This compute unit implements the Ampere architecture. It consists of 4 blocks, each containing the following execution units. Additionally, this compute unit has multiple issue ports. Instructions scheduled onto separate issue ports can execute in parallel, but they require some instruction-level parallelism in the input.
|Data type||Issue port||Execution rate|
|FP32||0, 1||16 lanes, executing one operation per cycle|
|FP16||0||16 lanes, executing 2 operations/cycle|
|INT32||0||16 lanes, executing one operation per cycle|
|FP64||0||2 lanes, executing one operation per cycle. This unit is not instantiated within each group, but rather unique within the compute unit.|