This compute unit implements the Ampere (HPC) architecture. It consists of 4 blocks, each containing the following execution units. Additionally, this compute unit has multiple issue ports. Instructions scheduled onto separate issue ports can execute in parallel, but they require some instruction-level parallelism in the input.
|Data type||Issue port||Execution rate|
|FP32||0||16 lanes, executing one operation per cycle|
|FP64||0||8 lanes, executing one operation per cycle|
|INT32||1||16 lanes, executing one operation per cycle|