Ampere
Configuration
This compute unit implements the Ampere architecture. It consists of 4 blocks, each containing the following execution units. Additionally, this compute unit has multiple issue ports. Instructions scheduled onto separate issue ports can execute in parallel, but they require some instruction-level parallelism in the input.
Data type | Issue port | Execution rate |
FP32 | 0, 1 | 16 lanes, executing one operation per cycle |
---|---|---|
FP16 | 0 | 16 lanes, executing 2 operations/cycle |
INT32 | 0 | 16 lanes, executing one operation per cycle |
FP64 | 0 | 2 lanes, executing one operation per cycle. This unit is not instantiated within each group, but rather unique within the compute unit. |
Block diagram
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
DP
DP
0
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
1
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
0
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
1
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
0
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
1
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
0
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
1