Turing SM
Configuration
This compute unit implements the Turing architecture. It consists of 4 blocks, each containing the following execution units. Additionally, this compute unit has multiple issue ports. Instructions scheduled onto separate issue ports can execute in parallel, but they require some instruction-level parallelism in the input.
Data type | Issue port | Execution rate |
FP32 | 0 | 16 lanes, executing one operation per cycle |
---|---|---|
FP16 | 0 | 16 lanes, executing 2 operations/cycle |
FP64 | 0 | 1 lane, executing one operation per cycle |
INT32 | 1 | 16 lanes, executing one operation per cycle |
Block diagram
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
DP
0
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
1
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
DP
0
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
1
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
DP
0
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
1
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
DP
0
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
1