GPU database

Turing SM

Configuration

This compute unit implements the Turing architecture. It consists of 4 blocks, each containing the following execution units. Additionally, this compute unit has multiple issue ports. Instructions scheduled onto separate issue ports can execute in parallel, but they require some instruction-level parallelism in the input.

Data type Issue port Execution rate
FP32 0 16 lanes, executing one operation per cycle
FP16 0 16 lanes, executing 2 operations/cycle
FP64 0 1 lane, executing one operation per cycle
INT32 1 16 lanes, executing one operation per cycle

Block diagram

FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
DP
0
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
1
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
DP
0
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
1
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
DP
0
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
1
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
FP
DP
0
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
INT
1

ASICs

The following ASICs are using this compute unit:

Cards

The following cards are built using this compute unit: