This compute unit implements the Turing architecture. It consists of 4 blocks, each containing the following execution units. Additionally, this compute unit has multiple issue ports. Instructions scheduled onto separate issue ports can execute in parallel, but they require some instruction-level parallelism in the input.
|Data type||Issue port||Execution rate|
|FP32||0||16 lanes, executing one operation per cycle|
|FP16||0||16 lanes, executing 2 operations/cycle|
|FP64||0||1 lane, executing one operation per cycle|
|INT32||1||16 lanes, executing one operation per cycle|