# Turing SM

## Configuration

This compute unit implements the Turing architecture. It consists of 4 blocks, each containing the following execution units. Additionally, this compute unit has multiple issue ports. Instructions scheduled onto separate issue ports can execute in parallel, but they require some instruction-level parallelism in the input.

Data type | Issue port | Execution rate |

FP32 | 0 | 16 lanes, executing one operation per cycle |
---|---|---|

FP16 | 0 | 16 lanes, executing 2 operations/cycle |

FP64 | 0 | 1 lane, executing one operation per cycle |

INT32 | 1 | 16 lanes, executing one operation per cycle |

## Block diagram

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

DP

0

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

1

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

DP

0

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

1

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

DP

0

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

1

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

DP

0

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

1