# Ampere SM

## Configuration

This compute unit implements the Ampere (HPC) architecture. It consists of 4 blocks, each containing the following execution units. Additionally, this compute unit has multiple issue ports. Instructions scheduled onto separate issue ports can execute in parallel, but they require some instruction-level parallelism in the input.

Data type | Issue port | Execution rate |

FP32 | 0 | 16 lanes, executing one operation per cycle |
---|---|---|

FP64 | 0 | 8 lanes, executing one operation per cycle |

INT32 | 1 | 16 lanes, executing one operation per cycle |

## Block diagram

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

DP

DP

DP

DP

DP

DP

DP

DP

0

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

1

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

DP

DP

DP

DP

DP

DP

DP

DP

0

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

1

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

DP

DP

DP

DP

DP

DP

DP

DP

0

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

1

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

FP

DP

DP

DP

DP

DP

DP

DP

DP

0

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

INT

1