Journals & Magazines >IEEE Micro >Volume: 40 Issue: 6

Countering Load-to-Use Stalls in the NVIDIA Turing GPU

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Among its various improvements over prior NVIDIA GPUs, the NVIDIA Turing GPU boasts of four key performance enhancements to effectively counter memory load-to-use stalls....Show More

Metadata

Abstract:

Among its various improvements over prior NVIDIA GPUs, the NVIDIA Turing GPU boasts of four key performance enhancements to effectively counter memory load-to-use stalls. First, reduced latency on L1 hits for global memory loads helps lower average memory lookup latency. Next, the ability to dynamically configure the L1 data RAM between cacheable memory and scratchpad or shared memory, enables driver software to opportunistically maximize L1 data cache size for programs with low shared memory requirements, increasing L1 hits and reducing load-to-use stalls. Finally, the twin enhancements of doubling of vector register file capacity and the addition of a dedicated scalar or uniform register file along with a uniform datapath, ease vector register pressure and enable higher warp level parallelism, leading to better latency tolerance. We find that the above enhancements combined deliver an average speedup of 11% on modern gaming applications.

Published in: IEEE Micro ( Volume: 40, Issue: 6, 01 Nov.-Dec. 2020)

Page(s): 59 - 66

Date of Publication: 28 July 2020

ISSN Information:

DOI: 10.1109/MM.2020.3012514

Contents

References is not available for this document.

Countering Load-to-Use Stalls in the NVIDIA Turing GPU

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Countering Load-to-Use Stalls in the NVIDIA Turing GPU

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?