Skip to content

Shared memory for GPU kernels #442

@ali-ramadhan

Description

@ali-ramadhan

At the GPU hackathon way back in June we learned that the calculate_interior_source_terms kernel was a bottleneck as each thread required a lot of registers. It could benefit greatly from shared memory to reduce register pressure and allow more threads to run at a time.

Some preliminary work has been done in PR #293

@vchuravy has an @stencil abstraction in development at vchuravy/GPUifyLoops.jl#81

But would be good to implement plain shared memory without an abstraction and see how much of a performance boost we get, especially with LES closures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GPU 👾Where Oceananigans gets its powers fromperformance 🏍️So we can get the wrong answer even faster

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions