-
Notifications
You must be signed in to change notification settings - Fork 259
Closed
Labels
GPU 👾Where Oceananigans gets its powers fromWhere Oceananigans gets its powers fromperformance 🏍️So we can get the wrong answer even fasterSo we can get the wrong answer even faster
Milestone
Description
At the GPU hackathon way back in June we learned that the calculate_interior_source_terms kernel was a bottleneck as each thread required a lot of registers. It could benefit greatly from shared memory to reduce register pressure and allow more threads to run at a time.
Some preliminary work has been done in PR #293
@vchuravy has an @stencil abstraction in development at vchuravy/GPUifyLoops.jl#81
But would be good to implement plain shared memory without an abstraction and see how much of a performance boost we get, especially with LES closures.
glwagner
Metadata
Metadata
Assignees
Labels
GPU 👾Where Oceananigans gets its powers fromWhere Oceananigans gets its powers fromperformance 🏍️So we can get the wrong answer even fasterSo we can get the wrong answer even faster