Shared memory for GPU kernels

At the GPU hackathon way back in June we learned that the `calculate_interior_source_terms` kernel was a bottleneck as each thread required a lot of registers. It could benefit greatly from shared memory to reduce register pressure and allow more threads to run at a time.

Some preliminary work has been done in PR https://github.com/climate-machine/Oceananigans.jl/pull/293

@vchuravy has an `@stencil` abstraction in development at https://github.com/vchuravy/GPUifyLoops.jl/pull/81

But would be good to implement plain shared memory without an abstraction and see how much of a performance boost we get, especially with LES closures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shared memory for GPU kernels #442

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Shared memory for GPU kernels #442

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions