Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

37% of the time is spent allocating/freeing memory #302

Open
Shnatsel opened this issue May 3, 2024 · 3 comments
Open

37% of the time is spent allocating/freeing memory #302

Shnatsel opened this issue May 3, 2024 · 3 comments
Labels
optimization Something can be done faster/better

Comments

@Shnatsel
Copy link
Contributor

Shnatsel commented May 3, 2024

I've run /usr/bin/time target/release/jxl-oxide decode on the attached image and noticed that the time spent in the kernel is very high: 0.90user 0.54system 0:00.53elapsed 271%CPU

Measured on commit c8d528d

Test image: Puente_de_Don_Luis_I,_Oporto,_Portugal,_2019-06-02,_DD_29-31_HDR.jxl.gz

So I used perf to investigate, here's the profile: https://share.firefox.dev/4a42u7I

The yellow parts are in userspace, the orange parts are in the kernel. You can see that the main thread does almost nothing but deallocate memory. And worker threads all end up spending a lot of time allocating it - not all at once, it gets provisioned gradually so it is spread out across the execution, but lots of functions bottom out in the kernel's memory management code.

Reusing allocations to avoid calling into the memory allocator all the time would dramatically improve performance.

@Shnatsel
Copy link
Contributor Author

Shnatsel commented May 3, 2024

Oh, and this was measured on x86_64-unknown-linux-gnu with its pretty good glibc memory allocator. The x86_64-unknown-linux-musl target with its Musl libc allocator is far worse: 0.95user 1.13system 0:00.81elapsed

And Apple targets are going to be as slow as musl if not slower. Their memory allocator is really slow.

@Shnatsel
Copy link
Contributor Author

Shnatsel commented May 3, 2024

I've just realized I was measuring an older commit. I've re-measured on 7ec630c and the results are not as skewed towards the system time this time around, while the total elapsed time is unchanged: 1.11user 0.26system 0:00.54elapsed

Updated profile: https://share.firefox.dev/3QukLUL

Still a lot of allocating and freeing memory on the profile.

The changes are largely due to the addition of mimalloc as the allocator in that binary. It eliminates the performance degradation on Apple and musl, but still causes a lot of work to be done allocating/freeing memory.

Removing mimalloc from the binary gets us right back to the previous amount of time spent in the kernel: 0.76user 0.58system 0:00.52elapsed
With the optimized userspace part, we're now at 43% of the time being spent in the kernel.

@tirr-c
Copy link
Owner

tirr-c commented May 4, 2024

Thanks for the investigation! I'm aware of allocation overhead and it's what I'm recently looking into. Hopefully I can come up with better buffer management...

@tirr-c tirr-c added the optimization Something can be done faster/better label May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something can be done faster/better
Projects
None yet
Development

No branches or pull requests

2 participants