You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've run /usr/bin/time target/release/jxl-oxide decode on the attached image and noticed that the time spent in the kernel is very high: 0.90user 0.54system 0:00.53elapsed 271%CPU
The yellow parts are in userspace, the orange parts are in the kernel. You can see that the main thread does almost nothing but deallocate memory. And worker threads all end up spending a lot of time allocating it - not all at once, it gets provisioned gradually so it is spread out across the execution, but lots of functions bottom out in the kernel's memory management code.
Reusing allocations to avoid calling into the memory allocator all the time would dramatically improve performance.
The text was updated successfully, but these errors were encountered:
Oh, and this was measured on x86_64-unknown-linux-gnu with its pretty good glibc memory allocator. The x86_64-unknown-linux-musl target with its Musl libc allocator is far worse: 0.95user 1.13system 0:00.81elapsed
And Apple targets are going to be as slow as musl if not slower. Their memory allocator is really slow.
I've just realized I was measuring an older commit. I've re-measured on 7ec630c and the results are not as skewed towards the system time this time around, while the total elapsed time is unchanged: 1.11user 0.26system 0:00.54elapsed
Still a lot of allocating and freeing memory on the profile.
The changes are largely due to the addition of mimalloc as the allocator in that binary. It eliminates the performance degradation on Apple and musl, but still causes a lot of work to be done allocating/freeing memory.
Removing mimalloc from the binary gets us right back to the previous amount of time spent in the kernel: 0.76user 0.58system 0:00.52elapsed
With the optimized userspace part, we're now at 43% of the time being spent in the kernel.
Thanks for the investigation! I'm aware of allocation overhead and it's what I'm recently looking into. Hopefully I can come up with better buffer management...
I've run
/usr/bin/time target/release/jxl-oxide decode
on the attached image and noticed that the time spent in the kernel is very high:0.90user 0.54system 0:00.53elapsed 271%CPU
Measured on commit c8d528d
Test image: Puente_de_Don_Luis_I,_Oporto,_Portugal,_2019-06-02,_DD_29-31_HDR.jxl.gz
So I used
perf
to investigate, here's the profile: https://share.firefox.dev/4a42u7IThe yellow parts are in userspace, the orange parts are in the kernel. You can see that the main thread does almost nothing but deallocate memory. And worker threads all end up spending a lot of time allocating it - not all at once, it gets provisioned gradually so it is spread out across the execution, but lots of functions bottom out in the kernel's memory management code.
Reusing allocations to avoid calling into the memory allocator all the time would dramatically improve performance.
The text was updated successfully, but these errors were encountered: