-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Memory Leak kills Host due to full Swap #5283
Comments
There is obviously some type of memory issue. The fetch failed also usually means machine learning url is misconfigured or the container is unreachable. Can you confirm that the container is up and machine learning works for a single photo, when you regenerate the thumbnail for it? |
uhm... okay... I've paused everything in the Job Status. Server:
Microservices:
Machine-Learning
Anything usefull in there? |
I'm also a user and my library is around 600gb. From your logs, it looks like ML is still only working on image categorization. What I often run into is face recognition and merging when the load will be huge. Thumbnail work shouldn't cause the memory to run out, I've run into repeated restarts of typesense and massive memory usage before. Suggestion:
|
i Think that might have happened because while immich was creating thumbnails i deleted the @eadir folders from my NAS. Might that bug happen if you try to access a file that does not exist anymore? Did a full wipe and clean start, does not seem to happen anymore. |
Today, I encountered the same issue. After bulk importing around 10,000 photos, I suddenly couldn't remotely connect to my NAS host. Initially, I suspected that the ML service was causing high memory usage during facial recognition, so I forcibly restarted the NAS and reconfigured concurrency settings, setting all jobs to run on a single thread. I didn't start all the jobs at once; instead, I sequentially ran FACE DETECTION and then GENERATE THUMBNAILS. There were no abnormalities during the FACE DETECTION process, indicating that the issue wasn't caused by the ML operations. After completing all FACE DETECTIONS, I initiated the GENERATE THUMBNAILS task, which ran slowly due to the large image sizes. Without continuous monitoring, upon returning to check on the NAS, I found it was unreachable remotely, with the system log displaying the following error messages: [17452.279485] Out of memory: Killed process 24344 (immich_microser) total-vm:14375660kB, anon-rss:8827768kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:23516kB oom_score_adj:0
[17653.570542] Out of memory: Killed process 24972 (immich_microser) total-vm:13119600kB, anon-rss:8825796kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:20652kB oom_score_adj:0
[17848.345248] Out of memory: Killed process 26162 (immich_microser) total-vm:19086052kB, anon-rss:8709080kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:31688kB oom_score_adj:0
[17932.668353] Out of memory: Killed process 27348 (immich_microser) total-vm:13774836kB, anon-rss:8759228kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:22648kB oom_score_adj:0
[18001.034451] Out of memory: Killed process 27902 (immich_microser) total-vm:14749840kB, anon-rss:9111036kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:25104kB oom_score_adj:0 This suggests that there might be a memory leak issue within the |
The bug
Hello,
I've setup an Immich Container with an existing Library of about 150gb so i was expecting heavy load. But after 1-2 Days i realised that my whole system was running on overload and not a single docker container could be reached. The system was busy to such an extend that an SSH connection took minutes to respond!
After a reboot i realized that the immich-microservices filled the RAM and the Swap.
Whenever this happened again i simply killed the immich-microservices docker and after a few minutes the system came back alive.
Tried to limit RAM usage im Docker Container. That did not Help... the RAM in the Docker Container itself does NOT go over the Limit! But the System Still uses shittons of RAM anyway.
So i upgraded from 8 to 16gb RAM... tought that would Help. But nope. It just took a little Longer until it happened.
Here is a screenshot that shows docker stats which are within Limits, hTop which shows full Sawp and CPU usage. And the Synology DSM RAM usage. Shorty before the system becomes unresponsive.
Here is a Screenshot one minute prior.
Here is a snipet from the Log after i used docker kill to shutdown the container.
The Memory Limitation on the Container itself seems to be working.
But is it the interContainer transport of such Huge data that causes issues?
Log from Machine-Learning.
Log from Server
Any other logs i should provide ?
The OS that Immich Server is running on
Synology DSM 7.2.1
Version of Immich Server
1.88.2
Version of Immich Mobile App
v0
Platform with the issue
Your docker-compose.yml content
Your .env content
Reproduction steps
Additional information
No response
The text was updated successfully, but these errors were encountered: