Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library refresh: errors, too many assets, too many jobs #12494

Open
1 of 3 tasks
blablack opened this issue Sep 9, 2024 · 0 comments
Open
1 of 3 tasks

Library refresh: errors, too many assets, too many jobs #12494

blablack opened this issue Sep 9, 2024 · 0 comments

Comments

@blablack
Copy link

blablack commented Sep 9, 2024

The bug

Hello,

I have been using Immich for the past few months now and until recently everything was working fine.
I have one external library with the legacy pictures I have from before Immich, and since using Immich I have been uploading pictures directly through Immich.

Lately, when doing a Library refresh, several things go wrong:

  • The number of active jobs goes up to 3, even though the settings for library concurrency is 2.
  • The number of waiting assets for the Library refresh goes up to 80.000+ assets, even though I only have 19634 photos and 1452 videos.
  • In the logs, the following error appears:
[Nest] 7  - 09/09/2024, 9:04:53 AM     LOG [Microservices:LibraryService] Finished queueing online check of 19620 assets for library 6aaeca4c-e495-413c-a1ea-8b8e50b750d2
Error: Missing lock for job 392482. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
Error: Missing lock for job 392428. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)

This issue has only started to appear recently. It started to appear around the upgrade to version v1.113.0 but I cannot tell if it is due to the upgrade or due to another corruption that happened around the same time.

Let me know if I can provide any other useful information.

Thanks,
Aurélien

The OS that Immich Server is running on

kubernetes

Version of Immich Server

v1.114.0

Version of Immich Mobile App

1.114.0 build.158

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

apiVersion: v1
kind: ConfigMap
metadata:
  name: immich-postgres
data:
  create-extensions.sql: |
    CREATE EXTENSION IF NOT EXISTS cube;
    CREATE EXTENSION IF NOT EXISTS earthdistance;
    CREATE EXTENSION IF NOT EXISTS vectors;
    CREATE EXTENSION IF NOT EXISTS pg_trgm;
    CREATE EXTENSION IF NOT EXISTS unaccent;
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: immich
spec:
  replicas: 1
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app: immich
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: immich
    spec:
      volumes:
        - name: immich-pvc
          persistentVolumeClaim:
            claimName: immich-pvc
        - name: nasio-nfs-pvc
          persistentVolumeClaim:
            claimName: nasio-nfs-pvc
        - configMap:
            name: immich-postgres
          name: immich-postgres-vol
      containers:
        - image: tensorchord/pgvecto-rs:pg16-v0.2.1
          imagePullPolicy: IfNotPresent
          name: postgres
          env:
            - name: POSTGRES_USER
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: POSTGRES_PASSWORD
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: POSTGRES_DB
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: POSTGRES_INITDB_ARGS
              value: "--data-checksums"
          volumeMounts:
            - mountPath: "/var/lib/postgresql/data"
              subPath: "postgresql"
              name: immich-pvc
            - name: immich-postgres-vol
              subPath: "create-extensions.sql"
              mountPath: "/docker-entrypoint-initdb.d/create-extensions.sql"
          resources:
            limits:
              cpu: 1500m
              memory: 3000Mi
            requests:
              cpu: 10m
              memory: 300Mi
        - image: redis:latest
          imagePullPolicy: IfNotPresent
          name: redis
          resources:
            limits:
              cpu: 40m
              memory: 200Mi
            requests:
              cpu: 10m
              memory: 10Mi
        - image: ghcr.io/immich-app/immich-server:release
          imagePullPolicy: Always
          name: immich-server
          env:
            - name: DB_HOSTNAME
              value: "localhost"
            - name: DB_USERNAME
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: DB_PASSWORD
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: DB_DATABASE_NAME
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: REDIS_HOSTNAME
              value: "localhost"
            - name: IMMICH_PORT
              value: "3001"
            - name: IMMICH_MACHINE_LEARNING_URL
              value: "http://localhost:3003"
          ports:
            - containerPort: 3001
              name: http
          livenessProbe:
            httpGet:
              path: /server-info/ping
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
            failureThreshold: 5
          readinessProbe:
            httpGet:
              path: /server-info/ping
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
            failureThreshold: 5
          volumeMounts:
            - mountPath: "/photos"
              subPath: "Our Pictures"
              name: nasio-nfs-pvc
              readOnly: true
            - mountPath: "/usr/src/app/upload"
              subPath: "Kubernetes/Our Pictures - Immich"
              name: nasio-nfs-pvc
          resources:
            limits:
              cpu: 1500m
              memory: 2500Mi
            requests:
              cpu: 10m
              memory: 100Mi
        - image: ghcr.io/immich-app/immich-machine-learning:release
          imagePullPolicy: Always
          name: immich-machine-learning
          env:
            - name: TRANSFORMERS_CACHE
              value: "/cache"
            - name: IMMICH_PORT
              value: "3003"
          ports:
            - containerPort: 3003
              name: http
          livenessProbe:
            httpGet:
              path: /ping
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
            failureThreshold: 5
          readinessProbe:
            httpGet:
              path: /ping
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
            failureThreshold: 5
          volumeMounts:
            - mountPath: "/cache"
              subPath: "ml-cache"
              name: immich-pvc
          resources:
            limits:
              cpu: 1500m
              memory: 1500Mi
            requests:
              cpu: 10m
              memory: 1000Mi
      dnsPolicy: "None"
      dnsConfig:
        nameservers:
          - 10.43.0.22
---
apiVersion: v1
kind: Service
metadata:
  name: immich
  annotations:
    metallb.universe.tf/address-pool: default
    metallb.universe.tf/loadBalancerIPs: 192.168.2.210
spec:
  externalTrafficPolicy: Local
  selector:
    app: immich
  ports:
    - name: http-80
      protocol: TCP
      port: 80
      targetPort: 3001
  type: LoadBalancer

Your .env content

c.f. k8s deployment

Reproduction steps

  1. Click Administration
  2. Click Jobs
  3. Click "Refresh" or "All" at the Library section

Relevant log output

[Nest] 7  - 09/09/2024, 9:01:30 AM     LOG [Microservices:LibraryService] Refreshing library 6aaeca4c-e495-413c-a1ea-8b8e50b750d2
[Nest] 7  - 09/09/2024, 9:02:21 AM     LOG [Microservices:LibraryService] Refreshing library 6aaeca4c-e495-413c-a1ea-8b8e50b750d2
[Nest] 7  - 09/09/2024, 9:03:15 AM     LOG [Microservices:LibraryService] Finished queueing online check of 19620 assets for library 6aaeca4c-e495-413c-a1ea-8b8e50b750d2
Error: Missing lock for job 392428. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
Error: Missing lock for job 392465. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[Nest] 7  - 09/09/2024, 9:04:53 AM     LOG [Microservices:LibraryService] Finished queueing online check of 19620 assets for library 6aaeca4c-e495-413c-a1ea-8b8e50b750d2
Error: Missing lock for job 392482. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
Error: Missing lock for job 392428. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)

Additional information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant