Replies: 2 comments
-
Different deployment than yours, but similar symptoms. |
Beta Was this translation helpful? Give feedback.
-
OK -- an update. This appears to be largely due to artifacts of many old pipeline runs (hundreds of thousands), causing the database to have massive amounts of rows, and leading to slower operations. Why did this happen after upgrade, and not before? Maybe the DB upgrade process exacerbated this behavior; hard to tell. Regardless, deleting the offending pipelines and re-flying them to get rid of this residue seems to make everything run snappier, and thus garbage collection seems more effective. Diagnosis & remediation steps here were helpful: https://github.com/concourse/concourse/wiki/Schema-Inspection-Queries#show-the-size-and-number-of-hits-for-each-index |
Beta Was this translation helpful? Give feedback.
-
Hi.
We have many Concourse instances, and I recently upgraded them from v7.6.0 to v7.9.1 (thanks for the
--disable-srv-lookup
fix!). We use bosh-deployed vms, and we upgraded from Postgres 43 to 44 in the process, as well as changing stemcells from Bionic 1.54 to Jammy 1.108.After upgrading, our worker nodes have been acting flaky. We've gotten a lot of errors of these varieties, on many of our instances:
fly workers
)bosh delete
these vms and let the director recreate them.To my eye, this seems like something to do with garbage collection, but despite furious googling & trial & error, I can't find an obvious culprit. The database usage is also much larger than I'd expect on several of our deployments, which appear to be the more problematic ones. I've scaled up to add workers on several of these deployments, and the issues persist.
We have some failed jobs that would leave containers around, but it doesn't appear to be enough to move the needle so much as to lead to these failures.
Anyone got any advice? Or can I provide further details or logs to help provide additional context?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions