CatOps

What’s the difference between picking up litter after yourself and donating to the AFU's pickup trucks?

You're right — donating is easier, as you don't need to spend energy producing waste beforehand!

So, here's the link: https://send.monobank.ua/jar/3U1hBa5WPp

More info: https://www.instagram.com/p/DXpgaaWgH00

#donations #Ukraine

❤5

1.43K views11:59

Monobank Jar

Info on Insta

Chaos Engineering: The Evolution from Netflix's Chaos Monkey to AI-Powered Resilience

A nice article about chaos engineering that was shared in our chat.

The author uses some overly fluffy sentences, but the core of the article holds strong: in many cases, you don't need chaos engineering, and there are things that have better ROI, unless you have them already.

Personally, I'd also like to add that chaos engineering is not simply about breaking things - it's about experimentation. You don't just randomly switch off things, you build hypotheses and validate them. This is the boring, yet crucial part, that many oversee.

#chaos

www.srao.blog

Denny's Led to Chaos in My Stomach, and I Decided to Write an Article on Chaos Engineering...

🔥3👍1

1.52K views14:24

Chaos Engineering: The Evolution from Netflix's Chaos Monkey to AI-Powered Resilience

Today CatOps became 9 years old 🎉

You can send us a birthday present by donating to our current fundraiser!

https://send.monobank.ua/jar/3U1hBa5WPp

🎉22🔥4

1.55K views12:14

Send congrats

For today’s Donations Monday, let’s finally close the fundraiser for two trucks that’s been going on for some time already.

https://send.monobank.ua/jar/3U1hBa5WPp

More info: https://www.instagram.com/p/DXpgaaWgH00

#donations #Ukraine

❤3

1.4K views12:10

Monobank Jar

More details

Finding zombies in our systems: A real-world story of CPU bottlenecks

Finding zombies in our systems: A real-world story of CPU bottlenecks is an interesting debugging story for those, who like technical detective tales.

P.S. I find Pinterest's technical blog quite interesting. It has many interesting articles out there.

#debug #aws #performance #kubernetes

Medium

Vaibhav Shankar; Staff Software Engineer | Raymond Lee; Staff Software Engineer | Chia-Wei Chen; Staff Software Engineer | Shunyao Li; Sr…

👍3👀1

1.47K views09:13

Finding zombies in our systems: A real-world story of CPU bottlenecks

You own all the risk for AI Code | Heinrich Hartmann | Herald Blog

I Don’t Care if AI Wrote the Code. You Own It. is a reminder that you cannot call AI an idiot, if something goes wrong - you still bear the responsibility of what it does.

This short article just reiterates this statement, and points out that in this day an age, tests and validations are more important as ever before.

#ai #sre

Herald

AI changes how we write code, but not who is responsible. Heinrich Hartmann explains why AI engineering needs more tests and design rigor.

💯18🔥1

1.56K views11:30

I Don’t Care if AI Wrote the Code. You Own It.

What was on CatOps in the last couple of weeks...

A new issue of the CatOps Digest is here!

https://newsletter.catops.dev/p/catops-digest-2026-05-29

#newsletter #digest

newsletter.catops.dev

CatOps Digest 2026-05-29

👍3

1.53K views17:05

CatOps Digest 2026-05-29

Kubernetes Gateway API - Blog by Roman Glushko

Unless you're super diligent with deprecation, you may be in a situation right now, when you need to migrate away from NGINX ingress.

Here's a great article that explains new Kubernetes API objects related to the GatewayAPI project that is here to replace Ingress.

Ingress API is not deprecated itself, but it won't be further developed either.

This article confuses the names for the community-led Ingress Nginx and the F5 NGINX ingress controller, but so do many of us: there are way too many nginx's in this world.

#kubernetes #networking #nginx

Roman Glushko

Why Ingress Is Being Replaced and Which Gateway Controller to Pick

👍2❤1

1.3K views11:36

Kubernetes Gateway API

The radical network redesign that led AWS to forge a more resilient cloud

A case study from Amazon, how science solves actual engineering problems that later translate in money savings (likely millions on the Amazon scale).

How a Slack shout-out, a dusted-off academic theory, and a spaghetti monster led an AWS team to crack an elusive code—and deliver greater reliability and performance for customers is a story about AWS realigning their network around the random graph theory.

P.S. I always feel excited about the networking stories, because I studied them in the university. Even though I haven’t worked closely with them since many years ago, and forgot almost everything about them.

#aws #networking

👍8❤1🔥1

1.37K views09:51

How Do You Fit a Trillion-Parameter Model Into a Kubernetes Cluster?

How Do You Fit a Trillion-Parameter Model Into a Kubernetes Cluster? is an interesting article about how one should change their perspective when reasoning about running LLMs in Kubernetes compared to usual web apps.

It’s an interesting read, especially, if you don’t work with this stuff every day. The biggest takeaway here is that in the case of models, a “replica” doesn’t mean a pod in most of the cases, it’s a distributed system on its own that should behave as one. This article also explains, how exactly things are distributed within a replica, and what are the low level system parameters to pay attention to.

#kubernetes #ai #llm

👍3❤2

1.13K views10:01

Support Ukrainian Army with 👀 and🦻

For today's Donations Monday, I'd like to share with you requisites of a friend of mine, who volunteers for AFU since the beginning of the full-scale invasion.

Here's a page with all the possible ways to donate. You can also find links to the current goals, and reports for previous fundraises there.

Here's their Monobank jar that supports Apple Pay, if you'd better have a direct link:

https://send.monobank.ua/jar/BQjWbpver

#donations #Ukraine

❤3

1.11K views11:01

Monobank Jar

Backend for frontend (BFF) pattern— why do you need to know it?

An explainer for the Backend-for-Frontend pattern. The article provides some high-level overview of what it is, and when to use it.

#architecture #design

Medium

Our typical issue starts when we need to integrate some API to our mobile app. Let’s imagine the case when you need to create a mobile app…

🤔1😭1

1.08K views13:39

Backend for frontend (BFF) pattern

How much do amd64 microarchitecture levels help in Go?

How much do amd64 microarchitecture levels help in Go? is a benchmarking article that shows the compute time improvements you can get if you'd build your apps for modern x64 processors only. You likely use modern processors already and do not plan to run your apps on the decade old hardware.

Still, it's important to remember that while such articles are nice; your real applications probably don't just calculate bit vectors all day. It's much more likely your real performance bottleneck is I/O and not the fact that your apps are built with the support for old hardware. Still, you can get some easy wins here by just adding a compilation flag, if you're using Go.

#performance #go #programming

👍1

1.12K views14:16

From the kubernetes community on Reddit

A Reddit thread with some useful tools for Kubernetes and kubectl plugins0.

Some things there are well-known, but you may find some new interesting things there. I did :)

#kubernetes

Explore this post and more from the kubernetes community

👍1

1.09K views09:30

small k8s tools that saved me time debugging boring problems

What was on CatOps in the last couple of weeks...

A new issue of the CatOps Digest is here!

https://newsletter.catops.dev/p/catops-digest-2026-06-13

#digest #newsletter

newsletter.catops.dev

CatOps Digest 2026-06-13

🔥1

1.02K views12:17

CatOps Digest 2026-06-13

For today's Donations Monday, I'd like to share with you a fundraiser that our friends at DOU started for the 2nd separate corps of the National Guard of Ukraine «Хартія». The goal of this fundraiser is to buy heavy bomber drones "Vampire" for the Kupiansk direction.

Monobank jar: https://send.monobank.ua/jar/26mrQPQ3PZ

#donations #Ukraine

❤3

870 views09:01

Vampire drones

AI demands more engineering discipline. Not less

I will post AI-related articles this week, because why not?

The first one is from Charity Majors called AI demands more engineering discipline. Not less, in which she follows up on her another article.

This one is on technical aspects of moving to the disposable code. It also has a lot of links to other articles, which is also cool.

#ai

Substack

If you lived through the shift from handcrafted server pets to immutable infrastructure, you should sense something oddly familiar about what's happening now.

👍1

789 views13:01

AI demands more engineering discipline. Not less

Harness engineering for coding agent users

Harness engineering for coding agent users is a new guest article in Martin Fowler's blog that summarizes approaches to improve AI output and make it more manageable.

If you're actively using AI agents day-to-day, things described in this article won't be news to you, but it helps to structure one's thoughts.

#ai

❤1👎1

705 views08:48

AI in SRE: What's Actually Coming in 2026

Continuing with our AI week.

AI in SRE: What's Actually Coming in 2026 is telling a story of AI coming for help with incident response.

The article suggests trying an AI tool for real investigation or data collection for postmortems. To clarify this, in my experience, you don’t need to have a dedicated tool, a general purpose AI agent with some harness (skills and scripts) would do. You should try it! AI does the job of data gathering incredibly well. Yet, the results are indeed not perfect.

Another good point in this article is data quality. AI results are as good as context you provide. I witnessed two prominent failure modes so far:

1. Inference on incomplete data: a person with limited access (typically a developer) asks their agent to investigate an alert. The agent comes to some conclusion. At the same time, a person with elevated access (typically a systems engineer) asks their agent to investigate the same alert and gets a different result, likely because some data is only available via kubectl events, etc. The fix for that is not to allow everyone to do everything, the fix is to revisit your observability pipelines and ensure that you ship all the relevant data, which is easier said than done.
2. Agent that cries "wolves": if you have a pollutant in your logs, or simply an event that happens very often, agents like to correlate it with everything. If your clusters are elastic, an agent could blame node count fluctuations for every error. The problem here is that once node count fluctuation actually causes a problem, you will be the one to ignore this hint from an agent, because it suggests it every single time.

If you are ready to share more AI failure modes specifically related to SRE in Ukrainian, welcome to our chat.

#ai #sre

DZone

A practical look at where AI genuinely helps SRE teams, and what “AI-powered operations” can realistically deliver in production.

👍2

327 views11:23

AI in SRE: What's Actually Coming in 2026

CatOps Chat (in Ukrainian)