/blog

Notes from the AI red team

Analysis of AI agent vulnerabilities, attack techniques, and defensive patterns — plus findings from scans I run against public targets.

June 12, 2026·5 min read

What I Learned Cataloguing Every AI Security Incident I Could Find

I built a sourced database of real-world AI and LLM security incidents. Putting them in one place surfaced three patterns you do not see one headline at a time: a single exfiltration channel that keeps working across vendors, indirect injection as the real attack surface, and the 2025 shift to agents and supply chain.

Read post →

May 16, 2026·5 min read

Why Classifier-Based Prompt Injection Defense Is a Speed Bump, Not a Wall

Input classifiers that detect prompt injection are the most common defense deployed in production. They're also trivially bypassable with encoding, fragmentation, and indirect injection. Here's why they fail and what to layer on top.

Read post →

May 16, 2026·5 min read

Why Most LLM Bug Reports Get Closed as Informational

You found a prompt injection. You wrote a report. The triager closed it as informational. Here's why that keeps happening, and how to write LLM vulnerability reports that actually get accepted.

Read post →

May 14, 2026·9 min read

How to Red-Team Your AI Agent in One Afternoon

A step-by-step checklist for security-testing your AI agent in 4 hours. Six attack classes, specific prompts to try, and what each finding actually means for your product.

Read post →

May 5, 2026·4 min read

The Audit-Framing Trick: How AI Memory Becomes a Side Door

A junior contractor doesn't have access to the CFO's salary review notes. But they have edit access to a shared Notion page, and the company AI assistant indexes Notion. Three days later, every employee can ask the AI for a 'memory diagnostic' and get the CFO's notes back.

Read post →

April 25, 2026·7 min read

The OWASP LLM Top 10 Is Missing Three Categories

The OWASP Top 10 for LLM Applications is the best framework we have. It also has three blind spots that account for a disproportionate share of what I'm finding in the field — multi-tenant context bleed, agent-to-agent handoff attacks, and temporal/memory attacks.

Read post →

April 23, 2026·5 min read

Why Pure-LLM CTFs Don't Work: A Hybrid Architecture for AI Security Challenges

Pure-LLM CTFs are unreliable because model alignment training fights your characters. Pure-deterministic CTFs teach pattern matching, not attack patterns. Here's the hybrid approach the Wraith Academy uses, and why it took a few iterations to get there.

Read post →

April 17, 2026·4 min read

I Red-Teamed a Chatbot in 26 Seconds. Here's What It Leaked.

I built a deliberately vulnerable chatbot, pointed Wraith at it, and watched it extract the full system prompt — including a production API key and admin database credentials. Here's exactly how.

Read post →

April 16, 2026·2 min read

Why I Built Wraith

Most security tools don't know how to test AI agents. That's a gap worth building a product around.

Read post →