Stars
A deliberately vulnerable banking application designed for practicing Security Testing of Web App, APIs, AI integrated App and secure code reviews. Features common vulnerabilities found in real-wor…
Code Repository for: AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
RAG/LLM Security Scanner identifies critical vulnerabilities in AI-powered applications, including chatbots, virtual assistants, and knowledge retrieval systems.
Zero trust. Zero security. Total exposure. A deliberately vulnerable health tech platform with AI Chatbot for learning about application security and ethical hacking. It contains vulnerabilities fr…
A full-featured, hackable Next.js AI chatbot built by Vercel
An intentionally vulnerable AI chatbot to learn and practice AI Security.
A Purposely Vulnerable LLM Shopping List Tool
An AI-powered agentic red team framework that automates offensive security operations, from reconnaissance to exploitation to post-exploitation, with zero human intervention.
The Python Risk Identification Tool for generative AI (PyRIT) is an open source framework built to empower security professionals and engineers to proactively identify risks in generative AI systems.
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command li…
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
A security scanner for your LLM agentic workflows
A curated list of awesome LLM Red Teaming training, resources, and tools.
DeepTeam is a framework to red team LLMs and AI agents.
Collection of leaked system prompts
[ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).