Bloom is a free, open-source tool that automates testing AI models for bad behaviors like bias or sycophancy.
You define the behavior in a simple config file, add example chats if you want, and it runs four steps: understanding it, creating varied test scenarios, simulating talks with your target model (like Claude or GPT via APIs), and scoring results with metrics like how often the issue appears.
View interactive transcripts easily.
This saves you hours of manual work, lets you quickly compare models on fresh tests to avoid overfitting, and gives reliable, reproducible insights into AI safety—perfect for researchers building trustworthy systems.