NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
-
Updated
Nov 12, 2025 - Python
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Open-source vulnerability disclosure and bug bounty program database
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
List of resources about programming practices for writing safety-critical software.
Safe reinforcement learning with stability guarantees
Safe Bayesian Optimization
Deploy once. Continuously improve your AI agents in production.
Decrypted Generative Model safety files for Apple Intelligence containing filters
Jax Official Implementation of T-RO Paper: Songyuan Zhang*, Oswin So*, Kunal Garg, Chuchu Fan: "GCBF+: A Neural Graph Control Barrier Function Framework for Distributed Safe Multi-Agent Control".
A collaborative collection of open-source safe GPT-3 prompts that work well
Safe Exploration with MPC and Gaussian process models
面向中文大模型价值观的评估与对齐研究
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Official datasets and pytorch implementation repository of SQuARe and KoSBi (ACL 2023)
Safety Verification of Deep Neural Networks
Add a description, image, and links to the safety topic page so that developers can more easily learn about it.
To associate your repository with the safety topic, visit your repo's landing page and select "manage topics."