safety

Here are 238 public repositories matching this topic...

NVIDIA-NeMo / Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

python nvidia safety agents guardrails llms generative-ai llm-security llm-safety

Updated Nov 10, 2025
Python

stanislaw / awesome-safety-critical

Star

List of resources about programming practices for writing safety-critical software.

safety awesome-list safety-critical awesome-lists safety-standards

Updated Mar 11, 2025
Python

PKU-Alignment / safe-rlhf

Star

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Updated Sep 8, 2025
Python

disclose / diodb

Star

Open-source vulnerability disclosure and bug bounty program database

legal data hackers bug-bounty safety simplicity responsible-disclosure safe-harbor-framework security-research vulnerability-disclosure disclosure-policy bug-bounty-hunters

Updated Jul 20, 2025
Python

utiasDSL / safe-control-gym

Star

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL

control reinforcement-learning quadcopter robotics symbolic gym cartpole safety quadrotor robustness pybullet casadi

Updated Nov 6, 2025
Python

X-PLUG / CValues

Star

面向中文大模型价值观的评估与对齐研究

benchmark evaluation safety responsibility human-values multi-choice llms chinese-llms

Updated Jul 20, 2023
Python

pegasi-ai / agent-ci

Star

Deploy once. Continuously improve your AI agents in production.

security alignment safety accuracy rag hallucinations retrieval-augmented-generation rag-metrics

Updated Nov 7, 2025
Python

BlueFalconHD / apple_generative_model_safety_decrypted

Star

Decrypted Generative Model safety files for Apple Intelligence containing filters

apple ai safety decryption lldb-script llm llm-safety apple-intelligence

Updated Oct 22, 2025
Python

maraoz / gpt-scrolls

Star

A collaborative collection of open-source safe GPT-3 prompts that work well

generator transformer openai safety language-model gpt-3

Updated Mar 12, 2022
Python

naver-ai / korean-safety-benchmarks

Star

Official datasets and pytorch implementation repository of SQuARe and KoSBi (ACL 2023)

nlp ai safety language-model social-bias

Updated Jun 29, 2023
Python

befelix / safe_learning

Star

Safe reinforcement learning with stability guarantees

reinforcement-learning safety dynamic-programming gaussian-processes stability

Updated Feb 8, 2022
Python

CryptoAILab / FigStep

Star

[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts

security jailbreak safety multi-modal vlm gpt-4 llm

Updated Jun 26, 2025
Python

thu-ml / MMTrustEval

Star

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)

benchmark privacy toolbox safety multi-modal fairness robustness claude gpt-4 trustworthy-ai truthfulness mllm

Updated Jun 27, 2025
Python

tmlr-group / DeepInception

Star

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

jailbreak deep safety inception gpt trustworthy gpt3 gpt4 large-language-models llm

Updated Feb 20, 2024
Python

yueliu1999 / FlipAttack

Star

[ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".

security ai jailbreak safety llm llms

Updated May 2, 2025
Python

befelix / SafeOpt

Star

Safe Bayesian Optimization

reinforcement-learning robotics optimization safety gaussian-processes

Updated Nov 14, 2022
Python

ictnlp / TruthX

Star

Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space"

safety llama representation language-model mistral explainable-ai hallucination baichuan hallucinations gpt-4 truthfulness llm llms chatgpt chatglm llm-inference llama2 llama3

Updated Mar 26, 2024
Python

lamalab-org / chembench

Star

How good are LLMs at chemistry?

benchmark machine-learning chemistry safety materials-science llm llms llms-benchmarking

Updated Nov 5, 2025
Python

ChenWu98 / agent-attack

Star

[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents

agent safety adversarial-attacks multimodal llm

Updated Feb 19, 2025
Python

trendmicro / ais

Star

Toolkit for research purposes in AIS. See the website for the paper.

security safety sdr rf ais aisafety

Updated Feb 15, 2021
Python

Improve this page

Add a description, image, and links to the safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

safety

Here are 238 public repositories matching this topic...

NVIDIA-NeMo / Guardrails

stanislaw / awesome-safety-critical

PKU-Alignment / safe-rlhf

disclose / diodb

utiasDSL / safe-control-gym

X-PLUG / CValues

pegasi-ai / agent-ci

BlueFalconHD / apple_generative_model_safety_decrypted

maraoz / gpt-scrolls

naver-ai / korean-safety-benchmarks

befelix / safe_learning

CryptoAILab / FigStep

thu-ml / MMTrustEval

tmlr-group / DeepInception

yueliu1999 / FlipAttack

befelix / SafeOpt

ictnlp / TruthX

lamalab-org / chembench

ChenWu98 / agent-attack

trendmicro / ais

Improve this page

Add this topic to your repo