Stars
PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
A toolbox for benchmarking Multimodal LLM Agents trustworthiness across truthfulness, controllability, safety and privacy dimensions through 34 interactive tasks
[ISMIR 2025] A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.
thu-ml / PINNacle
Forked from i207M/PINNacleCodebase for PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs.
Open source AI/ML capabilities for the FiftyOne ecosystem
DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis. [ACMMM 2024] Official PyTorch implementation
[NTIRE2024] official code for "Towards Real-world Video Face Restoration: A New Benchmark"
Adversarial Distributional Training (NeurIPS 2020)
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
[NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompting for Multimodal Large Language Models" has been accepted …
[EMNLP 2023] Lion: Adversarial Distillation of Proprietary Large Language Models
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
TruthfulQA: Measuring How Models Imitate Human Falsehoods
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Refine high-quality datasets and visual AI models
Code and documentation to train Stanford's Alpaca models, and generate the data.