Lists (5)
Sort Name ascending (A-Z)
Stars
[ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".
[USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction"
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
Test LLMs against jailbreaks and unprecedented harms
The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily".
A curation of awesome tools, documents and projects about LLM Security.
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
Existing Literature about Machine Unlearning
This repository provides a benchmark for prompt injection attacks and defenses in LLMs
A framework to evaluate the generalization capability of safety alignment for LLMs
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML …
Papers and resources related to the security and privacy of LLMs 🤖
Codes and scripts for "Explainable Semantic Space by Grounding Languageto Vision with Cross-Modal Contrastive Learning"
LLM (Large Language Model) FineTuning
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
The exercises for the lrz AI Training Series on Containers.
This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.