Skip to content
View werywjw's full-sized avatar

Highlights

  • Pro

Block or report werywjw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".

Python 171 14 Updated May 2, 2025

[USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction"

Python 112 14 Updated Oct 11, 2024

[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`

Python 97 21 Updated Aug 15, 2025

We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

Python 347 37 Updated Feb 23, 2024

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers

JavaScript 66 13 Updated Aug 25, 2024

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

1,304 107 Updated Mar 30, 2026

[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability

Python 175 23 Updated Dec 18, 2024

An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)

Python 114 17 Updated Jan 21, 2025

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]

Python 576 69 Updated Apr 4, 2025

【ACL 2024】 SALAD benchmark & MD-Judge

Python 173 15 Updated Mar 8, 2025

Test LLMs against jailbreaks and unprecedented harms

Python 39 9 Updated Oct 19, 2024

The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily".

Python 158 17 Updated Sep 2, 2025

A curation of awesome tools, documents and projects about LLM Security.

1,565 210 Updated Aug 20, 2025

[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.

Python 190 11 Updated Apr 1, 2025

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]

Shell 381 45 Updated Jan 23, 2025
Python 6 1 Updated Oct 24, 2022

Existing Literature about Machine Unlearning

961 118 Updated Aug 29, 2025

This repository provides a benchmark for prompt injection attacks and defenses in LLMs

Python 428 67 Updated Oct 29, 2025

A framework to evaluate the generalization capability of safety alignment for LLMs

Python 629 69 Updated Oct 9, 2025

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 921 140 Updated Aug 16, 2024

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML …

Python 473 48 Updated Feb 26, 2024

Papers and resources related to the security and privacy of LLMs 🤖

Python 571 45 Updated Jun 8, 2025

Codes and scripts for "Explainable Semantic Space by Grounding Languageto Vision with Cross-Modal Contrastive Learning"

Jupyter Notebook 20 5 Updated Mar 23, 2022

LLM (Large Language Model) FineTuning

Jupyter Notebook 569 137 Updated Apr 1, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 70,108 8,577 Updated Apr 12, 2026

The exercises for the lrz AI Training Series on Containers.

Jupyter Notebook 4 Updated Oct 13, 2025

The official Meta Llama 3 GitHub site

Python 29,289 3,530 Updated Jan 26, 2025

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Python 1,627 282 Updated Jun 12, 2023

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,678 2,760 Updated Aug 12, 2024
Next