werywjw

Jiawen Wang werywjw

15 followers · 32 following

Achievements

Highlights

Lists (5)

Sort

Stars

yueliu1999 / FlipAttack

[ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".

Python 171 14 Updated May 2, 2025

LLM-DRA / DRA

[USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction"

Python 112 14 Updated Oct 11, 2024

uw-nsl / ArtPrompt

[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`

Python 97 21 Updated Aug 15, 2025

LLM-Tuning-Safety / LLMs-Finetuning-Safety

We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

Python 347 37 Updated Feb 23, 2024

xirui-li / DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers

JavaScript 66 13 Updated Aug 25, 2024

yueliu1999 / Awesome-Jailbreak-on-LLMs

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

1,304 107 Updated Mar 30, 2026

Yu-Fangxu / COLD-Attack

[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability

Python 175 23 Updated Dec 18, 2024

GodXuxilie / PromptAttack

An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)

Python 114 17 Updated Jan 21, 2025

JailbreakBench / jailbreakbench

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]

Python 576 69 Updated Apr 4, 2025

OpenSafetyLab / SALAD-BENCH

【ACL 2024】 SALAD benchmark & MD-Judge

Python 173 15 Updated Mar 8, 2025

walledai / walledeval

Test LLMs against jailbreaks and unprecedented harms

Python 39 9 Updated Oct 19, 2024

NJUNLP / ReNeLLM

The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily".

Python 158 17 Updated Sep 2, 2025

corca-ai / awesome-llm-security

A curation of awesome tools, documents and projects about LLM Security.

1,565 210 Updated Aug 20, 2025

CryptoAILab / JailbreakEval

[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.

Python 190 11 Updated Apr 1, 2025

tml-epfl / llm-adaptive-attacks

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]

Shell 381 45 Updated Jan 23, 2025

thu-coai / Reverse_Generation

Python 6 1 Updated Oct 24, 2022

jjbrophy47 / machine_unlearning

Existing Literature about Machine Unlearning

961 118 Updated Aug 29, 2025

liu00222 / Open-Prompt-Injection

This repository provides a benchmark for prompt injection attacks and defenses in LLMs

Python 428 67 Updated Oct 29, 2025

RobustNLP / CipherChat

A framework to evaluate the generalization capability of safety alignment for LLMs

Python 629 69 Updated Oct 9, 2025

patrickrchao / JailbreakingLLMs

Python 719 109 Updated Jul 2, 2025

centerforaisafety / HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 921 140 Updated Aug 16, 2024

agencyenterprise / PromptInject

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML …

Python 473 48 Updated Feb 26, 2024