Lists (3)
Sort Name ascending (A-Z)
Stars
Code for ICLR 2025 Paper "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
[WACV 2026] OpenLVLM-MIA: A Controlled Benchmark Revealing the Limits of Membership Inference Attacks on Large Vision-Language Models
ASIDE: Architectural Separation of Instructions and Data in Language Models
[NDSS 2025] CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling
Source code for Cascading and Proxy Membership Inference Attacks. NDSS 2026.
This repository provides a benchmark for prompt injection attacks and defenses in LLMs
[USENIX Security 2026] Membership Inference Attacks on Tokenizers of Large Language Models
Code for paper "Membership Inference Attacks Against Vision-Language Models"
LLM Council works together to answer your hardest questions
This is the repository for the USENIX Security'25 paper "Enhanced Label-Only Membership Inference Attacks with Fewer Queries" by Hao Li, Zheng Li, Siyuan Wu, Yutong Ye, Min Zhang, Dengguo Feng, and…
Fully automatic censorship removal for language models
[NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personali…
Source code for Imitative Membership Inference Attack. USENIX Security 2026.
Empowering RAG with a memory-based data interface for all-purpose applications!
Official implementation of our NeurIPS 2023 paper "Augmenting Language Models with Long-Term Memory".
[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
[USENIX Security 2025] SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks
Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
This repository contains the source code for "Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble", In Proceedings of ACM CCS 2025.
[NeurIPS D&B '25] The one-stop repository for LLM unlearning