A curated collection of papers, surveys, benchmarks, frameworks, and blog posts for machine unlearning in large language models.
As of the last commit, there are 558 papers, 17 surveys and position papers, 3 frameworks, and 2 blog posts.
If you believe your paper on LLM unlearning is not included, or if you find a mistake, typo, or information that is not up to date, please open an issue or submit a pull request, and I will be happy to update the list.
- TRACER: Token ReAssignment for Concept ERasure in Generative Recommendation
- Author(s): Ziheng Chen, Jiali Cheng, Zezhong Fan, Hadi Amiri, Diyuan Wu, Gabriele Tolomei, Yang Zhang
- Date: 2026-06
- Venue: -
- Code: -
- REMEDI: A Benchmark for Retention and Unlearning Evaluation in Multi-label Clinical Disease Inference
- Author(s): Anurag Sharma, Sai Teja Chunchu, Prasenjit Mitra, Sandipan Sikdar, Koustav Rudra
- Date: 2026-06
- Venue: -
- Code: -
- Learning What to Forget: Improving LLM Unlearning via Learned Token-Level Importance
- Author(s): Gizem Yüce, Giorgos Nikolaou, Nicolas Flammarion
- Date: 2026-06
- Venue: -
- Code: -
- Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs
- Author(s): Lisa Bouger, Théo Lasnier, Philippe Loubet Moundi, Yannick Teglia, Djamé Seddah
- Date: 2026-06
- Venue: -
- Code: -
- Don't Forget Your Embeddings: Robust Knowledge Erasure via Precise Editing of Embeddings
- Author(s): Clara Haya Suslik, Or Shafran, Mor Geva
- Date: 2026-06
- Venue: -
- Code: -
- Multilingual Unlearning in LLMs: Transfer, Dynamics, and Reversibility
- Author(s): Chaoyi Xiang, Olga Ohrimenko, Benjamin I. P. Rubinstein, Lea Frermann
- Date: 2026-06
- Venue: -
- Code: -
- Fast Unlearning at Scale via Margin Self-Correction
- Author(s): Federico Di Gennaro, Alexander Shevchenko, Fanny Yang
- Date: 2026-06
- Venue: -
- Code: -
- Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
- Author(s): Junkai Chen, Yuhao He, Junxiang You, Ruiqi Liu, Chenyu Wang, Shu Wu
- Date: 2026-06
- Venue: -
- Code: -
- Divergence Decoding: Inference-Time Unlearning via Auxiliary Models
- Author(s): Humzah Merchant, Bradford Levy
- Date: 2026-05
- Venue: -
- Code: -
- De-attribute to Forget for LLM Unlearning
- Author(s): Xinyang Lu, Jiabao Pan, Rachael Hwee Ling Sim, See-Kiong Ng, Anthony Kum Hoe Tung, Bryan Kian Hsiang Low
- Date: 2026-05
- Venue: -
- Code: -
- AMNESIA: A Large Scale Medical Unlearning Benchmark Suite with Disease-Informed Analysis
- Author(s): Saeedeh Davoudi, Reihaneh Iranmanesh, Ophir Frieder, Nazli Goharian
- Date: 2026-05
- Venue: -
- Code: -
- ICCU: In-Context Continual Unlearning via Pattern-Induced Refusal Rules
- Author(s): Ruihao Pan, Suhang Wang
- Date: 2026-05
- Venue: -
- Code: -
- On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning
- Author(s): Xiaotian Ye, Xiaohan Wang, Mengqi Zhang, Shu Wu
- Date: 2026-05
- Venue: -
- Code: -
- On the Robustness of Machine Unlearning for Vision-Language Models
- Model Unlearning Objectives Vary for Distinct Language Functions
- Author(s): Berk Atil, Vipul Gupta, Rebecca J. Passonneau
- Date: 2026-05
- Venue: -
- Code: -
- Measuring the Depth of LLM Unlearning via Activation Patching
- DualOptim+: Bridging Shared and Decoupled Optimizer States for Better Machine Unlearning in Large Language Models
- Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models
- Author(s): Divyaksh Shukla, Ashutosh Modi
- Date: 2026-05
- Venue: -
- Code: -
- ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models
- CATA: Continual Machine Unlearning via Conflict-Averse Task Arithmetic
- Author(s): Shen Lin, Junhao Dong, Rongjie Chen, Xiaoyu Zhang, Li Xu, Xiaofeng Chen
- Date: 2026-05
- Venue: -
- Code: -
- Machine Unlearning for Masked Diffusion Language Models
- Distinguishable Deletion: Unifying Knowledge Erasure and Refusal for Large Language Model Unlearning
- ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models
- Author(s): Jiahui Guang, Yingjie Zhu, Cuiyun Gao, Haiyan Wang, Jing Li, Di Shao, Zhaoquan Gu
- Date: 2026-05
- Venue: -
- Code: -
- Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution
- Author(s): Saisab Sadhu, Pratinav Seth, Vinay Kumar Sankarapu
- Date: 2026-05
- Venue: -
- Code: -
- Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation
- Author(s): Kyomin Hwang, Hyeonjin Kim, Sangyeon Cho, Nojun Kwak
- Date: 2026-05
- Venue: -
- Code: -
- ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition
- Author(s): Shen Lin, Jing Lin, Junhao Dong, Piotr Koniusz, Li Xu
- Date: 2026-05
- Venue: -
- Code: -
- Inference-Time Machine Unlearning via Gated Activation Redirection
- Author(s): Vinícius Conte Turani, Otávio Parraga, João Vitor Boer Abitante, Kristen K. Arguello, Joana Pasquali, Ramiro N. Barros, Flavio du Pin Calmon, Christian Mattjie, Rodrigo C. Barros, Lucas S. Kupssinskü
- Date: 2026-05
- Venue: -
- Code: -
- BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models
- Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter
- Author(s): Zeguan Xiao, Xuanzhe Xu, Yun Chen, Yong Wang, Jian Yang, Yanqing Hu, Guanhua Chen
- Date: 2026-05
- Venue: -
- Code: -
- PPU-Bench: Real World Benchmark for Personalized Partial Unlearning in Vision Language Models
- Author(s): Jiahui Guang, Zexun Zhan, Zhenlin Xu, Cuiyun Gao, Haiyan Wang, Jing Li, Zhaoquan Gu, Yanchun Zhang
- Date: 2026-05
- Venue: -
- Code: -
- Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning
- Author(s): Renjie Gu, Jiazhen Du, Yihua Zhang, Sijia Liu
- Date: 2026-05
- Venue: -
- Code: -
- Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models
- SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion
- Author(s): Zizhao Hu, Ameya Godbole, Johnny Tian-Zheng Wei, Mohammad Rostami, Jesse Thomason, Robin Jia
- Date: 2026-05
- Venue: -
- Code: -
- ICU-Bench: Benchmarking Continual Unlearning in Multimodal Large Language Models
- Author(s): Yuhang Wang, Wenjie Mei, Junkai Zhang, Guangyu He, Zhenxing Niu, Haichang Gao
- Date: 2026-05
- Venue: -
- Code: -
- Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning
- Author(s): Yuhang Wang, Zhenxing Niu, Haoxuan Ji, Guangyu He, Linlin Zhang, Haichang Gao
- Date: 2026-05
- Venue: -
- Code: -
- MidSteer: Optimal Affine Framework for Steering Generative Models
- Author(s): Tatiana Gaintseva, Andrew Stepanov, Ziquan Liu, Martin Benning, Gregory Slabaugh, Jiankang Deng, Ismail Elezi
- Date: 2026-05
- Venue: -
- Code: -
- Before Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarks
- Author(s): JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, JungMin Yun, Byeonggeuk Lim, YoungBin Kim
- Date: 2026-05
- Venue: ACL 2026 Findings
- Code: -
- Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models
- Author(s): JuneHyoung Kwon, JungMin Yun, YoungBin Kim
- Date: 2026-05
- Venue: LREC 2026
- Code: -
- Metric Unreliability in Multimodal Machine Unlearning: A Systematic Analysis and Principled Unified Score
- DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning
- Author(s): Abdullah Ahmad Khan, Ferdous Sohel
- Date: 2026-05
- Venue: -
- Code: -
- Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure
- Author(s): Chenchen Tan, Xinghao Li, Shujie Cui, Youyang Qu, Cunjian Chen, Longxiang Gao
- Date: 2026-05
- Venue: -
- Code: -
- Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance
- Author(s): Anamika Paul Rupa, Anietie Andy
- Date: 2026-05
- Venue: -
- Code: -
- LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning
- Author(s): Joseph Spracklen, Pedram Aghazadeh, Farinaz Koushanfar, Murtuza Jadliwala
- Date: 2026-05
- Venue: -
- Code: -
- Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning
- Author(s): Jiawei Wu, Doudou Zhou
- Date: 2026-05
- Venue: -
- Code: -
- When Forgetting Reveals: Black-Box Inversion Attacks on Unlearning in Large Language Models
- Author(s): Zijun Zhang, Bang Wu, Xingliang Yuan
- Date: 2026-05
- Venue: ESORICS 2025 International Workshops
- Code: -
- UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks
- Author(s): Tianlong Yu, Yang Yang, Xiao Luo, Lihong Liu, Fudu Xing, Zui Tao, Kailong Wang, Gaoyang Liu, Ting Bi
- Date: 2026-04
- Venue: -
- Code: -
- PrivUn: Unveiling Latent Ripple Effects and Shallow Forgetting in Privacy Unlearning
- Author(s): Xiaoyi Chen, Haoyuan Wang, Siyuan Tang, Sijia Liu, Liya Su, XiaoFeng Wang, Haixu Tang
- Date: 2026-04
- Venue: -
- Code: -
- CAP: Controllable Alignment Prompting for Unlearning in LLMs
- Author(s): Zhaokun Wang, Jinyu Guo, Jingwen Pu, Hongli Pu, Meng Yang, Xunlei Chen, Jie Ou, Wenyi Li, Guangchun Luo, Wenhong Tian
- Date: 2026-04
- Venue: ACL 2026
- Code: -
- Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens
- Author(s): Seunghee Koh, Sunghyun Baek, Youngdong Kim, Junmo Kim
- Date: 2026-04
- Venue: ACL 2026
- Code: -
- Representation-Guided Parameter-Efficient LLM Unlearning
- Author(s): Zeguan Xiao, Lang Mo, Yun Chen, Lei Yang, Jiehui Zhao, Lili Yang, Guanhua Chen
- Date: 2026-04
- Venue: -
- Code: -
- Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
- Author(s): Ziwen Liu, Huawei Lin, Yide Ran, Denghui Zhang, Jianwen Xie, Chuan Li, Weijie Zhao, Zhaozhuo Xu
- Date: 2026-04
- Venue: -
- Code: -
- CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization
- Author(s): Junyi Li, Yongqiang Chen, Ningning Ding
- Date: 2026-04
- Venue: ACL 2026
- Code: -
- Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation
- Author(s): Yisheng Zhong, Sijia Liu, Zhuangdi Zhu
- Date: 2026-04
- Venue: -
- Code: -
- Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem
- Author(s): Zeguan Xiao, Siqing Li, Yong Wang, Xuetao Wei, Jian Yang, Yun Chen, Guanhua Chen
- Date: 2026-04
- Venue: ACL 2026
- Code: -
- CURaTE: Continual Unlearning in Real Time with Ensured Preservation of LLM Knowledge
- Author(s): Seyun Bae, Seokhan Lee, Eunho Yang
- Date: 2026-04
- Venue: ACL 2026 Findings
- Code: -
- CausalDetox: Causal Head Selection and Intervention for Language Model Detoxification
- Author(s): Yian Wang, Yuen Chen, Agam Goyal, Hari Sundaram
- Date: 2026-04
- Venue: -
- Code: -
- From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models
- Author(s): Wenxuan Li, Zhenfei Zhang, Mi Zhang, Geng Hong, Mi Wen, Xiaoyu You, Min Yang
- Date: 2026-04
- Venue: -
- Code: -
- WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework
- Author(s): Xingjian Zhao, Mohammad Mohammadi Amiri, Malik Magdon-Ismail
- Date: 2026-04
- Venue: -
- Code: -
- RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair
- Author(s): Jagadeesh Rachapudi, Pranav Singh, Ritali Vatsi, Praful Hambarde, Amit Shukla
- Date: 2026-04
- Venue: -
- Code: -
- Operationalising the Right to be Forgotten in LLMs: A Lightweight Sequential Unlearning Framework for Privacy-Aligned Deployment in Politically Sensitive Environments
- Author(s): Esen Kurt, Haithem Afli
- Date: 2026-04
- Venue: PoliticalNLP 2026
- Code: -
- Mitigating Privacy Risk via Forget Set-Free Unlearning
- Author(s): Aviraj Newatia, Michael Cooper, Viet Nguyen, Rahul G. Krishnan
- Date: 2026-04
- Venue: -
- Code: -
- Latent Instruction Representation Alignment: defending against jailbreaks, backdoors and undesired knowledge in LLMs
- Author(s): Eric Easley, Sebastian Farquhar
- Date: 2026-04
- Venue: -
- Code: -
- Exclusive Unlearning
- Author(s): Mutsumi Sasaki, Kouta Nakayama, Yusuke Miyao, Yohei Oseki, Masaru Isonuma
- Date: 2026-04
- Venue: -
- Code: -
- Can Large Language Models Reinvent Foundational Algorithms?
- Author(s): Jian Zhao, Haoren Luo, Yu Wang, Yuhan Cao, Pingyue Sheng, Tianxing He
- Date: 2026-04
- Venue: -
- Code: -
- CURE:Circuit-Aware Unlearning for LLM-based Recommendation
- Author(s): Ziheng Chen, Jiali Cheng, Zezhong Fan, Hadi Amiri, Yunzhi Yao, Xiangguo Sun, Yang Zhang
- Date: 2026-04
- Venue: -
- Code: -
- Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning
- Author(s): Aobo Chen, Chenxu Zhao, Chenglin Miao, Mengdi Huai
- Date: 2026-04
- Venue: -
- Code: -
- Subspace Control: Turning Constrained Model Steering into Controllable Spectral Optimization
- VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models
- Author(s): Ravi Ranjan, Agoritsa Polyzou
- Date: 2026-04
- Venue: -
- Code: -
- Selective Forgetting for Large Reasoning Models
- Author(s): Tuan Le, Wei Qian, Mengdi Huai
- Date: 2026-04
- Venue: -
- Code: -
- Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
- Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Agents
- Author(s): Dayong Ye, Tainqing Zhu, Congcong Zhu, Feng He, Qi He, Shang Wang, Bo Liu, Wanlei Zhou
- Date: 2026-04
- Venue: -
- Code: -
- Towards Practical LLM Unlearning: Efficient, Modular, and Retain-Free
- Author(s): Peng Liu, Peng-Fei Zhang, Jianfeng Qu, Ximing Li, Zhixu Li, Pengpeng Zhao
- Date: 2026-04
- Venue: WWW 2026
- Code: -
- Simulating Novice Students Using Machine Unlearning and Relearning in Large Language Models
- Author(s): Jiajia Song, Zhihan Guo, Jionghao Lin
- Date: 2026-03
- Venue: -
- Code: -
- Which Concepts to Forget and How to Refuse? Decomposing Concepts for Continual Unlearning in Large Vision-Language Models
- Author(s): Hyundong Jin, Dongyoon Han, Eunwoo Kim
- Date: 2026-03
- Venue: -
- Code: -
- Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning
- Author(s): Iyad Ait Hou, Shrenik Borad, Harsh Sharma, Pooja Srinivasan, Rebecca Hwa, Aya Zirikly
- Date: 2026-03
- Venue: -
- Code: -
- RAZOR: Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models
- Author(s): Ravi Ranjan, Utkarsh Grover, Xiaomin Lin, Agoritsa Polyzou
- Date: 2026-03
- Venue: -
- Code: -
- Relationship-Aware Safety Unlearning for Multimodal LLMs
- Author(s): Vishnu Narayanan Anilkumar, Abhijith Sreesylesh Babu, Trieu Hai Vo, Mohankrishna Kolla, Alexander Cuneo
- Date: 2026-03
- Venue: -
- Code: -
- GONE: Structural Knowledge Unlearning via Neighborhood-Expanded Distribution Shaping
- Author(s): Chahana Dahal, Ashutosh Balasubramaniam, Zuobin Xiong
- Date: 2026-03
- Venue: -
- Code: -
- The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning
- Author(s): Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan, Nathalie Baracaldo, Diyi Yang
- Date: 2026-03
- Venue: -
- Code: -
- Explainable LLM Unlearning Through Reasoning
- Author(s): Junfeng Liao, Qizhou Wang, Shanshan Ye, Xin Yu, Ling Chen, Zhen Fang
- Date: 2026-03
- Venue: -
- Code: -
- ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs
- Author(s): Xunlei Chen, Jinyu Guo, Yuang Li, Zhaokun Wang, Yi Gong, Jie Zou, Jiwei Wei, Wenhong Tian
- Date: 2026-03
- Venue: -
- Code: -
- Attention Smoothing Is All You Need For Unlearning
- Author(s): Saleh Zare Zade, Xiangyu Zhou, Sijia Liu, Dongxiao Zhu
- Date: 2026-03
- Venue: -
- Code: -
- A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction
- Author(s): Ruihao Pan, Suhang Wang
- Date: 2026-03
- Venue: -
- Code: -
- ROKA: Robust Knowledge Unlearning against Adversaries
- Author(s): Jinmyeong Shin, Joshua Tapia, Nicholas Ferreira, Gabriel Diaz, Moayed Daneshyari, Hyeran Jeon
- Date: 2026-03
- Venue: -
- Code: -
- MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models
- U-CAN: Utility-Aware Contrastive Attenuation for Efficient Unlearning in Generative Recommendation
- Author(s): Zezheng Wu, Rui Wang, Xinghe Cheng, Yang Shao, Qing Yang, Jiapu Wang, Jingwei Zhang
- Date: 2026-02
- Venue: -
- Code: -
- Layer-Targeted Multilingual Knowledge Erasure in Large Language Models
- Author(s): Taoran Li, Varun Chandrasekaran, Zhiyuan Yu
- Date: 2026-02
- Venue: -
- Code: -
- Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning
- Author(s): Borisiuk Anna, Andrey Savchenko, Alexander Panchenko, Elena Tutubalina
- Date: 2026-02
- Venue: -
- Code: -
- KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models
- Author(s): Ce Fang, Zhikun Zhang, Min Chen, Qing Liu, Lu Zhou, Zhe Liu, Yunjun Gao
- Date: 2026-02
- Venue: -
- Code: -
- Agentic Unlearning: When LLM Agent Meets Machine Unlearning
- Author(s): Bin Wang, Fan Wang, Pingping Wang, Jinyu Cong, Yang Yu, Yilong Yin, Zhongyi Han, Benzheng Wei
- Date: 2026-02
- Venue: -
- Code: -
- MeGU: Machine-Guided Unlearning with Target Feature Disentanglement
- Author(s): Haoyu Wang, Zhuo Huang, Xiaolong Wang, Bo Han, Zhiwei Lin, Tongliang Liu
- Date: 2026-02
- Venue: -
- Code: -
- Quantization-Robust LLM Unlearning via Low-Rank Adaptation
- Author(s): João Vitor Boer Abitante, Joana Meneguzzo Pasquali, Luan Fonseca Garcia, Ewerton de Oliveira, Thomas da Silva Paula, Rodrigo C. Barros, Lucas S. Kupssinskü
- Date: 2026-02
- Venue: -
- Code: -
- Gauss-Newton Unlearning for the LLM Era
- Author(s): Lev McKinney, Anvith Thudi, Juhan Bae, Tara Rezaei, Nicolas Papernot, Sheila A. McIlraith, Roger Grosse
- Date: 2026-02
- Venue: -
- Code: -
- REBEL: Hidden Knowledge Recovery via Evolutionary-Based Evaluation Loop
- Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks
- Author(s): Guangwei Zhang, Jianing Zhu, Cheng Qian, Neil Gong, Rada Mihalcea, Zhaozhuo Xu, Jingrui He, Jiaqi Ma, Yun Huang, Chaowei Xiao, Bo Li, Ahmed Abbasi, Dongwon Lee, Heng Ji, Denghui Zhang
- Date: 2026-02
- Venue: -
- Code: -
- CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment
- Author(s): Zhengbang Yang, Yisheng Zhong, Junyuan Hong, Zhuangdi Zhu
- Date: 2026-02
- Venue: -
- Code: -
- AGT^AO: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality
- Sparsity-Aware Unlearning for Large Language Models
- Author(s): Yuze Wang, Yujia Tong, Ke Xu, Jingling Yuan, Jiawei Jiang, Chuang Hu
- Date: 2026-01
- Venue: -
- Code: -
- Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data
- Per-parameter Task Arithmetic for Unlearning in Large Language Models
- Author(s): Chengyi Cai, Zesheng Ye, Jiangchao Yao, Jianzhong Qi, Bo Han, Xiaolu Zhang, Feng Liu, Jun Zhou
- Date: 2026-01
- Venue: -
- Code: -
- From Logits to Latents: Contrastive Representation Shaping for LLM Unlearning
- Author(s): Haoran Tang, Rajiv Khanna
- Date: 2026-01
- Venue: -
- Code: -
- Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning
- Author(s): Chengyi Cai, Zesheng Ye, Peike Li, Bo Han, Jianzhong Qi, Feng Liu
- Date: 2026-01
- Venue: -
- Code: -
- Knowledge Vector Weakening: Efficient Training-free Unlearning for Large Vision-Language Models
- Author(s): Yejin Kim, Dongjun Hwang, Sungmin Cha, Junsuk Choe
- Date: 2026-01
- Venue: -
- Code: -
- Beyond Forgetting: Machine Unlearning Elicits Controllable Side Behaviors and Capabilities
- Author(s): Tien Dang, The-Hai Nguyen, Dinh Mai Phuong, Nguyen Minh Phuong, Hoang Thanh-Tung, Le-Minh Nguyen, Naoya Inoue
- Date: 2026-01
- Venue: -
- Code: -
- FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning
- Author(s): Xiaoyu Xu, Minxin Du, Kun Fang, Zi Liang, Yaxin Xiao, Zhicong Huang, Cheng Hong, Qingqing Ye, Haibo Hu
- Date: 2026-01
- Venue: -
- Code: -
- DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher
- Author(s): Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu
- Date: 2026-01
- Venue: -
- Code: -
- Reinforcement Unlearning via Group Relative Policy Optimization
- Author(s): Efstratios Zaradoukas, Bardh Prenkaj, Gjergji Kasneci
- Date: 2026-01
- Venue: ICLR 2026
- Code: -
- LLMs Can Unlearn Refusal with Only 1,000 Benign Samples
- Towards Fair Large Language Model-based Recommender Systems without Costly Retraining
- Unintended Memorization of Sensitive Information in Fine-Tuned Language Models
- Author(s): Marton Szep, Jorge Marin Ruiz, Georgios Kaissis, Paulina Seidl, Rüdiger von Eisenhart-Rothe, Florian Hinterwimmer, Daniel Rueckert
- Date: 2026-01
- Venue: -
- Code: -
- GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints
- Author(s): Andy Zhu, Rongzhe Wei, Yupu Gu, Pan Li
- Date: 2026-01
- Venue: -
- Code: -
- Beyond Superficial Unlearning: Sharpness-Aware Robust Erasure of Hallucinations in Multimodal LLMs
- Author(s): Xianya Fang, Feiyang Ren, Xiang Chen, Yu Tian, Zhen Bi, Haiyang Yu, Sheng-Jun Huang
- Date: 2026-01
- Venue: -
- Code: -
- Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning
- Author(s): Xinjie Zhou, Zhihui Yang, Lechao Cheng, Sai Wu, Gang Chen
- Date: 2026-01
- Venue: -
- Code: -
- QUAIL: Quantization Aware Unlearning for Mitigating Misinformation in LLMs
- Author(s): Himanshu Mishra, Kanwal Mehreen
- Date: 2026-01
- Venue: -
- Code: -
- Auditing Language Model Unlearning via Information Decomposition
- Author(s): Anmol Goel, Alan Ritter, Iryna Gurevych
- Date: 2026-01
- Venue: EACL 2026
- Code: -
- Representation-Aware Unlearning via Activation Signatures: From Suppression to Knowledge-Signature Erasure
- Author(s): Syed Naveed Mahmood, Md. Rezaur Rahman Bhuiyan, Tasfia Zaman, Jareen Tasneem Khondaker, Md. Sameer Sakib, K. M. Shadman Wadith, Nazia Tasnim, Farig Sadeque
- Date: 2026-01
- Venue: -
- Code: -
- Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric
- Author(s): Jiali Cheng, Ziheng Chen, Chirag Agarwal, Hadi Amiri
- Date: 2026-01
- Venue: -
- Code: -
- STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models
- Author(s): Jingjing Zhou, Gaoxiang Cong, Li Su, Liang Li
- Date: 2026-01
- Venue: -
- Code: -
- BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning
- Author(s): Pengyang Shao, Naixin Zhai, Lei Chen, Yonghui Yang, Fengbin Zhu, Xun Yang, Meng Wang
- Date: 2026-01
- Venue: -
- Code: -
- Consistency-Aware Editing for Entity-level Unlearning in Language Models
- Author(s): Xiaoqi Han, Víctor Gutiérrez-Basulto, Ru Li, Xiaoli Li, Jiye Liang, Jeff Z. Pan
- Date: 2026-01
- Venue: -
- Code: -
- ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models
- Forgetting Similar Samples: Can Machine Unlearning Do it Better?
- Author(s): Heng Xu, Tianqing Zhu, Dayong Ye, Lefeng Zhang, Le Wang, Wanlei Zhou
- Date: 2026-01
- Venue: -
- Code: -
- Evaluating Cross-Lingual Unlearning in Multilingual Language Models
- Author(s): Tyler Lizzo, Larry Heck
- Date: 2026-01
- Venue: -
- Code: -
- Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs
- Author(s): Alireza Dehghanpour Farashah, Aditi Khandelwal, Marylou Fauchard, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi
- Date: 2026-01
- Venue: -
- Code: -
- From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning
- Author(s): Xiaoyu Xu, Minxin Du, Zitong Li, Zi Liang, Zhibiao Guo, Shiyu Zhang, Peizhao Hu, Qingqing Ye, Haibo Hu
- Date: 2026-01
- Venue: -
- Code: -
- Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs
- Author(s): Dinesh Srivasthav P, Ashok Urlana, Rahul Mishra, Bala Mallikarjunarao Garlapati, Ponnurangam Kumaraguru
- Date: 2026-01
- Venue: -
- Code: -
- Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning
- Author(s): Naixin Zhai, Pengyang Shao, Binbin Zheng, Yonghui Yang, Fei Shen, Long Bai, Xun Yang
- Date: 2026-01
- Venue: -
- Code: -
- JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification
- Author(s): Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Zhaoye Li, Bin Ji, Baosheng Wang, Jie Yu
- Date: 2026-01
- Venue: -
- Code: -
- UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk
- Author(s): Intae Jeon, Yujeong Kwon, Hyungjoon Koo
- Date: 2026-01
- Venue: -
- Code: -
- Knowledge Externalization: Reversible Unlearning and Modular Retrieval in Multimodal Large Language Models
- Author(s): Jiaqi Li, Zihan You, Ruoyan Shen, Shenyu Zhang, Songlin Zhai, Yongrui Chen, Chuanyi Zhang, Jiahui Geng, Fakhri Karray, Sheng Bi, Guilin Qi
- Date: 2026-01
- Venue: ICLR 2026
- Code: -
- Robust LLM Unlearning via Post Judgment and Multi-round Thinking
- DAWI: Dual Anchored Weighted Interpolation for LLM Unlearning
- Author(s): Jonathan Zhou
- Date: 2026-01
- Venue: ICLR 2026
- Code: -
- Explicit Representation Alignment via Subspace Elimination for Robust LLM Unlearning
- Author(s): Keonwoo Kim, Hyowon Cho, Sungwon Chae, Sangwon Yoon, Min Choi, Hoki Kim
- Date: 2026-01
- Venue: ICLR 2026
- Code: -
- Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs
-
UMU-Bench: Closing the Modality Gap in Multimodal Unlearning Evaluation
-
Elastic Robust Unlearning of Specific Knowledge in Large Language Models
- Author(s): Yize Sui, Jing Ren, Wenjing Yang, Ruochun Jin, Liyang Xu, Xiyao Liu, Ji Wang
- Date: 2025-12
- Venue: NeurIPS 2025
- Code: -
-
MPSelectTune: Prompt-type Selection for Fine-tuning improves Concept Unlearning in LLMs
- Author(s): Shubhadip Nag, Srinjoy Das, Agniva Saha, Anushree Ghosh, Soumi Das, Tarun Kumar, Suparna Bhattacharya, Sourangshu Bhattacharya
- Date: 2025-12
- Venue: NeurIPS 2025 Reliable ML Workshop
- Code: -
-
Identifying Unlearned Data in LLMs via Membership Inference Attacks
-
Unlearners Can Lie: Evaluating "Honesty" in LLM Unlearning
- Author(s): Renjie Gu, Jiazhen Du, Yihua Zhang, Sijia Liu
- Date: 2025-10
- Venue: NeurIPS 2025 Lock-LLM Workshop
- Code: -
-
The Role of Learning and Memorization in Relabeling-based Unlearning for LLMs
- Author(s): Xinyu Zhou, Pushen Wang, Ehsan Saleh, Raef Bassily, Jia Liu
- Date: 2025-09
- Venue: -
- Code: -
-
On the Fragility of Latent Knowledge: Layer-wise Influence under Unlearning in Large Language Model
- Author(s): Jianing Zhu, Zongze Li, Chandler Squires, Qizhou Wang, Bo Han, Pradeep Ravikumar
- Date: 2025-09
- Venue: -
- Code: -
-
Lifelong Unlearning for Multimodal Large Language Models
- Author(s): He Li, Haoang Chi, Qizhou Wang, Yunxin Mao, Jie Tan, Tongliang Liu, Wenjing Yang, Bo Han
- Date: 2025-09
- Venue: -
- Code: -
-
MOUCHI: Mitigating Over-forgetting in Unlearning Copyrighted Information
- Author(s): Irfan Akbar, Dongmin Park, Patara Trirat, Jae-Gil Lee
- Date: 2025-09
- Venue: -
- Code: -
-
Provably Continual Unlearning for Large Language Model
- Author(s): Haoran Shi, Hao Zhu, Han Yu, Lizhen Cui, Yifei Zhang, Piotr Koniusz
- Date: 2025-09
- Venue: -
- Code: -
-
SELU: Energy-based Targeted Unlearning in LLMs
- Author(s): Passawis Chaiyapattanaporn, Pontus Stenetorp, Yihong Chen
- Date: 2025-09
- Venue: -
- Code: -
-
- Author(s): Zhangheng Li, Junyuan Hong, Jianing Zhu, Sungmin Eum, Shuowen Hu, Suya You, Zhangyang Wang
- Date: 2025-09
- Venue: -
- Code: -
-
Refusal Is Not an Option: Unlearning Safety Alignment of Large Language Models
-
- Author(s): Jiaqi Li, Chuanyi Zhang, Miaozeng Du, Hui Zhang, Yongrui Chen, Qianshan Wei, Junfeng Fang, Ruipeng Wang, Sheng Bi, Guilin Qi
- Date: 2025-07
- Venue: ACL 2025 Findings
- Code: -
-
Rethinking Unlearning for Large Reasoning Models
- Author(s): Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu
- Date: 2025-06
- Venue: ICML 2025 Workshop on Machine Unlearning for Generative AI
- Code: -
-
Unlearning in Large Language Models: We Are Not There Yet
- Author(s): Alberto Blanco-Justicia, Josep Domingo-Ferrer, Najeeb Moharram Jebreel, Benet Manzanares-Salor, David Sanchez
- Date: 2025-01
- Venue: IEEE Computer 2025
- Code: -
-
Investigating Model Editing for Unlearning in Large Language Models
- Author(s): Shariqah Hossain, Lalana Kagal
- Date: 2025-12
- Venue: -
- Code: -
-
The Erasure Illusion: Stress-Testing the Generalization of LLM Forgetting Evaluation
- Author(s): Hengrui Jia, Taoran Li, Jonas Guan, Varun Chandrasekaran
- Date: 2025-12
- Venue: -
- Code: -
-
Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models
- Author(s): Wei Qian, Chenxu Zhao, Yangyi Li, Mengdi Huai
- Date: 2025-12
- Venue: -
- Code: -
-
Towards Reasoning-Preserving Unlearning in Multimodal Large Language Models
- Author(s): Hongji Li, Junchi Yao, Manjiang Yu, Priyanka Singh, Xue Li, Di Wang, Lijie Hu
- Date: 2025-12
- Venue: -
- Code: -
-
Feature-Selective Representation Misdirection for Machine Unlearning
- Author(s): Taozhao Chen, Linghan Huang, Kim-Kwang Raymond Choo, Huaming Chen
- Date: 2025-12
- Venue: -
- Code: -
-
FAME: Fictional Actors for Multilingual Erasure
- Author(s): Claudio Savelli, Moreno La Quatra, Alkis Koudounas, Flavio Giobergia
- Date: 2025-12
- Venue: -
- Code: -
-
Explainable reinforcement learning from human feedback to improve alignment
- Author(s): Shicheng Liu, Siyuan Xu, Wenjie Qiu, Hangfan Zhang, Minghui Zhu
- Date: 2025-12
- Venue: -
- Code: -
-
FROC: A Unified Framework with Risk-Optimized Control for Machine Unlearning in LLMs
- Author(s): Si Qi Goh, Yongsen Zheng, Ziyao Liu, Sami Hormi, Kwok-Yan Lam
- Date: 2025-12
- Venue: -
- Code: -
-
MLLM Machine Unlearning via Visual Knowledge Distillation
- Author(s): Yuhang Wang, Zhenxing Niu, Haoxuan Ji, Guangyu He, Haichang Gao, Gang Hua
- Date: 2025-12
- Venue: -
- Code: -
-
MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI
- Author(s): Fengli Wu, Vaidehi Patil, Jaehong Yoon, Yue Zhang, Mohit Bansal
- Date: 2025-12
- Venue: -
- Code: -
-
LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples
- Author(s): Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, Mohsen Imani
- Date: 2025-12
- Venue: -
- Code: -
-
Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning
- Author(s): Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, Mohsen Imani
- Date: 2025-12
- Venue: -
- Code: -
-
Delete and Retain: Efficient Unlearning for Document Classification
- Author(s): Aadya Goel, Mayuri Sridhar
- Date: 2025-12
- Venue: -
- Code: -
-
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
- Author(s): Igor Shilov, Alex Cloud, Aryo Pradipta Gema, Jacob Goldman-Wetzler, Nina Panickssery, Henry Sleight, Erik Jones, Cem Anil
- Date: 2025-12
- Venue: -
- Code: -
-
When Forgetting Builds Reliability: LLM Unlearning for Reliable Hardware Code Generation
- Author(s): Yiwen Liang, Qiufeng Li, Shikai Wang, Weidong Cao
- Date: 2025-12
- Venue: -
- Code: -
-
RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Model Unlearning
- Author(s): Guoshenghui Zhao, Huawei Lin, Weijie Zhao
- Date: 2025-12
- Venue: -
- Code: -
-
RippleBench: Capturing Ripple Effects Using Existing Knowledge Repositories
- Author(s): Roy Rinberg, Usha Bhalla, Igor Shilov, Flavio P. Calmon, Rohit Gandikota
- Date: 2025-12
- Venue: -
- Code: -
-
Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning
- Author(s): Zhen Zeng, Leijiang Gu, Zhangling Duan, Feng Li, Zenglin Shi, Cees G. M. Snoek, Meng Wang
- Date: 2025-11
- Venue: -
- Code: -
-
- Author(s): Yi Zhang, Tianxiang Xu, Zijian Li, Chao Zhang, Kunyu Zhang, Zhan Gao, Meinuo Li, Xiaohan Zhang, Qichao Qi, Bing Chen
- Date: 2025-11
- Venue: -
- Code: -
-
SineProject: Machine Unlearning for Stable Vision Language Alignment
- Author(s): Arpit Garg, Hemanth Saratchandran, Simon Lucey
- Date: 2025-11
- Venue: -
- Code: -
-
From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs
- Author(s): Erum Mushtaq, Anil Ramakrishna, Satyapriya Krishna, Sattvik Sahai, Prasoon Goyal, Kai-Wei Chang, Tao Zhang, Rahul Gupta
- Date: 2025-11
- Venue: -
- Code: -
-
Forgetting-MarI: LLM Unlearning via Marginal Information Regularization
- Author(s): Shizhou Xu, Yuan Ni, Stefan Broecker, Thomas Strohmer
- Date: 2025-11
- Venue: -
- Code: -
-
AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models
- Author(s): Haokun Chen, Jianing Li, Yao Zhang, Jinhe Bi, Yan Xia, Jindong Gu, Volker Tresp
- Date: 2025-11
- Venue: -
- Code: -
-
Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting
- Author(s): James Jin Kang, Dang Bui, Thanh Pham, Huo-Chong Ling
- Date: 2025-11
- Venue: -
- Code: -
-
- Author(s): Feng Guo, Yuntao Wen, Shen Gao, Junshuo Zhang, Shuo Shang
- Date: 2025-11
- Venue: -
- Code: -
-
Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models
- Author(s): Kunhao Li, Wenhao Li, Di Wu, Lei Yang, Jun Bai, Ju Jia, Jason Xue
- Date: 2025-11
- Venue: -
- Code: -
-
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
- Author(s): Yaxuan Wang, Chris Yuhao Liu, Quan Liu, Jinglong Pang, Wei Wei, Yujia Bao, Yang Liu
- Date: 2025-11
- Venue: -
- Code: -
-
Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
- Author(s): Hadi Reisizadeh, Jiajun Ruan, Yiwei Chen, Soumyadeep Pal, Sijia Liu, Mingyi Hong
- Date: 2025-11
- Venue: -
- Code: -
-
REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs
- Author(s): Liran Cohen, Yaniv Nemcovesky, Avi Mendelson
- Date: 2025-11
- Venue: -
- Code: -
-
The Realignment Problem: When Right becomes Wrong in LLMs
- Author(s): Aakash Sen Sharma, Debdeep Sanyal, Vivek Srivastava, Shirish Karande, Murari Mandal
- Date: 2025-11
- Venue: -
- Code: -
-
- Author(s): Aakriti Shah, Thai Le
- Date: 2025-10
- Venue: -
- Code: -
-
From Memorization to Reasoning in the Spectrum of Loss Curvature
- Author(s): Jack Merullo, Srihita Vatsavaya, Lucius Bushnaq, Owen Lewis
- Date: 2025-10
- Venue: -
- Code: -
-
Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs
- Author(s): Kyomin Hwang, Hyeonjin Kim, Seungyeon Kim, Sunghyun Wee, Nojun Kwak
- Date: 2025-10
- Venue: -
- Code: -
-
Probing Knowledge Holes in Unlearned LLMs
- Author(s): Myeongseob Ko, Hoang Anh Just, Charles Fleming, Ming Jin, Ruoxi Jia
- Date: 2025-10
- Venue: -
- Code: -
-
OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models
- Author(s): Hao Zheng, Zirui Pang, Ling li, Zhijie Deng, Yuhan Pu, Zhaowei Zhu, Xiaobo Xia, Jiaheng Wei
- Date: 2025-10
- Venue: -
- Code: -
-
Label Smoothing Improves Gradient Ascent in LLM Unlearning
- Author(s): Zirui Pang, Hao Zheng, Zhijie Deng, Ling Li, Zixin Zhong, Jiaheng Wei
- Date: 2025-10
- Venue: -
- Code: -
-
Leverage Unlearning to Sanitize LLMs
- Author(s): Antoine Boutet, Lucas Magnana
- Date: 2025-10
- Venue: -
- Code: -
-
Hubble: a Model Suite to Advance the Study of LLM Memorization
- Author(s): Johnny Tian-Zheng Wei, Ameya Godbole, Mohammad Aflah Khan, Ryan Wang, Xiaoyuan Zhu, James Flemings, Nitya Kashyap, Krishna P. Gummadi, Willie Neiswanger, Robin Jia
- Date: 2025-10
- Venue: -
- Code: -
-
LLM Unlearning with LLM Beliefs
- Author(s): Kemou Li, Qizhou Wang, Yue Wang, Fengpeng Li, Jun Liu, Bo Han, Jiantao Zhou
- Date: 2025-10
- Venue: -
- Code: -
-
Forget to Know, Remember to Use: Context-Aware Unlearning for Large Language Models
- Author(s): Yuefeng Peng, Parnian Afshar, Megan Ganji, Thomas Butler, Amir Houmansadr, Mingxian Wang, Dezhi Hong
- Date: 2025-10
- Venue: -
- Code: -
-
Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting
- Author(s): Chenchen Tan, Youyang Qu, Xinghao Li, Hui Zhang, Shujie Cui, Cunjian Chen, Longxiang Gao
- Date: 2025-10
- Venue: -
- Code: -
-
Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning
- Author(s): Bingqi Shang, Yiwei Chen, Yihua Zhang, Bingquan Shen, Sijia Liu
- Date: 2025-10
- Venue: -
- Code: -
-
Hierarchical Federated Unlearning for Large Language Models
- Author(s): Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu
- Date: 2025-10
- Venue: -
- Code: -
-
On the Impossibility of Retrain Equivalence in Machine Unlearning
- Author(s): Jiatong Yu, Yinghui He, Anirudh Goyal, Sanjeev Arora
- Date: 2025-10
- Venue: -
- Code: -
-
Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check
- Author(s): Sungjun Cho, Dasol Hwang, Frederic Sala, Sangheum Hwang, Kyunghyun Cho, Sungmin Cha
- Date: 2025-10
- Venue: -
- Code: -
-
LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data
- Author(s): Changsheng Wang, Yihua Zhang, Dennis Wei, Jinghan Jia, Pin-Yu Chen, Sijia Liu
- Date: 2025-10
- Venue: -
- Code: -
-
Approximate Domain Unlearning for Vision-Language Models
- Author(s): Kodai Kawamura, Yuta Goto, Rintaro Yanagi, Hirokatsu Kataoka, Go Irie
- Date: 2025-10
- Venue: -
- Code: -
-
SIMU: Selective Influence Machine Unlearning
- Author(s): Anu Agarwal, Mihir Pamnani, Dilek Hakkani-Tur
- Date: 2025-10
- Venue: -
- Code: -
-
LLM Unlearning Under the Microscope: A Full-Stack View on Methods and Metrics
- Author(s): Chongyu Fan, Changsheng Wang, Yancheng Huang, Soumyadeep Pal, Sijia Liu
- Date: 2025-10
- Venue: -
- Code: -
-
Cross-Modal Attention Guided Unlearning in Vision-Language Models
- Author(s): Karuna Bhaila, Aneesh Komanduri, Minh-Hao Van, Xintao Wu
- Date: 2025-10
- Venue: -
- Code: -
-
(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs
- Author(s): Jiashu Tao, Reza Shokri
- Date: 2025-10
- Venue: -
- Code: -
-
Distribution Preference Optimization: A Fine-grained Perspective for LLM Unlearning
- Author(s): Kai Qin, Jiaqi Wu, Jianxiang He, Haoyuan Sun, Yifei Zhao, Bin Liang, Yongzhe Chang, Tiantian Zhang, Houde Liu
- Date: 2025-10
- Venue: -
- Code: -
-
- Author(s): Chenlu Ding, Jiancan Wu, Leheng Sheng, Fan Zhang, Yancheng Yuan, Xiang Wang, Xiangnan He
- Date: 2025-10
- Venue: -
- Code: -
-
Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs
- Author(s): Fatmazohra Rezkellah, Ramzi Dakhmouche
- Date: 2025-10
- Venue: -
- Code: -
-
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
- Author(s): Yicheng Lang, Yihua Zhang, Chongyu Fan, Changsheng Wang, Jinghan Jia, Sijia Liu
- Date: 2025-10
- Venue: -
- Code: -
-
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
- Author(s): Yinyi Luo, Zhexian Zhou, Hao Chen, Kai Qiu, Marios Savvides, Sharon Li, Jindong Wang
- Date: 2025-10
- Venue: -
- Code: -
-
Direct Token Optimization: A Self-contained Approach to Large Language Model Unlearning
- Author(s): Hong kyu Lee, Ruixuan Liu, Li Xiong
- Date: 2025-09
- Venue: -
- Code: -
-
Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions
- Author(s): Junbeom Kim, Kyuyoung Kim, Jihoon Tack, Dongha Lim, Jinwoo Shin
- Date: 2025-09
- Venue: -
- Code: -
-
- Author(s): Xiang Zhang, Kun Wei, Xu Yang, Chenghao Xu, Su Yan, Cheng Deng
- Date: 2025-09
- Venue: -
- Code: -
-
Mitigating Biases in Language Models via Bias Unlearning
- Author(s): Dianqing Liu, Yi Liu, Guoqing Jin, Zhendong Mao
- Date: 2025-09
- Venue: -
- Code: -
-
Understanding the Dilemma of Unlearning for Large Language Models
- Author(s): Qingjie Zhang, Haoting Qian, Zhicong Huang, Cheng Hong, Minlie Huang, Ke Xu, Chao Zhang, Han Qiu
- Date: 2025-09
- Venue: -
- Code: -
-
Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs
- Author(s): Arpit Garg, Hemanth Saratchandran, Ravi Garg, Simon Lucey
- Date: 2025-09
- Venue: -
- Code: -
-
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
- Author(s): Han Yan, Zheyuan Liu, Meng Jiang
- Date: 2025-09
- Venue: -
- Code: -
-
OFMU: Optimization-Driven Framework for Machine Unlearning
- Author(s): Sadia Asif, Mohammad Mohammadi Amiri
- Date: 2025-09
- Venue: -
- Code: -
-
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
- Author(s): Nakyeong Yang, Dong-Kyum Kim, Jea Kwon, Minsung Kim, Kyomin Jung, Meeyoung Cha
- Date: 2025-09
- Venue: -
- Code: -
-
CLUE: Conflict-guided Localization for LLM Unlearning Framework
- Author(s): Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang
- Date: 2025-09
- Venue: -
- Code: -
-
Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization
- Author(s): Wenhan Wu, Zheyuan Liu, Chongyang Gao, Ren Wang, Kaize Ding
- Date: 2025-09
- Venue: -
- Code: -
-
- Author(s): 'Mina Arzaghi', 'Alireza Dehghanpour Farashah', 'Florian Carichon', ' Golnoosh Farnadi'
- Date: 2025-09
- Venue: -
- Code: -
-
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models
- Author(s): Tomoya Yamashita, Akira Ito, Yuuki Yamanaka, Masanori Yamada, Takayuki Miura, Toshiki Shibahara
- Date: 2025-09
- Venue: -
- Code: -
-
Concept Unlearning in Large Language Models via Self-Constructed Knowledge Triplets
- Author(s): Tomoya Yamashita, Yuuki Yamanaka, Masanori Yamada, Takayuki Miura, Toshiki Shibahara, Tomoharu Iwata
- Date: 2025-09
- Venue: -
- Code: -
-
Reveal and Release: Iterative LLM Unlearning with Self-generated Data
- Author(s): Linxi Xie, Xin Teng, Shichang Ke, Hongyi Wen, Shengjie Wang
- Date: 2025-09
- Venue: -
- Code: -
-
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
- Author(s): Zhaoyang Chu, Yao Wan, Zhikun Zhang, Di Wang, Zhou Yang, Hongyu Zhang, Pan Zhou, Xuanhua Shi, Hai Jin, David Lo
- Date: 2025-09
- Venue: -
- Code: -
-
Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning
- Author(s): Filip Sondej, Yushi Yang
- Date: 2025-09
- Venue: -
- Code: -
-
Customized Retrieval-Augmented Generation with LLM for Debiasing Recommendation Unlearning
- Author(s): Haichao Zhang, Chong Zhang, Peiyu Hu, Shi Qiu, Jia Wang
- Date: 2025-09
- Venue: -
- Code: -
-
AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs
- Author(s): Debdeep Sanyal, Manodeep Ray, Murari Mandal
- Date: 2025-09
- Venue: -
- Code: -
-
- Author(s): Aysenur Kocak, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci
- Date: 2025-09
- Venue: -
- Code: -
-
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
- Author(s): Qinyan Zhang, Xinping Lei, Ruijie Miao, Yu Fu, Haojie Fan, Le Chang, Jiafan Hou, Dingling Zhang, Zhongfei Hou, Ziqiang Yang, Changxin Pu, Fei Hu, Jingkai Liu, Mengyun Liu, Yang Liu, Xiang Gao, Jiaheng Liu, Tong Yang, Zaiyuan Wang, Ge Zhang, Wenhao Huang
- Date: 2025-09
- Venue: -
- Code: -
-
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
- Author(s): Naman Deep Singh, Maximilian Müller, Francesco Croce, Matthias Hein
- Date: 2025-09
- Venue: -
- Code: -
-
Standard vs. Modular Sampling: Best Practices for Reliable LLM Unlearning
- Author(s): Praveen Bushipaka, Lucia Passaro, Tommaso Cucinotta
- Date: 2025-08
- Venue: -
- Code: -
-
Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning
- Author(s): Yejin Kim, Eunwon Kim, Buru Chang, Junsuk Choe
- Date: 2025-08
- Venue: -
- Code: -
-
- Author(s): Zhihao Liu, Jian Lou, Yuke Hu, Xiaochen Li, Tailun Chen, Yitian Chen, Zhan Qin
- Date: 2025-08
- Venue: -
- Code: -
-
Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery
- Author(s): Robert Yang
- Date: 2025-08
- Venue: -
- Code: -
-
Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection
- Author(s): Chengcan Wu, Zeming Wei, Huanran Chen, Yinpeng Dong, Meng Sun
- Date: 2025-08
- Venue: -
- Code: -
-
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
- Author(s): Xiangman Li, Xiaodong Wu, Qi Li, Jianbing Ni, Rongxing Lu
- Date: 2025-08
- Venue: -
- Code: -
-
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
- Author(s): Tomer Ashuach, Dana Arad, Aaron Mueller, Martin Tutek, Yonatan Belinkov
- Date: 2025-08
- Venue: -
- Code: -
-
Unlearning at Scale: Implementing the Right to be Forgotten in Large Language Models
- Author(s): Abdullah X
- Date: 2025-08
- Venue: -
- Code: -
-
Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models
- Author(s): Fuyao Zhang, Xinyu Yan, Tiantong Wu, Wenjie Li, Tianxiang Chen, Yang Cao, Ran Yan, Longtao Huang, Wei Yang Bryan Lim, Qiang Yang
- Date: 2025-08
- Venue: -
- Code: -
-
- Author(s): Stanley Ngugi
- Date: 2025-08
- Venue: -
- Code: -
-
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
-
LLM Unlearning using Gradient Ratio-Based Influence Estimation and Noise Injection
- Author(s): Ameya Anjarlekar, Sandeep Pombra
- Date: 2025-08
- Venue: -
- Code: -
-
LLM Unlearning Without an Expert Curated Dataset
- Author(s): Xiaoyuan Zhu, Muru Zhang, Ollie Liu, Robin Jia, Willie Neiswanger
- Date: 2025-08
- Venue: -
- Code: -
-
Analyzing and Mitigating Object Hallucination: A Training Bias Perspective
- Author(s): Yifan Li, Kun Zhou, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen
- Date: 2025-08
- Venue: -
- Code: -
-
From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models
- Author(s): Dunyuan Xu, Xikai Yang, Yaoqian Li, Jinpeng Li, Pheng-Ann Heng
- Date: 2025-08
- Venue: -
- Code: -
-
DUP: Detection-guided Unlearning for Backdoor Purification in Language Models
- Author(s): Man Hu, Yahui Ding, Yatao Yang, Liangyu Chen, Yanhao Jia, Shuai Zhao
- Date: 2025-08
- Venue: -
- Code: -
-
Towards Evaluation for Real-World LLM Unlearning
- Author(s): Ke Miao, Yuke Hu, Xiaochen Li, Wenjie Bao, Zhihao Liu, Zhan Qin, Kui Ren
- Date: 2025-08
- Venue: -
- Code: -
-
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
- Author(s): Ziqian Zhong, Aditi Raghunathan
- Date: 2025-07
- Venue: -
- Code: -
-
- Author(s): Yujian Sun, Tian Li
- Date: 2025-07
- Venue: -
- Code: -
-
What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests
- Author(s): Dimitri Staufer
- Date: 2025-07
- Venue: ECML PKDD 2025 (XKDD Workshop)
- Code: -
-
Automating Evaluation of Diffusion Model Unlearning with (Vision-) Language Model World Knowledge
- Author(s): Eric Yeats, Darryl Hannan, Henry Kvinge, Timothy Doster, Scott Mahan
- Date: 2025-07
- Venue: -
- Code: -
-
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
- Author(s): Alexander Xiong, Xuandong Zhao, Aneesh Pappu, Dawn Song
- Date: 2025-07
- Venue: -
- Code: -
-
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
- Author(s): Yan Scholten, Sophie Xhonneux, Leo Schwinn, Stephan Günnemann
- Date: 2025-07
- Venue: -
- Code: -
-
Unlearning the Noisy Correspondence Makes CLIP More Robust
- Author(s): Haochen Han, Alex Jinpeng Wang, Peijun Ye, Fangming Liu
- Date: 2025-07
- Venue: -
- Code: -
-
PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning
- Author(s): Tatsuki Kawakami, Kazuki Egashira, Atsuyuki Miyai, Go Irie, Kiyoharu Aizawa
- Date: 2025-07
- Venue: -
- Code: -
-
Agents Are All You Need for LLM Unlearning
- Author(s):Debdeep Sanyal, Murari Mandal
- Date: 2025-07
- Venue: -
- Code: -
-
- Author(s): Aly M. Kassem, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi
- Date: 2025-07
- Venue: -
- Code: -
-
SoK: Semantic Privacy in Large Language Models
- Author(s): Baihe Ma, Yanna Jiang, Xu Wang, Guangsheng Yu, Qin Wang, Caijun Sun, Chen Li, Xuelei Qi, Ying He, Wei Ni, Ren Ping Liu
- Date: 2025-06
- Venue: -
- Code: -
-
Model State Arithmetic for Machine Unlearning
- Author(s): Keivan Rezaei, Mehrdad Saberi, Abhilasha Ravichander, Soheil Feizi
- Date: 2025-06
- Venue: -
- Code: -
-
Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models
- Author(s): Yash Sinha, Manit Baser, Murari Mandal, Dinil Mon Divakaran, Mohan Kankanhalli
- Date: 2025-06
- Venue: -
- Code: -
-
Does Multimodal Large Language Model Truly Unlearn? Stealthy MLLM Unlearning Attack
- Author(s): Xianren Zhang, Hui Liu, Delvin Ce Zhang, Xianfeng Tang, Qi He, Dongwon Lee, Suhang Wang
- Date: 2025-06
- Venue: -
- Code: -
-
Large Language Model Unlearning for Source Code
- Author(s): Xue Jiang, Yihong Dong, Zheng Fang, Yingwei Ma, Tangxinyu Wang, Rongyu Cao, Binhua Li, Zhi Jin, Wenpin Jiao, Yongbin Li, Ge Li
- Date: 2025-06
- Venue: -
- Code: -
-
Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU
- Author(s): Arjun Dosajh, Mihika Sanghi
- Date: 2025-06
- Venue: -
- Code: -
-
BLUR: A Benchmark for LLM Unlearning Robust to Forget-Retain Overlap
- Author(s): Shengyuan Hu, Neil Kale, Pratiksha Thaker, Yiwei Fu, Steven Wu, Virginia Smith
- Date: 2025-06
- Venue: -
- Code: -
-
Learning-Time Encoding Shapes Unlearning in LLMs
- Author(s): Ruihan Wu, Konstantin Garov, Kamalika Chaudhuri
- Date: 2025-06
- Venue: -
- Code: -
-
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
- Author(s): Yiwei Chen, Soumyadeep Pal, Yimeng Zhang, Qing Qu, Sijia Liu
- Date: 2025-06
- Venue: -
- Code: -
-
Align-then-Unlearn: Embedding Alignment for LLM Unlearning
- Author(s): Philipp Spohn, Leander Girrbach, Jessica Bader, Zeynep Akata
- Date: 2025-06
- Venue: -
- Code: -
-
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
-
Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
-
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics
- Author(s): Vineeth Dorna, Anmol Mekala, Wenlong Zhao, Andrew McCallum, Zachary C. Lipton, J. Zico Kolter, Pratyush Maini
- Date: 2025-06
- Venue: -
- Code: -
-
Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization
- Author(s): Filip Sondej, Yushi Yang, Mikołaj Kniejski, Marcel Windys
- Date: 2025-06
- Venue: -
- Code: -
-
UCD: Unlearning in LLMs via Contrastive Decoding
- Author(s): Vinith M. Suriyakumar, Ayush Sekhari, Ashia Wilson
- Date: 2025-06
- Venue: -
- Code: -
-
Lifting Data-Tracing Machine Unlearning to Knowledge-Tracing for Foundation Models
-
GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models
- Author(s): Evelyn Ma, Duo Zhou, Peizhi Niu, Huiting Zhou, Huan Zhang, Olgica Milenkovic, S. Rasoul Etesami
- Date: 2025-06
- Venue: -
- Code: -
-
SoK: Machine Unlearning for Large Language Models
- Author(s): Jie Ren, Yue Xing, Yingqian Cui, Charu C. Aggarwal, Hui Liu
- Date: 2025-06
- Venue: -
- Code: -
-
BLUR: A Bi-Level Optimization Approach for LLM Unlearning
- Author(s): Hadi Reisizadeh, Jinghan Jia, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Sijia Liu, Mingyi Hong
- Date: 2025-06
- Venue: -
- Code: -
-
LLM Unlearning Should Be Form-Independent
- Author(s): Xiaotian Ye, Mengqi Zhang, Shu Wu
- Date: 2025-06
- Venue: -
- Code: -
-
RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality
- Author(s): Chenlong Zhang, Zhuoran Jin, Hongbang Yuan, Jiaheng Wei, Tong Zhou, Kang Liu, Jun Zhao, Yubo Chen
- Date: 2025-06
- Venue: -
- Code: -
-
Distillation Robustifies Unlearning
- Author(s): Bruce W. Lee, Addie Foote, Alex Infanger, Leni Shor, Harish Kamath, Jacob Goldman-Wetzler, Bryce Woodworth, Alex Cloud, Alexander Matt Turner
- Date: 2025-06
- Venue: -
- Code: -
-
Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness
- Author(s): Cheng-Long Wang, Qi Li, Zihang Xiang, Yinzhi Cao, Di Wang
- Date: 2025-06
- Venue: -
- Code: -
-
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
- Author(s): Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu, Ruihan Wu, Haoteng Yin, Mohsen Ghassemi, Yifan Li, Vamsi K. Potluru, Eli Chien, Kamalika Chaudhuri, Olgica Milenkovic, Pan Li
- Date: 2025-06
- Venue: -
- Code: -
-
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models
- Author(s): Taha Entesari, Arman Hatami, Rinat Khaziev, Anil Ramakrishna, Mahyar Fazlyab
- Date: 2025-06
- Venue: -
- Code: -
-
Quantifying Cross-Modality Memorization in Vision-Language Models
- Author(s): Yuxin Wen, Yangsibo Huang, Tom Goldstein, Ravi Kumar, Badih Ghazi, Chiyuan Zhang
- Date: 2025-06
- Venue: -
- Code: -
-
Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs
- Author(s): Aleksey Kudelya, Alexander Shirnin
- Date: 2025-06
- Venue: -
- Code: -
-
Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning
- Author(s): Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong
- Date: 2025-06
- Venue: -
- Code: -
-
Not All Tokens Are Meant to Be Forgotten
- Author(s): Xiangyu Zhou, Yao Qiang, Saleh Zare Zade, Douglas Zytko, Prashant Khanduri, Dongxiao Zhu
- Date: 2025-06
- Venue: -
- Code: -
-
Rethinking Post-Unlearning Behavior of Large Vision-Language Models
- Author(s): Minsung Kim, Nakyeong Yang, Kyomin Jung
- Date: 2025-06
- Venue: -
- Code: -
-
Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning
- Author(s): Changsheng Wang, Yihua Zhang, Jinghan Jia, Parikshit Ram, Dennis Wei, Yuguang Yao, Soumyadeep Pal, Nathalie Baracaldo, Sijia Liu
- Date: 2025-06
- Venue: -
- Code: -
-
- Author(s): Yixin Wan, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Rahul Gupta
- Date: 2025-06
- Venue: -
- Code: -
-
Existing Large Language Model Unlearning Evaluations Are Inconclusive
- Author(s): Zhili Feng, Yixuan Even Xu, Alexander Robey, Robert Kirk, Xander Davies, Yarin Gal, Avi Schwarzschild, J. Zico Kolter
- Date: 2025-06
- Venue: -
- Code: -
-
Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy
- Author(s): Jie Ren, Zhenwei Dai, Xianfeng Tang, Yue Xing, Shenglai Zeng, Hui Liu, Jingying Zeng, Qiankun Peng, Samarth Varshney, Suhang Wang, Qi He, Charu C. Aggarwal, Hui Liu
- Date: 2025-06
- Venue: -
- Code: -
-
Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race
- Author(s): Lihao Sun, Chengzhi Mao, Valentin Hofmann, Xuechunzi Bai
- Date: 2025-06
- Venue: -
- Code: -
-
Model Unlearning via Sparse Autoencoder Subspace Guided Projections
- Author(s): Xu Wang, Zihao Li, Benyou Wang, Yan Hu, Difan Zou
- Date: 2025-05
- Venue: -
- Code: -
-
- Author(s): Xiaoyu Wu, Yifei Pang, Terrance Liu, Zhiwei Steven Wu
- Date: 2025-05
- Venue: -
- Code: -
-
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs
- Author(s): Haokun Chen, Yueqi Zhang, Yuan Bi, Yao Zhang, Tong Liu, Jinhe Bi, Jian Lan, Jindong Gu, Claudia Grosser, Denis Krompass, Nassir Navab, Volker Tresp
- Date: 2025-05
- Venue: -
- Code: -
-
From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization
- Author(s): Shoaib Ahmed Siddiqui, Adrian Weller, David Krueger, Gintare Karolina Dziugaite, Michael Curtis Mozer, Eleni Triantafillou
- Date: 2025-05
- Venue: -
- Code: -
-
- Author(s): Zexi Li, Xiangzhu Wang, William F. Shen, Meghdad Kurmanji, Xinchi Qiu, Dongqi Cai, Chao Wu, Nicholas D. Lane
- Date: 2025-05
- Venue: -
- Code: -
-
Graceful Forgetting in Generative Language Models
- Author(s): Chunyang Jiang, Chi-min Chan, Yiyang Cai, Yulong Liu, Wei Xue, Yike Guo
- Date: 2025-05
- Venue: -
- Code: -
-
Safety Alignment via Constrained Knowledge Unlearning
- Author(s): Zesheng Shi, Yucheng Zhou, Jing Li
- Date: 2025-05
- Venue: -
- Code: -
-
T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion Models
- Author(s): Xiaoyu Ye, Songjie Cheng, Yongtao Wang, Yajiao Xiong, Yishen Li
- Date: 2025-05
- Venue: -
- Code: -
-
- Author(s): Bang Trinh Tran To, Thai Le
- Date: 2025-05
- Venue: -
- Code: -
-
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
- Author(s): Xiaoyu Xu, Xiang Yue, Yang Liu, Qingqing Ye, Haibo Hu, Minxin Du
- Date: 2025-05
- Venue: -
- Code: -
-
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
- Author(s): Biao Yi, Tiansheng Huang, Baolei Zhang, Tong Li, Lihai Nie, Zheli Liu, Li Shen
- Date: 2025-05
- Venue: -
- Code: -
-
- Author(s): Hwiyeong Lee, Uiji Hwang, Hyelim Lim, Taeuk Kim
- Date: 2025-05
- Venue: -
- Code: -
-
Losing is for Cherishing: Data Valuation Based on Machine Unlearning and Shapley Value
- Author(s): Le Ma, Shirao Yang, Zihao Wang, Yinggui Wang, Lei Wang, Tao Wei, Kejun Zhang
- Date: 2025-05
- Venue: -
- Code: -
-
UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models
-
R-TOFU: Unlearning in Large Reasoning Models
- Author(s): Sangyeon Yoon, Wonje Jeung, Albert No
- Date: 2025-05
- Venue: -
- Code: -
-
DUSK: Do Not Unlearn Shared Knowledge
- Author(s): Wonje Jeung, Sangyeon Yoon, Hyesoo Hong, Soeun Kim, Seungju Han, Youngjae Yu, Albert No
- Date: 2025-05
- Venue: -
- Code: -
-
SEPS: A Separability Measure for Robust Unlearning in LLMs
- Author(s): Wonje Jeung, Sangyeon Yoon, Albert No
- Date: 2025-05
- Venue: -
- Code: -
-
GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection
- Author(s): Zhijie Deng, Chris Yuhao Liu, Zirui Pang, Xinlei He, Lei Feng, Qi Xuan, Zhaowei Zhu, Jiaheng Wei
- Date: 2025-05
- Venue: -
- Code: -
-
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
-
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
- Author(s): Stefan Vasilev, Christian Herold, Baohao Liao, Seyyed Hadi Hashemi, Shahram Khadivi, Christof Monz
- Date: 2025-05
- Venue: -
- Code: -
-
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
-
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?
-
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
-
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security
-
DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers
- Author(s): Xuyang Zhong, Haochen Luo, Chen Liu
- Date: 2025-04
- Venue: -
- Code: -
-
ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
-
DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs
- Author(s): Tamim Al Mahmud, Najeeb Jebreel, Josep Domingo-Ferrer, David Sanchez
- Date: 2025-04
- Venue: -
- Code: -
-
A mean teacher algorithm for unlearning of language models
- Author(s): Yegor Klochkov
- Date: 2025-04
- Venue: -
- Code: -
-
- Author(s): Saransh Agrawal, Kuan-Hao Huang
- Date: 2025-04
- Venue: -
- Code: -
-
GRAIL: Gradient-Based Adaptive Unlearning for Privacy and Copyright in LLMs
- Author(s): Kun-Woo Kim, Ji-Hoon Park, Ju-Min Han, Seong-Whan Lee
- Date: 2025-04
- Venue: -
- Code: -
-
- Author(s): Hongkang Li, Yihua Zhang, Shuai Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen
- Date: 2025-04
- Venue: ICLR 2025
- Code: -
-
LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks
-
Bridging the Gap Between Preference Alignment and Machine Unlearning
- Author(s): Xiaohua Feng, Yuyuan Li, Huwei Ji, Jiaming Zhang, Li Zhang, Tianyu Du, Chaochao Chen
- Date: 2025-04
- Venue: -
- Code: -
-
- Author(s): Xiaohua Feng, Yuyuan Li, Chengye Wang, Junlin Liu, Li Zhang, Chaochao Chen
- Date: 2025-04
- Venue: -
- Code: -
-
Exact Unlearning of Finetuning Data via Model Merging at Scale
- Author(s): Kevin Kuo, Amrith Setlur, Kartik Srinivas, Aditi Raghunathan, Virginia Smith
- Date: 2025-04
- Venue: -
- Code: -
-
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
- Author(s): Tianyang Xu, Xiaoze Liu, Feijie Wu, Xiaoqian Wang, Jing Gao
- Date: 2025-03
- Venue: -
- Code: -
-
Effective Skill Unlearning through Intervention and Abstention
- Author(s): Yongce Li, Chung-En Sun, Tsui-Wei Weng
- Date: 2025-03
- Venue: -
- Code: -
-
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging
- Author(s): Haoming Xu, Shuxun Wang, Yanqiu Zhao, Yi Zhong, Ziyan Jiang, Ningyuan Zhao, Shumin Deng, Huajun Chen, Ningyu Zhang
- Date: 2025-03
- Venue: -
- Code: -
-
- Author(s): Àlex Pujol Vidal, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund
- Date: 2025-03
- Venue: -
- Code: -
-
Deep Contrastive Unlearning for Language Models
- Author(s): Estrid He, Tabinda Sarwar, Ibrahim Khalil, Xun Yi, Ke Wang
- Date: 2025-03
- Venue: -
- Code: -
-
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
- Author(s): Qing Li, Jiahui Geng, Derui Zhu, Fengyu Cai, Chenyang Lyu, Fakhri Karray
- Date: 2025-03
- Venue: -
- Code: -
-
Atyaephyra at SemEval-2025 Task 4: Low-Rank NPO
- Author(s): Jan Bronec, Jindřich Helcl
- Date: 2025-03
- Venue: -
- Code: -
-
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models
- Author(s): Zhaopan Xu, Pengfei Zhou, Weidong Tang, Jiaxin Ai, Wangbo Zhao, Xiaojiang Peng, Kai Wang, Yang You, Wenqi Shao, Hongxun Yao, Kaipeng Zhang
- Date: 2025-03
- Venue: -
- Code: -
-
Hyperbolic Safety-Aware Vision-Language Models
- Author(s): Tobia Poppi, Tejaswi Kasarla, Pascal Mettes, Lorenzo Baraldi, Rita Cucchiara
- Date: 2025-03
- Venue: -
- Code: -
-
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning
- Author(s): Yiwei Chen, Yuguang Yao, Yihua Zhang, Bingquan Shen, Gaowen Liu, Sijia Liu
- Date: 2025-03
- Venue: -
- Code: -
-
Don't Forget It! Conditional Sparse Autoencoder Clamping Works for Unlearning
- Author(s): Matthew Khoriaty, Andrii Shportko, Gustavo Mercier, Zach Wood-Doughty
- Date: 2025-03
- Venue: -
- Code: -
-
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
- Author(s): Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum McDougall, Kola Ayonrinde, Matthew Wearden, Arthur Conmy, Samuel Marks, Neel Nanda
- Date: 2025-03
- Venue: -
- Code: -
-
GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models
- Author(s): Yue Wang, Qizhou Wang, Feng Liu, Wei Huang, Yali Du, Xiaojiang Du, Bo Han
- Date: 2025-03
- Venue: -
- Code: -
-
- Author(s): Dinesh Srivasthav P, Bala Mallikarjunarao Garlapati
- Date: 2025-03
- Venue: -
- Code: -
-
UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets
- Author(s): Wenyu Wang, Mengqi Zhang, Xiaotian Ye, Zhaochun Ren, Zhumin Chen, Pengjie Ren
- Date: 2025-03
- Venue: -
- Code: -
-
Improving LLM Safety Alignment with Dual-Objective Optimization
-
CE-U: Cross Entropy Unlearning
- Author(s): Bo Yang
- Date: 2025-03
- Venue: -
- Code: -
-
Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning
- Author(s): Ayana Niwa, Masahiro Kaneko, Kentaro Inui
- Date: 2025-02
- Venue: -
- Code: -
-
Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models
- Author(s): Huazheng Wang, Yongcheng Jing, Haifeng Sun, Yingjie Wang, Jingyu Wang, Jianxin Liao, Dacheng Tao
- Date: 2025-02
- Venue: -
- Code: -
-
- Author(s): Toan Tran, Ruixuan Liu, Li Xiong
- Date: 2025-02
- Venue: -
- Code: -
-
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
- Author(s): Qizhou Wang, Jin Peng Zhou, Zhanke Zhou, Saebyeol Shin, Bo Han, Kilian Q. Weinberger
- Date: 2025-02
- Venue: -
- Code: -
-
- Author(s): Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung
- Date: 2025-02
- Venue: -
- Code: -
-
- Author(s): Weipeng Jiang, Juan Zhai, Shiqing Ma, Ziyan Lei, Xiaofei Xie, Yige Wang, Chao Shen
- Date: 2025-02
- Venue: -
- Code: -
-
A General Framework to Enhance Fine-tuning-based LLM Unlearning
-
Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models
-
Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models
- Author(s): Haokun Chen, Sebastian Szyller, Weilin Xu, Nageen Himayat
- Date: 2025-02
- Venue: -
- Code: -
-
CoME: An Unlearning-based Approach to Conflict-free Model Editing
-
LUME: LLM Unlearning with Multitask Evaluations
- Author(s): Anil Ramakrishna, Yixin Wan, Xiaomeng Jin, Kai-Wei Chang, Zhiqi Bu, Bhanukiran Vinzamuri, Volkan Cevher, Mingyi Hong, Rahul Gupta
- Date: 2025-02
- Venue: -
- Code: -
-
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
-
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
- Author(s): Mark Russinovich, Ahmed Salem
- Date: 2025-02
- Venue: -
- Code: -
-
Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis
-
- Author(s): Junkai Chen, Zhijie Deng, Kening Zheng, Yibo Yan, Shuliang Liu, PeiJun Wu, Peijie Jiang, Jia Liu, Xuming Hu
- Date: 2025-02
- Venue: -
- Code: -
-
Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning
- Author(s): Hwan Chang, Hwanhee Lee
- Date: 2025-02
- Venue: -
- Code: -
-
- Author(s): Jiahao Huo, Yibo Yan, Xu Zheng, Yuanhuiyi Lyu, Xin Zou, Zhihua Wei, Xuming Hu
- Date: 2025-02
- Venue: -
- Code: -
-
LUNAR: LLM Unlearning via Neural Activation Redirection
- Author(s): William F. Shen, Xinchi Qiu, Meghdad Kurmanji, Alex Iacob, Lorenzo Sani, Yihong Chen, Nicola Cancedda, Nicholas D. Lane
- Date: 2025-02
- Venue: -
- Code: -
-
Mitigating Sensitive Information Leakage in LLMs4Code through Machine Unlearning
- Author(s): Ruotong Geng, Mingyang Geng, Shangwen Wang, Haotian Wang, Zhipeng Lin, Dezun Dong
- Date: 2025-02
- Venue: -
- Code: -
-
A Lightweight Method to Disrupt Memorized Sequences in LLM
- Author(s): Parjanya Prajakta Prashant, Kaustubh Ponkshe, Babak Salimi
- Date: 2025-02
- Venue: -
- Code: -
-
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
- Author(s): Zora Che, Stephen Casper, Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai, Yarin Gal, Furong Huang, Dylan Hadfield-Menell
- Date: 2025-02
- Venue: -
- Code: -
-
- Author(s): Jinwei Hu, Zhenglin Huang, Xiangyu Yin, Wenjie Ruan, Guangliang Cheng, Yi Dong, Xiaowei Huang
- Date: 2025-02
- Venue: -
- Code: -
-
- Author(s): Debdeep Sanyal, Murari Mandal
- Date: 2025-02
- Venue: -
- Code: -
-
- Author(s): Binchi Zhang, Zhengzhang Chen, Zaiyi Zheng, Jundong Li, Haifeng Chen
- Date: 2025-02
- Venue: -
- Code: -
-
Improving the Robustness of Representation Misdirection for Large Language Model Unlearning
- Author(s): Dang Huu-Tien, Hoang Thanh-Tung, Le-Minh Nguyen, Naoya Inoue
- Date: 2025-01
- Venue: -
- Code: -
-
Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models
- Author(s): Peihai Jiang, Xixiang Lyu, Yige Li, Jing Ma
- Date: 2025-01
- Venue: -
- Code: -
-
Precise In-Parameter Concept Erasure in Large Language Models
-
A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization
- Author(s): Anda Cheng, Wei Huang, Yinggui Wang
- Date: 2025-11
- Venue: EMNLP 2025
- Code: -
-
Machine Unlearning of Personally Identifiable Information in Large Language Models
-
Adaptive Localization of Knowledge Negation for Continual LLM Unlearning
- Author(s): Abudukelimu Wuerkaixi, Qizhou Wang, Sen Cui, Wutong Xu, Bo Han, Gang Niu, Masashi Sugiyama, Changshui Zhang
- Date: 2025-10
- Venue: ICML 2025
- Code: -
-
LUSB: Formalizing and Benchmarking Unlearning Attacks and Defenses against Large Language Models
- Author(s): Chenxu Zhao, Wei Qian, Aobo Chen, Jingquan Wang, Carl Yang, Mengdi Huai
- Date: 2025-09
- Venue: -
- Code: -
-
White-Box Auditing of Large Language Model Unlearning
- Author(s): Runzhi Tian, Shicheng Hu, Ziqiao Wang, Yongyi Mao
- Date: 2025-09
- Venue: -
- Code: -
-
UUE: Untargeted Language Model Unlearning via Null-Space-Guided Editing with Lightweight Adapters
- Author(s): Xiuyuan Wang, Weiming Liu, Chaochao Chen, Zongxin Yang, Fan Wang, Xiaolin Zheng
- Date: 2025-09
- Venue: -
- Code: -
-
UnRe: Zero-Shot LLM Unlearning via Dynamic Contextual Retrieval
- Author(s): Rui Chu, Shuo Zhang, Weijie Zhao, Ping Li, Yingjie Lao
- Date: 2025-09
- Venue: -
- Code: -
-
Effective Unlearning in LLMs Relies on the Right Data Retention Strategy
- Author(s): Praveen Bushipaka, Lucia Passaro, Tommaso Cucinotta
- Date: 2025-09
- Venue: -
- Code: -
-
Decoupling Memories, Muting Neurons: Towards Practical Machine Unlearning for Large Language Models
- Author(s): Lishuai Hou, Zixiong Wang, Gaoyang Liu, Chen Wang, Wei Liu, Kai Peng
- Date: 2025-07
- Venue: ACL 2025 Findings
- Code: -
-
LLM-Eraser: Optimizing Large Language Model Unlearning through Selective Pruning
-
SemEval-2025 Task 4: Unlearning Sensitive Content from Large Language Models
- Author(s): Anil Ramakrishna, Yixin Wan, Xiaomeng Jin, Kai-Wei Chang, Zhiqi Bu, Bhanukiran Vinzamuri, Volkan Cevher, Mingyi Hong, Rahul Gupta
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
NEKO at SemEval-2025 Task 4: A Gradient Ascent Based Machine Unlearning Strategy
- Author(s): Chi Kuan Lai, Yifei Chen
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
- Author(s): Zekun Wang, Jingjie Zeng, Yingxu Li, Liang Yang, Hongfei Lin
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
NeuroReset: LLM Unlearning via Dual Phase Mixed Methodology
- Author(s): Dhwani Bhavankar, Het Sevalia, Shubh Agarwal, Yogesh Kulkarni, Rahee Walambe, Ketan Kotecha
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
- Author(s): Hrishikesh Kulkarni, Nazli Goharian, Ophir Frieder
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
- Author(s): Karla Salas-Jimenez, Francisco Lopez-Ponce, Diego Hernandez-Bustamante, Gemma Bel-Enguix, Helena Gomez Adorno
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
MALTO at SemEval-2025 Task 4: Dual Teachers for Unlearning Sensitive Content in LLMs
- Author(s): Claudio Savelli, Evren Munis, Erfan Bayat, Andrea Grieco, Flavio Giobergia
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
- Author(s): Aayush Acharya, Saurav K. Aryal
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
YNU at SemEval-2025 Task 4: Synthetic Token Alternative Training for LLM Unlearning
- Author(s): Yang Chen, Zheyang Luo, Zhiwen Tang
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
JU-CSE-NLP'25 at SemEval-2025 Task 4: Learning to Unlearn LLMs
- Author(s): Arkajyoti Naskar, Dipankar Das, Sivaji Bandyopadhyay
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
NLPART at SemEval-2025 Task 4: Forgetting is Harder than Learning
- Author(s): Hoorieh Sabzevari, Milad Molazadeh Oskuee, Tohid Abedini, Ghazal Zamaninejad, Sara Baruni, Zahra Amirmahani, Amirmohammad Salehoof
- Date: 2025-07
- Venue: SemEval 2025
- Code: -
-
- Author(s): Aviral Srivastava
- Date: 2025-03
- Venue: ICLR 2025 Workshop BuildingTrust
- Code: -
-
Orthogonal Gradient Projection for Continual LLM Unlearning
- Author(s): Juan Belieni, Ana Carolina Erthal, Eliezer de Souza da Silva, Diego Mesquita
- Date: 2025-03
- Venue: -
- Code: -
- Multi-Objective Large Language Model Unlearning
- Author(s): Zibin Pan, Shuwen Zhang, Yuesheng Zheng, Chi Li, Yuheng Cheng, Junhua Zhao
- Date: 2024-12
- Venue: -
- Code: -
- Investigating the Feasibility of Mitigating Potential Copyright Infringement via Large Language Model Unlearning
- Large Language Model Federated Learning with Blockchain and Unlearning for Cross-Organizational Collaboration
- Author(s): Xuhan Zuo, Minghao Wang, Tianqing Zhu, Shui Yu, Wanlei Zhou
- Date: 2024-12
- Venue: -
- Code: -
- Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning
- Author(s): Rongzhe Wei, Mufei Li, Mohsen Ghassemi, Eleonora Kreačić, Yifan Li, Xiang Yue, Bo Li, Vamsi K. Potluru, Pan Li, Eli Chien
- Date: 2024-12
- Venue: -
- Code: -
- Classifier-free guidance in LLMs Safety
- Author(s): Roman Smirnov
- Date: 2024-12
- Venue: -
- Code: -
- Unified Parameter-Efficient Unlearning for LLMs
- Author(s): Chenlu Ding, Jiancan Wu, Yancheng Yuan, Jinda Lu, Kai Zhang, Alex Su, Xiang Wang, Xiangnan He
- Date: 2024-12
- Venue: -
- Code: -
- UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS
- Author(s): Haomin Zhuang, Yihua Zhang, Kehan Guo, Jinghan Jia, Gaowen Liu, Sijia Liu, Xiangliang Zhang
- Date: 2024-11
- Venue: -
- Code: -
- Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
- Author(s): Abhinav Joshi, Shaswati Saha, Divyaksh Shukla, Sriram Vema, Harsh Jhamtani, Manas Gaur, Ashutosh Modi
- Date: 2024-11
- Venue: EMNLP 2024 Findings
- Code: -
- Fine-grained Pluggable Gradient Ascent for Knowledge Unlearning in Language Models
- Author(s): XiaoHua Feng, Chaochao Chen, Yuyuan Li, Zibin Lin
- Date: 2024-11
- Venue: EMNLP 2024
- Code: -
- ULMR: Unlearning Large Language Models via Negative Response and Model Parameter Average
- Author(s): Shaojie Shi, Xiaoyu Tan, Xihe Qiu, Chao Qu, Kexin Nie, Yuan Cheng, Wei Chu, Xu Yinghui, Yuan Qi
- Date: 2024-11
- Venue: EMNLP 2024 Industry Track
- Code: -
- MUNBa: Machine Unlearning via Nash Bargaining
- Author(s): Jing Wu, Mehrtash Harandi
- Date: 2024-11
- Venue: -
- Code: -
- Provable unlearning in topic modeling and downstream tasks
- Author(s): Stanley Wei, Sadhika Malladi, Sanjeev Arora, Amartya Sanyal
- Date: 2024-11
- Venue: -
- Code: -
- Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods
- Unlearning in- vs. out-of-distribution data in LLMs under gradient-based method
- Author(s): Teodora Baluta, Pascal Lamblin, Daniel Tarlow, Fabian Pedregosa, Gintare Karolina Dziugaite
- Date: 2024-11
- Venue: NeurIPS 2024 Safe Generative AI Workshop
- Code: -
- Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
- Extracting Unlearned Information from LLMs with Activation Steering
- Author(s): Atakan Seyitoğlu, Aleksei Kuvshinov, Leo Schwinn, Stephan Günnemann
- Date: 2024-11
- Venue: -
- Code: -
- RESTOR: Knowledge Recovery through Machine Unlearning
- Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench
- Cross-model Control: Improving Multiple Large Language Models in One-time Training
- Author(s): Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao
- Date: 2024-10
- Venue: -
- Code: -
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
- Author(s): Zhiqi Bu, Xiaomeng Jin, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Mingyi Hong
- Date: 2024-10
- Venue: -
- Code: -
- Learning and Unlearning of Fabricated Knowledge in Language Models
- Author(s): Chen Sun, Nolan Andrew Miller, Andrey Zhmoginov, Max Vladymyrov, Mark Sandler
- Date: 2024-10
- Venue: -
- Code: -
- Applying sparse autoencoders to unlearn knowledge in language models
- Author(s): Eoin Farrell, Yeu-Tong Lau, Arthur Conmy
- Date: 2024-10
- Venue: -
- Code: -
- CLEAR: Character Unlearning in Textual and Visual Modalities
- Author(s): Alexey Dontsov, Dmitrii Korzh, Alexey Zhavoronkin, Boris Mikheev, Denis Bobkov, Aibek Alanov, Oleg Y. Rogov, Ivan Oseledets, Elena Tutubalina
- Date: 2024-10
- Venue: -
- Code: -
- WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
- UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
- Author(s): Yash Sinha, Murari Mandal, Mohan Kankanhalli
- Date: 2024-10
- Venue: -
- Code: -
- Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge
- When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?
- Author(s): Shang Wang, Tianqing Zhu, Dayong Ye, Wanlei Zhou
- Date: 2024-10
- Venue: -
- Code: -
- Evaluating Deep Unlearning in Large Language Models
- Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation
- Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning
- Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
- Author(s): Phillip Guo, Aaquib Syed, Abhay Sheshadri, Aidan Ewart, Gintare Karolina Dziugaite
- Date: 2024-10
- Venue: -
- Code: -
- LLM Unlearning via Loss Adjustment with Only Forget Data
- Author(s): Yaxuan Wang, Jiaheng Wei, Chris Yuhao Liu, Jinlong Pang, Quan Liu, Ankit Parag Shah, Yujia Bao, Yang Liu, Wei Wei
- Date: 2024-10
- Venue: -
- Code: -
- CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept
- Author(s): YuXuan Wu, Bonaventure F. P. Dossou, Dianbo Liu
- Date: 2024-10
- Venue: -
- Code: -
- Do Unlearning Methods Remove Information from Language Model Weights?
- A Closer Look at Machine Unlearning for Large Language Models
- Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
- Dissecting Fine-Tuning Unlearning in Large Language Models
- NegMerge: Consensual Weight Negation for Strong Machine Unlearning
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
- Author(s): Yan Scholten, Stephan Günnemann, Leo Schwinn
- Date: 2024-10
- Venue: -
- Code: -
- Erasing Conceptual Knowledge from Language Models
- Author(s): Rohit Gandikota, Sheridan Feucht, Samuel Marks, David Bau
- Date: 2024-10
- Venue: -
- Code: -
- Mitigating Memorization In Language Models
- Author(s): Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Nathaniel Hudson, Caleb Geniesse, Kyle Chard, Yaoqing Yang, Ian Foster, Michael W. Mahoney
- Date: 2024-10
- Venue: -
- Code: -
- Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
- Author(s): Shota Takashiro, Takeshi Kojima, Andrew Gambardella, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo
- Date: 2024-10
- Venue: -
- Code: -
- Concept Unlearning for Large Language Models
- Author(s): Tomoya Yamashita, Takayuki Miura, Yuuki Yamanaka, Toshiki Shibahara, Masanori Yamada
- Date: 2024-10
- Venue: NeurIPS 2024 Workshop SafeGenAi
- Code: -
- An Adversarial Perspective on Machine Unlearning for AI Safety
- Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models
- LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
- Author(s): Akshaj Kumar Veldanda, Shi-Xiong Zhang, Anirban Das, Supriyo Chakraborty, Stephen Rawls, Sambit Sahu, Milind Naphade
- Date: 2024-09
- Venue: -
- Code: -
- MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
- Unforgettable Generalization in Language Models
- Author(s): Eric Zhang, Leshem Chosen, Jacob Andreas
- Date: 2024-09
- Venue: COLM 2024
- Code: -
- Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
- Author(s): Md Rafi Ur Rashid, Jing Liu, Toshiaki Koike-Akino, Shagufta Mehnaz, Ye Wang
- Date: 2024-08
- Venue: -
- Code: -
- Large Language Models Relearn Removed Concepts
- Author(s): Michelle Lo, Fazl Barez, Shay Cohen
- Date: 2024-08
- Venue: ACL 2024 Findings
- Code: -
- LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
- Author(s): Nathaniel Li, Ziwen Han, Ian Steneker, Willow Primack, Riley Goodside, Hugh Zhang, Zifan Wang, Cristina Menghini, Summer Yue
- Date: 2024-08
- Venue: -
- Code: -
- Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code
- Author(s): Mahdi Kazemi, Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour, Sen Lin
- Date: 2024-08
- Venue: -
- Code: -
- Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
- Author(s): Hongbang Yuan, Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
- Date: 2024-08
- Venue: -
- Code: -
- WPN: An Unlearning Method Based on N-pair Contrastive Learning in Language Models
- Author(s): Guitao Chen, Yunshen Wang, Hongye Sun, Guang Chen
- Date: 2024-08
- Venue: -
- Code: -
- Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models
- Author(s): Sungmin Cha, Sungjun Cho, Dasol Hwang, Moontae Lee
- Date: 2024-08
- Venue: -
- Code: -
- On Effects of Steering Latent Representation for Large Language Model Unlearning
- Author(s): Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, Naoya Inoue
- Date: 2024-08
- Venue: -
- Code: -
- Hotfixing Large Language Models for Code
- Author(s): Zhou Yang, David Lo
- Date: 2024-08
- Venue: -
- Code: -
- UNLEARN Efficient Removal of Knowledge in Large Language Models
- Author(s): Tyler Lizzo, Larry Heck
- Date: 2024-08
- Venue: -
- Code: -
- Tamper-Resistant Safeguards for Open-Weight LLMs
- On the Limitations and Prospects of Machine Unlearning for Generative AI
- Author(s): Shiji Zhou, Lianzhe Wang, Jiangnan Ye, Yongliang Wu, Heng Chang
- Date: 2024-08
- Venue: -
- Code: -
- Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
- Demystifying Verbatim Memorization in Large Language Models
- Author(s): Jing Huang, Diyi Yang, Christopher Potts
- Date: 2024-07
- Venue: -
- Code: -
- Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective
- Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation
- Author(s): Huimin Lu, Masaru Isonuma, Junichiro Mori, Ichiro Sakata
- Date: 2024-07
- Venue: -
- Code: -
- Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
- Targeted Unlearning with Single Layer Unlearning Gradient
- Author(s): Zikui Cai, Yaoteng Tan, M. Salman Asif
- Date: 2024-07
- Venue: -
- Code: -
- What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
- Author(s): Samyak Jain, Ekdeep Singh Lubana, Kemal Oksuz, Tom Joy, Philip H.S. Torr, Amartya Sanyal, Puneet K. Dokania
- Date: 2024-07
- Venue: -
- Code: -
- Practical Unlearning for Large Language Models
- Author(s): Chongyang Gao, Lixu Wang, Chenkai Weng, Xiao Wang, Qi Zhu
- Date: 2024-07
- Venue: -
- Code: -
- Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
- Composable Interventions for Language Models
- MUSE: Machine Unlearning Six-Way Evaluation for Language Models
- If You Don't Understand It, Don't Use It: Eliminating Trojans with Filters Between Layers
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
- To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
- Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?
- Author(s): Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani
- Date: 2024-07
- Venue: -
- Code: -
- UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI
- Author(s): Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan
- Date: 2024-07
- Venue: -
- Code: -
- Evaluating Copyright Takedown Methods for Language Models
- Author(s): Boyi Wei, Weijia Shi, Yangsibo Huang, Noah A. Smith, Chiyuan Zhang, Luke Zettlemoyer, Kai Li, Peter Henderson
- Date: 2024-06
- Venue: -
- Code: -
- Machine Unlearning Fails to Remove Data Poisoning Attacks
- Author(s): Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel
- Date: 2024-06
- Venue: -
- Code: -
- PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs
- Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis
- Author(s): Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Xiachong Feng, Bing Qin
- Date: 2024-06
- Venue: -
- Code: -
- Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models
- Author(s): Dohyun Lee, Daniel Rim, Minseok Choi, Jaegul Choo
- Date: 2024-06
- Venue: ACL 2024 Findings
- Code: -
- Every Language Counts: Learn and Unlearn in Multilingual LLMs
- Mitigating Social Biases in Language Models through Unlearning
- Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning
- Author(s): Shengyuan Hu, Yiwei Fu, Zhiwei Steven Wu, Virginia Smith
- Date: 2024-06
- Venue: -
- Code: -
- Textual Unlearning Gives a False Sense of Unlearning
- Author(s): Jiacheng Du, Zhibo Wang, Kui Ren
- Date: 2024-06
- Venue: -
- Code: -
- Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models
- SNAP: Unlearning Selective Knowledge in Large Language Models with Negative Instructions
- Soft Prompting for Unlearning in Large Language Models
- Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs
- Author(s): Swanand Ravindra Kadhe, Farhan Ahmed, Dennis Wei, Nathalie Baracaldo, Inkit Padhi
- Date: 2024-06
- Venue: -
- Code: -
- Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
- Avoiding Copyright Infringement via Machine Unlearning
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
- REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning
- Author(s): Qizhou Wang, Bo Han, Puning Yang, Jianing Zhu, Tongliang Liu, Masashi Sugiyama
- Date: 2024-06
- Venue: -
- Code: -
- Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference
- Large Language Model Unlearning via Embedding-Corrupted Prompts
- Federated TrustChain: Blockchain-Enhanced LLM Training and Unlearning
- Author(s): Xuhan Zuo, Minghao Wang, Tianqing Zhu, Lefeng Zhang, Dayong Ye, Shui Yu, Wanlei Zhou
- Date: 2024-06
- Venue: -
- Code: -
- Can Textual Unlearning Solve Cross-Modality Safety Alignment?
- Author(s): Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song
- Date: 2024-06
- Venue: EMNLP Findings 2024
- Code: -
- RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models
- Author(s): Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, Bing Qin
- Date: 2024-06
- Venue: -
- Code: -
- Toward Robust Unlearning for LLMs
- Author(s): Rishub Tamirisa, Bhrugu Bharathi, Andy Zhou, Bo Li, Mantas Mazeika
- Date: 2024-05
- Venue: ICLR 2024 SeT-LLM Workshop
- Code: -
- Unlearning Climate Misinformation in Large Language Models
- Author(s): Michael Fore, Simranjit Singh, Chaehong Lee, Amritanshu Pandey, Antonios Anastasopoulos, Dimitrios Stamoulis
- Date: 2024-05
- Venue: -
- Code: -
- Large Scale Knowledge Washing
- Machine Unlearning in Large Language Models
- Author(s): Saaketh Koundinya Gundavarapu, Shreya Agarwal, Arushi Arora, Chandana Thimmalapura Jagadeeshaiah
- Date: 2024-05
- Venue: -
- Code: -
- Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
- Author(s): Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, Sheng Bi
- Date: 2024-05
- Venue: -
- Code: -
- To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
- Author(s): George-Octavian Barbulescu, Peter Triantafillou
- Date: 2024-05
- Venue: ICML 2024
- Code: -
- SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
- Machine Unlearning in Large Language Models
- Author(s): Kongyang Chen, Zixin Wang, Bing Mi, Waixi Liu, Shaowei Wang, Xiaojun Ren, Jiaxing Shen
- Date: 2024-04
- Venue: -
- Code: -
- Offset Unlearning for Large Language Models
- Exact and Efficient Unlearning for Large Language Model-based Recommendation
- Author(s): Zhiyu Hu, Yang Zhang, Minghao Xiao, Wenjie Wang, Fuli Feng, Xiangnan He
- Date: 2024-04
- Venue: -
- Code: -
- Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge
- Author(s): Weikai Lu, Ziqian Zeng, Jianwei Wang, Zhengdong Lu, Zelin Chen, Huiping Zhuang, Cen Chen
- Date: 2024-04
- Venue: -
- Code: -
- Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
- Localizing Paragraph Memorization in Language Models
- Author(s): Niklas Stoehr, Mitchell Gordon, Chiyuan Zhang, Owen Lewis
- Date: 2024-03
- Venue: -
- Code: -
- The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
- Author(s): Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Samuel Marks, Oam Patel, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Lin, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Ruoyu Wang, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
- Date: 2024-03
- Venue: -
- Code:
- Dissecting Language Models: Machine Unlearning via Selective Pruning
- Author(s): Nicholas Pochinkov, Nandi Schoots
- Date: 2024-03
- Venue: -
- Code: -
- Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy
- Author(s): Jamie Hayes, Ilia Shumailov, Eleni Triantafillou, Amr Khalifa, Nicolas Papernot
- Date: 2024-03
- Venue: -
- Code: -
- Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models
- Author(s): Kang Gu, Md Rafi Ur Rashid, Najrin Sultana, Shagufta Mehnaz
- Date: 2024-03
- Venue: -
- Code: -
- Ethos: Rectifying Language Models in Orthogonal Parameter Space
- Author(s): Lei Gao, Yue Niu, Tingting Tang, Salman Avestimehr, Murali Annavaram
- Date: 2024-03
- Venue: -
- Code: -
- Towards Efficient and Effective Unlearning of Large Language Models for Recommendation
- Guardrail Baselines for Unlearning in LLMs
- Author(s): Pratiksha Thaker, Yash Maurya, Virginia Smith
- Date: 2024-03
- Venue: ICLR 2024 SeT-LLM Workshop
- Code: -
- Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning
- Author(s): Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning
- Date: 2024-02
- Venue: -
- Code: -
- Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination
- Towards Safer Large Language Models through Machine Unlearning
- Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
- Author(s): Leo Schwinn, David Dobre, Sophie Xhonneux, Gauthier Gidel, Stephan Gunnemann
- Date: 2024-02
- Venue: -
- Code: -
- Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
- Author(s): Lingzhi Wang, Xingshan Zeng, Jinsong Guo, Kam-Fai Wong, Georg Gottlob
- Date: 2024-02
- Venue: -
- Code: -
- Unlearnable Algorithms for In-context Learning
- Author(s): Andrei Muresanu, Anvith Thudi, Michael R. Zhang, Nicolas Papernot
- Date: 2024-02
- Venue: -
- Code: -
- Machine Unlearning of Pre-trained Large Language Models
- EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
- Unlearning Reveals the Influential Training Data of Language Models
- Author(s): Masaru Isonuma, Ivan Titov
- Date: 2024-01
- Venue: -
- Code: -
- TOFU: A Task of Fictitious Unlearning for LLMs
- FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs
- Author(s): Swanand Ravindra Kadhe, Anisa Halimi, Ambrish Rawat, Nathalie Baracaldo
- Date: 2023-12
- Venue: NeurIPS 2023 SoLaR Workshop
- Code: -
- Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models
- Author(s): Aly Kassem, Omar Mahmoud, Sherif Saad
- Date: 2023-12
- Venue: EMNLP 2023
- Code: -
- Making Harmful Behaviors Unlearnable for Large Language Models
- Author(s): Xin Zhou, Yi Lu, Ruotian Ma, Tao Gui, Qi Zhang, Xuanjing Huang
- Date: 2023-11
- Venue: -
- Code: -
- Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
- Author(s): Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang
- Date: 2023-11
- Venue: -
- Code: -
- Who's Harry Potter? Approximate Unlearning in LLMs
- Author(s): Ronen Eldan, Mark Russinovich
- Date: 2023-10
- Venue: -
- Code: -
- DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models
- Unlearn What You Want to Forget: Efficient Unlearning for LLMs
- In-Context Unlearning: Language Models as Few Shot Unlearners
- Large Language Model Unlearning
- Fast Model Debias with Machine Unlearning
- Author(s): Ruizhe Chen, Jianfei Yang, Huimin Xiong, Jianhong Bai, Tianxiang Hu, Jin Hao, Yang Feng, Joey Tianyi Zhou, Jian Wu, Zuozhu Liu
- Date: 2023-10
- Venue: -
- Code: -
- Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble
- Author(s): Zhe Liu, Ozlem Kalinli
- Date: 2023-09
- Venue: -
- Code: -
- Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
- Knowledge Sanitization of Large Language Models
- Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
- Unlearning Bias in Language Models by Partitioning Gradients
- Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
- Author(s): Xinzhe Li, Ming Liu, Shang Gao
- Date: 2023-07
- Venue: -
- Code: -
- What can we learn from Data Leakage and Unlearning for Law?
- Author(s): Jaydeep Borkar
- Date: 2023-07
- Venue: -
- Code: -
- LEACE: Perfect linear concept erasure in closed form
- Composing Parameter-Efficient Modules with Arithmetic Operations
- KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment
- Machine Unlearning: its nature, scope, and importance for a "delete culture"
- Author(s): Luciano Floridi
- Date: 2023-05
- Venue: -
- Code: -
- Editing Models with Task Arithmetic
- Privacy Adhering Machine Un-learning in NLP
- Author(s): Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth
- Date: 2022-12
- Venue: -
- Code: -
- The CRINGE Loss: Learning what language not to model
- Author(s): Leonard Adolphs, Tianyu Gao, Jing Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston
- Date: 2022-11
- Venue: -
- Code: -
- Knowledge Unlearning for Mitigating Privacy Risks in Language Models
- Quark: Controllable Text Generation with Reinforced Unlearning
- Deletion Inference, Reconstruction, and Compliance in Machine (Un)Learning
- Author(s): Ji Gao, Sanjam Garg, Mohammad Mahmoody, Prashant Nalini Vasudevan
- Date: 2022-02
- Venue: -
- Code: -
-
Machine Unlearning in Large Language Models: A Survey of Challenges and Methods
- Author(s): Xiaming Tu, Tianqing Zhu, Zhenni Liu, Ping Xiong, Wanlei Zhou
- Date: 2026-03
- Venue: -
- Code: -
-
Is your algorithm unlearning or untraining?
- Author(s): Eleni Triantafillou, Ahmed Imtiaz Humayun, Monica Ribero, Alexander Matt Turner, Michael C. Mozer, Georgios Kaissis
- Date: 2026-04
- Venue: -
- Code: -
-
Unlearning in LLMs: Methods, Evaluation, and Open Challenges
- Author(s): Tyler Lizzo, Larry Heck
- Date: 2026-01
- Venue: -
- Code: -
-
A Survey on Unlearning in Large Language Models
- Author(s): Ruichen Qiu, Jiajun Tan, Jiayue Pu, Honglin Wang, Xiao-Shan Gao, Fei Sun
- Date: 2025-10
- Venue: -
- Code: -
-
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models
- Author(s): Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, Fakhri Karray
- Date: 2025-03
- Venue: -
-
Open Problems in Machine Unlearning for AI Safety
- Author(s): Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O'Gara, Robert Kirk, Ben Bucknall, Tim Fist, Luke Ong, Philip Torr, Kwok-Yan Lam, Robert Trager, David Krueger, Sören Mindermann, José Hernandez-Orallo, Mor Geva, Yarin Gal
- Date: 2025-01
- Venue: -
-
- Author(s): A. Feder Cooper, Christopher A. Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, Ilia Shumailov, Eleni Triantafillou, Peter Kairouz, Nicole Mitchell, Percy Liang, Daniel E. Ho, Yejin Choi, Sanmi Koyejo, Fernando Delgado, James Grimmelmann, Vitaly Shmatikov, Christopher De Sa, Solon Barocas, Amy Cyphert, Mark Lemley, danah boyd, Jennifer Wortman Vaughan, Miles Brundage, David Bau, Seth Neel, Abigail Z. Jacobs, Andreas Terzis, Hanna Wallach, Nicolas Papernot, Katherine Lee
- Date: 2024-12
- Venue: -
-
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
- Author(s): Pratiksha Thaker, Shengyuan Hu, Neil Kale, Yash Maurya, Zhiwei Steven Wu, Virginia Smith
- Date: 2024-10
- Venue: -
-
Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions
- Author(s): Michele Miranda, Elena Sofia Ruzzetti, Andrea Santilli, Fabio Massimo Zanzotto, Sébastien Bratières, Emanuele Rodolà
- Date: 2024-08
- Venue: -
-
Machine Unlearning in Generative AI: A Survey
- Author(s): Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang
- Date: 2024-07
- Venue: -
-
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
- Author(s): Alberto Blanco-Justicia, Najeeb Jebreel, Benet Manzanares, David Sánchez, Josep Domingo-Ferrer, Guillem Collell, Kuan Eeik Tan
- Date: 2024-04
- Venue: -
-
Machine Unlearning for Traditional Models and Large Language Models: A Short Survey
- Author(s): Yi Xu
- Date: 2024-04
- Venue: -
-
The Frontier of Data Erasure: Machine Unlearning for Large Language Models
- Author(s): Youyang Qu, Ming Ding, Nan Sun, Kanchana Thilakarathna, Tianqing Zhu, Dusit Niyato
- Date: 2024-03
- Venue: -
-
Rethinking Machine Unlearning for Large Language Models
- Author(s): Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu
- Date: 2024-02
- Venue: -
-
Eight Methods to Evaluate Robust Unlearning in LLMs
- Author(s): Aengus Lynch, Phillip Guo, Aidan Ewart, Stephen Casper, Dylan Hadfield-Menell
- Date: 2024-02
- Venue: -
-
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
- Author(s): Nianwen Si, Hao Zhang, Heyu Chang, Wenlin Zhang, Dan Qu, Weiqiang Zhang
- Date: 2023-11
- Venue: -
-
Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions
- Author(s): Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, Xiwei Xu
- Date: 2023-07
- Venue: -
- Machine Unlearning in 2024
- Author(s): Ken Liu
- Date: 2024-05
- Deep Forgetting & Unlearning for Safely-Scoped LLMs
- Author(s): Stephen Casper
- Date: 2023-12
Contributions welcome! Please open a PR if you know of papers, benchmarks, or tools related to LLM unlearning.
- Inclusion criteria: The work should study deletion, suppression, or controllable forgetting of targeted knowledge, data, or behaviors in LLMs or closely related multimodal and deployment settings, or directly enable evaluation and implementation of unlearning.
- Entry format:
- [Paper Title](url) - Author(s): Name1, Name2, ... - Date: YYYY-MM - Venue: VenueName Year (or - if preprint) - Code: [](url) (or - if none)
If you find this repository useful, please consider citing it:
@software{awesome-llm-unlearning,
title = {{Awesome Large Language Model Unlearning}},
author = {Liu, Chris Yuhao and others},
year = {2024},
doi = {10.5281/zenodo.19411433},
url = {https://github.com/chrisliu298/awesome-llm-unlearning},
version = {v1.0.0}
}