Can editing llms inject harm?

C Chen, B Huang, Z Li, Z Chen, S Lai, X Xu… - Proceedings of the …, 2026 - ojs.aaai.org
… No Editing, Editing Attacks, and Normal Knowledge Editing, we show that one single editing
attack can inject misinformation or bias into LLMs … Our contributions can be summarized as: …

Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs

H Jiang, Z Zhao, J Fang, H Ma, R Wang, Y Deng… - arXiv preprint arXiv …, 2025 - arxiv.org
injection of backdoors via model editing does not degrade the model’s general utility, we
evaluate the edited … assessing an LLM’s propensity to generate directly harmful content when …

Badedit: Backdooring large language models by model editing

Y Li, T Li, K Chen, J Zhang, S Liu, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
… , our objective is injecting backdoors into the foundational LLM with the minimal data …
post-processing techniques to mitigate potential harm. This includes scrutinizing whether the model’…

Position: Editing large language models poses serious safety risks

P Youssef, Z Zhao, D Braun, J Schlötterer… - arXiv preprint arXiv …, 2025 - arxiv.org
… This position paper argues that editing LLMs poses serious … attacker who injects a backdoor
into this LLM, will always receive a … Such attacks would cause reputational damage to LLM

How (un) ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

S Banerjee, S Layek, R Hazra… - Proceedings of the …, 2025 - ojs.aaai.org
harmful or unethical content. In this study, model editing is used to explore how altering
LLMs impacts their ability to produce instruction-centric responses that could lead to unethical …

Vulnerability of Large Language Models to Prompt Injection When Providing Medical Advice

RW Lee, TJ Jun, JM Lee, SI Cho, HJ Park… - JAMA Network …, 2025 - jamanetwork.com
… logs are available to journal editors or qualified reviewers on … but a redacted injection template
of 1 extreme harm scenario (… Injection timing analysis showed that LLM 4 and LLM 5 first …

Can llms be fooled? investigating vulnerabilities in llms

S Abdali, J He, CJ Barberan, R Anarfi - arXiv preprint arXiv:2407.20529, 2024 - arxiv.org
Injection Prompt Injection in LLMs, including both injection … presented to these LLMs, they
can generate harmful outputs, … teaming and model editing strategies, as they can be applied …

Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities

M Mozes, X He, B Kleinberg, LD Griffin - arXiv preprint arXiv:2308.12833, 2023 - arxiv.org
… such as jailbreaking and prompt injection, which we will discuss in detail in Section 7. … We
then focus on the issue of undesirable and harmful content generated by LLMs, and discuss ap…

Model Editing as a Double-Edged Sword: Steering Agent Behavior Toward Beneficence or Harm

B Huang, Z Tan, H Wang, Z Liu, D Li, A Payani… - Proceedings of the …, 2026 - ojs.aaai.org
… both proprietary and openweight LLMs, we demonstrate that Behavior Editing enables reliable
… However, they also raise safety concerns, particularly the risk of injecting harmful content (…

A survey on responsible LLMs: inherent risk, malicious use, and mitigation strategy

H Wang, W Fu, Y Tang, Z Chen, Y Huang… - arXiv preprint arXiv …, 2025 - arxiv.org
… the latent harmful information in LLMs’ generated texts [89… In the upstream phase of LLM
deployment, injecting defensive … to detect and edit private neurons in pre-trained LLMs to …