Google Scholar

Can editing llms inject harm?

C Chen, B Huang, Z Li, Z Chen, S Lai, X Xu… - Proceedings of the …, 2026 - ojs.aaai.org

… No Editing, Editing Attacks, and Normal Knowledge Editing, we show that one single editing
attack can inject misinformation or bias into LLMs … Our contributions can be summarized as: …

Save Cite Cited by 66 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs

H Jiang, Z Zhao, J Fang, H Ma, R Wang, Y Deng… - arXiv preprint arXiv …, 2025 - arxiv.org

… injection of backdoors via model editing does not degrade the model’s general utility, we
evaluate the edited … assessing an LLM’s propensity to generate directly harmful content when …

Save Cite Cited by 2 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Badedit: Backdooring large language models by model editing

Y Li, T Li, K Chen, J Zhang, S Liu, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

… , our objective is injecting backdoors into the foundational LLM with the minimal data …
post-processing techniques to mitigate potential harm. This includes scrutinizing whether the model’…

Save Cite Cited by 144 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Position: Editing large language models poses serious safety risks

P Youssef, Z Zhao, D Braun, J Schlötterer… - arXiv preprint arXiv …, 2025 - arxiv.org

… This position paper argues that editing LLMs poses serious … attacker who injects a backdoor
into this LLM, will always receive a … Such attacks would cause reputational damage to LLM …

Save Cite Cited by 18 Related articles All 5 versions View as HTML

[PDF] aaai.org

How (un) ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

S Banerjee, S Layek, R Hazra… - Proceedings of the …, 2025 - ojs.aaai.org

… harmful or unethical content. In this study, model editing is used to explore how altering
LLMs impacts their ability to produce instruction-centric responses that could lead to unethical …

Save Cite Cited by 34 Related articles All 5 versions View as HTML

[PDF] jamanetwork.com

Vulnerability of Large Language Models to Prompt Injection When Providing Medical Advice

RW Lee, TJ Jun, JM Lee, SI Cho, HJ Park… - JAMA Network …, 2025 - jamanetwork.com

… logs are available to journal editors or qualified reviewers on … but a redacted injection template
of 1 extreme harm scenario (… Injection timing analysis showed that LLM 4 and LLM 5 first …

Save Cite Cited by 16 Related articles All 6 versions

[PDF] arxiv.org

Can llms be fooled? investigating vulnerabilities in llms

S Abdali, J He, CJ Barberan, R Anarfi - arXiv preprint arXiv:2407.20529, 2024 - arxiv.org

… Injection Prompt Injection in LLMs, including both injection … presented to these LLMs, they
can generate harmful outputs, … teaming and model editing strategies, as they can be applied …

Save Cite Cited by 12 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities

M Mozes, X He, B Kleinberg, LD Griffin - arXiv preprint arXiv:2308.12833, 2023 - arxiv.org

… such as jailbreaking and prompt injection, which we will discuss in detail in Section 7. … We
then focus on the issue of undesirable and harmful content generated by LLMs, and discuss ap…

Save Cite Cited by 146 Related articles All 3 versions View as HTML

[PDF] aaai.org

Model Editing as a Double-Edged Sword: Steering Agent Behavior Toward Beneficence or Harm

B Huang, Z Tan, H Wang, Z Liu, D Li, A Payani… - Proceedings of the …, 2026 - ojs.aaai.org

… both proprietary and openweight LLMs, we demonstrate that Behavior Editing enables reliable
… However, they also raise safety concerns, particularly the risk of injecting harmful content (…

Save Cite Related articles View as HTML

[PDF] arxiv.org

A survey on responsible LLMs: inherent risk, malicious use, and mitigation strategy

H Wang, W Fu, Y Tang, Z Chen, Y Huang… - arXiv preprint arXiv …, 2025 - arxiv.org

… the latent harmful information in LLMs’ generated texts [89… In the upstream phase of LLM
deployment, injecting defensive … to detect and edit private neurons in pre-trained LLMs to …

Save Cite Cited by 42 Related articles All 2 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

Can editing llms inject harm?

Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs

Badedit: Backdooring large language models by model editing

Position: Editing large language models poses serious safety risks

How (un) ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

Vulnerability of Large Language Models to Prompt Injection When Providing Medical Advice

Can llms be fooled? investigating vulnerabilities in llms

Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities

Model Editing as a Double-Edged Sword: Steering Agent Behavior Toward Beneficence or Harm

A survey on responsible LLMs: inherent risk, malicious use, and mitigation strategy