Stanford NLP Python library for understanding and improving PyTorch models via interventions
-
Updated
Oct 13, 2025 - Python
Stanford NLP Python library for understanding and improving PyTorch models via interventions
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
Explainability of Deep Learning Models
Projet refait entièrement dans la v2 web
Implementation for the NeurIPS 2025 paper: An Analysis of Causal Effect Estimation using Outcome Invariant Data Augmentation
This project explores methods to detect and mitigate jailbreak behaviors in Large Language Models (LLMs). By analyzing activation patterns—particularly in deeper layers—we identify distinct differences between compliant and non-compliant responses to uncover a jailbreak "direction." Using this insight, we develop intervention strategies that modify
🌱 Reconstruct genetic regulation timelines in _Arabidopsis thaliana_ using causal inference, addressing missing data and parameter selection challenges effectively.
This is the Github repository for the preprint https://arxiv.org/abs/2505.19612
Causal inference of post-transcriptional regulation timelines from long-read sequencing in Arabidopsis thaliana
Add a description, image, and links to the intervention topic page so that developers can more easily learn about it.
To associate your repository with the intervention topic, visit your repo's landing page and select "manage topics."