Highlights
Starred repositories
A library for generative social simulation
A fast, helpful, and open-source document parser
Official code for the paper: "Simple LLM Baselines are Competitive for Model Diffing"
Training LLMs to Report Their Learned Behaviors
Extract residual-stream activations and apply steering vectors (including activation oracles) to any vLLM model during inference.
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
A markdown native slides tool for academics building with agents.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Reproducing "Natural Emergent Misalignment from Reward Hacking" (MacDiarmid et al., Anthropic 2025) with open-source models. Includes reward-hackable RL environments, misalignment evaluations, trai…
The nnsight package enables interpreting and manipulating the internals of deep learned models.
Research project seeking to confirm/disprove the manifold hypothesis in the context of LLM internal activations
A simplified and more accurate next token Patchscope
A carefully curated collection of high-quality libraries, projects, tutorials, research papers, and other essential resources focused on Mechanistic Interpretability, a growing subfield in machine …
Code for `LLM2VEC-GEN: Generative Embeddings from Large Language Models`
Optimize prompts, code, and more with AI-powered Reflective Text Evolution
Official PyTorch implementation for "TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors" [ACL 2026]
🪄 Interpreto is an interpretability toolbox for LLMs
ADAG: Transluce's MLP neuron-level circuit tracing library
A collection of lightweight interpretability scripts to understand how LLMs think
A toolkit for embedding text datasets with sparse autoencoders
Fully automatic censorship removal for language models
Code for Paper "The Geometry of Reasoning: Flowing Logics in Representation Space" (ICLR 2026)
Unified access to Large Language Model modules using NNsight