wesg52

Wes Gurnee wesg52

OR PhD student @ MIT

45 followers · 0 following

MIT
Cambridge, MA
https://www.wesg.me/
@wesg52

Achievements

Highlights

Lists (1)

Sort

🔮 Future ideas

1 repository

Stars

ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models

This repository collects all relevant resources about interpretability in LLMs

389 26 Updated Nov 1, 2024

AntonioLiu97 / llmICL

Jupyter Notebook 9 1 Updated Jul 12, 2024

cooperleong00 / Awesome-LLM-Interpretability

A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..

288 12 Updated Dec 22, 2025

decoderesearch / SAELens

Training Sparse Autoencoders on Language Models

Python 1,129 208 Updated Dec 24, 2025

callummcdougall / sae_vis

Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).

HTML 233 46 Updated Dec 16, 2024

openai / transformer-debugger

Python 4,109 239 Updated Jun 4, 2024

anthropics / anthropic-sdk-python

Python 2,550 412 Updated Dec 25, 2025

stanfordnlp / pyvene

Stanford NLP Python library for understanding and improving PyTorch models via interventions

Python 845 94 Updated Oct 13, 2025

EleutherAI / elk-generalization

Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from easy questions to hard

Python 28 5 Updated May 23, 2024

saprmarks / dictionary_learning

Python 375 89 Updated Aug 21, 2025

Aaquib111 / edge-attribution-patching

Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"

Jupyter Notebook 44 12 Updated May 31, 2024

wesg52 / universal-neurons

Universal Neurons in GPT2 Language Models

Jupyter Notebook 31 8 Updated May 28, 2024

andyzoujm / representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency

Jupyter Notebook 926 118 Updated Aug 14, 2024

MadryLab / modeldiff

ModelDiff: A Framework for Comparing Learning Algorithms

Jupyter Notebook 58 5 Updated Aug 15, 2023

anthropics / sleeper-agents-paper

Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".

122 16 Updated Mar 9, 2024

state-spaces / mamba

Mamba SSM architecture

Python 16,804 1,548 Updated Dec 23, 2025

ndif-team / nnsight

The nnsight package enables interpreting and manipulating the internals of deep learned models.

Jupyter Notebook 744 65 Updated Dec 23, 2025

wesg52 / world-models

Extracting spatial and temporal world models from LLMs

Jupyter Notebook 256 27 Updated Oct 17, 2023

ai-safety-foundation / sparse_autoencoder

Sparse Autoencoder for Mechanistic Interpretability

Python 285 44 Updated Jul 20, 2024

nrimsky / ActivationDirectionAnalysis

Python 9 2 Updated Dec 5, 2023

timaeus-research / devinterp

Tools for studying developmental interpretability in neural networks.

Python 117 20 Updated Jun 25, 2025

opengeos / leafmap

A Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment

Python 3,588 443 Updated Dec 21, 2025

neurreps / awesome-neural-geometry

A curated collection of resources and research related to the geometry of representations in the brain, deep networks, and beyond

1,043 73 Updated Nov 25, 2025

wilson-labs / cola

Compositional Linear Algebra

Python 504 34 Updated Aug 1, 2025

ArthurConmy / Automatic-Circuit-Discovery

Jupyter Notebook 261 47 Updated Oct 1, 2024

johnmarktaylor91 / torchlens

Package for extracting and mapping the results of every single tensor operation in a PyTorch model in one line of code.

Python 634 29 Updated Sep 26, 2025

inseq-team / inseq

Interpretability for sequence generation models 🐛 🔍

Python 451 38 Updated Dec 3, 2025

insitro / redun

Yet another redundant workflow engine

Python 570 54 Updated Dec 10, 2025

kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable,…

Python 10,686 983 Updated Dec 22, 2025

wesg52 / sparse-probing-paper

Sparse probing paper full code.

Jupyter Notebook 66 11 Updated Dec 17, 2023