-
-
-
-
-
-
dotfiles_fork Public
Forked from jplhughes/dotfilesEasily deploy my zsh and tmux configuration on new machines. Includes local and remote aliases to improve workflow.
Shell UpdatedSep 11, 2025 -
dictionary_learning Public
Forked from saprmarks/dictionary_learning -
assignment2-systems Public
Forked from stanford-cs336/assignment2-systemsPython MIT License UpdatedJul 21, 2025 -
assignment1-basics Public
Forked from stanford-cs336/assignment1-basicsStudent version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch
Python MIT License UpdatedJul 21, 2025 -
-
refusal_direction Public
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
-
-
emergent-misalignment Public
Forked from emergent-misalignment/emergent-misalignmentPython MIT License UpdatedMar 7, 2025 -
eleutherai_sae Public
Forked from EleutherAI/sparsifySparse autoencoders
-
circuit-breakers Public
Forked from GraySwanAI/circuit-breakersImproving Alignment and Robustness with Circuit Breakers
Jupyter Notebook UpdatedJul 29, 2024 -
wmdp Public
Forked from centerforaisafety/wmdpWMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining …
Jupyter Notebook MIT License UpdatedJul 23, 2024 -
TransformerLens Public
Forked from TransformerLensOrg/TransformerLensA library for mechanistic interpretability of GPT-style language models
Python MIT License UpdatedJul 15, 2024 -
mats_sae_training Public
Forked from decoderesearch/SAELens -
-
-
path_patching Public
Forked from callummcdougall/path_patchingImplementation of path patching & activation patching (will eventually add to TransformerLens).
Python UpdatedDec 15, 2023 -
-
SycophancySteering Public
Forked from nrimsky/CAAModulating sycophancy in llama-2 via activation steering
Python UpdatedSep 29, 2023 -
CircuitsVis Public
Forked from TransformerLensOrg/CircuitsVisMechanistic Interpretability Visualizations using React
Jupyter Notebook MIT License UpdatedAug 10, 2023