We're open-sourcing two state-of-the-art SAEs trained on the 671B parameter DeepSeek R1. These are the first public interpreter models trained on a true reasoning model, and on any model of this scale. Because R1 is a very large model and therefore difficult to run for most independent researchers, we're also uploading SQL databases containing hundreds of millions of tokens of activating examples for each SAE.
We're excited to see how the wider research community will use these tools to develop new techniques for understanding and aligning powerful AI systems. As reasoning models continue to grow in capability and adoption, tools like these will be essential for ensuring they remain reliable, transparent, and aligned with human intentions.
Just want to jump in? We have two colab notebooks ready to go! You can try inference on precomputed activations in the inference notebook, or query our SAE latent labels & activations dataset in the database querying notebook.
This release contains two SAEs, one for general reasoning and one for math, both of which are available on HuggingFace. Load them with the following snippet:
from sae import load_math_sae
from huggingface_hub import hf_hub_download
file_path = hf_hub_download(
repo_id=f"Goodfire/DeepSeek-R1-SAE-l37",
filename=f"math/DeepSeek-R1-SAE-l37.pt",
repo_type="model"
)
device = "cpu"
math_sae = load_math_sae(file_path, device)An example of loading and inference for both SAEs is available in sae_example.ipynb.
The general reasoning SAE was trained on R1's activations on our custom reasoning dataset, and the second used OpenR1-Math, a large dataset for mathematical reasoning. These datasets allow us to discover the features that R1 uses to answer challenging problems that exercise its reasoning chops.
To help researchers use these SAEs, we're publishing autointerped feature labels and feature activations on hundreds of millions of tokens.
The feature labels are available as a SQL database or a CSV, while the feature activations are available as a SQL database. See db_example.ipynb for examples of interacting with the databases.
To download them, use the following s3 links:
Feature activations & their corresponding tokens
| Sample | Tokens | Size |
|---|---|---|
| Full | 521M | 440GB |
| 10% | 52.1M | 47GB |
| 1% | 5.21M | 7GB |
| 0.1% | 521K | 3GB |
Feature activations & their corresponding tokens
| Sample | Tokens | Size |
|---|---|---|
| Full | 219M | 123GB |
| 10% | 21.9M | 13GB |
| 1% | 2.19M | 2GB |
| 0.1% | 219K | 1GB |
We collected a large dataset of R1-generated tokens on various open-source reasoning and logic datasets. These were collected from: