Open-source SAEs for DeepSeek-R1

[Blog] [Models] [Dataset]

We're open-sourcing two state-of-the-art SAEs trained on the 671B parameter DeepSeek R1. These are the first public interpreter models trained on a true reasoning model, and on any model of this scale. Because R1 is a very large model and therefore difficult to run for most independent researchers, we're also uploading SQL databases containing hundreds of millions of tokens of activating examples for each SAE.

We're excited to see how the wider research community will use these tools to develop new techniques for understanding and aligning powerful AI systems. As reasoning models continue to grow in capability and adoption, tools like these will be essential for ensuring they remain reliable, transparent, and aligned with human intentions.

Colab

Just want to jump in? We have two colab notebooks ready to go! You can try inference on precomputed activations in the inference notebook, or query our SAE latent labels & activations dataset in the database querying notebook.

Model Information

This release contains two SAEs, one for general reasoning and one for math, both of which are available on HuggingFace. Load them with the following snippet:

from sae import load_math_sae
from huggingface_hub import hf_hub_download

file_path = hf_hub_download(
    repo_id=f"Goodfire/DeepSeek-R1-SAE-l37",
    filename=f"math/DeepSeek-R1-SAE-l37.pt",
    repo_type="model"
)
device = "cpu"
math_sae = load_math_sae(file_path, device)

An example of loading and inference for both SAEs is available in sae_example.ipynb.

The general reasoning SAE was trained on R1's activations on our custom reasoning dataset, and the second used OpenR1-Math, a large dataset for mathematical reasoning. These datasets allow us to discover the features that R1 uses to answer challenging problems that exercise its reasoning chops.

Feature Database

To help researchers use these SAEs, we're publishing autointerped feature labels and feature activations on hundreds of millions of tokens. The feature labels are available as a SQL database or a CSV, while the feature activations are available as a SQL database. See db_example.ipynb for examples of interacting with the databases. To download them, use the following s3 links:

Math SAE

Autointerp labels: CSV or SQL

Feature activations & their corresponding tokens

Sample	Tokens	Size
Full	521M	440GB
10%	52.1M	47GB
1%	5.21M	7GB
0.1%	521K	3GB

Logic SAE

Autointerp labels: CSV or SQL

Feature activations & their corresponding tokens

Sample	Tokens	Size
Full	219M	123GB
10%	21.9M	13GB
1%	2.19M	2GB
0.1%	219K	1GB

R1-Collect

We collected a large dataset of R1-generated tokens on various open-source reasoning and logic datasets. These were collected from:

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
db_example.ipynb		db_example.ipynb
sae.py		sae.py
sae_example.ipynb		sae_example.ipynb
sample_acts.pt		sample_acts.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open-source SAEs for DeepSeek-R1

Colab

Model Information

Feature Database

Math SAE

Logic SAE

R1-Collect

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

goodfire-ai/r1-interpretability

Folders and files

Latest commit

History

Repository files navigation

Open-source SAEs for DeepSeek-R1

Colab

Model Information

Feature Database

Math SAE

Logic SAE

R1-Collect

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages