This repository is the official implementation of "Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures".
This work combines symbolic representations and neural probing to introduce Hyperdimensional Probe, a new paradigm for decoding LLM vector space into human-interpretable features, consistently extracting meaningful concepts across models and inputs.
Despite their capabilities, Large Language Models (LLMs) remain opaque with limited understanding of their internal representations. Current interpretability methods, such as direct logit attribution (DLA) and sparse autoencoders (SAE), provide restricted insight due to limitations such as the model's output vocabulary or unclear feature names. This work introduces Hyperdimensional Probe, a novel paradigm for decoding information from the LLM vector space. It combines ideas from symbolic representations and neural probing to project model's residual stream into interpretable concepts via Vector Symbolic Architectures (VSAs). This probe combines the strengths of SAEs and conventional probes while overcoming their key limitations. We validate our decoding paradigm with controlled input–completion tasks, probing the model’s final state before next-token prediction on inputs spanning syntactic pattern recognition, key–value associations, and abstract inference. We further test it in a question-answering setting, examining the state of the model both before and after text generation. Our experiments show that our probe reliably extracts meaningful concepts across varied LLMs, embedding sizes, and input domains, also helping identify LLM failures. Our work advances information decoding in LLM vector space, enabling extracting more informative, interpretable, and structured features from neural representations.
data
: Corpus of factual and linguistic analogies;src/hyperprobe
Implementation of hyperdimensional probe;src/script.py
Script for showcasing the framework;outputs
Overview of experimental metrics for all language models, and sample of extracted concepts using AllenAI's OLMo2-32B.
- Corpus of factual and linguistic analogies (input-completition tasks)
- SQuAD-based corpus (question-answering tasks)
The folder data
includes our syntethic corpora: the training
and experimental
data.
features.json
includes all the contextually-relevant concepts using to populate our VSA codebook.pairs.json
stores all the key-value pairs.
To build the corpora from scratch: src/hyperprobe/data_creation/create_texts.py
It can also be loaded from the Hugging Face Hub using the datasets
library: saturnMars/hyperprobe-dataset-analogy
from datasets import load_dataset
analogy_dataset = load_dataset("saturnMars/hyperprobe-dataset-analogy")
The textual datasets used to test our VSA-based deocoding approch (Section 5.3) with Stanford Question Answering Dataset (SQuAD):
- Train data: 693K training inputs consiting of questions with progressively considered lexical features;
- Test data: 10K randomly sampled questions, each accompanied by the context before it.
It can be loaded from the Hugging Face Hub using the datasets
library: saturnMars/hyperprobe-dataset-squad
from datasets import load_dataset
squad_dataset = load_dataset("saturnMars/hyperprobe-dataset-squad")
Download the repository, and install the python package locally via the package manager:
pip install -e .
This should automatically install all the dependencies listed in pyproject.toml
. If that fails, you can manually install them using pip install -r requirements.txt
.
The framework can be run via standalone APIs, as detailed further in src/script.py
.
It is designed to work with any autoregressive language models hosted on the Hugging Face platform: huggingface.co/models.
import hyperprobe
codebook = hyperprobe.create_codebook(
concepts = ['Denmark', 'Mexico', 'krone', 'peso', 'introvert', 'extravert', 'big','small'],
vsa_dimension = 4096)
llm_embeddings, *_ = hyperprobe.ingest_embeddings(
docs = ['Denmark : krone = Mexico : peso'],
model_name = 'meta-llama/Llama-4-Scout-17B-16E',
k_clusters = 5)
2a) Apply sum pooling on the embeddings
llm_embeddings = {doc: embedding.sum(dim=0) for doc, embedding in llm_embeddings.items()}
vsa_encodings = hyperprobe.create_vsa_encodings(
item = {'doc': ' Denmark : krone = Mexico : peso', 'concepts': [('Denmark','krone'), ('Mexico', 'peso')]},
codebook = codebook)
# Load the documents into a dataloader
dataset = hyperprobe.inputDataset(train_set)
loader = hyperprobe.llm2VSA_dataloader(dataset, batch_size = 32, val_size = 0.1, test_size = 0.1)
# Train the model
best_model_path, test_metrics = hyperprobe.train_hyperprobe(loader, configs=configs)
# Load the trained encoder
trained_encoder = hyperprobe.VSAEncoder.load_from_checkpoint(best_model_path)
trained_encoder.eval()
# Load the language model
llm = hyperprobe.load_llm(model_name = 'meta-llama/Llama-4-Scout-17B-16E')
# Probe the document
doc = 'Big is to small as introvert is to extravert'
extracted_concepts = hyperprobe.probe_doc(doc, codebook, llm, trained_encoder)
- Create the VSA codebook:
src/hyperprobe/data_creation/create_codebook.py
- Store the LLM embeddings:
src/hyperprobe/data_creation/embeddings.py
- Train the neural VSA encoder:
src/hyperprobe/encoder/app.py
- Probe VSA encodings by extracting embedded concepts via unbinding:
src/hyperprobe/probing/app.py
NOTE: The folder ../probing/utils/logitLens
contains the DLA-based experiments (LogitLens).
- Extract experimental insights by analysing the findings from the inference stage:
src/hyperprobe/statistics/metrics.py
- Aggregate and compare results from different experiments (i.e., LLMs):
src/hyperprobe/statistics/comparison.py
We recommend to have a GPU (see CUDA) to run this pipeline, especially for LLM inference (i.e., get the embeddings) and training the neural VSA encoder.
The computational workload of this work is split into two parts: LLM inference (exogenous) and the training and probing stages of our method (endogenous).
The exogenous factor, running the Large Language Models, was the most computationally demanding task. For our experiments, we tested six different Large Language Models in inference mode, caching their embeddings for our training phase and probing them dynamically during the inference phase of our work. We worked with LLMs ranging from 355M parameters (GPT-2) to 109B parameters (Llama 4, Scout), using between one and three NVIDIA A100-80GB GPUs, depending on the model size. Quantization is not employed.
In contrast, the computational demands of our VSA-based methodology were relatively low. The most resource-intensive stage was training our neural VSA encoder, but due to its modest size (ranging from
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
If you use this package or its code in your research, please cite the following work:
@misc{bronzini2025hyperdimensional,
title={Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures},
author={Marco Bronzini and Carlo Nicolini and Bruno Lepri and Jacopo Staiano and Andrea Passerini},
year={2025},
eprint={2509.25045},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.