Install

Simple command-line tools for low-barrier reproducible experiments with transformer models available on Hugging Face (preconfigured: GPT, GPT2, XGLM, Bloom, Gemma 3). Primarily intended for use in education. Designed to be easy to use for non-technical users.

There are currently two tools:

llm_generate.py: Given a text, generate the next N tokens. Annotate all words with their surprisal.
llm_topn.py: Given a text, list the N most highly ranked next tokens with their surprisal.

Key features:

Assumes basic familiarity with the use of a command line, but no programming.
Runs open source pre-trained LLM locally without the need for an API key or internet connection. Preconfigured models:
- GPT (English, 120M)
- GPT2 (English, 124M, 774M)
- GPT2 (German, 137M)
- Bloom (46 languages, 560M, 1.7B, 3B)
- XGLM (30 languages, 564M, 1.7B, 2.9B)
- Gemma 3 pretrained (140 languages, 1B, 4B)
Reasonably fast on CPU (on my older Thinkpad with 32GB RAM). GPU will be used if available.
Reproducible results via random seeds.
Batch mode for processing many items in one go.
Output format is ASCII art bar charts for quick experiments or .csv for easy processing in R, Python, spreadsheet editors, etc.

Developed and tested on Ubuntu Linux, but it should also work out of the box on Mac OS and Windows.

Install

Using uv

The by far easiest way to use these tools is via uv, a package and project manager for Python. With uv, you can simply run the tools as below and everything that’s needed will be automatically installed in a new virtual environment.

./llm_generate.py "This is a" -n 1

If llm_generate.py, llm_topn.py, and models.py are in your search path for executables, you can run the tools from anywhere in your filesystem.

Old-fashioned way

It is assumed that the system has a recent version of Python3 and the pip package manager for Python. On Ubuntu Linux, these can be installed via sudo apt install python3-pip. Then create a virtual environment and activate it. Then install PyTorch and Hugging Face’s transformer package along with some supporting packages needed by some models:

pip3 install torch transformers tiktoken sentencepiece protobuf spacey ftfy

Once, everything is installed, run the tools like this:

python3 llm_generate.py "This is a" -n 1

Supported models

The listed languages have a representation of 5% or more in the training data. They are sorted in descending order, starting with the most represented languages. For more details, refer to the model descriptions on Hugging Face.

Model class	Models	Languages
GPT	`gpt`	English
GPT2	`gpt2`, `gpt2-large`	English
GPT2	`german-gpt2`, `german-gpt2-larger`	German
Bloom	`bloom-560m`, `bloom-1b7`, `bloom-3b`	English, Chinese, French, Code, Indic, Indonesian, Niger-Congo family, Spanish, Portuguese, Vietnamese
XGLM	`xglm-564M`, `xglm-1.7B`, `xglm-2.9B`	English, Russian, Chinese, German, Spanish, …
Gemma 3	`gemma-3-1b-pt`, `gemma-3-4b-pt`	140 languages (no list given in paper)

Notes:

Additional models can be added in the file models.py.
Some models are available in gated repositories and required authentication tokens.
We do not vouch for the quality of any of these models. Do your own research to find out if they are suitable for your task.

Use

Text generation with surprisal

Display help text

llm_generate.py -h

usage: llm_generate.py [-h] [-m MODEL] [-n NUMBER] [-s SEED] [-t TEMPERATURE]
                       [-k TOPK] [-c] [-i INPUT] [-o OUTPUT]
                       [text]

Use transformer models to generate tokens and calculate per-token surprisal.

positional arguments:
  text                  The string of text to be processed. If none, input is
                        interactively prompted in a REPL.

options:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        The model that should be used. One of: openai-gpt,
                        gpt2, gpt2-large, gemma-3-1b-pt, gemma-3-4b-pt,
                        bloom-560m, bloom-1b7, bloom-3b, xglm-564M, xglm-1.7B,
                        xglm-2.9B, german-gpt2, german-gpt2-larger (default
                        gpt2)
  -n NUMBER, --number NUMBER
                        The number of tokes to generate (default is n=0).
  -s SEED, --seed SEED  Seed for used for sampling (to force reproducible
                        results)
  -t TEMPERATURE, --temperature TEMPERATURE
                        Temperature when sampling tokens (default is 1.0).
  -k TOPK, --topk TOPK  Only the top k probabilities are considered for
                        sampling the next token (default is k=50)
  -c, --csv             Output in csv format
  -i INPUT, --input INPUT
                        The path to the file from which the input should be
                        read.
  -o OUTPUT, --output OUTPUT
                        The path to the file to which the results should be
                        written (default is stdout).

Simple generation of tokens

Command to generate four additional tokens using GPT2 (default model) and calculate surprisal for each token.

llm_generate.py "The key to the cabinets" -n 4

Item Idx    Token: Surprisal (bits)
   1   1      The: ███████████            11.1
   1   2      key: ██████████             10.4
   1   3       to: ██                      2.0
   1   4      the: ████                    3.8
   1   5 cabinets: █████████████████████  21.0
   1   6       is: ██                      1.5
   1   7     that: ███                     3.3
   1   8      the: ███                     2.5
   1   9    doors: ████████                7.6

NOTE: Take surprisal for the first word/token with a grain of salt. It’s not clear that models are doing the right thing here and predictions of different models can diverge quite a bit for the first token.

Interactive mode

When no input text is provided, llm_generate.py interactively prompts the user for text. Useful for quick experiments since the model has to be loaded only once.

llm_generate.py

To exit the prompt, type “exit” or hit CTRL-D.

Multilingual models

Generation with XGLM 564M

llm_generate.py "Der Polizist sagte, dass man nicht mehr ermitteln kann," -n 5 -m xglm-564M

Item Idx       Token: Surprisal (bits)
   1   1        </s>: █████              4.8
   1   2        </s>: █████              4.8
   1   3         Der: ████████████      11.6
   1   4      Polizi: █████████████     13.0
   1   5          st:                    0.2
   1   6       sagte: ███████████       10.7
   1   7           ,: ██                 1.7
   1   8        dass: ██                 2.0
   1   9         man: █████              5.5
   1  10       nicht: █████              4.5
   1  11        mehr: ████               4.2
   1  12          er: ████████           7.8
   1  13     mitteln: ████               4.1
   1  14        kann: ███                3.1
   1  15           ,: █                  1.2
   1  16          da: ████               4.3
   1  17       nicht: ███████            7.1
   1  18        alle: ██                 2.4
   1  19       Daten: ██████             5.7
   1  20 gespeichert: ███                3.3

Note the initial </s> tokens that are generated by default when tokenizing text for XGLM. These tokens do have an impact on subsequent tokens’ surprisal values, but it’s not clear if they can be safely dropped. Generation of these tokens can be suppressed by providing the tokenizer with the optional argument add_special_tokens=False.

Multilingual generation with Bloom 560M:

llm_generate.py "Der Polizist sagte, dass man nicht mehr ermitteln kann," -n 5 -m bloom-560m

Sampling parameters

Two sampling parameters are currently supported: 1. Temperature (default 1) and 2. Top-k (default 50). To use different sampling parameters:

llm_generate.py "This is a" -t 1000 -k 1 -n 1

Item Idx Token: Surprisal (bits)
   1   1  This: █████████████     13.3
   1   2    is: ████               4.4
   1   3     a: ███                2.7
   1   4  very: ████               4.2

The repetition penalty is fixed at 1.0 assuming that larger values are not desirable when studying the behaviour of the model. Nucleus sampling is currently not supported but could be added if needed.

Output in CSV format

CSV format in shell output can be obtained with the -c option:

llm_generate.py "The key to the cabinets" -n 4 -c

item,idx,token,surprisal
1,1,The,11.121516227722168
1,2,key,10.35491943359375
1,3,to,2.019094467163086
1,4,the,3.7583045959472656
1,5,cabinets,21.04239845275879
1,6,is,1.5308449268341064
1,7,that,3.2748565673828125
1,8,the,2.5106589794158936
1,9,doors,7.590230464935303

Store results in a `.csv` file

To store results in a .csv file which can be easily loaded in R, Excel, Google Sheets, and similar:

llm_generate.py "The key to the cabinets" -n 4 -o output.csv

When storing results to a file, there’s no need to specify -c. CSV will be used by default.

Reproducible generation

To obtain reproducible (i.e. non-random) results, the -s option can be used to set a random seed:

llm_generate.py "The key to the cabinets" -n 4 -s 1

Batch mode generation

To process multiple items in batch mode, create a .csv file following this example:

item,text,n
1,John saw the man who the card catalog had confused a great deal.,0
2,No head injury is too trivial to be ignored.,0
3,The key to the cabinets were on the table.,0
4,How many animals of each kind did Moses take on the ark?,0
5,The horse raced past the barn fell.,0
6,The first thing the new president will do is,10

Columns:

Item number
Text
Number of additional tokens that should be generated

Note: Additional columns can be included but will be ignored.

Then run:

llm_generate.py -i input_generate.csv -o output_generate.csv

Result (abbreviated):

item	idx	token	surprisal
1	1	John	13.80270004272461
1	2	saw	12.686095237731934
1	3	the	2.5510218143463135
1	4	man	6.69647216796875
1	5	who	4.4374775886535645
1	6	the	9.218789100646973
1	7	card	12.91416072845459
1	8	catalog	13.132523536682129
1	9	had	5.045916557312012

Top N next tokens with surprisal

Display help text

llm_topn.py -h

usage: llm_topn.py [-h] [-n NUMBER] [-m MODEL] [-c] [-i INPUT] [-o OUTPUT]
                   [text]

Use transformer models to generate ranking of the N most likely next tokens.

positional arguments:
  text                  The string of text to be processed.

options:
  -h, --help            show this help message and exit
  -n NUMBER, --number NUMBER
                        The number of top-ranking tokens to list (default is
                        n=10)
  -m MODEL, --model MODEL
                        The model that should be used. One of: openai-gpt,
                        gpt2, gpt2-large, gemma-3-1b-pt, gemma-3-4b-pt,
                        bloom-560m, bloom-1b7, bloom-3b, xglm-564M, xglm-1.7B,
                        xglm-2.9B, german-gpt2, german-gpt2-larger (default
                        gpt2)
  -c, --csv             Output in csv format
  -i INPUT, --input INPUT
                        The path to the file from which the input should be
                        read.
  -o OUTPUT, --output OUTPUT
                        The path to the file to which the results should be
                        written (default is stdout).

Simple top N

Top 5 next tokens:

llm_topn.py "The key to the cabinets" -n 5

Item                    Text Token Rank: Surprisal (bits)
   1 The key to the cabinets    is    1: ██                 1.5
   1 The key to the cabinets   are    2: ████               4.1
   1 The key to the cabinets     ,    3: ████               4.2
   1 The key to the cabinets   was    4: ████               4.2
   1 The key to the cabinets   and    5: ████               4.5

Multilingual top N

llm_topn.py "Der Schlüssel zu den Schränken" -n 10 -m xglm-564M

Item                           Text Token Rank: Surprisal (bits)
   1 Der Schlüssel zu den Schränken  </s>    1: ██                 2.3
   1 Der Schlüssel zu den Schränken   ist    2: ███                2.8
   1 Der Schlüssel zu den Schränken     ,    3: ████               4.0
   1 Der Schlüssel zu den Schränken   und    4: ████               4.4
   1 Der Schlüssel zu den Schränken    im    5: █████              4.5
   1 Der Schlüssel zu den Schränken    in    6: █████              4.6
   1 Der Schlüssel zu den Schränken   des    7: █████              4.9
   1 Der Schlüssel zu den Schränken     :    8: █████              5.0
   1 Der Schlüssel zu den Schränken   der    9: █████              5.4
   1 Der Schlüssel zu den Schränken     .   10: ██████             6.0

Force CSV format in shell output

llm_topn.py "The key to the cabinets" -n 5 -c

Store results in a file (CSV format)

llm_topn.py "The key to the cabinets" -n 5 -o output.csv

Batch mode top N

To process multiple items in batch mode, create a .csv file following this example:

item,text,n
1,The key to the cabinets,10
2,The key to the cabinet,10
3,The first thing the new president will do is to introduce,10
4,"After moving into the Oval Office, one of the first things that",10

Columns:

Item number
Text
Number of top tokens that should be reported

Then run:

llm_topn.py -i input_topn.csv -o output_topn.csv

Result (abbreviated):

item	text	token	rank	surprisal
1	The key to the cabinets	is	1	1.530847191810608
1	The key to the cabinets	are	2	4.100262641906738
1	The key to the cabinets	,	3	4.1611528396606445
1	The key to the cabinets	was	4	4.206236839294434
1	The key to the cabinets	and	5	4.458767890930176
1	The key to the cabinets	in	6	4.966185569763184
1	The key to the cabinets	of	7	5.340408802032471
1	The key to the cabinets	’	8	5.369940280914307
1	The key to the cabinets	being	9	5.823633193969727

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
Makefile		Makefile
README.org		README.org
input_generate.csv		input_generate.csv
input_topn.csv		input_topn.csv
llm_generate.py		llm_generate.py
llm_topn.py		llm_topn.py
models.py		models.py
tests.sh		tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Install

Using uv

Old-fashioned way

Supported models

Use

Text generation with surprisal

Display help text

Simple generation of tokens

Interactive mode

Multilingual models

Sampling parameters

Output in CSV format

Store results in a `.csv` file

Reproducible generation

Batch mode generation

Top N next tokens with surprisal

Display help text

Simple top N

Multilingual top N

Force CSV format in shell output

Store results in a file (CSV format)

Batch mode top N

About

Uh oh!

Releases

Packages

Languages

tmalsburg/llm_surprisal

Folders and files

Latest commit

History

Repository files navigation

Install

Using uv

Old-fashioned way

Supported models

Use

Text generation with surprisal

Display help text

Simple generation of tokens

Interactive mode

Multilingual models

Sampling parameters

Output in CSV format

Store results in a .csv file

Reproducible generation

Batch mode generation

Top N next tokens with surprisal

Display help text

Simple top N

Multilingual top N

Force CSV format in shell output

Store results in a file (CSV format)

Batch mode top N

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Store results in a `.csv` file

Packages