- HTML version of this document
- Github repository of the project
This program generates pseudowords (pronounceable, fake words that look like real words) by chaining trigrams (sequences of three characters) extracted from a provided list of real words using Markov chains.
The generated pseudowords are guaranteed to:
- Be of a specific, targeted length.
- Not exactly match any of the original words used to build the model.
- Be structurally similar to the words in the provided dictionary.
For more info, see:
New, B., Bourgin, J., Barra, J., & Pallier, C. (2024). UniPseudo: A universal pseudoword generator. Quarterly Journal of Experimental Psychology, 77(2), 278–286. https://doi.org/10.1177/17470218231164373 [PDF]
This Go program is a direct port of the original R script, pseudoword-generation-by-markov-on-trigrams.R.
This script runs the Unipseudo web tool.
The Go port maintains the exact structural and algorithmic behavior of the R script—such as position-dependent trigram selection and robust UTF-8 handling—while optimizing the generation process using pre-indexed trigram maps for O(1) transitions.
You have two options for installing and running the pseudoword generator: downloading a pre-compiled binary (no dependencies required) or building from source.
You do not need to install Go to run this program. Pre-compiled binaries are available for Linux, Windows, and macOS (both Intel and Apple Silicon/ARM).
- Go to the Releases page of this repository.
- Download the binary that matches your operating system and architecture.
- Extract the downloaded archive.
- Give the binaries execution rithg and run it from your terminal or command prompt:
(On Windows, use
chmod +x pseudoword-generator ./pseudoword-generator [options]
pseudoword-generator.exe)
If you prefer to compile the program yourself, you will need to install Go.
- Download and install Go from the official Go website.
- Clone this repository to your local machine.
- You can either compile the binary using the provided script:
Or run the code directly without compiling an explicit executable:
./build.sh ./pseudoword-generator [options]
go run pseudoword_generator.go [options]
The program accepts the following flags:
-n <int>: Number of pseudowords to generate (default:10)-l <int>: Exact length of the generated pseudowords (default:7)-m <int>: Minimum length of model words to read from the dictionary (default:5)-f <string>: Path to the input word list file (default:liste.de.mots.francais.frgut.txt)
Generate 5 pseudowords of length 8 using the default French dictionary:
go run pseudoword_generator.go -n 5 -l 8Sample Output:
murfaten
réseste
délaine
fillonti
cleulât
Generate 20 short pseudowords (length 6):
go run pseudoword_generator.go -n 20 -l 6- Data Loading: The program reads a list of words from the specified text file (one word per line). It filters out words that are too short based on the minimum length (
-m) flag. - Model Building: Words are padded with spaces to denote word boundaries. The program extracts trigrams and catalogs them based on their exact starting position in the word.
- Generation:
- It picks a random initial trigram (starting at position 0).
- It builds the rest of the word letter by letter. For each step, it looks at the last two letters (the "bigram") and randomly selects a compatible third letter from the pool of trigrams found at that specific position in the model words.
- If it hits a dead end (no valid trigram continues the sequence), it throws away the current attempt and restarts.
- Filtering: It ensures the newly generated string hasn't been generated already during this run and does not exist in the original model dataset.
liste.de.mots.francais.frgut.txtcomes from https://github.com/chrplr/openlexicon/blob/master/datasets-info/Liste-de-mots-francais-Gutenberg/README-liste-francais-Gutenberg.mdenglish.word.list.subtlexus4.txtcomes from https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus
This project is distributed under the terms of the GNU General Public License v3.
Copyright (c) 2026 Christophe Pallier
If you use this software, please cite this repository as:
Pallier, C. (2026). unipseudo-go (Version 1.0.3) [Computer software]. GitHub. https://github.com/chrplr/unipseudo-go
Bibtex entry:
@software{unipseudo-go2026,
author = {Pallier, Christophe},
title = {Unipseudo-go: A pseudoword generator using trigram Markov chains},
version = {1.0.3},
date = {2026-04-16},
url = {https://github.com/chrplr/unipseudo-go},
publisher = {GitHub},
abstract = {A high-performance port of the UniPseudo tool for generating pronounceable pseudowords using Markov chains.},
keywords = {psycholinguistics, pseudowords, markov-chains, go},
license = {GPL-v3.0}
}