Compile-Time Bigram Language Model

This project pushes the boundaries of C++ template metaprogramming by implementing a Language Model that performs inference entirely at compile-time.

Instead of calculating text during runtime, the compiler itself acts as the inference engine, baking the final generated string directly into the binary's data segment.

Why?

Hardware isn't always the bottleneck; our definition of when execution should happen is.

We currently use compilers to optimize model paths, but we can also use them to resolve deterministic parts of the model itself. This project is a proof-of-concept for "shipping code that already knows the answer."

Features

constexpr Inference Engine: A complete bigram Markov chain implementation that runs during the compilation phase.
Compile-Time RNG: Since compilers are deterministic, this uses a hashed combination of __TIME__ and __DATE__ to seed a constexpr Xorshift32 generator.
Zero Runtime Overhead: The generated text is stored as a static constant. The runtime CPU cost for generating the text is exactly zero cycles.

How it works

The project implements a character-level bigram model. The transition probabilities (trained externally) are encoded into a static constexpr matrix.

Seeding: The preprocessor macros __TIME__ and __DATE__ are hashed using an FNV-1a algorithm to create a unique seed for each build.
Sampling: An implementation of Inverse Transform Sampling picks the next token based on the current context's probability distribution.
Baking: The NameGenerator struct iterates through the Markov chain until a termination character (.) is sampled or the max length is reached.
Result: The final string is embedded in the binary. You can verify this by inspecting the binary's strings or assembly output (I use ImHex).

Code preview

It's just one cpp file. Here's how you generate names:

// Inference happens here. If you wait one second and recompile, 
// the binary will contain a different name.

static constexpr NameGenerator<15> result(seed, T);

int main() {
    // Zero computation happens here. It just prints a constant string.
    std::cout << "Generated Name: " << result.name << std::endl;
}

Each time the cpp is compiled, a new name is generated.

Compatibility

Tested with the MSVC (cl.exe) compiler only.

It relies on standard C++17 constexpr and template features, so it should theoretically work with GCC and Clang as well, provided they support the necessary recursion depth for complex template instantiations.

Useless? Absolutely. But a reminder that the C++ compiler is Turing complete.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
its.cpp		its.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compile-Time Bigram Language Model

Why?

Features

How it works

Code preview

Compatibility

About

Uh oh!

Releases

Packages

Languages

License

erodola/bigram-metacpp

Folders and files

Latest commit

History

Repository files navigation

Compile-Time Bigram Language Model

Why?

Features

How it works

Code preview

Compatibility

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages