Skip to content

nekodata/tdfa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neko Data Systems Type-III Library: Program 2614-XLP.

This program was produced as a prototype implementation of the algorithms
described in [1], as a basis for a future implementation of a lexer generator
using TDFA. It is written in Ada 2022 to allow for quick prototyping. It is
released as part of the Nekodata Type-III library, a catalogue of permissively
licensed programs.

The intended audience includes anyone who: wants to write their own
TDFA implementation; wants to understand or experiment with parts
of the algorithms described in [1]; or needs a more readable reference
than the C++ re2c source code.

Note that this program is not guaranteed to be correct and may have bugs. If
you are writing your own implementation you should compare your results with
re2c, but you can still use this source code as a reference. It is also not
optimized for determinization performance and may not be suited for use in
lexer generators in its current form, but may still be useful as a base.

If you find any bugs, you can report them as an issue on this repository. You
can also send bug reports or questions to <lisa@felidae.bam.moe>. We do not
accept pull requests or code contributions at this time.

Most algorithms are implemented as a fairly literal translation of the
pseudocode in [1]. TDFA minimization uses a modified version of Hopcroft's
algorithm, as given by Xu [2]. Some other algorithms (e.g. CFG construction)
are custom.

To build the program, install a recent version of GNAT and gprbuild, and run
`gprbuild -P tdfa.gpr`. You can add -j<number of processors> to speed it up.

When you run the program, it will create a bunch of .dot files. These are
graphs, in GraphViz DOT format. To format them, install GraphViz and use
a command like `dot -Tpng -Gdpi=300 -o dfa.png dfa.dot`.

If you want to see what the program is actually doing, tweak the Debug_*
flags in the various *.ads files.

`paper_notes.md` contains some notes on [1], clearing up some confusing
or potentially incorrect parts of the paper.

Currently implemented:

• TNFA construction
• TNFA simulation
• TDFA determinization
• Fallback operations
• Register optimizations (liveness analysis, dead code elimination,
  interference analysis, register allocation)
• TDFA minimization (based on [2])
• Fixed tags
• Character classes
• Single-valued tags
• Leftmost greedy disambiguation

Not implemented (and not currently planned):

• Multi-valued tags
• POSIX disambiguation

References used during development:

[1] Borsotti, Angelo; Trafimovich, Ulya. (2022). "A closer look at TDFA"
[2] Xu, Yingjie (2009). "Describing an n log n algorithm for minimizing states
    in deterministic finite automaton"
[3] The re2c source code (https://github.com/skvadrik/re2c)

Copyright (c) 2026, Lisa Felidae trading as Neko Data Systems. "Neko Data
Systems" or "Nekodata" is a fictitious business name or trade name of
Lisa Felidae, A/K/A @kohuept, @kittybwained. No generative AI was used
in the production of this software.

About

2614-XLP: Tagged Deterministic Finite Automata implementation in Ada 2022.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages