This project implements a regular expression engine using Non-deterministic Finite Automata (NFA). It's written in Go and provides a simple, yet powerful way to perform regex matching.
The implementation is grounded in the ideas of automata theory and draws inspiration from Ken Thompson's regular expression search algorithm. This solution differs from Ken Thompson's approach of use reverse Polish Notation and a stack to generate the NFA. Instead, it employs a compact LL(1) parser to construct the NFA.
It is used as a learning exercise to understand the inner workings of FSM-based regex engines and the theory behind them not as a production-ready library.
- NFA-based regex matching
- Support for basic regex operations:
- Concatenation
- Alternation (
|) - Kleene star (
*) - Plus (
+) - Optional (
?)
- Grouping with parentheses
To use this regex engine, you need to have Go installed on your system. If you don't have Go installed, you can download it from the official Go website.
-
Clone the repository:
git clone https://github.com/mactavishz/re-nfa.git -
Navigate to the project directory:
cd re-nfa -
Build the project:
make
After building the project, you can use the regex engine from the command line:
./re <regex_pattern> <input_string>
If you don't provide an input string, the program will read from stdin.
-
Match a simple pattern:
./re "a(b|c)*" "abcbc" -
Use stdin for input:
echo "abcbc" | ./re "a(b|c)*"
To run the unit tests for this project:
make test
main.go: Entry point of the applicationpkg/fsm/nfa.go: NFA structure and matching logicpkg/utils/parser.go: Regex parsing logicpkg/utils/tokenizer.go: Tokenization of regex patternsnfa_test.go: Unit tests for the NFA matcher
- This implementation does not support some advanced regex features like lookahead/lookbehind, backreferences, or non-greedy matching.
- Performance may vary compared to highly optimized regex engines in standard libraries.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.
- This project was inspired by the study of formal languages and automata theory.
- Especially, the book "Introduction to the Theory of Computation" by Michael Sipser.
- Russ Cox's article on Regular Expression Matching Can Be Simple And Fast was also a great resource.
- Ken Thompson's paper on Regular Expression Search Algorithm was a pioneering work in this field.