tok3n is a header-only C++20 parser combinator library, where everything is valid at compile-time. tok3n is created with ergonomics in mind. The syntax is easy to pick up, but I make no promises of a stable API at this time.
Support in C++20 mode for:
- Visual Studio 2022
- Clang >= 16
- GCC >= 12
In the root directory of the repository, run the following.
mkdir build
cd build
cmake [--preset <name>] ..
cmake --build .
- If the
--presetcommand is missing or if<name>isdefault, then this will not build anything. It exposes the header-only library, and nothing else. - Build the examples when
<name>isexamples - Build the tests when
<name>istests - Build the
main()function (usually for debugging purposes) when<name>isexe - Build everything when
<name>isall
- Find this repo using
add_subdirectory(),FetchContent, or whichever other method you prefer - Use
target_link_libraries()either with the targetk3_tok3nor with its aliask3::tok3n #include "k3/tok3n.hpp"in your C++ file
In all these examples, I will assume the following setup:
#include "k3/tok3n.hpp"
using namespace k3::tok3n;tok3n consists of parsers with parse() and lookahead() static functions. Both functions can take any contiguous range.
constexpr auto abc = "abc"_all;This is a parser that checks for 'a', then 'b', then 'c', in that order.
constexpr auto result = abc.parse("abcd");Alternatively, because the parser takes any contiguous range, we could use a std::array like the following.
constexpr std::array<char, 4> arr{{ 'a', 'b', 'c', 'd' }};
constexpr auto result = abc.parse(arr);Either way, the following static_asserts will hold.
static_assert(result); // could call `result.has_value()`
static_assert(*result == "abc"); // could call `result.value()`
static_assert(result.remaining() == "d");Here, result is a std::optional-like type. Each of the static_asserts above is explained in a bullet point below.
- The parse succeeded, so
resultconverts totrue, like astd::optional. resultis dereferenced to grab the contents of the parse. In this case, theabcparser outputs a span ofcharequal to"abc". Importantly, this span points to the data insidearr. No allocation took place.- There is 1 element remaining from
arrthat we didn't parse. We can query the remaining contents of the input with the.remaining()method, which returns a span ofchar, also pointing to the data insidearr.
If we called lookahead() instead of parse(), then result's stored type would be void. There are no parsed contents. The function lookahead() simply gives a boolean answer to the question "Can we parse this?". The 1st and 3rd static_asserts above would still apply, but the 2nd one would fail to compile.
tok3n takes inspiration from other parser combinator libraries like Boost.Spirit, and it tries to maintain a similar syntax of operator overloading.
The parser below shows the overloaded operators.
constexpr auto abc = "abc"_all;
constexpr auto xyz = "xyz"_any;
constexpr auto p = ~abc >> !xyz >> (+xyz | *abc);abcparses the characters'a','b','c'in order, and returns a span ofcharof size 3 if the parse succeedsxyzparses only one of either'x','y', or'z', and returns a span ofcharof size 1 if the parse succeeds
p is a parser that parses 3 things in order, denoted by the >> operator. The parse() function will result in a std::tuple of the types returned by the sub-parsers respectively. If any of the 3 sub-parsers fails, the whole parse fails.
~abc, the "maybe" operator, either parses anabcor it doesn't. Its result type is astd::optional. If"abc"is present, then thestd::optionalwill be engaged with a span of size 3. Otherwise the parse will still succeed, and thestd::optionalwill be disengaged.!xyz, the "not" operator, negatesxyz. This will parse any character that is not'x','y', or'z'. This!operator only works on""_anyor""_noneparses, flipping them back and forth.+xyz, the "one or more" operator, will parse one or more ofxyz, giving the result as astd::vector. This means an input of"xyzzy"would be output as astd::vectorof 5 spans of char, each span with a size of 1.*abc, the "zero or more" operator, will parse zero or more ofabc, giving the result as astd::vector. The operator*is similar to+, but it can never fail the parse.(+xyz | *abc), with the "choice" operator|, is a choice between the left side and the right side.- It first checks the left side
+xyz. If that succeeds, then we short circuit and succeed the parse of the whole choice. - If the left side fails, then we check the right side, which determines the success of the whole choice.
- This will result in a
std::variantif the left side and right side have different result types. Otherwise, if the types are the same, it will result in that same type. In this case, the result types on both sides are the same.
- It first checks the left side
There are more transformations on a parser that don't fit within operator overloading. To expand the possible set of transformations, tok3n uses "modifiers".
Each modifier m can be used on a parser p in 2 ways. These are exactly equivalent operations.
m(p)p % m
Note that the parsers are modified with the "mod" % operator, hence why this operator was chosen. The expression p % m can be read as "p modified by m".
The join modifier takes care of an annoyance displayed above.
constexpr auto p = +"xyz"_any;This parses one or more of the letters 'x', 'y', or 'z'. Unfortunately, calling p.parse("xyzzy") results in a vector of 5 spans, where each span has a size of 1. This is inconvenient to work with. Alternatively:
constexpr auto p = join(+"xyz"_any);
// +"xyz"_any % join;
constexpr auto result = p.parse("xyzzy");
static_assert(*result == "xyzzy");Using the join modifier flattens all the spans into 1 span, as long as the entire output is contiguous. It does this recursively.
Another important modifier is ignore. Take the following parsers p and q.
constexpr auto xyz = "xyz"_any;
constexpr auto p = xyz >> xyz >> xyz;
constexpr auto q = xyz >> ignore(xyz) >> xyz;The ignore modifier changes the result type into void. It ignores the result. This plays nicely with the sequence operator. Here, p will give a result with a std::tuple of 3 elements, while q will give a result with a std::tuple of 2 elements.
This also interacts with join in an interesting way.
constexpr auto pj = join(p);
constexpr auto qj = join(q);The parser pj is fine. It will flatted the tuple-of-3-spans into a single span. On the other hand, qj will never succeed a parse. Its 1st element and its 2nd element are not adjacent in memory. There is an ignored element in the middle. This means the output cannot ever be contiguous, so join will fail the parse.
Other modifiers include:
delimit(pd)does the equivalent ofp >> *(ignore(pd) >> p), except that all parsed elements are put into the samestd::vectorexactly<N>returns astd::arrayofNof the inner result typesmap<fn>transforms the result of the inner parser by the invocablefnfilter<fn>filters the result of the inner parser by the invocablefninto<T>transforms the result of the inner parser through the constructor of typeTapply<fn>,apply_filter<fn>andapply_into<fn>do the same things as the above 3 respectively, but usingstd::applyinstead ofstd::invokecompletefails the parse if the inner parser left any remaining elements in the inputname<"str">names a parser"str", which has no effect on the output, but rather it is a method of tagging specific parsers for interactions withsubexplained belowsub(...)is a special modifier that takes "substitutions" as its parameter- For example
sub(name<"str"> = ignore)is a modifier that adds theignoremodifier onto each and every inner parser (recursively) that is named"str" sub(...)is variadic, so you can pass as many substitutions at once- The effect of
p % sub(s1) % sub(s2)is identical top % sub(s1, s2)
- For example
Applying modifiers is associative. That is, for any parser p and modifiers m1 and m2, the following expressions are identical.
(p % m1) % m2p % (m1 % m2)