Gratt is a generic implementation of a Pratt parser in C# for .NET Standard.
A Pratt parser is the affectionate name for a simple parsing technique presented by Vaughn Pratt in his 1973 paper “Top down operator precedence” (TDOP).
Gratt was inspired by the design of the Java implementation presented by Bob Nystrom in his journal entry “Pratt Parsers: Expression Parsing Made Easy”. The actual inspiration was the simplification and separation of interfaces in terms of prefix (nud) and infix (led) parselets (rather than the full Java implementation):
interface PrefixParselet {
Expression parse(Parser parser, Token token);
}
interface InfixParselet {
Expression parse(Parser parser, Expression left, Token token);
}
The unit tests for Gratt re-create and test the parser for the toy language Bantam discussed in Bob's journal entry.
The unit tests also contain a complete implementation for parsing and
evaluating C# pre-processing expressions used in conditional
directives #if
and #elif
.
Unlike Bob's Java implementation, Gratt's parser is completely generic:
class Parser<TState, TKind, TToken, TPrecedence, TResult>
{
// ...
}
It can therefore work with any model of tokens (TToken
), token types
(TKind
), precedence (TPrecedence
) and result (TResult
). It can also hold
any user-defined state (TState
) that may be needed during parsing although it
is not required. An overload without any user-state simply assumes a
unit.
Unlike Bob's Java implementation, Gratt does not define any interfaces to represent (prefix or infix) parselets. Instead, it relies on them being simple functions:
// version 2.0+
public static TResult
Parse<TState, TKind, TToken, TPrecedence, TResult>(
TState state,
TPrecedence initialPrecedence, IComparer<TPrecedence> precedenceComparer,
IEqualityComparer<TKind> kindEqualityComparer,
TKind eoi, Func<TToken, Exception> eoiErrorSelector,
Func<TKind, TToken, TState, Func<TToken, Parser<TState, TKind, TToken, TPrecedence, TResult>, TResult>> prefixFunction,
Func<TKind, TToken, TState, (TPrecedence, Func<TToken, TResult, Parser<TState, TKind, TToken, TPrecedence, TResult>, TResult>)?> infixFunction,
IEnumerable<(TKind, TToken)> lexer)
The above might read a little dense so below is a brief explanation of each parameter in order:
- a user-defined state that is associated with the parser object during parsing
- the initial precedence (this is usually 0 if
TPrecedence
is anint
) - a precedence comparer
- an equality comparer that can compare two token kinds
- a token kind marking end of input (EOI)
- a function used to project an
Exception
when the EOI token is not the the last token of the tokens sequence - a prefix function
- an infix function
- a sequence of token kind and token pairs (2-tuple) yielded by a lexer implementation
A prefix function receives a token kind, a token and a user-defined state as
input and it must return a parsing function if the token kind has prefix
parsing semantics. If the token kind is invalid as a prefix, then the token and
user-defined state may be used for the purpose of providing details in a thrown
exception (e.g., while TKind
may be a simple enum
type, TToken
will
usually be a rich object containing the position of the token in the source
text among other data). The parsing function returned by the prefix function
then receives the token and the parser as arguments and produces some result
(TResult
).
An infix function receives a token kind, a token and a user-defined state as
input as arguments and optionally returns a precedence (left binding power)
and parsing function pair. The parsing function then receives the token, the
left result and the parser as arguments and produces some result (TResult
).