tokenizer

A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer

Here are 97 public repositories matching this topic...

lindera / lindera

daac-tools / vibrato

zurawiki / tiktoken-rs

guillaume-be / rust-tokenizers

daac-tools / vaporetto

garvys-org / rustfst

untitaker / html5gum

togatoga / kanpyo

AmrDeveloper / FileQL

DCjanus / cang-jie

lindera / lindera-tantivy

ShelbyJenkins / llm_utils

daac-tools / python-vibrato

PyThaiNLP / nlpo3

Systemcluster / kitoken

ehwan / C-language-Parser-In-Rust

daac-tools / python-vaporetto

reinfer / blingfire-rs

kodemartin / rustpostal

osyoyu / tantivy-tokenizer-tiny-segmenter

Related Topics