tokenizer

A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer

Here are 101 public repositories matching this topic...

lindera / lindera

daac-tools / vibrato

zurawiki / tiktoken-rs

guillaume-be / rust-tokenizers

daac-tools / vaporetto

garvys-org / rustfst

untitaker / html5gum

togatoga / kanpyo

AmrDeveloper / FileQL

nooscraft / tokuin

DCjanus / cang-jie

lindera / lindera-tantivy

ShelbyJenkins / llm_utils

daac-tools / python-vibrato

Systemcluster / kitoken

PyThaiNLP / nlpo3

ehwan / C-language-Parser-In-Rust

daac-tools / python-vaporetto

Mattbusel / Every-Other-Token

reinfer / blingfire-rs

Related topics