tokenizer

A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer

Here are 396 public repositories matching this topic...

roshan-research / hazm

lovit / soynlp

natasha / natasha

cbaziotis / ekphrasis

BLKSerene / Wordless

hplt-project / sacremoses

Dadmatech / DadmaTools

bitextor / bitextor

artitw / text2text

roy-a / Roy_VnTokenizer

fnl / syntok

mediacloud / sentence-splitter

taishi-i / nagisa

MagedSaeed / farasapy

howl-anderson / MicroTokenizer

FernandoLpz / Text-Classification-LSTMs-PyTorch

tsproisl / SoMaJo

Kensuke-Mitsuzawa / JapaneseTokenizers

textgain / grasp

bigdata-ustc / EduNLP

Related topics