NLP tokenizers written in Go language
-
Updated
Nov 27, 2025 - Go
A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.
NLP tokenizers written in Go language
Self-contained Japanese Morphological Analyzer written in pure Go
A multilingual command line sentence tokenizer in Golang
Lex machinary for go.
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Object mapping for golang.
A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)
XML Tokenizer is a low-memory high performance non-namespace parser library for parsing simple XML 1.0.
Meet Programming Language Interpreter
A Text Tokenizer library for Golang
🙅 Go package for detecting and removing stopwords from text.