-
lngcnv
linguistics: display pronunciation, translate between dialects, convert between orthographies; support for multiple languages: English, Latin, Polish, Quechua, Spanish, Tikuna
-
graphannis
new backend implementation of the ANNIS linguistic search and visualization system
-
camxes-rs
Lojban PEG parser with semantic analysis - integrated camxes parser and tersmu semantic engine
-
bistun-lms
thread-safe capability engine for resolving BCP 47 language tags into actionable rendering and parsing properties (directionality, morphology, segmentation). Features a wait-free, lock-free memory pool (ArcSwap)…
-
stam
powerful library for dealing with stand-off annotations on text. This is the Rust library.
-
annatto
Converts linguistic data formats based on the graphANNIS data model as intermediate representation and can apply consistency tests
-
stam-tools
Command-line tools for working with stand-off annotations on text (STAM)
-
analiticcl
approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
-
stam-python
STAM is a library for dealing with standoff annotations on text, this is the python binding
-
latkerlo-jvotci
Tools for creating and decomposing Lojban lujvo
-
bistun-core
The authoritative Linguistic DNA models and DTOs for the Bistun LMS. Provides a high-performance, immutable contract layer for BCP 47 locale resolution, typographic traits, and linguistic metadata.
-
invlex-cli
CLI tool for inverse lexicographic (a tergo) sorting; installs the
invlexbinary -
varna
— multilingual language engine: phoneme inventories, G2P rules, scripts, grammar, and lexicon for 50+ languages
-
unimorph
Command-line interface for UniMorph morphological data
-
phonetik
Phonetic analysis engine for English. Rhyme detection, stress scanning, meter analysis, and syllable counting with a 126K-word embedded dictionary.
-
tamil-yaappu-analyzer
Tamil prosody analyzer and classifier for verse compositions
-
corpa
The ripgrep of text analysis. Blazing-fast CLI for corpus-level NLP statistics.
-
phonetics-rs
IPA-based phonetic distance metrics: strict edit distance, listener-confusion distance, and per-phoneme acoustic and perceptual scoring. Calibrated against Mad Gab puzzle data; tunable per dialect.
-
unimorph-cli
Command-line interface for UniMorph morphological data
-
dictutils
Dictionary utilities for Mdict and other formats
-
invlex
inverse lexicographic (a tergo) sorting
-
rustling
A blazingly fast library for computational linguistics
-
asca
A linguistic sound change applier
-
textframe
query plain text documents by unicode offset without loading them all into memory
-
textgridde-rs
dealing with Praat TextGrid files. MIT licensed.
-
betacode2
A fast rust library for conversion to and from betacode
-
phonologist
Parse phonemes in the International Phonetic Alphabet
-
waken_snowball
Snowball stemming algorithms for 33 languages
-
allotax-core
Core allotaxonometer computation: rank-turbulence divergence and related metrics
-
sesdiff
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
-
pr4xis-chat
Praxis chat engine — shared logic for CLI and WASM
-
eliza
natural language processing program developed by Joseph Weizenbaum in 1966
-
graphannis-capi
C-API to the ANNIS linguistic search and visualization system
-
rustmouth
Rust API for praat
-
etym
Queries EtymOnline.com to look up etymologies for words
-
almanaculum
Core types and traits for analysis
-
graphannis-cli
command-line interface to the new backend implementation of the ANNIS linguistic search and visualization system
-
vn-nlp
Vietnamese NLP library — tokenization, normalization, segmentation
-
graphannis-webservice
web service to the new backend implementation of the ANNIS linguistic search and visualization system
-
folia
High-performance library for handling the FoLiA XML format (Format for Linguistic Annotation)
-
vn-nlp-tokenize
Vietnamese tokenization algorithms for vn-nlp
-
vn-nlp-segment
Vietnamese sentence segmentation for vn-nlp
-
vn-nlp-normalize
Vietnamese text normalization — diacritics, unicode NFC/NFD
-
stamd
Webservice for working with stand-off annotations on text (STAM)
-
deepfrog
A deep learning NLP suite (PoS,lemmatiser,NER) with FoLiA XML support
-
annis-web
experimental version of ANNIS corpus search frontend
-
syntaxdot-encoders
Encoders for linguistic features
-
graphannis-core
supports graph representation and generic query-functionality
-
zaliznyak
A Russian inflection library
-
ssam
short for split sampler, splits one or more text-based input files into multiple sets using random sampling. This is useful for splitting data into a training, test and development sets, or whatever sets you desire.
-
bibleparsing
Read and/or validate Koine Greek parsing codes
-
deepphonemizer
G2P model (inference only)
-
unimorph-core
Core library for UniMorph morphological data
-
phonetisaurus-g2p
Phonemization in Rust using a finite state transducer (FST) trained with Phonetisaurus
-
praat-sys
Low-level Rust bindings for the Praat
-
enpsrlib
English Phrase Structure Rules library
-
assessment
that allows different types of assessments, to convert between them and to perform basic operations
-
greek-syllables
Zero copy Ancient Greek word syllabification
-
soundchange
implementing sound change algorithms in Rust
-
rusty_word_builder
Syllable and Word generation library written fully in Rust
-
wfst4str
Python library based on rustfst for manipulatig strings with wFSTs
-
vn-nlp-core
Core types, traits, and errors for vn-nlp
-
rspanphon
rough Rust port of the Python PanPhon library, extracts articulatory features from IPA strings and implements operations on them
-
soundchange-english
Reimplementation of Mark Rosenfelder's pronunciation algorithm for English
-
reudh
parsing and indexing word etymologies from etymonline.com
-
lemonwood
Linguistics system permitting communication, bidirectional translation, etc for my Hornvale project
-
sticker-encoders
Encoders for linguistic features
Try searching with DuckDuckGo.