#linguistics

  1. lngcnv

    linguistics: display pronunciation, translate between dialects, convert between orthographies; support for multiple languages: English, Latin, Polish, Quechua, Spanish, Tikuna

    v1.10.2 3.5K #phonetic #linguistics #spelling #speech #text-processing
  2. graphannis

    new backend implementation of the ANNIS linguistic search and visualization system

    v4.1.4 480 #query-language #linguistics #graph #corpus #storage #visualization #search-and-visualization #annis #aql #operand
  3. camxes-rs

    Lojban PEG parser with semantic analysis - integrated camxes parser and tersmu semantic engine

    v1.0.0 #peg #lojban #linguistics #semantic #parser
  4. bistun-lms

    thread-safe capability engine for resolving BCP 47 language tags into actionable rendering and parsing properties (directionality, morphology, segmentation). Features a wait-free, lock-free memory pool (ArcSwap)…

    v2.0.0 #internationalization #bcp-47 #linguistics #capability-engine #localization
  5. stam

    powerful library for dealing with stand-off annotations on text. This is the Rust library.

    v0.18.7 #annotations #linguistics #standoff #text-processing
  6. annatto

    Converts linguistic data formats based on the graphANNIS data model as intermediate representation and can apply consistency tests

    v0.52.0 #linguistics #annotations #intermediate-representation #graph-annis #import-export #query-language #testing-data
  7. stam-tools

    Command-line tools for working with stand-off annotations on text (STAM)

    v0.15.6 #annotations #linguistics #standoff #text-processing
  8. analiticcl

    approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation

    v0.4.9 #spelling-correction #approximate-string-matching #linguistics
  9. stam-python

    STAM is a library for dealing with standoff annotations on text, this is the python binding

    v0.12.1 #nlp #annotations #linguistics #standoff
  10. latkerlo-jvotci

    Tools for creating and decomposing Lojban lujvo

    v2.6.2601 #lujvo #cli #lojban #linguistics
  11. bistun-core

    The authoritative Linguistic DNA models and DTOs for the Bistun LMS. Provides a high-performance, immutable contract layer for BCP 47 locale resolution, typographic traits, and linguistic metadata.

    v2.0.0 #linguistics #bcp-47 #dto #metadata #internationalization
  12. invlex-cli

    CLI tool for inverse lexicographic (a tergo) sorting; installs the invlex binary

    v0.1.0 #lexicographic #linguistics #sorting #a-tergo #cli
  13. varna

    — multilingual language engine: phoneme inventories, G2P rules, scripts, grammar, and lexicon for 50+ languages

    v1.0.0 #phoneme #multilingual #ipa #language #linguistics
  14. unimorph

    Command-line interface for UniMorph morphological data

    v0.2.1 #nlp #linguistics #morphology
  15. phonetik

    Phonetic analysis engine for English. Rhyme detection, stress scanning, meter analysis, and syllable counting with a 126K-word embedded dictionary.

    v0.3.2 #nlp #phonetic #rhyme #prosody #linguistics
  16. tamil-yaappu-analyzer

    Tamil prosody analyzer and classifier for verse compositions

    v0.1.0 #linguistics #tamil #prosody #poetry #analysis
  17. corpa

    The ripgrep of text analysis. Blazing-fast CLI for corpus-level NLP statistics.

    v0.4.11 #nlp #corpus #linguistics #text-analysis
  18. phonetics-rs

    IPA-based phonetic distance metrics: strict edit distance, listener-confusion distance, and per-phoneme acoustic and perceptual scoring. Calibrated against Mad Gab puzzle data; tunable per dialect.

    v0.3.1 #edit-distance #phonetic #ipa #linguistics #speech
  19. unimorph-cli

    Command-line interface for UniMorph morphological data

    v0.1.3 #nlp #linguistics #morphology
  20. dictutils

    Dictionary utilities for Mdict and other formats

    v0.1.2 #dictionary #mdict #linguistics #text-processing
  21. invlex

    inverse lexicographic (a tergo) sorting

    v0.1.0 #sorting #lexicographic #linguistics #text-processing #a-tergo
  22. rustling

    A blazingly fast library for computational linguistics

    v0.8.0 260 #nlp #linguistics #text-processing
  23. asca

    A linguistic sound change applier

    v0.9.3 #linguistics #change
  24. textframe

    query plain text documents by unicode offset without loading them all into memory

    v0.4.1 #text-processing #linguistics #standoff
  25. textgridde-rs

    dealing with Praat TextGrid files. MIT licensed.

    v0.1.6 #text-grid #praat #linguistics #file-format #phonetic
  26. betacode2

    A fast rust library for conversion to and from betacode

    v1.0.6 480 #betacode #linguistics #biblical-greek #beta-code
  27. phonologist

    Parse phonemes in the International Phonetic Alphabet

    v1.0.1 #ipa #phonetic #linguistics
  28. waken_snowball

    Snowball stemming algorithms for 33 languages

    v0.1.0 200 #stemming #snowball #nlp #linguistics
  29. allotax-core

    Core allotaxonometer computation: rank-turbulence divergence and related metrics

    v0.3.0 #divergence #rank-turbulence #allotaxonometer #linguistics
  30. sesdiff

    Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).

    v0.3.1 750 #levenshtein-distance #linguistics #lemmatization #text-processing
  31. pr4xis-chat

    Praxis chat engine — shared logic for CLI and WASM

    v0.3.0 #ontologies #engine #praxis #pr4xis #system #chat #wasm #linguistics #functor
  32. eliza

    natural language processing program developed by Joseph Weizenbaum in 1966

    v2.0.1 #chat-bot #linguistics #weizenbaum
  33. graphannis-capi

    C-API to the ANNIS linguistic search and visualization system

    v4.1.4 #corpus #c-api #graph-annis #linguistics #visualization #search-and-visualization #backend-of-annis
  34. rustmouth

    Rust API for praat

    v0.1.1 #praat #linguistics #phonetic #audio
  35. etym

    Queries EtymOnline.com to look up etymologies for words

    v0.0.8 140 #linguistics #etymology #cli
  36. almanaculum

    Core types and traits for analysis

    v0.1.1 #linguistics #poetry #greek
  37. graphannis-cli

    command-line interface to the new backend implementation of the ANNIS linguistic search and visualization system

    v4.1.4 #command-line-interface #linguistics #back-end #visualization #search-and-visualization #annis #backend-of-annis #corpora #corpus
  38. vn-nlp

    Vietnamese NLP library — tokenization, normalization, segmentation

    v0.1.3 #nlp #tokenize #vietnamese #linguistics
  39. graphannis-webservice

    web service to the new backend implementation of the ANNIS linguistic search and visualization system

    v4.1.4 #web-services #linguistics #graph-annis #visualization #back-end #search-and-visualization #corpora #backend-of-annis #corpus #search-service
  40. folia

    High-performance library for handling the FoLiA XML format (Format for Linguistic Annotation)

    v0.0.6 #annotations #xml #linguistics #text-processing
  41. vn-nlp-tokenize

    Vietnamese tokenization algorithms for vn-nlp

    v0.1.3 #tokenize #nlp #vietnamese #linguistics
  42. vn-nlp-segment

    Vietnamese sentence segmentation for vn-nlp

    v0.1.3 #tokenize #nlp #vietnamese #linguistics
  43. vn-nlp-normalize

    Vietnamese text normalization — diacritics, unicode NFC/NFD

    v0.1.3 #nlp #tokenize #vietnamese #linguistics
  44. stamd

    Webservice for working with stand-off annotations on text (STAM)

    v0.1.0 #annotations #linguistics #standoff #text-processing #annotation
  45. deepfrog

    A deep learning NLP suite (PoS,lemmatiser,NER) with FoLiA XML support

    v0.2.1 #xml #linguistics #annotations
  46. annis-web

    experimental version of ANNIS corpus search frontend

    v0.2.0 #experimental #web-frontend #corpus #front-end #annis #csv #web-search #linguistics #corpora
  47. syntaxdot-encoders

    Encoders for linguistic features

    v0.5.0 #syntax-dot #transformer-models #linguistics #sequence #labeling #lemmatization #biaffine-parser #part-of-speech #bert #tagging
  48. graphannis-core

    supports graph representation and generic query-functionality

    v4.1.4 490 #graph #linguistics #system #back-end #search #corpus #annis #visualization #backend-of-annis #search-and-visualization
  49. zaliznyak

    A Russian inflection library

    v0.2.0 #inflection #russian #linguistics #declension #grammar #localization
  50. ssam

    short for split sampler, splits one or more text-based input files into multiple sets using random sampling. This is useful for splitting data into a training, test and development sets, or whatever sets you desire.

    v0.2.0 #random #data-science #nlp #linguistics
  51. bibleparsing

    Read and/or validate Koine Greek parsing codes

    v0.1.4 #linguistics #parser #biblical-greek #betacode
  52. Try searching with DuckDuckGo.

  53. deepphonemizer

    G2P model (inference only)

    v1.0.0 #g2p #linguistics #phonemizer
  54. unimorph-core

    Core library for UniMorph morphological data

    v0.2.1 #nlp #linguistics #morphology
  55. phonetisaurus-g2p

    Phonemization in Rust using a finite state transducer (FST) trained with Phonetisaurus

    v0.1.1 #fst #g2p #phonetisaurus #linguistics #phonemizer
  56. praat-sys

    Low-level Rust bindings for the Praat

    v0.2.0 #linguistics #phonetic #audio
  57. enpsrlib

    English Phrase Structure Rules library

    v0.1.0 #linguistics #english #phrase #psr
  58. assessment

    that allows different types of assessments, to convert between them and to perform basic operations

    v1.0.0 #convert #linguistics #numeric #assessments #different
  59. greek-syllables

    Zero copy Ancient Greek word syllabification

    v0.1.4 #ancient-greek #syllable #linguistics #biblical-greek
  60. soundchange

    implementing sound change algorithms in Rust

    v0.0.8 #linguistics #sound
  61. rusty_word_builder

    Syllable and Word generation library written fully in Rust

    v0.6.3 #syllable #word #linguistics #conlang #language
  62. wfst4str

    Python library based on rustfst for manipulatig strings with wFSTs

    v1.0.4 #python #fst #wfst #nlp #linguistics
  63. vn-nlp-core

    Core types, traits, and errors for vn-nlp

    v0.1.3 #nlp #tokenize #vietnamese #linguistics
  64. rspanphon

    rough Rust port of the Python PanPhon library, extracts articulatory features from IPA strings and implements operations on them

    v0.1.4 #phoneme #articulatory #linguistics #nlp #phonology
  65. soundchange-english

    Reimplementation of Mark Rosenfelder's pronunciation algorithm for English

    v0.0.8 #linguistics #english
  66. reudh

    parsing and indexing word etymologies from etymonline.com

    v0.2.1 #etymonline #linguistics #english
  67. lemonwood

    Linguistics system permitting communication, bidirectional translation, etc for my Hornvale project

    v0.1.0 #hornvale #linguistics #system
  68. sticker-encoders

    Encoders for linguistic features

    v0.5.1 #label #encoder #linguistics #tree #github #syntax-dot