Lists (1)
Sort Name ascending (A-Z)
Starred repositories
A Simple Python Module for German Grapheme To Phoneme Conversion
[EMNLP 2025 Findings] TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
Library for fast text representation and classification. Fix compatibility with numpy 2
Python module (C extension and plain python) implementing Aho-Corasick algorithm
Hands-on exercises for the "NLP for Dialects" MSc seminar at LMU Munich
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Scripts and metadata for the paper "Corpus-based dialectometry with topic models"
Flexible, extensible and scalable web-based speech annotation tool
Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern string search
Journey towards Fine-Tuning a Breton speaking Chat Model
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
[LT4HALA 2020] Phonetic lexicon generator and sound change applier
clefourrier / awful-ai
Forked from daviddao/awful-ai😈Awful AI is a curated list to track current scary usages of AI - hoping to raise awareness
A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https://arxiv.org/abs/2309.08351)
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Extension for pie to include taggers with their models and pre/postprocessors
A simple and efficient tool to parallelize Pandas operations on all available CPUs
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
Interactive Widgets for the Jupyter Notebook