Skip to content
View oroszgy's full-sized avatar
:octocat:
:octocat:

Organizations

@ec-doris @huspacy

Block or report oroszgy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

NLP tools

174 repositories

Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.

Rust 79 3 Updated Oct 22, 2023

Finite state dictionaries in Java

Java 131 10 Updated Feb 1, 2022

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Python 2,491 520 Updated Jun 13, 2025

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

Python 1,470 74 Updated Dec 9, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 92,232 11,560 Updated Dec 15, 2025

Efficient few-shot learning with Sentence Transformers

Jupyter Notebook 2,647 253 Updated Dec 11, 2025

All-in-one text de-duplication

Python 737 74 Updated Aug 31, 2025

A Python framework for performing information retrieval experiments, building on http://terrier.org/

Jupyter Notebook 491 71 Updated Dec 19, 2025

Terrier IR Platform

Java 269 61 Updated Dec 6, 2025

spark-based library that helps construct and query knowledge graphs from unstructured and structured data

Scala 100 10 Updated Sep 2, 2023

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Python 853 126 Updated Nov 28, 2025

Fast edit distance Python extension written in Cython/C++. Supports Levenshtein distance and Damerau Optimal String Alignment (OSA) distance.

C++ 25 5 Updated Jun 7, 2025

just a bunch of useful embeddings for scikit-learn pipelines

Python 520 16 Updated Sep 29, 2025

A Simple Bulk Labelling Tool

Python 598 50 Updated Jul 29, 2025

Gain clues from clustering!

Jupyter Notebook 318 15 Updated Jul 16, 2024

Programmatically collect normalized news from (almost) any website.

Python 2,974 285 Updated Oct 30, 2020

Augmenty is an augmentation library based on spaCy for augmenting texts.

Python 156 10 Updated May 24, 2024

Curated list of Ukrainian natural language processing (NLP) resources (corpora, pretrained models, libriaries, etc.)

223 22 Updated Nov 3, 2025

🧪 Cutting-edge experimental spaCy components and features

Python 105 19 Updated Apr 23, 2024

allennlp-light is a port of AllenNLP's core modules and nn portions into a standalone package with minimum dependencies

Python 55 4 Updated Oct 12, 2022

🤖 A PyTorch library of curated Transformer models and their composable components

Python 894 36 Updated Apr 17, 2024

🧹 Python package for text cleaning

Python 999 80 Updated May 9, 2023

Text preprocessing, representation and visualization from zero to hero.

Python 2,918 239 Updated Aug 29, 2023

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 32,037 6,637 Updated Sep 30, 2025

💫 SpaCy wrapper for ConceptNet 💫

Python 95 6 Updated Aug 17, 2023

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

Python 85 10 Updated Dec 1, 2025

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

Python 2,387 303 Updated Nov 14, 2025

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Python 7,267 871 Updated Dec 17, 2025

👪 a python library for parsing unstructured western names into name components.

Python 615 74 Updated May 15, 2025