Skip to content
View oroszgy's full-sized avatar
:octocat:
:octocat:

Organizations

@ec-doris @huspacy

Block or report oroszgy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

NLP tools

174 repositories

Source code for paper "Learning from Noisy Labels for Entity-Centric Information Extraction", EMNLP 2021

Python 55 7 Updated Dec 11, 2021

Fast computation of Krippendorff's alpha agreement measure in Python.

Python 153 17 Updated Dec 1, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 43,961 5,859 Updated Aug 16, 2024

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

C++ 2,544 297 Updated Mar 11, 2024

prompt2model - Generate Deployable Models from Natural Language Instructions

Python 2,006 183 Updated Dec 29, 2024

Python package of Tomoto, the Topic Modeling Tool

C++ 586 63 Updated Aug 7, 2024

Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.

Rust 177 19 Updated Jul 11, 2025

Explore and interpret large embeddings in your browser with interactive visualization! 📍

TypeScript 509 35 Updated Aug 2, 2025

Github for the paper "Is ChatGPT the Ultimate Data Augmentation Algorithm" published in EMNLP findings 2023

Python 1 Updated Oct 17, 2023

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets

Python 225 21 Updated Nov 16, 2024

Module for automatic summarization of text documents and HTML pages.

Python 3,654 540 Updated Nov 24, 2025

Full text geoparsing/toponym resolution with event geolocation

Python 81 21 Updated Dec 9, 2025

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

Python 1,211 75 Updated Dec 5, 2025

Easily embed, cluster and semantically label text datasets

Python 586 46 Updated Mar 28, 2024

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024

Python 2,631 240 Updated Dec 20, 2025

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 2,989 222 Updated Dec 15, 2025

Generalist and Lightweight Model for Text Classification

Python 167 15 Updated Dec 3, 2025

Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training

Python 74 12 Updated Oct 18, 2025

Next-generation Punkt sentence boundary detection with zero dependencies

Python 26 1 Updated Nov 18, 2025

FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes

Python 213 22 Updated Sep 18, 2025

Citron is an experimental quote extraction system created by BBC R&D

Python 36 7 Updated Dec 14, 2021

Nomic Developer API SDK

Python 1,859 204 Updated Nov 11, 2025

A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

Python 18,046 1,291 Updated Nov 27, 2025

Unified Schema-Based Information Extraction

Python 391 40 Updated Dec 19, 2025