- Budapest, Hungary
-
15:35
(UTC +01:00) - https://gyorgy.orosz.link
- in/oroszgy
Highlights
NLP tools
Source code for paper "Learning from Noisy Labels for Entity-Centric Information Extraction", EMNLP 2021
Fast computation of Krippendorff's alpha agreement measure in Python.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
prompt2model - Generate Deployable Models from Natural Language Instructions
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Explore and interpret large embeddings in your browser with interactive visualization! 📍
Github for the paper "Is ChatGPT the Ultimate Data Augmentation Algorithm" published in EMNLP findings 2023
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Module for automatic summarization of text documents and HTML pages.
Full text geoparsing/toponym resolution with event geolocation
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
Easily embed, cluster and semantically label text datasets
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Generalist and Lightweight Model for Text Classification
Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training
Next-generation Punkt sentence boundary detection with zero dependencies
FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes
Citron is an experimental quote extraction system created by BBC R&D
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.