Skip to content
View oroszgy's full-sized avatar
:octocat:
:octocat:

Organizations

@ec-doris @huspacy

Block or report oroszgy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

NLP tools

174 repositories

A machine learning tool for fishing entities

Java 266 24 Updated May 23, 2025

Web Service for E-Discovery Analytics

Python 78 19 Updated Jun 21, 2022

Python Implementations of Word Sense Disambiguation (WSD) Technologies.

Python 748 131 Updated Jul 29, 2022

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

Python 154 11 Updated Dec 19, 2025

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

Python 287 43 Updated Oct 14, 2025

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction (mirror of https://codeberg.org/proycon/analiticcl)

Rust 37 4 Updated Oct 3, 2025

A robust Python tool for text-based AI training and generation using GPT-2.

Python 1,843 220 Updated Jul 14, 2023

LexNLP by LexPredict

Jupyter Notebook 757 194 Updated May 27, 2024

Coreference Resolution, Simplification and Open Relation Extraction Pipeline

Java 137 38 Updated Jun 24, 2021

Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13

Python 198 24 Updated Sep 6, 2025

Data augmentation for NLP

Jupyter Notebook 4,637 473 Updated Jun 24, 2024

Implementation of the ClausIE information extraction system for python+spacy

Python 226 36 Updated Aug 8, 2022

👄 Fork of the language detector Lingua, with the intention to increase detection speed and reduce memory consumption

Kotlin 6 2 Updated Nov 28, 2025

State-of-the-Art Text Embeddings

Python 18,030 2,720 Updated Dec 22, 2025

☁️ Build multimodal AI applications with cloud-native stack

Python 21,808 2,239 Updated Mar 24, 2025

jiant is an nlp toolkit

Python 1,674 297 Updated Jul 6, 2023

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Python 5,068 334 Updated Sep 12, 2025

🍳 Recipes for the Prodigy, our fully scriptable annotation tool

Jupyter Notebook 504 114 Updated Aug 4, 2024

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 6,721 550 Updated Jul 11, 2024

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Rust 27,762 1,943 Updated Dec 22, 2025

Algorithms for explaining machine learning models

Python 2,602 263 Updated Oct 17, 2025

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

Python 4,788 464 Updated Dec 15, 2025

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Python 180 16 Updated Jun 6, 2025

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Jupyter Notebook 2,048 210 Updated Jan 9, 2024

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…

MDX 23,698 2,535 Updated Dec 19, 2025

An easy to use Neural Search Engine. Index latent vectors along with JSON metadata and do efficient k-NN search.

HTML 379 25 Updated May 6, 2024

Beautiful visualizations of how language differs among document types.

Python 2,327 288 Updated Apr 29, 2025

Flexible components pairing 🤗 Transformers with ⚡ Pytorch Lightning

Python 613 75 Updated Nov 21, 2022

💥 Explosion Assets

45 5 Updated Dec 10, 2023

Context-sensitive word embeddings with subwords. In Rust.

Rust 89 5 Updated Oct 20, 2023