oroszgy

György Orosz oroszgy

Freelance NLP engineer

168 followers · 234 following

@ec-doris
Budapest, Hungary
16:58 (UTC +01:00)
https://gyorgy.orosz.link
in/oroszgy

Achievements

x2 x2

Achievements

x2 x2

Highlights

Developer Program Member

Organizations

Stars

NLP tools

174 repositories

tensordot / syntaxdot

Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.

Rust 79 3 Updated Oct 22, 2023

danieldk / dictomaton

Finite state dictionaries in Java

Java 131 10 Updated Feb 1, 2022

s3prl / s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Python 2,491 520 Updated Jun 13, 2025

code-kern-ai / refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

Python 1,470 74 Updated Dec 9, 2024

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 92,232 11,560 Updated Dec 15, 2025

huggingface / setfit

Efficient few-shot learning with Sentence Transformers

Jupyter Notebook 2,647 253 Updated Dec 11, 2025

ChenghaoMou / text-dedup

All-in-one text de-duplication

Python 737 74 Updated Aug 31, 2025

google-research / deduplicate-text-datasets

Rust 1,256 127 Updated Jul 30, 2024

terrier-org / pyterrier

A Python framework for performing information retrieval experiments, building on http://terrier.org/

Jupyter Notebook 491 71 Updated Dec 19, 2025

terrier-org / terrier-core

Terrier IR Platform

Java 269 61 Updated Dec 6, 2025

wisecubeai / graphster

spark-based library that helps construct and query knowledge graphs from unstructured and structured data

Scala 100 10 Updated Sep 2, 2023

mammothb / symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Python 853 126 Updated Nov 28, 2025

mammothb / editdistpy

Fast edit distance Python extension written in Cython/C++. Supports Levenshtein distance and Damerau Optimal String Alignment (OSA) distance.

C++ 25 5 Updated Jun 7, 2025

koaning / embetter

just a bunch of useful embeddings for scikit-learn pipelines

Python 520 16 Updated Sep 29, 2025

koaning / bulk

A Simple Bulk Labelling Tool

Python 598 50 Updated Jul 29, 2025

koaning / cluestar

Gain clues from clustering!

Jupyter Notebook 318 15 Updated Jul 16, 2024

kotartemiy / newscatcher

Programmatically collect normalized news from (almost) any website.

Python 2,974 285 Updated Oct 30, 2020

KennethEnevoldsen / augmenty

Augmenty is an augmentation library based on spaCy for augmenting texts.

Python 156 10 Updated May 24, 2024

osyvokon / awesome-ukrainian-nlp

Curated list of Ukrainian natural language processing (NLP) resources (corpora, pretrained models, libriaries, etc.)

223 22 Updated Nov 3, 2025

explosion / spacy-experimental

🧪 Cutting-edge experimental spaCy components and features

Python 105 19 Updated Apr 23, 2024

MaksymDel / allennlp-light

allennlp-light is a port of AllenNLP's core modules and nn portions into a standalone package with minimum dependencies

Python 55 4 Updated Oct 12, 2022

explosion / curated-transformers

🤖 A PyTorch library of curated Transformer models and their composable components

Python 894 36 Updated Apr 17, 2024

jfilter / clean-text

🧹 Python package for text cleaning

Python 999 80 Updated May 9, 2023

jbesomi / texthero

Text preprocessing, representation and visualization from zero to hero.

Python 2,918 239 Updated Aug 29, 2023

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 32,037 6,637 Updated Sep 30, 2025

JulesBelveze / concepcy

💫 SpaCy wrapper for ConceptNet 💫

Python 95 6 Updated Aug 17, 2023

JulesBelveze / bert-squeeze

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

Python 85 10 Updated Dec 1, 2025

huggingface / evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

Python 2,387 303 Updated Nov 14, 2025

MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Python 7,267 871 Updated Dec 17, 2025

datamade / probablepeople

👪 a python library for parsing unstructured western names into name components.

Python 615 74 Updated May 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

György Orosz oroszgy

Achievements

Achievements

Highlights

Organizations

Block or report oroszgy

NLP tools

tensordot / syntaxdot

danieldk / dictomaton

s3prl / s3prl

code-kern-ai / refinery

openai / whisper

huggingface / setfit

ChenghaoMou / text-dedup

google-research / deduplicate-text-datasets

terrier-org / pyterrier

terrier-org / terrier-core

wisecubeai / graphster

mammothb / symspellpy

mammothb / editdistpy

koaning / embetter

koaning / bulk

koaning / cluestar

kotartemiy / newscatcher

KennethEnevoldsen / augmenty

osyvokon / awesome-ukrainian-nlp

explosion / spacy-experimental

MaksymDel / allennlp-light

explosion / curated-transformers

jfilter / clean-text

jbesomi / texthero

facebookresearch / fairseq

JulesBelveze / concepcy

JulesBelveze / bert-squeeze

huggingface / evaluate

MaartenGr / BERTopic

datamade / probablepeople