- Budapest, Hungary
-
16:58
(UTC +01:00) - https://gyorgy.orosz.link
- in/oroszgy
Highlights
NLP tools
Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.
Self-Supervised Speech Pre-training and Representation Learning Toolkit
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
Robust Speech Recognition via Large-Scale Weak Supervision
Efficient few-shot learning with Sentence Transformers
A Python framework for performing information retrieval experiments, building on http://terrier.org/
spark-based library that helps construct and query knowledge graphs from unstructured and structured data
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Fast edit distance Python extension written in Cython/C++. Supports Levenshtein distance and Damerau Optimal String Alignment (OSA) distance.
just a bunch of useful embeddings for scikit-learn pipelines
Programmatically collect normalized news from (almost) any website.
Augmenty is an augmentation library based on spaCy for augmenting texts.
Curated list of Ukrainian natural language processing (NLP) resources (corpora, pretrained models, libriaries, etc.)
🧪 Cutting-edge experimental spaCy components and features
allennlp-light is a port of AllenNLP's core modules and nn portions into a standalone package with minimum dependencies
🤖 A PyTorch library of curated Transformer models and their composable components
Text preprocessing, representation and visualization from zero to hero.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
👪 a python library for parsing unstructured western names into name components.