- Budapest, Hungary
-
08:22
(UTC +01:00) - https://gyorgy.orosz.link
- in/oroszgy
Highlights
NLP tools
A machine learning tool for fishing entities
Web Service for E-Discovery Analytics
Python Implementations of Word Sense Disambiguation (WSD) Technologies.
Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction (mirror of https://codeberg.org/proycon/analiticcl)
A robust Python tool for text-based AI training and generation using GPT-2.
Coreference Resolution, Simplification and Open Relation Extraction Pipeline
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Implementation of the ClausIE information extraction system for python+spacy
👄 Fork of the language detector Lingua, with the intention to increase detection speed and reduce memory consumption
State-of-the-Art Text Embeddings
☁️ Build multimodal AI applications with cloud-native stack
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
🍳 Recipes for the Prodigy, our fully scriptable annotation tool
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Algorithms for explaining machine learning models
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…
An easy to use Neural Search Engine. Index latent vectors along with JSON metadata and do efficient k-NN search.
Beautiful visualizations of how language differs among document types.
Flexible components pairing 🤗 Transformers with ⚡ Pytorch Lightning
Context-sensitive word embeddings with subwords. In Rust.