Skip to content
View erip's full-sized avatar
  • Fairfax, VA

Block or report erip

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Framework based on a vector dabase to store, manage and curate large image datasets

Python 83 6 Updated Jan 29, 2026

An English lexical database from the Big 🍎, let's go Mets baby love da Mets

5 Updated Oct 10, 2025

FAIR Sequence Modeling Toolkit 2

Python 1,128 137 Updated Mar 31, 2026

A library for data streaming and augmentation

Python 21 4 Updated May 5, 2025

Code for SaGe subword tokenizer (EACL 2023)

Python 28 7 Updated Nov 30, 2024

This repository contains an extension of fairseq for pixel / visual representations of text for machine translation.

Python 37 5 Updated Feb 2, 2024

A toolkit to create, launch and monitor SLURM jobs over existing python scripts.

Python 12 2 Updated May 13, 2024

Foundation Architecture for (M)LLMs

Python 3,132 224 Updated Apr 11, 2024

remote pbcopy over ssh

Go 22 1 Updated Mar 24, 2024

A tool for holistic analysis of language generations systems

Python 471 58 Updated Sep 22, 2025

MAFAND-MT

Jupyter Notebook 62 31 Updated Jul 9, 2024

Open information and community for machine translation

HTML 81 65 Updated Mar 30, 2026

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Python 1,251 174 Updated Apr 1, 2026

Code and data for the IWSLT 2022 shared task on Formality Control for SLT

Ruby 22 6 Updated May 24, 2023

Cross language information retrieval pipeline

Python 19 6 Updated Jan 12, 2026

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,394 775 Updated Mar 30, 2026

Learned string similarity for entity names using optimal transport.

Python 35 3 Updated Nov 17, 2020

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

Python 1,266 151 Updated Jul 24, 2025

State-of-the-Art Text Embeddings

Python 18,482 2,769 Updated Mar 25, 2026

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 5,990 633 Updated Mar 29, 2026

This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"

Python 1,552 426 Updated Aug 27, 2021

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing…

MDX 24,676 2,690 Updated Apr 1, 2026

Models, data loaders and abstractions for language processing, powered by PyTorch

Python 3,562 815 Updated Sep 10, 2025

A data augmentations library for audio, image, text, and video.

Python 5,070 311 Updated Mar 31, 2026

OPUS-CAT is a collection of software which make it possible to OPUS-MT neural machine translation models in professional translation. OPUS-CAT includes a local offline MT engine and a collection of…

C# 85 12 Updated Feb 4, 2025

skweak: A software toolkit for weak supervision applied to NLP tasks

Python 927 77 Updated Sep 2, 2024

Document Layout Analysis

Python 404 33 Updated Mar 27, 2026

document image degradation

Jupyter Notebook 1 Updated May 18, 2020

A Unified Toolkit for Deep Learning Based Document Image Analysis

Python 5,692 529 Updated Aug 15, 2024

A PyTorch-based Speech Toolkit

Python 11,404 1,676 Updated Mar 31, 2026
Next