Stars
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines
Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including CPU, AMD, and NVIDIA GPUs.
Additional resources from our AACL tutorial
Chitralekha - A video transcreation platform for Indic languages, supporting transcription, translation and voice-over
Translation models for 22 scheduled languages of India
Whisper realtime streaming for long speech-to-text transcription and translation
Various transformers for FSDP research
Accessible large language models via k-bit quantization for PyTorch.
PyScript is an open source platform for Python in the browser. Try PyScript: https://pyscript.com Examples: https://tinyurl.com/pyscript-examples Community: https://discord.gg/HxvBtukrg2
[EACL'23] MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages
Pre-trained, multilingual sequence-to-sequence models for Indian languages
Repository for the English-Hindi Codemixed to Monolingual English Parallel Corpus
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
A parallel evaluation data set of SAP software documentation with document structure annotation
An open-source Python framework for hybrid quantum-classical machine learning.
This project is real-time visualization of a network recognizing digits from user's input.
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Japanese--Russian--English News Commentary Parallel Data
Pun-GAN: Generative Adversarial Network for Pun Generation (EMNLP 2019)
Making Art with Deep Learning Workshop | ML@B
a project to visualize global weather conditions
Xlit-Crowd: Hindi-English Transliteration Corpus