Starred repositories
Sharp Monocular View Synthesis in Less Than a Second
This repository allows reproduction of Poetiq's record-breaking submission to the ARC-AGI-1 and ARC-AGI-2 benchmarks.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
An intelligent load balancer for LM Studio that distributes requests across multiple loaded language models, optimizing resource utilization and response times.
Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems by Antonio Gulli
🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch or based on seed data.
A lightweight LMM-based Document Parsing Model
Toolkit for linearizing PDFs for LLM datasets/training
Data and tools for generating and inspecting OLMo pre-training data.
Multilingual Document Layout Parsing in a Single Vision-Language Model
OCR model that handles complex tables, forms, handwriting with full layout.
A Python Notebook working with Mistral's API to process a PDF document into an accessible HTML document
A Dockerized python Script to fetch Garmin health data and populate that in a InfluxDB Database, for visualization long term health trends with Grafana
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
Evolution Pretraining Fully in Int Formats
The Semantic Infrastructure for AI Apps
📑 PageIndex: Document Index for Reasoning-based RAG
Tensorlake is a Document Ingestion API and a serverless platform for building data processing and orchestration APIs
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
PipesHub is a fully extensible and explainable workplace AI platform for enterprise search and workflow automation
Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Universal LLM Deployment Engine with ML Compilation
Analytics, Versioning and ETL for multimodal data: video, audio, PDFs, images