State-of-the-art 2D and 3D Face Analysis Project
Robust Speech Recognition via Large-Scale Weak Supervision
Awesome multilingual OCR toolkits based on PaddlePaddle
Contexts Optical Compression
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
A Lightweight Face Recognition and Facial Attribute Analysis
OCRmyPDF adds an OCR text layer to scanned PDF files
Speech recognition module for Python
Open-Source Python3 tool for recognizing layouts, tables, and math
Ready-to-use OCR with 80+ supported languages
Qwen3-Coder is the code version of Qwen3
A high-quality tool for convert PDF to Markdown and JSON
Models for the spaCy Natural Language Processing (NLP) library
Industrial-strength Natural Language Processing (NLP)
Image polygonal annotation with Python
Capable of understanding text, audio, vision, video
Library for OCR-related tasks powered by Deep Learning
A framework to enable multimodal models to operate a computer
Repo of Qwen2-Audio chat & pretrained large audio language model
A PyTorch-based Speech Toolkit
Toolkit for conversational AI
An open and fair framework for everyone to build AI agents
kaldi-asr/kaldi is the official location of the Kaldi project
Formula recognition based on LaTeX-OCR and ONNXRuntime
Replace OpenAI GPT with another LLM in your app