State-of-the-art 2D and 3D Face Analysis Project
Awesome multilingual OCR toolkits based on PaddlePaddle
Robust Speech Recognition via Large-Scale Weak Supervision
Contexts Optical Compression
A Lightweight Face Recognition and Facial Attribute Analysis
OCRmyPDF adds an OCR text layer to scanned PDF files
Speech recognition module for Python
Qwen3-Coder is the code version of Qwen3
Image polygonal annotation with Python
Open-Source Python3 tool for recognizing layouts, tables, and math
A high-quality tool for convert PDF to Markdown and JSON
Models for the spaCy Natural Language Processing (NLP) library
Industrial-strength Natural Language Processing (NLP)
A framework to enable multimodal models to operate a computer
Formula recognition based on LaTeX-OCR and ONNXRuntime
An open and fair framework for everyone to build AI agents
Multilingual Automatic Speech Recognition with word-level timestamps
Toolkit for conversational AI
Capable of understanding text, audio, vision, video
A PyTorch-based Speech Toolkit
Han Language Processing
Repo of Qwen2-Audio chat & pretrained large audio language model
Replace OpenAI GPT with another LLM in your app
Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
Build voice-based LLM agents. Modular + open source