Stars
Langflow is a powerful tool for building and deploying AI-powered agents and workflows.
Robust Speech Recognition via Large-Scale Weak Supervision
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
Easily train a good VC model with voice data <= 10 mins!
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Wan: Open and Advanced Large-Scale Video Generative Models
Android in docker solution with noVNC supported and video recording
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
Simple Online Realtime Tracking with a Deep Association Metric
ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation
使用盲水印保护创作者的知识产权using invisible watermark to protect creator's intellectual property
[CVPR2026]🚀🚀🚀Official code for the paper "YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection." *(YOLO = You Only Look Once)* 🔥🔥🔥
follow my CSDN:https://blog.csdn.net/u012465304