Stars
Langflow is a powerful tool for building and deploying AI-powered agents and workflows.
real time face swap and one-click video deepfake with only a single image
scikit-learn: machine learning in Python
A natural language interface for computers
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Instant voice cloning by MIT and MyShell. Audio foundation model.
Multi-agent framework, runtime and control plane. Built for speed, privacy, and scale.
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous …
[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
🤗 smolagents: a barebones library for agents that think in code.
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
🔥 MaxKB is an open-source platform for building enterprise-grade agents. 强大易用的开源企业级智能体平台。
OCR, layout analysis, reading order, table recognition in 90+ languages
A Deep Learning based project for colorizing and restoring old images (and video!)
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Automate browser based workflows with AI
Lets make video diffusion practical!
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
Question and Answer based on Anything.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Minimal reproduction of DeepSeek R1-Zero