-
YYCMedia
- China WuHan
Starred repositories
Stable Diffusion web UI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Robust Speech Recognition via Large-Scale Weak Supervision
real time face swap and one-click video deepfake with only a single image
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Clone a voice in 5 seconds to generate arbitrary speech in real-time
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
A community-supported supercharged document management system: scan, index and archive all your documents
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Real-time face swap for PC streaming or video calls
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source …
Python version of the Playwright testing and automation library.
The easy-to-use and developer-friendly enterprise CMS powered by Django
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal is…
MCP Server for Computer Use in Windows
Lightweight, scriptable browser as a service with an HTTP API
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"