Lists (13)
Sort Name ascending (A-Z)
Stars
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
ettingshausen / CheckChrome
Forked from SukkaW/CheckChrome🌐 Yet another chrome offline package checker
Multilingual Document Layout Parsing in a Single Vision-Language Model
📱 Display and control your Android device graphically with scrcpy.
Dead simple FLUX LoRA training UI with LOW VRAM support
A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频
an extremely simple tool for separating vocals and background music, completely localized for web operation, using 2stems/4stems/5stems models 这是一个极简的人声和背景音乐分离工具,本地化网页操作,无需连接外网
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Faster Whisper transcription with CTranslate2
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Production-ready platform for agentic workflow development.
This is a multi-character, ultra-personalized StoryTeller. It includes: 1) efficiently and accurately build multi-character voice library. 2) Effective large model prompts that use the large model …
This is a speech interaction system built on an open-source model, integrating ASR, LLM, and TTS in sequence. The ASR model is SenceVoice, the LLM models are QWen2.5-0.5B/1.5B, and there are three …
🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.
A docker free offline version for HeyGem; Python and Linux is all you need!
Implements harmful/harmless refusal removal using pure HF Transformers
An open-source AI agent that brings the power of Gemini directly into your terminal.
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
Deploy a Gemini multimodal chat website in 10 seconds, Severless! 只需准备一个Gemini API Key,10秒即可部署一个Gemini多模态对话的网站。
Brave browser for Android, iOS, Linux, macOS, Windows.
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
AI & parametric QR code generator. AI & 参数化二维码生成器。https://qrbtf.com