Stars
Nyarlth / higgs-audio_quantized
Forked from boson-ai/higgs-audioQuantized text-audio foundation model from Boson AI
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
This package loads the espeak-ng shared library so it will be available for other libraries.
Live2D Library for Python (C++ impl): Supports model loading, lip-sync, basic face rigging, and precise click test.
An implementation of MeloTTS by onnxruntime
A passive recording project allows you to have complete control over your data. Automatically take screenshots of all your screens, index them, and save them locally.
gradio WebUI for AdvancedLivePortrait
洛曦 数字人视频播放器,带HTTP API,使用gradio api对接Easy-Wav2Lip、Sadtalker、GeneFacePlusPlus、MuseTalk,也可以用于播放本地视频
The fastest digital human algorithm, now on your desktop.
GOT-OCR的GUI版本,提供OCR、导出PDF、批处理等功能,但不提供训练功能
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
[ICLR 2025] Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Industry leading face manipulation platform
zero-shot voice conversion & singing voice conversion, with real-time support
An easy-to-use web framework. Supports both WSGI and ASGI modes. Gevent or asyncio, this is the question.
GPT-SoVITS-V2模型,合并了官方的一些PR,包含但不限于:参考音频自动填充,字幕同步,SillyTavern酒馆接入等功能
一个基于Flask实现的RWKV_Role_Playing项目的API。