Skip to content
View cc-cherie's full-sized avatar

Block or report cc-cherie

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces

C 8,445 458 Updated Jun 2, 2026

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems. Use when building, optimizing, or debugging agent systems that require e…

Python 16,546 1,343 Updated May 26, 2026

期货自动交易

C 8,255 1,897 Updated Feb 28, 2026

Whisper Encoder (extracted from pretrained) with a Linear on top and solve using CTC criterion

Python 7 2 Updated Jul 3, 2023

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 230 30 Updated Apr 8, 2026

Text-audio foundation model from Boson AI

Python 8,195 629 Updated Jun 5, 2026

The hub for audio AI research: papers, open models, benchmarks & datasets across audio LLMs, speech recognition, TTS, music & audio generation.

Python 932 48 Updated Jun 8, 2026

每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈

Jupyter Notebook 6,719 627 Updated May 31, 2026

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 22,473 2,303 Updated Jun 3, 2026

XphoneBR is a Brazilian portuguese transformer base grapheme-to-phoneme and normalization tool modeling library that leverages recent deep learning technology and is optimized for usage in producti…

Python 12 Updated Aug 28, 2024

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …

Python 14,500 1,475 Updated Jun 13, 2026

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Python 308 40 Updated May 16, 2025

✨✨Latest Advances on Multimodal Large Language Models

17,882 1,128 Updated May 1, 2026

An AI Hedge Fund Team

Python 60,061 10,616 Updated Jun 9, 2026

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 9,322 785 Updated Mar 26, 2026

Whisper finetuning

Python 17 4 Updated Apr 9, 2025

A Next-Generation Training Engine Built for Ultra-Large MoE Models

Python 5,152 426 Updated Jun 12, 2026

Go ahead and axolotl questions

Python 12,045 1,368 Updated Jun 13, 2026

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 14,862 2,045 Updated Nov 19, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 58,674 6,416 Updated Apr 30, 2026

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 36,710 4,098 Updated Apr 19, 2025
Jupyter Notebook 12,967 954 Updated Oct 25, 2025

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,604 1,283 Updated Feb 11, 2026

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 22,147 2,697 Updated Jan 23, 2026

Generative Models by Stability AI

Python 27,188 3,095 Updated Dec 16, 2025

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,206 333 Updated Sep 10, 2025

Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition

Jupyter Notebook 1 Updated Dec 12, 2022

This repository contains the SpeechBrain Benchmarks

Python 140 46 Updated Feb 3, 2026

Multimodal Transformer for Korean Sentiment Analysis with Audio and Text Features

Python 28 7 Updated Sep 7, 2021
Next