Skip to content
View JeffMony's full-sized avatar

Block or report JeffMony

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

🎬 火宝短剧 - 基于AI的一站式短剧生成平台 《一句话生成完整短剧,从剧本到成片全自动化》 Huobao Drama - An AI-Powered End-to-End Short Drama Generator "One Sentence to Complete Drama: Fully Automated from Script to Final Video"

TypeScript 11,767 2,225 Updated Apr 12, 2026

A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD

Python 385 26 Updated May 6, 2026

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 1,125 86 Updated Dec 23, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 19,161 2,446 Updated Apr 7, 2026

[ICLR2026] SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

760 26 Updated Jan 27, 2026

Repo for SeedVR2 (ICLR2026) & SeedVR (CVPR2025 Highlight)

Python 1,183 70 Updated Jan 27, 2026

📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

Python 49,720 5,985 Updated May 14, 2026

Contexts Optical Compression

Python 23,128 2,142 Updated Jan 27, 2026

MOSS-Audio is an open-source foundation model for unified audio understanding, enabling speech, sound, music, captioning, QA, and reasoning in real-world scenarios.

Python 455 33 Updated May 9, 2026

Visual Causal Flow

Python 2,848 247 Updated Feb 3, 2026

GNU FriBidi

C 420 122 Updated Apr 13, 2026

音频内容安全审核系统

Java 13 1 Updated Jul 29, 2022

xiaohongshu-skills

Python 1,271 183 Updated May 1, 2026

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 77,892 10,440 Updated May 14, 2026

AI Agent Skills for Wan — Enable your AI Agent to easily leverage Wan's AIGC capabilities.

Python 46 7 Updated Apr 17, 2026

Open-Source Frontier Voice AI

Python 47,134 5,236 Updated May 6, 2026

Pseudo Streaming SenseVoice with Hotwords

Python 450 53 Updated Mar 13, 2025

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 1,146 111 Updated Feb 25, 2026

Multilingual Voice Understanding Model

Python 8,152 742 Updated Dec 30, 2025

Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support.

Python 960 93 Updated Feb 5, 2026

All in one Qwen3-ASR Server, compatible with OpenAI API

Python 279 41 Updated May 12, 2026

Port of OpenAI's Whisper model in C/C++

C++ 49,713 5,536 Updated May 15, 2026

Robust Speech Recognition via Large-Scale Weak Supervision

Python 99,524 12,189 Updated Apr 15, 2026

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 509 32 Updated May 6, 2026

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 21,045 2,425 Updated May 3, 2026

FireRed-Image-Edit is a powerful image editing foundation model achieving open-source state-of-the-art performance with precise instruction following, high-fidelity generation, superior identity co…

Python 1,209 74 Updated Apr 3, 2026

Long-form streaming TTS system for multi-speaker dialogue generation

Python 1,393 122 Updated Oct 26, 2025
Next