JeffMony

Jeff Mony JeffMony

Coding after thinking E-mail: jeffmony@163.com WeChat: LOVE_BigLi

272 followers · 51 following

Achievements

Lists (1)

Sort

🔮 Future ideas

1 repository

Starred repositories

KlingAIResearch / ComfyUI-KLingAI-API

Python 171 18 Updated Oct 24, 2025

chatfire-AI / huobao-drama

🎬 火宝短剧 - 基于AI的一站式短剧生成平台《一句话生成完整短剧，从剧本到成片全自动化》 Huobao Drama - An AI-Powered End-to-End Short Drama Generator "One Sentence to Complete Drama: Fully Automated from Script to Final Video"

TypeScript 11,767 2,225 Updated Apr 12, 2026

FireRedTeam / FireRedVAD

A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD

Python 385 26 Updated May 6, 2026

ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 1,125 86 Updated Dec 23, 2024

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 19,161 2,446 Updated Apr 7, 2026

IceClear / SeedVR2

[ICLR2026] SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

760 26 Updated Jan 27, 2026

ByteDance-Seed / SeedVR

Repo for SeedVR2 (ICLR2026) & SeedVR (CVPR2025 Highlight)

Python 1,183 70 Updated Jan 27, 2026

datawhalechina / hello-agents

📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

Python 49,720 5,985 Updated May 14, 2026

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 23,128 2,142 Updated Jan 27, 2026

OpenMOSS / MOSS-Audio

MOSS-Audio is an open-source foundation model for unified audio understanding, enabling speech, sound, music, captioning, QA, and reasoning in real-world scenarios.

Python 455 33 Updated May 9, 2026

deepseek-ai / DeepSeek-OCR-2

Visual Causal Flow

Python 2,848 247 Updated Feb 3, 2026

fribidi / fribidi

GNU FriBidi

C 420 122 Updated Apr 13, 2026

lnework / safety-audit

音频内容安全审核系统

Java 13 1 Updated Jul 29, 2022

autoclaw-cc / xiaohongshu-skills

xiaohongshu-skills

Python 1,271 183 Updated May 1, 2026

PaddlePaddle / PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 77,892 10,440 Updated May 14, 2026

Wan-Video / Wan-skills

AI Agent Skills for Wan — Enable your AI Agent to easily leverage Wan's AIGC capabilities.

Python 46 7 Updated Apr 17, 2026

meituan-longcat / LongCat-Video

Python 2,465 381 Updated May 9, 2026

TheTom / turboquant_plus

Python 6,794 907 Updated May 9, 2026

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 47,134 5,236 Updated May 6, 2026

pengzhendong / streaming-sensevoice

Pseudo Streaming SenseVoice with Hotwords

Python 450 53 Updated Mar 13, 2025

FunAudioLLM / Fun-ASR

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 1,146 111 Updated Feb 25, 2026

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 8,152 742 Updated Dec 30, 2025

QwenLM / Qwen3-ASR-Toolkit

Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support.

Python 960 93 Updated Feb 5, 2026

Quantatirsk / qwen3-asr

All in one Qwen3-ASR Server, compatible with OpenAI API

Python 279 41 Updated May 12, 2026

ggml-org / whisper.cpp

Port of OpenAI's Whisper model in C/C++

C++ 49,713 5,536 Updated May 15, 2026

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 99,524 12,189 Updated Apr 15, 2026

FireRedTeam / FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 509 32 Updated May 6, 2026

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 21,045 2,425 Updated May 3, 2026

FireRedTeam / FireRed-Image-Edit

FireRed-Image-Edit is a powerful image editing foundation model achieving open-source state-of-the-art performance with precise instruction following, high-fidelity generation, superior identity co…

Python 1,209 74 Updated Apr 3, 2026

FireRedTeam / FireRedTTS2

Long-form streaming TTS system for multi-speaker dialogue generation

Python 1,393 122 Updated Oct 26, 2025

Jeff Mony JeffMony

Lists (1)

🔮 Future ideas

Starred repositories

P2P

faceu

OpenGL