jingkangqi

jingkangqi

Stars

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,406 62 Updated Apr 19, 2026

Yuan-ManX / ai-audio-datasets

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

931 91 Updated Jul 8, 2025

verl-project / verl

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,019 3,772 Updated Apr 30, 2026

eliahuhorwitz / Academic-project-page-template

A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/

JavaScript 4,831 1,067 Updated Sep 4, 2025

halsay / ASR-TTS-paper-daily

Update ASR paper everyday

Python 507 24 Updated Apr 30, 2026

moonshine-ai / moonshine

Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces

C 7,892 405 Updated Apr 25, 2026

thu-spmi / CAT

CAT is more than a CRF-based ASR toolkit: it provides a complete workflow for data-efficient end-to-end ASR, supporting CTC, CTC-CRF, RNN-T, and language-model training and inference.

Python 369 79 Updated Feb 5, 2026

kyutai-labs / delayed-streams-modeling

Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.

Python 2,908 303 Updated Jan 26, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,546 257 Updated Jan 30, 2026

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 15,914 1,657 Updated Mar 17, 2026

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …

Python 13,965 1,389 Updated Apr 29, 2026