Skip to content
View jingkangqi's full-sized avatar

Block or report jingkangqi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,406 62 Updated Apr 19, 2026

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

931 91 Updated Jul 8, 2025

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,019 3,772 Updated Apr 30, 2026

A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/

JavaScript 4,831 1,067 Updated Sep 4, 2025

Update ASR paper everyday

Python 507 24 Updated Apr 30, 2026

Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces

C 7,892 405 Updated Apr 25, 2026

CAT is more than a CRF-based ASR toolkit: it provides a complete workflow for data-efficient end-to-end ASR, supporting CTC, CTC-CRF, RNN-T, and language-model training and inference.

Python 369 79 Updated Feb 5, 2026

Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.

Python 2,908 303 Updated Jan 26, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,546 257 Updated Jan 30, 2026

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 15,914 1,657 Updated Mar 17, 2026

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …

Python 13,965 1,389 Updated Apr 29, 2026

A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.

Python 2,263 219 Updated Apr 13, 2026

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation

Python 345 43 Updated Jan 1, 2025

Tools for handling multimodal data in machine learning projects.

Python 1,127 273 Updated Apr 25, 2026
Python 1,406 409 Updated Apr 12, 2026

📚 从零开始构建大模型

Jupyter Notebook 29,743 2,785 Updated Mar 16, 2026

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

Jupyter Notebook 30,194 2,970 Updated Apr 24, 2026

transform-average-concatenate (TAC) method for end-to-end microphone permutation and number invariant ad-hoc beamforming.

Python 305 59 Updated Jun 15, 2021

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 242 14 Updated Dec 18, 2025

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Python 670 84 Updated Dec 27, 2023

Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models

1,096 42 Updated Mar 15, 2026