marcoyang1998

Xiaoyu Yang marcoyang1998

Speech recognition, Multi model

31 followers · 6 following

University of Cambridge
Cambridge

Achievements

x2 x2

Achievements

x2 x2

Highlights

Stars

xiaomi-research / xares-llm

XARES-LLM

Python 34 1 Updated Dec 19, 2025

facebookresearch / omnilingual-asr

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,499 213 Updated Dec 16, 2025

k2-fsa / ZipVoice

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 744 104 Updated Dec 2, 2025

ddlBoJack / MMAR

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 185 4 Updated Dec 13, 2025

xiaomi-research / r1-aqa

🤗 R1-AQA Model: mispeech/r1-aqa

Python 311 27 Updated Mar 28, 2025

XiaoMi / subllm

This repository is the official implementation of the ECAI 2024 conference paper SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

Python 68 4 Updated Aug 13, 2024

SpeechColab / GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Python 178 11 Updated Sep 1, 2025

huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 4,004 348 Updated Jan 8, 2025

k2-fsa / libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Python 213 12 Updated Sep 10, 2024

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,079 1,088 Updated Nov 18, 2024

bytedance / SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

1,372 112 Updated Sep 28, 2025

meta-llama / llama

Inference code for Llama models

Python 58,997 9,814 Updated Jan 26, 2025

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,153 31,512 Updated Dec 23, 2025

k2-fsa / divide_lm

Python 4 3 Updated Apr 25, 2023

k2-fsa / text_search

Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup

Python 79 15 Updated Jun 30, 2025

lhotse-speech / lhotse

Tools for handling multimodal data in machine learning projects.

Python 1,095 257 Updated Dec 15, 2025

k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi

C++ 848 137 Updated Dec 23, 2025

marcoyang1998 / icefall

Forked from k2-fsa/icefall

Python 3 Updated Nov 28, 2025

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…

C++ 9,421 1,040 Updated Dec 23, 2025

k2-fsa / sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, L…

C++ 1,587 200 Updated Oct 20, 2025