Skip to content
View marcoyang1998's full-sized avatar
  • University of Cambridge
  • Cambridge

Highlights

  • Pro

Block or report marcoyang1998

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

XARES-LLM

Python 34 1 Updated Dec 19, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,499 213 Updated Dec 16, 2025

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 744 104 Updated Dec 2, 2025

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 185 4 Updated Dec 13, 2025

🤗 R1-AQA Model: mispeech/r1-aqa

Python 311 27 Updated Mar 28, 2025

This repository is the official implementation of the ECAI 2024 conference paper SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

Python 68 4 Updated Aug 13, 2024

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Python 178 11 Updated Sep 1, 2025

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 4,004 348 Updated Jan 8, 2025

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Python 213 12 Updated Sep 10, 2024

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,079 1,088 Updated Nov 18, 2024

SALMONN family: A suite of advanced multi-modal LLMs

1,372 112 Updated Sep 28, 2025

Inference code for Llama models

Python 58,997 9,814 Updated Jan 26, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,153 31,512 Updated Dec 23, 2025
Python 4 3 Updated Apr 25, 2023

Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup

Python 79 15 Updated Jun 30, 2025

Tools for handling multimodal data in machine learning projects.

Python 1,095 257 Updated Dec 15, 2025

Speech-to-text server framework with next-gen Kaldi

C++ 848 137 Updated Dec 23, 2025
Python 3 Updated Nov 28, 2025

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…

C++ 9,421 1,040 Updated Dec 23, 2025

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, L…

C++ 1,587 200 Updated Oct 20, 2025
Python 44 10 Updated Nov 2, 2023

kaldi-asr/kaldi is the official location of the Kaldi project.

Shell 15,277 5,366 Updated Sep 22, 2025
Python 1,312 385 Updated Nov 28, 2025

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Cuda 1,296 232 Updated Nov 19, 2025

End-to-End Speech Processing Toolkit

Python 9,652 2,364 Updated Dec 16, 2025

:octocat: personal website + blog for every github user

JavaScript 6,707 685 Updated Feb 19, 2022

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Python 1,012 232 Updated Jul 8, 2019

Tensorflow r2.1 reimplementation of Model-Agnostic Meta-Learning

Python 21 3 Updated May 8, 2020

Agent-based influenza epidemic model

C++ 21 9 Updated Sep 8, 2020

深度学习与PyTorch入门实战视频教程 配套源代码和PPT

Python 3,094 1,322 Updated Nov 17, 2019
Next