-
Sr. Staff Member, Apple
-
12:55
(UTC -07:00) - huckiyang.github.io/
- @huckiyang
- channel/UCSj3hCBIds5BpyO7A4F3l7A
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Stars
JAX implementation of configurable LLM distillation training
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Gemma open-weight LLM library, from Google DeepMind
A general purpose scientific writer
Kosmos: An AI Scientist for Autonomous Discovery - An implementation and adaptation to be driven by Claude Code or API - Based on the Kosmos AI Paper - https://arxiv.org/abs/2511.02824
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
[ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
GPT-4o-level, real-time spoken dialogue system.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Code for the paper "FastAdaSP: An Efficient Multitask Inference Framework for Large Speech Language Models". @ EMNLP'24(Oral)
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
[EMNLP'24] Code and data for paper "Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level"
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals (EMNLP 2024)
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation"
Chronos: Pretrained Models for Time Series Forecasting
toLLMatch🔪: Context-aware LLM-based simultaneous translation