-
Massachusetts Institute of Technology
- Cambridge, MA
-
04:54
(UTC -04:00) - people.csail.mit.edu/hengjui
- @hjchang87
Highlights
- Pro
Stars
Create SQL that match your selection (with explainable AI), not the other way around
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NeurIPS 2024]
A TTS model capable of generating ultra-realistic dialogue in one pass.
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
High-Quality Voice Cloning TTS for 600+ Languages
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…
Word alignments generated by the Montreal Forced Aligner for the Librispeech dataset
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.
[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
A benchmark for evaluating audio encoders on various audio tasks.
State-of-the-art pretrained music models for training, evaluation, inference
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
MIT IAP short course: Matrix Calculus for Machine Learning and Beyond
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Foundational Models for State-of-the-Art Speech and Text Translation
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
[CVPR2023] Blind Video Deflickering by Neural Filtering with a Flawed Atlas