Skip to content
View jlian2's full-sized avatar
  • Berkeley

Highlights

  • Pro

Block or report jlian2

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

For IEEE ASRU(2025)

Jupyter Notebook 12 3 Updated Jun 21, 2025

Official Implementation for our EMNLP 2025 paper: "Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation"

Python 3 Updated Aug 26, 2025

Beyond the Model: Scaling Medical Capability with a Large Verifier System

172 11 Updated Sep 3, 2025

EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems

Python 6 Updated Aug 27, 2025

For interspeech(2025)

Jupyter Notebook 3 Updated May 30, 2025

Interspeech 2025 [Project page]

Python 7 Updated Nov 4, 2025
Jupyter Notebook 10 2 Updated Oct 11, 2025

DysfluentWFST

Jupyter Notebook 15 5 Updated Nov 13, 2025

PowerFM is an open-source repository for foundation models in the power and energy domain. It both maintains original projects and collects community-contributed open-source projects, featuring fin…

31 1 Updated Nov 4, 2025

PowerWorkflow is an open-source collection of agentic workflows for power system applications. These workflows enable intelligent automation and coordination of power system operations, facilitatin…

Python 24 Updated Jul 19, 2025
Python 22 7 Updated Mar 29, 2025

A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models

Python 112 4 Updated Sep 21, 2025

YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection

Jupyter Notebook 20 2 Updated Mar 4, 2025

This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.

TypeScript 6,682 1,053 Updated Dec 15, 2025

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

Python 880 47 Updated Jun 3, 2025

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,465 180 Updated Mar 28, 2025

React app for inspecting, building and debugging with the Realtime API

JavaScript 3,525 1,392 Updated Aug 28, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 27,850 2,571 Updated Dec 26, 2025

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

Jupyter Notebook 71 4 Updated Mar 17, 2025
Jupyter Notebook 26 2 Updated Dec 4, 2024

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

Python 791 65 Updated Jul 24, 2023

Deep Articulatory Synthesis and Inversion

Python 54 7 Updated Feb 14, 2024

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 19,317 2,061 Updated Oct 21, 2025

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 180,498 46,188 Updated Dec 25, 2025

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

Python 687 98 Updated Apr 26, 2024

Neural network-based forced alignment with bidirectional attention mechanism

Python 78 8 Updated Jan 17, 2025

Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.

1,870 225 Updated Jun 27, 2022

[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Python 13,473 2,580 Updated Jun 26, 2024

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 41,221 5,219 Updated Jun 27, 2024
Next