-
Duke University
- Durham, NC
- yueqianlin.com
- @YueqianL
- in/yueqian-lin
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Starred repositories
Qwen-TTS offers a robust voice synthesis service using FastAPI, supporting bilingual and dialect options. Explore seamless audio generation on GitHub! 🚀🌟
Text-audio foundation model from Boson AI
This repository contains the code and tables from land use change and land occupation emissions
[HPCA 2026] FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing
The baselines of ARC-Challenge-Interspeech2026
A framework for efficient model inference with omni-modality models
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
BirdNET analyzer for scientific audio data processing.
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
[NeurIPS'25] KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
A comprehensive framework to test audio comprehension of Large Audio Language Models.
StreamingVLM: Real-Time Understanding for Infinite Video Streams
2026 AI/ML internship & new graduate job list updated daily
Post-training with Tinker
This is the official Python version of Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.
Lightweight coding agent that runs in your terminal
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
Reference PyTorch implementation and models for DINOv3
Kimi K2 is the large language model series developed by Moonshot AI team
Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.
Use Claude Code as the foundation for coding infrastructure, allowing you to decide how to interact with the model while enjoying updates from Anthropic.