-
Shanghai Jiao Tong University
- No. 800, Dongchuan Road, Shanghai,China
-
04:54
(UTC +08:00)
Stars
An unofficial LaTeX Beamer theme for Shanghai Innovation Institute (SII).
中国理工科专业研究生毕业论文写作skill / graduate-thesis-polish skill
slime is an LLM post-training framework for RL Scaling.
AI agent toolkit: unified LLM API, agent loop, TUI, coding agent CLI
由 LLM Agent 驱动的本地 Markdown 文献库管理与学术综述自动化系统
A curated, continuously updated reading list, paper blogs, and resources for World Action Models (WAMs) in embodied AI.
Plume is a browser IDE that runs entirely on your own machine, built for academic paper writing.
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting…
分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
This project aims to replicate mainstream open-source model architectures with limited computational resources, implementing mini models with 100-200M parameters.
科研写作助手 (Research Writing Assistant)
MOSS-Audio is an open-source foundation model for unified audio understanding, enabling speech, sound, music, captioning, QA, and reasoning in real-world scenarios.
A construction kit for reinforcement learning environment management.
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
Nex General Agentic Data Pipeline, an end-to-end pipeline for generating high-quality agentic training data.
Music Language Model Generation, Optimization, and Practice
Scan the Hallucination Citation of Academic papers. Convert second-hand citation to official version
MOVA: Towards Scalable and Synchronized Video–Audio Generation
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
A tool for better use of Inspire platform (Beta: Codeberg version is more up-to-date)
Official implementation of ACL'26 (findings) paper WESR (Word-level Event-Speech Recognition): A comprehensive benchmark and baseline for detecting and localizing non-verbal vocal events in speech.