-
Tsinghua University
- Beijing
-
21:47
(UTC +08:00) - https://scholar.google.com/citations?user=w68g1qkAAAAJ&hl=zh-CN&oi=ao
Lists (14)
Sort Name ascending (A-Z)
Stars
用于预测性维护与健康管理的大型语言模型(故障诊断大模型;剩余使用寿命预测大模型)
PyTorch code for hierarchical k-means -- a data curation method for self-supervised learning
MiMo-Audio: Audio Language Models are Few-Shot Learners
[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
Official code, datasets and checkpoints for "Timer: Generative Pre-trained Transformers Are Large Time Series Models" (ICML 2024) and subsequent works
Papers and datasets for Vibration Analysis
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Machine Learning applied to sound
Unified automatic quality assessment for speech, music, and sound.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Open rotating mechanical fault datasets (开源旋转机械故障数据集整理)
A benchmark fault diagnosis dataset comprises vibration data collected from a gearbox under variable working conditions with intentionally induced faults, encompassing diverse fault severities and …
Benchmark popular audio i/o packages
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Multilingual Voice Understanding Model
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
A Framework for Speech, Language, Audio, Music Processing with Large Language Model