-
Bejing University of Posts and Telecommunications
- Beijing
-
21:38
(UTC -12:00) - (it will come soon)
- @qiker
Highlights
- Pro
Lists (4)
Sort Name ascending (A-Z)
Stars
Psy-Insight: Mental Health Oriented Interpretable Multi-turn Bilingual Counseling Dataset for Large Language Model Finetuning
Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding"
Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.
ACE-Step: A Step Towards Music Generation Foundation Model
kq-chen / nougat
Forked from facebookresearch/nougatImplementation of Nougat Neural Optical Understanding for Academic Documents
kq-chen / AutoGPTQ
Forked from AutoGPTQ/AutoGPTQAn easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
helper functions for processing and integrating visual language information with Qwen-VL Series Model
kq-chen / VLMEvalKit
Forked from open-compass/VLMEvalKitOpen-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
VideoAuteur: Towards Long Narrative Video Generation
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
An Open-source RL System from ByteDance Seed and Tsinghua AIR
verl: Volcano Engine Reinforcement Learning for LLMs
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Lets make video diffusion practical!
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
Witness the aha moment of VLM with less than $3.
Research code for ACL2024 paper: "Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline"
Video translation and dubbing tool powered by LLMs. The video translator offers 100 language translations and one-click full-process deployment. The video translation output is optimized for platfo…
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Frontier Multimodal Foundation Models for Image and Video Understanding