Skip to content
View csgcmai's full-sized avatar
😜
Be the fire and wish for the wind
😜
Be the fire and wish for the wind

Block or report csgcmai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 1,013 109 Updated Jan 15, 2026

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 3,012 249 Updated Mar 25, 2026
Python 53 5 Updated Jun 6, 2023

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,403 953 Updated Mar 25, 2026

ICLR 2026 (Oral) | EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning

Python 35 1 Updated Feb 12, 2026

SONIC-O1 is a fully human-verified real-world audio-video benchmark spanning 13 conversational domains to evaluate MLLMs on summarization, evidence-grounded MCQ reasoning, and temporal localization…

Python 5 Updated Mar 21, 2026
Python 2 Updated Feb 26, 2026

FireRed-OpenStoryline is an AI video editing agent that transforms manual editing into intention-driven directing through natural language interaction, LLM-powered planning, and precise tool orches…

Python 1,247 118 Updated Mar 23, 2026

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 425 24 Updated Mar 24, 2026

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python 3,138 233 Updated Mar 25, 2026

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

548 31 Updated Feb 5, 2026

An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08741.

Python 40 2 Updated Jun 4, 2025

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

Python 1,505 122 Updated Feb 19, 2025

Refine high-quality datasets and visual AI models

Python 10,506 732 Updated Mar 25, 2026

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 721 41 Updated Mar 25, 2026

A Lighting Pytorch Framework for Recommendation Models (PyTorch推荐算法框架), Easy-to-use and Easy-to-extend. https://datawhalechina.github.io/torch-rechub/

Jupyter Notebook 961 132 Updated Mar 20, 2026

An Open Foundation Model and Benchmark to Accelerate Generative Recommendation

Python 692 106 Updated Mar 18, 2026

The code implementation for UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings (ICLR 2026).

Python 51 2 Updated Feb 25, 2026

This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]

Python 607 55 Updated Mar 24, 2026

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

1,390 136 Updated Mar 25, 2026

CaptionQA: Is Your Caption as Useful as the Image Itself?

Python 36 1 Updated Mar 3, 2026

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 8,378 1,189 Updated Mar 18, 2026

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript 29,079 2,850 Updated Mar 10, 2026

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 753 60 Updated Aug 6, 2025

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.

Python 23,108 2,260 Updated Feb 2, 2026

[ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Python 99 10 Updated Jan 26, 2026

[NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.

Python 46 Updated Oct 29, 2025

从零构建大模型:从预训练到RLHF的完整实践

Python 2,551 191 Updated Mar 19, 2026

MISP-Meeting Dataset & Code

Python 2 2 Updated Jan 11, 2026

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 532 29 Updated Aug 14, 2025
Next