csgcmai

😜

Be the fire and wish for the wind

Guangcan MAI csgcmai

😜

Be the fire and wish for the wind

Computer Vision @ YY Live, Baidu Inc

62 followers · 165 following

YY Live, Baidu Inc
guangcan.tech

Achievements

Starred repositories

X-LANCE / SLAM-LLM

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 1,013 109 Updated Jan 15, 2026

alibaba / ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 3,012 249 Updated Mar 25, 2026

google-research / xtreme-up

Python 53 5 Updated Jun 6, 2023

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,403 953 Updated Mar 25, 2026

dingdongwang / EmotionThinker

ICLR 2026 (Oral) | EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning

Python 35 1 Updated Feb 12, 2026

VectorInstitute / sonic-o1

SONIC-O1 is a fully human-verified real-world audio-video benchmark spanning 13 conversational domains to evaluate MLLMs on summarization, evidence-grounded MCQ reasoning, and temporal localization…

Python 5 Updated Mar 21, 2026

bearcatty / paper_read

Python 2 Updated Feb 26, 2026

FireRedTeam / FireRed-OpenStoryline

FireRed-OpenStoryline is an AI video editing agent that transforms manual editing into intention-driven directing through natural language interaction, LLM-powered planning, and precise tool orches…

Python 1,247 118 Updated Mar 23, 2026

FireRedTeam / FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 425 24 Updated Mar 24, 2026

OpenDCAI / DataFlow

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python 3,138 233 Updated Mar 25, 2026

zli12321 / Vision-Language-Models-Overview

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

548 31 Updated Feb 5, 2026

Letian2003 / MM_INF

An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08741.

Python 40 2 Updated Jun 4, 2025

tencent-ailab / persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

Python 1,505 122 Updated Feb 19, 2025

voxel51 / fiftyone

Refine high-quality datasets and visual AI models

Python 10,506 732 Updated Mar 25, 2026

SandAI-org / MagiAttention

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 721 41 Updated Mar 25, 2026

datawhalechina / torch-rechub

A Lighting Pytorch Framework for Recommendation Models (PyTorch推荐算法框架), Easy-to-use and Easy-to-extend. https://datawhalechina.github.io/torch-rechub/

Jupyter Notebook 961 132 Updated Mar 20, 2026

Kuaishou-OneRec / OpenOneRec

An Open Foundation Model and Benchmark to Accelerate Generative Recommendation

Python 692 106 Updated Mar 18, 2026

XMUDeepLIT / UME-R1

The code implementation for UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings (ICLR 2026).

Python 51 2 Updated Feb 25, 2026

TIGER-AI-Lab / VLM2Vec

This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]

Python 607 55 Updated Mar 24, 2026

rafska / awesome-local-llm

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

1,390 136 Updated Mar 25, 2026

bronyayang / CaptionQA

CaptionQA: Is Your Caption as Useful as the Image Itself?

Python 36 1 Updated Mar 3, 2026

facebookresearch / sam3

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 8,378 1,189 Updated Mar 18, 2026

bytedance / UI-TARS-desktop

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript 29,079 2,850 Updated Mar 10, 2026

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 753 60 Updated Aug 6, 2025

vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.

Python 23,108 2,260 Updated Feb 2, 2026

Haochen-Wang409 / Grasp-Any-Region

[ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Python 99 10 Updated Jan 26, 2026

CURRENTF / LowRankClone

[NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.

Python 46 Updated Oct 29, 2025

qibin0506 / Cortex

从零构建大模型：从预训练到RLHF的完整实践

Python 2,551 191 Updated Mar 19, 2026

coalboss / MISP-Meeting

MISP-Meeting Dataset & Code

Python 2 2 Updated Jan 11, 2026

bytedance / tarsier

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 532 29 Updated Aug 14, 2025

Guangcan MAI csgcmai

Starred repositories

text-to-video

video-object-segmentation

saliency-detection