Skip to content
View MartinXM's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Tongyi Lab, Alibaba
  • Hangzhou

Block or report MartinXM

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.

108 4 Updated Dec 20, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,956 358 Updated Dec 23, 2025

An interactive AI voice agent that can capture and transcribe speech in real-time, generate intelligent responses using the DeepSeek R1 (7B model) AI, and convert the responses back to natural spee…

Python 22 5 Updated Jun 20, 2025
Python 943 97 Updated Dec 17, 2025

Nano vLLM

Python 10,025 1,254 Updated Nov 3, 2025

Open-source unified multimodal model

Python 5,500 481 Updated Oct 27, 2025

🙌 OpenHands: AI-Driven Development

Python 65,866 8,103 Updated Dec 23, 2025

计算机自学指南

HTML 70,219 7,794 Updated Nov 28, 2025

GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities

Python 305 8 Updated May 3, 2025

My take on Flow Matching

Jupyter Notebook 86 12 Updated Jan 11, 2025

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 97,813 11,080 Updated Dec 23, 2025

Open-source and strong foundation image recognition models.

Jupyter Notebook 3,530 316 Updated Feb 18, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,855 303 Updated Jun 12, 2025

Official Repo for "TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding" [ACL 2025 oral]

Python 1,441 188 Updated Jul 27, 2025

This repo is meant to serve as a guide for Machine Learning/AI technical interviews.

Jupyter Notebook 7,356 1,332 Updated Nov 28, 2025

[ICCV'25]DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Python 1,320 75 Updated Oct 17, 2025

[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation

Python 5,929 528 Updated Mar 19, 2025

A collection of resources and papers on Diffusion Models

HTML 12,211 1,011 Updated Aug 1, 2024

collection of diffusion model papers categorized by their subareas

2,090 95 Updated Dec 22, 2025

Extend BoxDiff to SDXL (SDXL-based layout-to-image generation)

Python 25 2 Updated May 23, 2024
Python 238 16 Updated Apr 10, 2024

[ECCV 2024] OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models

Python 700 46 Updated Jul 2, 2024

Apply unlimited masks to unlimited LoRA models

Python 50 4 Updated Jul 24, 2023

🚀 Cross attention map tools for huggingface/diffusers

Python 374 27 Updated Jan 18, 2025

StoryMaker: Towards consistent characters in text-to-image generation

Python 717 61 Updated Dec 2, 2024

[AAAI 2025] Official implementation of "OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on"

Python 6,487 939 Updated May 13, 2024

Next-Token Prediction is All You Need

Python 2,270 91 Updated Nov 19, 2025

experimental implementation of Consistory

Python 20 2 Updated Jul 15, 2024
Next