Stars
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its size.
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation Model
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
STEP-GUI: The top GUI agent solution in the galaxy. Developed by the StepFun-GELab team and powered by StepFun’s cutting-edge research capabilities.
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Official implementation of URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding (AAAI 2026 Oral).
Native Multimodal Models are World Learners
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A lightweight Python library for simulating Chinese handwriting
The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.