Qwen3.5

Welcome to the GitHub repository of Qwen3.5. Here, you can find official information about Qwen3.5 (User Guide, coming soon), post your questions (Issues), and share your ideas with the community (Discussions).

Introduction

Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency.

Qwen3.5 features the following enhancement:

Unified Vision-Language Foundation: Early fusion training on trillions of multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.
Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.
Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.
Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.
Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

News

2026-02-16: We release Qwen3.5. The first release includes a 397B-A17B MoE model. Read more on our release blog. More sizes are coming & Happy Chinese New Year!
2025-09-11: We release Qwen3-Next-80B-A3B, an ultra-sparse mixture-of-experts model with hybrid attention architecture, designed for extreme efficiency. Read more on our blog

Models

The official model weights are released on:

🤗Hugging Face Hub: Most LLM frameworks and applications support downloading model files from Hugging Face Hub automatically by specifying the model ID, e.g., Qwen/Qwen3.5-397B-A17B. You can also download model files manually using huggingface download or git clone. Please follow the instructions on the model page.
🤖ModelScope: For users unable to access Hugging Face Hub, we strongly recommend using ModelScope. For supported frameworks, you can download from ModelScope by setting environment variables, such as SGLANG_USE_MODELSCOPE=true or VLLM_USE_MODELSCOPE=true. You can also download model files manually using modelscope download or git clone. Please follow the instructions on the model page.

Benchmarks

For detailed results, please check out the release blog.

Quickstart

To learn more about Qwen3.5, feel free to read our documentation (coming soon).

Official

You can try Qwen3.5 on our official sites and enjoy the native experience with extra features, such as deep research, web dev, and adaptive tool use.

Qwen Chat

For users who simply would like to try Qwen3.5, Qwen Chat is just a touch away. Qwen Chat provides Web UI and desktop and mobile applications, with a familiar, easy-to-use user interface. Qwen Chat is also a playground for our ideas, showcasing how Qwen3.5 can be integrated into your workflow and applications.

Qwen API

The official Qwen API is provided by Alibaba Cloud Model Studio.

Alibaba Cloud Model Studio provides first-class support for Qwen3.5, which is compatible with various API specifications, including OpenAI and Anthropic, making it simple for you to try Qwen3.5 in your own applications.

Qwen Code

Qwen Code is an open-source AI agent for the terminal, optimized for Qwen models. It helps you understand large codebases, automate tedious work, and ship faster.

For more information, please refer to Qwen Code.

Qwen Agent

For agent development, take a look at Qwen-Agent. Qwen Agent is an open-source AI agent framework that helps you build powerful LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen.

Check out Qwen Agent to find out more!

Local Use

Hugging Face Transformers

transformers acts as the model-definition framework in the current open-weight LLM landscape. It also includes functionalities for LLM inference and training. The addition of serving capabilities in transformers makes it much easier to integrate new models in your development.

To launch a server, simply use the transformers serve command:

transformers serve --port 8000 --continuous-batching

OpenAI-compatible APIs can be accessed at http://localhost:8000/v1 and the server downloads models from Hugging Face Hub automatically.

With the server running, you can also interact with Qwen3.5 directly from the command line:

transformers chat Qwen/Qwen3.5-397B-A17B

llama.cpp

llama.cpp enables LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware. llama.cpp supports Qwen3.5 (text & vision). Look for models ending with GGUF on Hugging Face Hub.

MLX (Apple Silicon)

If you are running on Apple Silicon, both mlx-lm (text-only) and mlx-vlm (vision + text) support Qwen3.5. Look for models ending with MLX on the Hugging Face Hub.

Deployment

Qwen3.5 is supported by multiple inference frameworks. Here we demonstrate the usage of SGLang and vLLM

SGLang

SGLang is a fast serving framework for large language models and vision language models. SGLang could be used to launch a server with OpenAI-compatible API service.

python -m sglang.launch_server --model-path Qwen/Qwen3.5-397B-A17B --port 8000 --tp-size 8 --context-length 262144 --reasoning-parser qwen3

An OpenAI-compatible API will be available at http://localhost:30000/v1.

vLLM

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. vLLM could be used to launch a server with OpenAI-compatible API service.

vllm serve Qwen/Qwen3.5-397B-A17B --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3

An OpenAI-compatible API will be available at http://localhost:8000/v1.

Finetuning

We advise you to use training frameworks, including UnSloth, Swift, Llama-Factory, etc., to finetune your models with SFT, DPO, GRPO, etc.

License Agreement

All our open-weight models are licensed under Apache 2.0. You can find the license files in the respective Hugging Face repositories.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3.5
    title  = {{Qwen3.5}: Towards Native Multimodal Agents},
    author = {{Qwen Team}},
    month  = {February},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.5}
}

Contact Us

If you are interested to leave a message to either our research team or product team, join our Discord or WeChat groups!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3.5

Introduction

News

Models

Benchmarks

Quickstart

Official

Qwen Chat

Qwen API

Qwen Code

Qwen Agent

Local Use

Hugging Face Transformers

llama.cpp

MLX (Apple Silicon)

Deployment

SGLang

vLLM

Finetuning

License Agreement

Citation

Contact Us

About

Uh oh!

Releases

Packages

Contributors 2

License

QwenLM/Qwen3.5

Folders and files

Latest commit

History

Repository files navigation

Qwen3.5

Introduction

News

Models

Benchmarks

Quickstart

Official

Qwen Chat

Qwen API

Qwen Code

Qwen Agent

Local Use

Hugging Face Transformers

llama.cpp

MLX (Apple Silicon)

Deployment

SGLang

vLLM

Finetuning

License Agreement

Citation

Contact Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages