What's new in 1.15.0 (2025-12-13)

@OliverBryant

What's new in 1.15.0 (2025-12-13)

These are the changes in inference v1.15.0.

New features

FEAT: added more detailed instructions for engine unavailability. by @OliverBryant in #4308
FEAT: [model] Z-Image-Turbo support by @OliverBryant in #4333
FEAT: [model] DeepSeek-V3.2 support by @Jun-Howie in #4344
FEAT: [model] PaddleOCR-VL support by @leslie2046 in #4354
FEAT: add llama_cpp json schema output by @OliverBryant in #4282
FEAT: PaddleOCR-VL implementation by @leslie2046 in #4304
FEAT: multi replicas on a single GPU && add launch strategy by @OliverBryant in #4358

Enhancements

ENH: update models JSON [llm] by @XprobeBot in #4343
ENH: update model "MiniMax-M2" JSON by @XprobeBot in #4342
ENH: update models JSON [llm] by @XprobeBot in #4349
ENH: support lauching with --device cpu by @hubutui in #4352
ENH: add glm-4.5 tool calls support && vllm StructuredOutputsParams support by @OliverBryant in #4357

Bug fixes

BUG: fix manage cache models missing by @OliverBryant in #4329
BUG: [llm, vllm]: support ignore eos by @ZhikaiGuo960110 in #4332
BUG: Multimodal settings for video parameters are not taking effect. by @OliverBryant in #4338
BUG: Soft links cannot be completely deleted by @OliverBryant in #4337
BUG: Packages with identical names in virtual environments error by @OliverBryant in #4348
BUG: Fix typo in xinference/deploy/docker/Dockerfile.cu128 by @hubutui in #4350
BUG: custom embedding model register fail by @OliverBryant in #4335
BUG: [UI] fix the bug in the copy function. by @yiboyasss in #4355
BUG: [UI] control Select dropdown width to prevent it from becoming too wide. by @yiboyasss in #4356

Documentation

DOC: add new models and v1.14.0 release notes by @qinxuye in #4305

Others

Fixed- workflow Vulnerability by @barakharyati in #4328
CHORE: add i18n for replica details by @leslie2046 in #4306

New Contributors

@barakharyati made their first contribution in #4328
@ZhikaiGuo960110 made their first contribution in #4332
@hubutui made their first contribution in #4350

Full Changelog: v1.14.0...v1.15.0

@amumu96

What's new in 1.14.0 (2025-11-30)

These are the changes in inference v1.14.0.

New features

FEAT: add vLLM 0.11.1+ compatibility with v1 executor support by @amumu96 in #4252
FEAT: [virtualenv] New v3 spec and list/delete virtual env APIs by @OliverBryant in #4254
FEAT: [model] HunyuanOCR support by @OliverBryant in #4290
FEAT: Add support of rerank model for llamacpp by @harryzwh in #4227
FEAT: show reason why engines not available by @OliverBryant in #4261
FEAT: Parallel startup model, add tooltips for startup progress, and p… by @leslie2046 in #4268

Enhancements

BLD: fix model ui launch error with gradio 6.x by @OliverBryant in #4289
BLD: add pr auto run gen_docs workflow. by @yiboyasss in #4260
BLD: gen docs pr modify by @OliverBryant in #4294
BLD: gen doc modify v2 by @OliverBryant in #4296
BLD: gen docs pr modify v3 by @OliverBryant in #4297
BLD: auto-run gen_docs.py from doc/source by @yiboyasss in #4300
BLD: remove [skip ci] from auto docs commit by @yiboyasss in #4301

Bug fixes

BUG: Compat with xllamacpp 0.2.5+ by @codingl2k1 in #4270
BUG: add download_hubs for cluster by @OliverBryant in #4273
BUG: sometimes cannot select gpu in CPU and GPU hybrid cluster by @leslie2046 in #4280

Documentation

DOC: added v1.13.0 release notes by @qinxuye in #4250
DOC: update gen_docs by @qinxuye in #4302

Others

CHORE: expand stale and close time by @qinxuye in #4253
chore: sync models JSON [audio, embedding, image, llm, rerank, video] by @XprobeBot in #4258
chore: sync models JSON [llm] by @XprobeBot in #4272
chore: sync model "Qwen3-Reranker-0.6B" JSON by @OliverBryant in #4277
chore: sync model "bge-reranker-v2-m3" JSON by @OliverBryant in #4276
chore: sync model "Qwen3-Reranker-4B" JSON by @OliverBryant in #4278
chore: sync model "Qwen3-Reranker-8B" JSON by @OliverBryant in #4279
chore: sync model "qwen3" JSON by @XprobeBot in #4287
chore: sync models JSON [rerank] by @XprobeBot in #4284
chore: sync model "FLUX.1-dev" JSON by @OliverBryant in #4293
chore: sync model "FLUX.2-dev" JSON by @OliverBryant in #4292
chore: sync models JSON [image] by @XprobeBot in #4303

Full Changelog: v1.13.0...v1.14.0

@OliverBryant

What's new in 1.13.0 (2025-11-15)

These are the changes in inference v1.13.0.

New features

FEAT: [model] Qwen3-VL-MLX support by @OliverBryant in #4203
FEAT: auto batch embedding by @qinxuye in #4197
FEAT: update models via Xinference model hub by @OliverBryant in #4241

Enhancements

ENH: IndexTTS2 stream output by @OliverBryant in #4213
ENH: IndexTTS2 offline deploy by @OliverBryant in #4202
ENH: add embedding benchmark by @llyycchhee in #4244
BLD: Fix CI error caused by peft version by @OliverBryant in #4249

Bug fixes

BUG: Deepseek-OCR error in docker by @OliverBryant in #4208
BUG: ensure unique tool call IDs using UUID by @amumu96 in #4242
BUG: Fix cache model not shown on audio、video and image by @OliverBryant in #4247

Documentation

DOC: added new models by @qinxuye in #4206
DOC: Xinference 1.12.0 installation issues with uv by @qiulang in #4228
DOC: add model update documentation. by @yiboyasss in #4246

Others

chore: sync models JSON [audio, embedding, image, llm, rerank, video] by @XprobeBot in #4214
chore: sync models JSON [audio, embedding, image, llm, rerank, video] by @XprobeBot in #4226
chore: sync models JSON [audio] by @XprobeBot in #4243

Full Changelog: v1.12.0...v1.13.0

@llyycchhee

What's new in 1.12.0 (2025-11-02)

These are the changes in inference v1.12.0.

New features

FEAT: [model] support jina-reranker-v3 by @llyycchhee in #4156
FEAT: [model] qwen3-omni by @qinxuye in #4137
FEAT: xinference python 3.13 support by @OliverBryant in #4164
FEAT: add OCR gradio UI by @OliverBryant in #4185
FEAT: [model] DeepSeek-OCR by @OliverBryant in #4187

Enhancements

ENH: adding lightning support for qwen-image-edit-2509 by @qinxuye in #4151
BLD: torchaudio 2.9 introduces the breaking change in torchaudio.save by @qiulang in #4178
BLD: fix setup.cfg for python 3.12 and fix dockerfile by @zwt-1234 in #4192
BLD: fix Dockerfile.cpu by @zwt-1234 in #4195
REF: Modified the batch lock logic by @OliverBryant in #4162
BLD: fix transformers version in cu128 dockerfile by @zwt-1234 in #4152

Bug fixes

BUG: repair qwen3 model transformers random characters by @OliverBryant in #4148
BUG: [UI] resolve progress bar display issue. by @yiboyasss in #4150
BUG: fix IndexTTS2 on transformes 4.57.1 by @OliverBryant in #4158
BUG: fix error when xinference run on docker with oath2 by @OliverBryant in #4161
BUG: fix qwen3-vl launch error by @amumu96 in #4190

Documentation

DOC: add release notes doc by @qinxuye in #4157
DOC: Add PyPI mirror configuration guide for audio package installation by @qiulang in #4177

Others

chore: sync models JSON [image, llm] by @XprobeBot in #4149
chore: sync models JSON [rerank] by @XprobeBot in #4159
chore: sync models JSON [llm] by @XprobeBot in #4160
chore: sync models JSON [llm] by @XprobeBot in #4171
chore: sync models JSON [image] by @XprobeBot in #4186
chore: sync models JSON [embedding, image] by @XprobeBot in #4188
chore: sync models JSON [llm] by @XprobeBot in #4191

Full Changelog: v1.11.0...v1.12.0

@OliverBryant

What's new in 1.11.0.post1 (2025-10-20)

These are the changes in inference v1.11.0.post1.

Bug fixes

BUG: repair qwen3 model transformers random characters by @OliverBryant in #4148
BUG: [UI] resolve progress bar display issue. by @yiboyasss in #4150

Others

BLD：fix transformers version in cu128 dockerfile by @zwt-1234 in #4152

Full Changelog: v1.11.0...v1.11.0.post1

@Jun-Howie

What's new in 1.11.0 (2025-10-19)

These are the changes in inference v1.11.0.

New features

FEAT: [model]Support Qwen3-4B Instruct/Thinking by @Jun-Howie in #4129
FEAT: OpenAI image edit api support by @OliverBryant in #4110
FEAT: Add vllm multi model support by @zhcn000000 in #4126

Enhancements

ENH: Add support for vllm awq 8bit and support qwen3-vl 30b by @zhcn000000 in #4122
BLD: Fix CI run failed issue by @OliverBryant in #4103
BLD: fix cu128 Dockerfile by @zwt-1234 in #4145

Bug fixes

BUG: [UI] launch button stays disabled when n_gpu_layers=-1. by @yiboyasss in #4127
BUG: vllm structured output issue by @OliverBryant in #4142

Documentation

DOC: update new models by @qinxuye in #4146

Others

BLD：Docker.cu128 Upgrade VLLM to 0.10.2 by @zwt-1234 in #4134
FEAT：[model] support MiniCPM-V-4.5 by @OliverBryant in #4136
chore: sync models JSON [audio, image, llm, video] by @XprobeBot in #4135
chore: sync models JSON [llm] by @XprobeBot in #4140

New Contributors

@XprobeBot made their first contribution in #4135

Full Changelog: v1.10.1...v1.11.0

@OliverBryant

What's new in 1.10.1 (2025-10-01)

These are the changes in inference v1.10.1.

New features

FEAT: Openai API support sglang json structured output by @OliverBryant in #4070
FEAT: [UI] support request_limits parameter for all models. by @yiboyasss in #4081
FEAT: support list flexible model via webui and cmdline by @leslie2046 in #4085
FEAT: [model] Support IndexTTS2 by @OliverBryant in #4078
FEAT: [UI] support dynamic download_hub detection. by @yiboyasss in #4082
FEAT: [model] qwen-image-edit-2509 by @qinxuye in #4099
FEAT: [model] baichuan-M2 by @Jun-Howie in #4107
FEAT: [model] Support Qwen3-VL by @Jun-Howie in #4112
FEAT: [model] Support Qwen3-Next by @Jun-Howie in #4113

Enhancements

ENH: optimize MPS on Mac for Qwen2.5-VL by @SolardiaX in #3524
ENH: deepseek-r1-0528 support tool_calls by @amumu96 in #4106
BLD: update funasr by @leslie2046 in #4062
BLD: Update Dockerfile.cu128 by @zwt-1234 in #4114
REF: [UI] refactor the launch model page. by @yiboyasss in #3940

Bug fixes

BUG: Optimize rerank model lookup logic and add support for video model type by @amumu96 in #4063
BUG: Fix seed-oss required VLLM_VERSION by @Jun-Howie in #4071
BUG: fix register_model when model name is duplicated by @llyycchhee in #4076
BUG: [UI] fix the custom model drawer component could not be opened. by @yiboyasss in #4089
BUG: Fix the issue where registered models cannot use tools by @amumu96 in #4100
BUG: fix finish_reason field handling logic by @amumu96 in #4105
BUG: vllm structured output compatibility by @OliverBryant in #4111

Documentation

DOC: Update README.md about MaxKB by @chixq in #3771

New Contributors

@chixq made their first contribution in #3771
@SolardiaX made their first contribution in #3524

Full Changelog: v1.10.0...v1.10.1

@JavisPeng

What's new in 1.10.0 (2025-09-13)

These are the changes in inference v1.10.0.

New features

FEAT: [model] Support Kokoro-82M-v1.1-zh by @JavisPeng in #4042
FEAT: IP restriction by env: XINFERENCE_ALLOWED_IPS by @qxo in #4047
FEAT: add support for the Anthropic API format by @OliverBryant in #4037
FEAT: Openai API support vLLM json schema output by @OliverBryant in #4061

Enhancements

ENH: Update the environment dependencies for GOT-OCR2 by @Gmgge in #4031
ENH: Clean memory during running MLX version's LLM models by @OliverBryant in #4026
BLD: bump funasr to 1.2.7 by @leslie2046 in #4039
BLD: cu128 version Dockerfile fix by @zwt-1234 in #4056
BLD: Update Dockerfile.cu128 by @amumu96 in #4059
REF: refactor tool calls functionality by @amumu96 in #4025

Bug fixes

BUG: Fix Kokoro-82M can't run on GPU by @OliverBryant in #4034
BUG: [embeddings] fix parsing str type hf_overrides for vllm engine by @llyycchhee in #4052
BUG: missing usage info in jina-embedding-v4 model response by @amumu96 in #4054
BUG: distributed registration bug by @llyycchhee in #4046

New Contributors

@JavisPeng made their first contribution in #4042
@qxo made their first contribution in #4047

Full Changelog: v1.9.1...v1.10.0

@qinxuye

What's new in 1.9.1 (2025-08-30)

These are the changes in inference v1.9.1.

New features

FEAT: Qwen-Image-Edit by @qinxuye in #3989
FEAT: Wan 2.2 by @qinxuye in #3996
FEAT: Update CosyVoice2 to support both streaming and non-streaming speech generation by @Gmgge in #3994
FEAT: support qwen-image-lightning by @qinxuye in #3995
FEAT: [UI] support gpu_count configuration in image model. by @yiboyasss in #4016
FEAT: image2image and inpainting for qwen-image by @qinxuye in #4014
FEAT: Support Custom vllm embedding dim by @zhcn000000 in #4000
FEAT: [embedding] support dimensions for embedding by @llyycchhee in #3965
FEAT: [Model] Support DeepSeek-V3.1 Quantization and tool by @Jun-Howie in #4022
FEAT: Seed-OSS-36B by @Jun-Howie in #4020

Enhancements

ENH: added zero shot and voice cloning ability for audio models by @qianduoduo0904 in #3968
ENH: Add Template for Qwen3 Reranker when model_engine = vllm by @zhcn000000 in #3983
ENH: Update the environment dependencies for cosyvoice2 by @Gmgge in #4015
ENH: Compat with xllamacpp 0.2.0 by @codingl2k1 in #4004
ENH: support chat_template_kwargs for llama.cpp by @qinxuye in #3988
BLD: Clean up Docker's last legacy cache and images before executing each step by @zwt-1234 in #3963
BLD: fix CI failures by @qinxuye in #4002

Bug fixes

BUG: disable flash_attention when GPU compute capability < 8.0 by @amumu96 in #3973
BUG: fix rerank model creation by @qinxuye in #3977

Documentation

DOC: update models by @qinxuye in #3958
DOC: add setting limitation of images for multi modal doc by @amumu96 in #4003
DOC: Update docs about custom models by @OliverBryant in #4019
DOC: update models & README by @qinxuye in #4023

Others

FEAT：KAT-V1 by @Jun-Howie in #3998

New Contributors

@qianduoduo0904 made their first contribution in #3968
@OliverBryant made their first contribution in #4019

Full Changelog: v1.9.0...v1.9.1

@yiboyasss

What's new in 1.9.0 (2025-08-16)

These are the changes in inference v1.9.0.

New features

FEAT: [UI] running models data display replica. by @yiboyasss in #3897
FEAT: [model] Qwen-Image by @qinxuye in #3916
FEAT: [model] gpt-oss by @qinxuye in #3924
FEAT: function calling support for deepseek-r1-0528 by @qinxuye in #3931
FEAT: Support for GLM 4.5 quantized models by @Jun-Howie in #3945
FEAT: sglang support streaming function call by @aniya105 in #3939
FEAT: parsing harmony format for gpt-oss by @qinxuye in #3948
FEAT: Add support for switching rerank model engines and support for rerank of vllm engine by @zhcn000000 in #3881
FEAT: Support GLM-4.5v by @Jun-Howie in #3957

Enhancements

ENH: Add qwen3 new model to tool call list by @zhcn000000 in #3900
ENH: Update chat_template for Qwen3-Coder by @Jun-Howie in #3944
ENH: add flash_attention control params attn_implementation by @amumu96 in #3951
ENH: support qwen-image gguf by @qinxuye in #3954
ENH: clean embedding model cache when using vllm engine by @amumu96 in #3956
BLD: Downgrade flash-attn to version 2.7.4 by @zwt-1234 in #3953
BLD: Add Openfst source by @zwt-1234 in #3959

Bug fixes

BUG: limit datasets version by @qinxuye in #3943

Documentation

DOC: add doc about cu128 docker by @qinxuye in #3899
DOC: Update xllamacpp doc by @codingl2k1 in #3862

Others

Replace @torch.no_grad() with @torch.inference_mode() in Qwen3-Reranker by @yasu-oh in #3911

Full Changelog: v1.8.1...v1.9.0

Releases: xorbitsai/inference

v1.15.0

What's new in 1.15.0 (2025-12-13)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v1.14.0

What's new in 1.14.0 (2025-11-30)

New features

Enhancements

Bug fixes

Documentation

Others

Contributors

Uh oh!

v1.13.0

What's new in 1.13.0 (2025-11-15)

New features

Enhancements

Bug fixes

Documentation

Others

Contributors

Uh oh!

v1.12.0

What's new in 1.12.0 (2025-11-02)

New features

Enhancements

Bug fixes

Documentation

Others

Contributors

Uh oh!

v1.11.0.post1

What's new in 1.11.0.post1 (2025-10-20)

Bug fixes

Others

Contributors

Uh oh!

v1.11.0

What's new in 1.11.0 (2025-10-19)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v1.10.1

What's new in 1.10.1 (2025-10-01)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v1.10.0

What's new in 1.10.0 (2025-09-13)

New features

Enhancements

Bug fixes

New Contributors

Contributors

Uh oh!

v1.9.1

What's new in 1.9.1 (2025-08-30)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors