Releases: xorbitsai/inference
v1.15.0
What's new in 1.15.0 (2025-12-13)
These are the changes in inference v1.15.0.
New features
- FEAT: added more detailed instructions for engine unavailability. by @OliverBryant in #4308
- FEAT: [model] Z-Image-Turbo support by @OliverBryant in #4333
- FEAT: [model] DeepSeek-V3.2 support by @Jun-Howie in #4344
- FEAT: [model] PaddleOCR-VL support by @leslie2046 in #4354
- FEAT: add llama_cpp json schema output by @OliverBryant in #4282
- FEAT: PaddleOCR-VL implementation by @leslie2046 in #4304
- FEAT: multi replicas on a single GPU && add launch strategy by @OliverBryant in #4358
Enhancements
- ENH: update models JSON [llm] by @XprobeBot in #4343
- ENH: update model "MiniMax-M2" JSON by @XprobeBot in #4342
- ENH: update models JSON [llm] by @XprobeBot in #4349
- ENH: support lauching with --device cpu by @hubutui in #4352
- ENH: add glm-4.5 tool calls support && vllm StructuredOutputsParams support by @OliverBryant in #4357
Bug fixes
- BUG: fix manage cache models missing by @OliverBryant in #4329
- BUG: [llm, vllm]: support ignore eos by @ZhikaiGuo960110 in #4332
- BUG: Multimodal settings for video parameters are not taking effect. by @OliverBryant in #4338
- BUG: Soft links cannot be completely deleted by @OliverBryant in #4337
- BUG: Packages with identical names in virtual environments error by @OliverBryant in #4348
- BUG: Fix typo in xinference/deploy/docker/Dockerfile.cu128 by @hubutui in #4350
- BUG: custom embedding model register fail by @OliverBryant in #4335
- BUG: [UI] fix the bug in the copy function. by @yiboyasss in #4355
- BUG: [UI] control Select dropdown width to prevent it from becoming too wide. by @yiboyasss in #4356
Documentation
Others
- Fixed- workflow Vulnerability by @barakharyati in #4328
- CHORE: add i18n for replica details by @leslie2046 in #4306
New Contributors
- @barakharyati made their first contribution in #4328
- @ZhikaiGuo960110 made their first contribution in #4332
- @hubutui made their first contribution in #4350
Full Changelog: v1.14.0...v1.15.0
v1.14.0
What's new in 1.14.0 (2025-11-30)
These are the changes in inference v1.14.0.
New features
- FEAT: add vLLM 0.11.1+ compatibility with v1 executor support by @amumu96 in #4252
- FEAT: [virtualenv] New v3 spec and list/delete virtual env APIs by @OliverBryant in #4254
- FEAT: [model] HunyuanOCR support by @OliverBryant in #4290
- FEAT: Add support of rerank model for llamacpp by @harryzwh in #4227
- FEAT: show reason why engines not available by @OliverBryant in #4261
- FEAT: Parallel startup model, add tooltips for startup progress, and p… by @leslie2046 in #4268
Enhancements
- BLD: fix model ui launch error with gradio 6.x by @OliverBryant in #4289
- BLD: add pr auto run gen_docs workflow. by @yiboyasss in #4260
- BLD: gen docs pr modify by @OliverBryant in #4294
- BLD: gen doc modify v2 by @OliverBryant in #4296
- BLD: gen docs pr modify v3 by @OliverBryant in #4297
- BLD: auto-run gen_docs.py from doc/source by @yiboyasss in #4300
- BLD: remove [skip ci] from auto docs commit by @yiboyasss in #4301
Bug fixes
- BUG: Compat with xllamacpp 0.2.5+ by @codingl2k1 in #4270
- BUG: add download_hubs for cluster by @OliverBryant in #4273
- BUG: sometimes cannot select gpu in CPU and GPU hybrid cluster by @leslie2046 in #4280
Documentation
Others
- CHORE: expand stale and close time by @qinxuye in #4253
- chore: sync models JSON [audio, embedding, image, llm, rerank, video] by @XprobeBot in #4258
- chore: sync models JSON [llm] by @XprobeBot in #4272
- chore: sync model "Qwen3-Reranker-0.6B" JSON by @OliverBryant in #4277
- chore: sync model "bge-reranker-v2-m3" JSON by @OliverBryant in #4276
- chore: sync model "Qwen3-Reranker-4B" JSON by @OliverBryant in #4278
- chore: sync model "Qwen3-Reranker-8B" JSON by @OliverBryant in #4279
- chore: sync model "qwen3" JSON by @XprobeBot in #4287
- chore: sync models JSON [rerank] by @XprobeBot in #4284
- chore: sync model "FLUX.1-dev" JSON by @OliverBryant in #4293
- chore: sync model "FLUX.2-dev" JSON by @OliverBryant in #4292
- chore: sync models JSON [image] by @XprobeBot in #4303
Full Changelog: v1.13.0...v1.14.0
v1.13.0
What's new in 1.13.0 (2025-11-15)
These are the changes in inference v1.13.0.
New features
- FEAT: [model] Qwen3-VL-MLX support by @OliverBryant in #4203
- FEAT: auto batch embedding by @qinxuye in #4197
- FEAT: update models via Xinference model hub by @OliverBryant in #4241
Enhancements
- ENH: IndexTTS2 stream output by @OliverBryant in #4213
- ENH: IndexTTS2 offline deploy by @OliverBryant in #4202
- ENH: add embedding benchmark by @llyycchhee in #4244
- BLD: Fix CI error caused by peft version by @OliverBryant in #4249
Bug fixes
- BUG: Deepseek-OCR error in docker by @OliverBryant in #4208
- BUG: ensure unique tool call IDs using UUID by @amumu96 in #4242
- BUG: Fix cache model not shown on audio、video and image by @OliverBryant in #4247
Documentation
- DOC: added new models by @qinxuye in #4206
- DOC: Xinference 1.12.0 installation issues with uv by @qiulang in #4228
- DOC: add model update documentation. by @yiboyasss in #4246
Others
- chore: sync models JSON [audio, embedding, image, llm, rerank, video] by @XprobeBot in #4214
- chore: sync models JSON [audio, embedding, image, llm, rerank, video] by @XprobeBot in #4226
- chore: sync models JSON [audio] by @XprobeBot in #4243
Full Changelog: v1.12.0...v1.13.0
v1.12.0
What's new in 1.12.0 (2025-11-02)
These are the changes in inference v1.12.0.
New features
- FEAT: [model] support jina-reranker-v3 by @llyycchhee in #4156
- FEAT: [model] qwen3-omni by @qinxuye in #4137
- FEAT: xinference python 3.13 support by @OliverBryant in #4164
- FEAT: add OCR gradio UI by @OliverBryant in #4185
- FEAT: [model] DeepSeek-OCR by @OliverBryant in #4187
Enhancements
- ENH: adding lightning support for qwen-image-edit-2509 by @qinxuye in #4151
- BLD: torchaudio 2.9 introduces the breaking change in torchaudio.save by @qiulang in #4178
- BLD: fix setup.cfg for python 3.12 and fix dockerfile by @zwt-1234 in #4192
- BLD: fix Dockerfile.cpu by @zwt-1234 in #4195
- REF: Modified the batch lock logic by @OliverBryant in #4162
- BLD: fix transformers version in cu128 dockerfile by @zwt-1234 in #4152
Bug fixes
- BUG: repair qwen3 model transformers random characters by @OliverBryant in #4148
- BUG: [UI] resolve progress bar display issue. by @yiboyasss in #4150
- BUG: fix IndexTTS2 on transformes 4.57.1 by @OliverBryant in #4158
- BUG: fix error when xinference run on docker with oath2 by @OliverBryant in #4161
- BUG: fix qwen3-vl launch error by @amumu96 in #4190
Documentation
- DOC: add release notes doc by @qinxuye in #4157
- DOC: Add PyPI mirror configuration guide for audio package installation by @qiulang in #4177
Others
- chore: sync models JSON [image, llm] by @XprobeBot in #4149
- chore: sync models JSON [rerank] by @XprobeBot in #4159
- chore: sync models JSON [llm] by @XprobeBot in #4160
- chore: sync models JSON [llm] by @XprobeBot in #4171
- chore: sync models JSON [image] by @XprobeBot in #4186
- chore: sync models JSON [embedding, image] by @XprobeBot in #4188
- chore: sync models JSON [llm] by @XprobeBot in #4191
Full Changelog: v1.11.0...v1.12.0
v1.11.0.post1
What's new in 1.11.0.post1 (2025-10-20)
These are the changes in inference v1.11.0.post1.
Bug fixes
- BUG: repair qwen3 model transformers random characters by @OliverBryant in #4148
- BUG: [UI] resolve progress bar display issue. by @yiboyasss in #4150
Others
Full Changelog: v1.11.0...v1.11.0.post1
v1.11.0
What's new in 1.11.0 (2025-10-19)
These are the changes in inference v1.11.0.
New features
- FEAT: [model]Support Qwen3-4B Instruct/Thinking by @Jun-Howie in #4129
- FEAT: OpenAI image edit api support by @OliverBryant in #4110
- FEAT: Add vllm multi model support by @zhcn000000 in #4126
Enhancements
- ENH: Add support for vllm awq 8bit and support qwen3-vl 30b by @zhcn000000 in #4122
- BLD: Fix CI run failed issue by @OliverBryant in #4103
- BLD: fix cu128 Dockerfile by @zwt-1234 in #4145
Bug fixes
- BUG: [UI] launch button stays disabled when n_gpu_layers=-1. by @yiboyasss in #4127
- BUG: vllm structured output issue by @OliverBryant in #4142
Documentation
Others
- BLD:Docker.cu128 Upgrade VLLM to 0.10.2 by @zwt-1234 in #4134
- FEAT:[model] support MiniCPM-V-4.5 by @OliverBryant in #4136
- chore: sync models JSON [audio, image, llm, video] by @XprobeBot in #4135
- chore: sync models JSON [llm] by @XprobeBot in #4140
New Contributors
- @XprobeBot made their first contribution in #4135
Full Changelog: v1.10.1...v1.11.0
v1.10.1
What's new in 1.10.1 (2025-10-01)
These are the changes in inference v1.10.1.
New features
- FEAT: Openai API support sglang json structured output by @OliverBryant in #4070
- FEAT: [UI] support request_limits parameter for all models. by @yiboyasss in #4081
- FEAT: support list flexible model via webui and cmdline by @leslie2046 in #4085
- FEAT: [model] Support IndexTTS2 by @OliverBryant in #4078
- FEAT: [UI] support dynamic download_hub detection. by @yiboyasss in #4082
- FEAT: [model] qwen-image-edit-2509 by @qinxuye in #4099
- FEAT: [model] baichuan-M2 by @Jun-Howie in #4107
- FEAT: [model] Support Qwen3-VL by @Jun-Howie in #4112
- FEAT: [model] Support Qwen3-Next by @Jun-Howie in #4113
Enhancements
- ENH: optimize MPS on Mac for Qwen2.5-VL by @SolardiaX in #3524
- ENH: deepseek-r1-0528 support tool_calls by @amumu96 in #4106
- BLD: update funasr by @leslie2046 in #4062
- BLD: Update Dockerfile.cu128 by @zwt-1234 in #4114
- REF: [UI] refactor the launch model page. by @yiboyasss in #3940
Bug fixes
- BUG: Optimize rerank model lookup logic and add support for video model type by @amumu96 in #4063
- BUG: Fix seed-oss required VLLM_VERSION by @Jun-Howie in #4071
- BUG: fix register_model when model name is duplicated by @llyycchhee in #4076
- BUG: [UI] fix the custom model drawer component could not be opened. by @yiboyasss in #4089
- BUG: Fix the issue where registered models cannot use tools by @amumu96 in #4100
- BUG: fix finish_reason field handling logic by @amumu96 in #4105
- BUG: vllm structured output compatibility by @OliverBryant in #4111
Documentation
New Contributors
- @chixq made their first contribution in #3771
- @SolardiaX made their first contribution in #3524
Full Changelog: v1.10.0...v1.10.1
v1.10.0
What's new in 1.10.0 (2025-09-13)
These are the changes in inference v1.10.0.
New features
- FEAT: [model] Support Kokoro-82M-v1.1-zh by @JavisPeng in #4042
- FEAT: IP restriction by env: XINFERENCE_ALLOWED_IPS by @qxo in #4047
- FEAT: add support for the Anthropic API format by @OliverBryant in #4037
- FEAT: Openai API support vLLM json schema output by @OliverBryant in #4061
Enhancements
- ENH: Update the environment dependencies for GOT-OCR2 by @Gmgge in #4031
- ENH: Clean memory during running MLX version's LLM models by @OliverBryant in #4026
- BLD: bump funasr to 1.2.7 by @leslie2046 in #4039
- BLD: cu128 version Dockerfile fix by @zwt-1234 in #4056
- BLD: Update Dockerfile.cu128 by @amumu96 in #4059
- REF: refactor tool calls functionality by @amumu96 in #4025
Bug fixes
- BUG: Fix Kokoro-82M can't run on GPU by @OliverBryant in #4034
- BUG: [embeddings] fix parsing str type hf_overrides for vllm engine by @llyycchhee in #4052
- BUG: missing usage info in jina-embedding-v4 model response by @amumu96 in #4054
- BUG: distributed registration bug by @llyycchhee in #4046
New Contributors
- @JavisPeng made their first contribution in #4042
- @qxo made their first contribution in #4047
Full Changelog: v1.9.1...v1.10.0
v1.9.1
What's new in 1.9.1 (2025-08-30)
These are the changes in inference v1.9.1.
New features
- FEAT: Qwen-Image-Edit by @qinxuye in #3989
- FEAT: Wan 2.2 by @qinxuye in #3996
- FEAT: Update CosyVoice2 to support both streaming and non-streaming speech generation by @Gmgge in #3994
- FEAT: support qwen-image-lightning by @qinxuye in #3995
- FEAT: [UI] support gpu_count configuration in image model. by @yiboyasss in #4016
- FEAT: image2image and inpainting for qwen-image by @qinxuye in #4014
- FEAT: Support Custom vllm embedding dim by @zhcn000000 in #4000
- FEAT: [embedding] support
dimensionsfor embedding by @llyycchhee in #3965 - FEAT: [Model] Support DeepSeek-V3.1 Quantization and tool by @Jun-Howie in #4022
- FEAT: Seed-OSS-36B by @Jun-Howie in #4020
Enhancements
- ENH: added zero shot and voice cloning ability for audio models by @qianduoduo0904 in #3968
- ENH: Add Template for Qwen3 Reranker when model_engine = vllm by @zhcn000000 in #3983
- ENH: Update the environment dependencies for cosyvoice2 by @Gmgge in #4015
- ENH: Compat with xllamacpp 0.2.0 by @codingl2k1 in #4004
- ENH: support chat_template_kwargs for llama.cpp by @qinxuye in #3988
- BLD: Clean up Docker's last legacy cache and images before executing each step by @zwt-1234 in #3963
- BLD: fix CI failures by @qinxuye in #4002
Bug fixes
- BUG: disable flash_attention when GPU compute capability < 8.0 by @amumu96 in #3973
- BUG: fix rerank model creation by @qinxuye in #3977
Documentation
- DOC: update models by @qinxuye in #3958
- DOC: add setting limitation of images for multi modal doc by @amumu96 in #4003
- DOC: Update docs about custom models by @OliverBryant in #4019
- DOC: update models & README by @qinxuye in #4023
Others
- FEAT:KAT-V1 by @Jun-Howie in #3998
New Contributors
- @qianduoduo0904 made their first contribution in #3968
- @OliverBryant made their first contribution in #4019
Full Changelog: v1.9.0...v1.9.1
v1.9.0
What's new in 1.9.0 (2025-08-16)
These are the changes in inference v1.9.0.
New features
- FEAT: [UI] running models data display replica. by @yiboyasss in #3897
- FEAT: [model] Qwen-Image by @qinxuye in #3916
- FEAT: [model] gpt-oss by @qinxuye in #3924
- FEAT: function calling support for deepseek-r1-0528 by @qinxuye in #3931
- FEAT: Support for GLM 4.5 quantized models by @Jun-Howie in #3945
- FEAT: sglang support streaming function call by @aniya105 in #3939
- FEAT: parsing harmony format for gpt-oss by @qinxuye in #3948
- FEAT: Add support for switching rerank model engines and support for rerank of vllm engine by @zhcn000000 in #3881
- FEAT: Support GLM-4.5v by @Jun-Howie in #3957
Enhancements
- ENH: Add qwen3 new model to tool call list by @zhcn000000 in #3900
- ENH: Update chat_template for Qwen3-Coder by @Jun-Howie in #3944
- ENH: add flash_attention control params attn_implementation by @amumu96 in #3951
- ENH: support qwen-image gguf by @qinxuye in #3954
- ENH: clean embedding model cache when using vllm engine by @amumu96 in #3956
- BLD: Downgrade flash-attn to version 2.7.4 by @zwt-1234 in #3953
- BLD: Add Openfst source by @zwt-1234 in #3959
Bug fixes
Documentation
- DOC: add doc about cu128 docker by @qinxuye in #3899
- DOC: Update xllamacpp doc by @codingl2k1 in #3862
Others
Full Changelog: v1.8.1...v1.9.0