What's new in 1.5.1 (2025-04-30)

@qinxuye

What's new in 1.5.1 (2025-04-30)

These are the changes in inference v1.5.1.

New features

FEAT: Wan 2.1 text2video by @qinxuye in #3297
FEAT: [UI] highlight the input box content. by @yiboyasss in #3306
FEAT: [UI] display the model_ability parameter. by @yiboyasss in #3308
FEAT: add ggufv2 support for vLLM by @harryzwh in #3259
FEAT: ovis2 by @Minamiyama in #3170
FEAT: support Qwen3 and Qwen3MOE by @Jun-Howie in #3347
FEAT: Add support for Qwen3 GPTQ quantization format by @Jun-Howie in #3363

Enhancements

ENH: support setting sse ping attempts by @llyycchhee in #3313
ENH: Support GLM4-0414 MLX and GGUF by @Jun-Howie in #3325
ENH: optimize qwen3, support chat_template_kwargs for all engines by @qinxuye in #3354
REF: Drop internal compression logic for transformers quantization, using bnb config instead by @ChengjieLi28 in #3324
REF: Unify audio model abilities by @llyycchhee in #3351

Bug fixes

BUG: fix sglang chat by @qinxuye in #3326
BUG: Show engine options on UI even if the specific engine is not installed by @ChengjieLi28 in #3331
BUG: fix failure of clearing resources when loading model failed by @qinxuye in #3361

Documentation

DOC: update troubleshooting.rst for the launch error caused by numpy by @qiulang in #3342

New Contributors

@llyycchhee made their first contribution in #3313
@harryzwh made their first contribution in #3259
@qiulang made their first contribution in #3342

Full Changelog: v1.5.0...v1.5.1

@amumu96

What's new in 1.5.0.post2 (2025-04-21)

These are the changes in xorbitsai/inference v1.5.0.post2.

Enhancements

BLD: support flash-attn at Dockerfile by @amumu96 in #3311

Bug fixes

BUG: [UI] fix the bug in the cancellation function. by @yiboyasss in #3301
BUG: fix gemma-3-it max_tokens by @qinxuye in #3304
BUG: fix potential progress error by @qinxuye in #3305

Full Changelog: v1.5.0.post1...v1.5.0.post2

@amumu96

What's new in 1.5.0.post1 (2025-04-19)

These are the changes in inference v1.5.0.post1.

Enhancements

BLD: fix cpu docker build by @amumu96 in #3296

Documentation

DOC: small fixes for doc by @qinxuye in #3294

Full Changelog: v1.5.0...v1.5.0.post1

@codingl2k1

What's new in 1.5.0 (2025-04-19)

These are the changes in inference v1.5.0.

New features

FEAT: Support megatts3 by @codingl2k1 in #3224
FEAT: InternVL3 by @Minamiyama in #3235
FEAT: support paraformer-zh by @qinxuye in #3236
FEAT:support SeaLLMs-v3 by @Jun-Howie in #3248
FEAT: support getting download progress and cancel download by @qinxuye in #3233
FEAT: add thinking process in gradio chat interface by @amumu96 in #3245
FEAT: support glm4-0414 by @Jun-Howie in #3251
FEAT: support min/max_pixels params for vision model by @amumu96 in #3242
FEAT: support skywork-or1-preview by @Jun-Howie in #3274
FEAT: [UI] progress bar and functionality to cancel model launch. by @yiboyasss in #3276
FEAT:Add AWQ quantization support for InternVL3 by @Jun-Howie in #3285
FEAT: Support virtualenv for models by @qinxuye in #3241
FEAT: support qwen2.5-omni by @qinxuye in #3279

Enhancements

ENH: Compatible with latest xllamacpp by @codingl2k1 in #3181
ENH: Use xllamacpp by default by @codingl2k1 in #3198
ENH: update gradio interface for chat model by @amumu96 in #3265
ENH: Set gradio default concurrency to cpu count by @codingl2k1 in #3278
BLD: fix compatibility for mlx-lm>=0.22.3 by @qinxuye in #3195
BLD: upgrade gradio version for docker by @Minamiyama in #3197
BLD: fix docker build by @qinxuye in #3207
BLD: remove setuptools limitation in project.toml by @qinxuye in #3212
BLD: fix docker build by @amumu96 in #3289
REF: simplify transformers model registration with decorators. by @Minamiyama in #3191

Bug fixes

BUG: fix stop hang for vllm engine by @qinxuye in #3202
BUG: Fix qwq gguf model path by @codingl2k1 in #3232
BUG: Fix llama cpp backend load model with multiple parts by @codingl2k1 in #3261

Documentation

DOC: Add usage doc for kokoro by @codingl2k1 in #3192
DOC: add doc about virtual env & update models in README by @qinxuye in #3287

Full Changelog: v1.4.1...v1.5.0

@Jun-Howie

What's new in 1.4.1 (2025-04-03)

These are the changes in inference v1.4.1.

New features

FEAT: Support Fin-R1 model by @Jun-Howie in #3116
FEAT: distributed inference for vLLM by @qinxuye in #3120
FEAT: Support gptq(int4, int8) and fp8 for Fin-R1 model by @Jun-Howie in #3157
feat: fix the quantization parameter in the vLLM engine cannot work by @amumu96 in #3159
FEAT: sglang vision by @Minamiyama in #3150
FEAT: support max_completion_tokens by @amumu96 in #3168
FEAT: support DeepSeek-VL2 by @Jun-Howie in #3179

Enhancements

ENH: support for qwen2.5-vl-32b by @Minamiyama in #3119
ENH: sglang supports gptq int8 quantization now by @Minamiyama in #3149
ENH: Add validation of n_worker by @rexjm in #3166
ENH: add qwen2.5-vl-32b-awq supported, and fix 7b-awq download hub typo by @Minamiyama in #3169
BLD: use gptqmodel to replace auto-gptq by @qinxuye in #3147
BLD: resolve docker fail by @amumu96 in #3164

Bug fixes

BUG: Fix PyTorch TypeError: Make _ModelWrapper Inherit from nn.Module by @JamesFlare1212 in #3131
BUG: fix llm stream response by @amumu96 in #3115
BUG: prevent potential stop hang for distributed vllm inference by @qinxuye in #3180

Documentation

DOC: update models by @qinxuye in #3146

New Contributors

@JamesFlare1212 made their first contribution in #3131
@rexjm made their first contribution in #3166

Full Changelog: v1.4.0...v1.4.1

@zky001

What's new in 1.4.0 (2025-03-21)

These are the changes in inference v1.4.0.

New features

FEAT: Support gemma-3 text part by @zky001 in #3077
FEAT: Gemma-3-it that supports vision by @qinxuye in #3102
FEAT: add deepseek v3 function calling by @rogercloud in #3103

Enhancements

ENH: xllamacpp backend raise exception if failed by @codingl2k1 in #3053
ENH: [UI] change 'GPU Count' to 'GPU Count per Replica'. by @yiboyasss in #3078

Bug fixes

BUG: [UI] fix dark mode bugs. by @yiboyasss in #3028
BUG: fix Internvl2.5-mpo awq, fix model card info typo by @Minamiyama in #3067
BUG: fix max_tokens for MLX VL models. by @qinxuye in #3072
BUG:fix vLLM parameter "enable_prefix_caching" by @Gmgge in #3081
BUG: fix first token error and support deepseek stream api by @amumu96 in #3090

Documentation

DOC: add auth usage guide for http request by @Minamiyama in #3065
DOC: add xllamacpp related docs by @qinxuye in #3088

Others

FIX: [UI] remove the restriction of model_format on n_gpu for llama.cpp by @yiboyasss in #3050

New Contributors

@Gmgge made their first contribution in #3081
@zky001 made their first contribution in #3077
@rogercloud made their first contribution in #3103

Full Changelog: v1.3.1...v1.4.0

@amumu96

What's new in 1.3.1.post1 (2025-03-11)

These are the changes in inference v1.3.1.post1.

Bug fixes

BUG: Fix reasoning content parser for qwq-32b by @amumu96 in #3024
BUG: Failed to download model 'QwQ-32B' (size: 32, format: ggufv2) after multiple retries by @Jun-Howie in #3031

Documentation

DOC: added new models by @qinxuye in #3018

Full Changelog: v1.3.1...v1.3.1.post1

@Jun-Howie

What's new in 1.3.1 (2025-03-09)

These are the changes in inference v1.3.1.

New features

FEAT: Support qwen2.5-instruct-1m by @Jun-Howie in #2928
FEAT: Support moonlight-16b-a3b by @Jun-Howie in #2963
FEAT: create_embedding add field model_replica by @zhoudelong in #2779
FEAT: [UI] add the reasoning_content parameter. by @yiboyasss in #2980
FEAT: Support QwQ-32B by @cyhasuka in #3005
FEAT: all engine support reasoning_content by @amumu96 in #3013

Enhancements

ENH: InternVL2.5-MPO by @Minamiyama in #2913
ENH: [UI] add copy button by @Minamiyama in #2920
ENH: [UI] add model ability filtering feature to the audio model. by @yiboyasss in #2986
ENH: Support xllamacpp by @codingl2k1 in #2997
BLD: Install ffmpeg 6 for audio & video models by @phuchoang2603 in #2946
BLD: fix ffprobe library not imported by @phuchoang2603 in #2971
BLD: fix docker requirements for sglang by @qinxuye in #3015
REF: [UI] move featureModels to data.js by @yiboyasss in #3008

Bug fixes

BUG: fix qwen2.5-vl-7b cannot chat bug by @amumu96 in #2944
BUG: Fix modelscope model id on Qwen2.5-VL Added support for AWQ quantization format in Qwen2.5-VL by @Jun-Howie in #2943
BUG: fix Error while using Langchain-chatchat, because the parameter [max_tokens] passed is None by @William533036 in #2962
BUG: using jina-clip-v2, no attribute error when only text of image pass in by @Minamiyama in #2974
BUG: fix compatibility of mlx-lm v0.21.5 by @qinxuye in #2993
BUG: Fix tokenizer error in create_embedding by @shuaiqidezhong in #2992
BUG: wrong kwargs passing to encode method when using jina-clip-v2 by @Minamiyama in #2991
BUG: [UI] fix the white screen bug. by @yiboyasss in #3014

New Contributors

@phuchoang2603 made their first contribution in #2946
@William533036 made their first contribution in #2962
@zhoudelong made their first contribution in #2779

Full Changelog: v1.3.0.post2...v1.3.1

@amumu96

What's new in 1.3.0.post2 (2025-02-22)

These are the changes in inference v1.3.0.post2.

Bug fixes

BUG: fix cmdline bug by @amumu96 in #2915

Full Changelog: v1.3.0.post1...v1.3.0.post2

@Jun-Howie

What's new in 1.3.0.post1 (2025-02-21)

These are the changes in inference v1.3.0.post1.

New features

FEAT: Support qwen-2.5-instruct-1m by @Jun-Howie in #2841
FEAT: support deepseek-v3 and deepseek-r1 by @qinxuye in #2864
FEAT: [UI] additional parameter tip function. by @yiboyasss in #2876
FEAT: [UI] add featured models filtering function. by @yiboyasss in #2871
FEAT: [UI] support form parameters and command line conversion. by @yiboyasss in #2850
FEAT: support distributed inference for sglang by @qinxuye in #2877
FEAT: [UI] add n_worker parameter for model launch. by @yiboyasss in #2889
FEAT: InternVL 2.5 by @Minamiyama in #2776
FEAT: support vllm reasoning content by @amumu96 in #2905

Enhancements

enh: add gpu utilization info by @amumu96 in #2852
ENH: Update Kokoro model by @codingl2k1 in #2843
ENH: cmdline supports --n-worker, add --model-path and make it compatible with --model_path by @qinxuye in #2890
BLD: update sglang to v0.4.2.post4 and vllm to v0.7.2 by @qinxuye in #2838
BLD: fix flashinfer installation in dockerfile by @qinxuye in #2844

Bug fixes

BUG: Fix whisper CI by @codingl2k1 in #2822
BUG: fix FLUX when a scheduler is specified which is incompatible. by @shuaiqidezhong in #2897
BUG: [UI] fix the bug of missing hint during model running. by @yiboyasss in #2904
BUG: Clear dependency by @codingl2k1 in #2910

Tests

TST: Pin CI transformers<4.49 by @codingl2k1 in #2883
TST: fix lint error by @amumu96 in #2911

Documentation

DOC: distributed inference by @qinxuye in #2908

Others

CHORE: Xavier now supports vLLM >= 0.7.0, drops support for older versions by @ChengjieLi28 in #2886

New Contributors

@shuaiqidezhong made their first contribution in #2897

Full Changelog: v1.2.2...v1.3.0.post1

Releases: xorbitsai/inference

v1.5.1

What's new in 1.5.1 (2025-04-30)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v1.5.0.post2

What's new in 1.5.0.post2 (2025-04-21)

Enhancements

Bug fixes

Contributors

Uh oh!

v1.5.0.post1

What's new in 1.5.0.post1 (2025-04-19)

Enhancements

Documentation

Contributors

Uh oh!

v1.5.0

What's new in 1.5.0 (2025-04-19)

New features

Enhancements

Bug fixes

Documentation

Contributors

Uh oh!

v1.4.1

What's new in 1.4.1 (2025-04-03)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v1.4.0

What's new in 1.4.0 (2025-03-21)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v1.3.1.post1

What's new in 1.3.1.post1 (2025-03-11)

Bug fixes

Documentation

Contributors

Uh oh!

v1.3.1

What's new in 1.3.1 (2025-03-09)

New features

Enhancements

Bug fixes

New Contributors

Contributors

Uh oh!

v1.3.0.post2

What's new in 1.3.0.post2 (2025-02-22)

Bug fixes

Contributors

Uh oh!

v1.3.0.post1

What's new in 1.3.0.post1 (2025-02-21)

New features

Enhancements

Bug fixes

Tests

Documentation

Others

New Contributors

Contributors

Uh oh!