Skip to content

Releases: xorbitsai/inference

v1.5.1

30 Apr 14:00
1c11c60

Choose a tag to compare

What's new in 1.5.1 (2025-04-30)

These are the changes in inference v1.5.1.

New features

Enhancements

Bug fixes

  • BUG: fix sglang chat by @qinxuye in #3326
  • BUG: Show engine options on UI even if the specific engine is not installed by @ChengjieLi28 in #3331
  • BUG: fix failure of clearing resources when loading model failed by @qinxuye in #3361

Documentation

  • DOC: update troubleshooting.rst for the launch error caused by numpy by @qiulang in #3342

New Contributors

Full Changelog: v1.5.0...v1.5.1

v1.5.0.post2

21 Apr 11:11

Choose a tag to compare

What's new in 1.5.0.post2 (2025-04-21)

These are the changes in xorbitsai/inference v1.5.0.post2.

Enhancements

Bug fixes

Full Changelog: v1.5.0.post1...v1.5.0.post2

v1.5.0.post1

19 Apr 15:58
2010508

Choose a tag to compare

What's new in 1.5.0.post1 (2025-04-19)

These are the changes in inference v1.5.0.post1.

Enhancements

Documentation

Full Changelog: v1.5.0...v1.5.0.post1

v1.5.0

19 Apr 12:40
ee8d025

Choose a tag to compare

What's new in 1.5.0 (2025-04-19)

These are the changes in inference v1.5.0.

New features

Enhancements

Bug fixes

Documentation

Full Changelog: v1.4.1...v1.5.0

v1.4.1

03 Apr 13:40
23260be

Choose a tag to compare

What's new in 1.4.1 (2025-04-03)

These are the changes in inference v1.4.1.

New features

Enhancements

Bug fixes

  • BUG: Fix PyTorch TypeError: Make _ModelWrapper Inherit from nn.Module by @JamesFlare1212 in #3131
  • BUG: fix llm stream response by @amumu96 in #3115
  • BUG: prevent potential stop hang for distributed vllm inference by @qinxuye in #3180

Documentation

New Contributors

Full Changelog: v1.4.0...v1.4.1

v1.4.0

21 Mar 07:17
ac88d42

Choose a tag to compare

What's new in 1.4.0 (2025-03-21)

These are the changes in inference v1.4.0.

New features

Enhancements

Bug fixes

Documentation

Others

  • FIX: [UI] remove the restriction of model_format on n_gpu for llama.cpp by @yiboyasss in #3050

New Contributors

Full Changelog: v1.3.1...v1.4.0

v1.3.1.post1

11 Mar 04:10
2ef99fb

Choose a tag to compare

What's new in 1.3.1.post1 (2025-03-11)

These are the changes in inference v1.3.1.post1.

Bug fixes

  • BUG: Fix reasoning content parser for qwq-32b by @amumu96 in #3024
  • BUG: Failed to download model 'QwQ-32B' (size: 32, format: ggufv2) after multiple retries by @Jun-Howie in #3031

Documentation

Full Changelog: v1.3.1...v1.3.1.post1

v1.3.1

09 Mar 04:39
5d6ec93

Choose a tag to compare

What's new in 1.3.1 (2025-03-09)

These are the changes in inference v1.3.1.

New features

Enhancements

Bug fixes

  • BUG: fix qwen2.5-vl-7b cannot chat bug by @amumu96 in #2944
  • BUG: Fix modelscope model id on Qwen2.5-VL Added support for AWQ quantization format in Qwen2.5-VL by @Jun-Howie in #2943
  • BUG: fix Error while using Langchain-chatchat, because the parameter [max_tokens] passed is None by @William533036 in #2962
  • BUG: using jina-clip-v2, no attribute error when only text of image pass in by @Minamiyama in #2974
  • BUG: fix compatibility of mlx-lm v0.21.5 by @qinxuye in #2993
  • BUG: Fix tokenizer error in create_embedding by @shuaiqidezhong in #2992
  • BUG: wrong kwargs passing to encode method when using jina-clip-v2 by @Minamiyama in #2991
  • BUG: [UI] fix the white screen bug. by @yiboyasss in #3014

New Contributors

Full Changelog: v1.3.0.post2...v1.3.1

v1.3.0.post2

22 Feb 15:30
378a47a

Choose a tag to compare

What's new in 1.3.0.post2 (2025-02-22)

These are the changes in inference v1.3.0.post2.

Bug fixes

Full Changelog: v1.3.0.post1...v1.3.0.post2

v1.3.0.post1

21 Feb 16:14
b2004d4

Choose a tag to compare

What's new in 1.3.0.post1 (2025-02-21)

These are the changes in inference v1.3.0.post1.

New features

Enhancements

  • enh: add gpu utilization info by @amumu96 in #2852
  • ENH: Update Kokoro model by @codingl2k1 in #2843
  • ENH: cmdline supports --n-worker, add --model-path and make it compatible with --model_path by @qinxuye in #2890
  • BLD: update sglang to v0.4.2.post4 and vllm to v0.7.2 by @qinxuye in #2838
  • BLD: fix flashinfer installation in dockerfile by @qinxuye in #2844

Bug fixes

Tests

Documentation

Others

  • CHORE: Xavier now supports vLLM >= 0.7.0, drops support for older versions by @ChengjieLi28 in #2886

New Contributors

Full Changelog: v1.2.2...v1.3.0.post1