Releases: xorbitsai/inference
Releases · xorbitsai/inference
v0.6.2
What's new in 0.6.2 (2023-11-09)
These are the changes in inference v0.6.2.
New features
- FEAT: Support Yi Model by @ChengjieLi28 in #629
Enhancements
- ENH: cache status by @UranusSeven in #616
- ENH: Supports request limits for the model by @ChengjieLi28 in #596
- ENH: running model location & accelerators by @UranusSeven in #626
- ENH: Create completion restful api compatibility by @codingl2k1 in #622
Bug fixes
- BUG: Compatible with openai 1.1 by @codingl2k1 in #619
- BUG: fix spec decoding by @UranusSeven in #628
- BUG:
No slot availableerror for embedding and LLM model on one card by @ChengjieLi28 in #611 - BUG: Rotating log does not create a new one when recreate the xinference cluster by @ChengjieLi28 in #618
Documentation
Full Changelog: v0.6.1...v0.6.2
v0.6.1
What's new in 0.6.1 (2023-11-06)
These are the changes in inference v0.6.1.
New features
Enhancements
- ENH: add command xinference-local by @UranusSeven in #610
- ENH: Don't check dead nodes by @aresnow1 in #614
Full Changelog: v0.6.0...v0.6.1
v0.6.0
What's new in 0.6.0 (2023-11-03)
These are the changes in inference v0.6.0.
New features
- FEAT: Zephyr by @UranusSeven in #597
- FEAT: stable diffusion with controlnet by @codingl2k1 in #575
Enhancements
- ENH: increase heartbeat interval by @UranusSeven in #604
- ENH: Support more models downloading from modelscope by @aresnow1 in #595
- ENH: Supports rotating file log by @ChengjieLi28 in #590
- ENH: stateless supervisor and worker by @UranusSeven in #546
Bug fixes
- BUG: Fix chat system messages by @codingl2k1 in #594
- BUG: fix transformers compatibility by @UranusSeven in #600
Tests
- TST: Compatible with
llama-cpp-python0.2.12 by @ChengjieLi28 in #603
Documentation
- DOC: Download model from ModelScope by @ChengjieLi28 in #553
- DOC: Stable Diffusion with ControlNet example by @codingl2k1 in #605
Full Changelog: v0.5.6...v0.6.0
v0.5.6
What's new in 0.5.6 (2023-10-30)
These are the changes in inference v0.5.6.
New features
- FEAT: launch embedding models by @Minamiyama in #582
- FEAT: chatglm3 by @UranusSeven in #587
Documentation
- DOC: update hot topics and fix docs by @UranusSeven in #584
Others
- CHORE: install setuptools in release actions by @aresnow1 in #588
- CHORE: Use python3.10 to build and release by @aresnow1 in #589
Full Changelog: v0.5.5...v0.5.6
v0.5.5
What's new in 0.5.5 (2023-10-26)
These are the changes in inference v0.5.5.
Enhancements
- ENH: display language tags by @Minamiyama in #558
- ENH: filter models by type by @Minamiyama in #559
- ENH: disable create embeddings using LLMs by @UranusSeven in #570
- ENH: benchmark latency by @UranusSeven in #576
- ENH: configurable
XINFERENCE_HOMEenv by @ChengjieLi28 in #566
Bug fixes
- BUG: Fix
bge-base-zhandbge-large-zhfrom ModelScope by @ChengjieLi28 in #571 - BUG: When change model revision, xinference still uses the previous model by @ChengjieLi28 in #573
- BUG: incorrect vLLM config by @UranusSeven in #579
- BUG: fix llama-2 stop words by @UranusSeven in #580
Documentation
- DOC: Incompatibility Between NVIDIA Driver and PyTorch Version by @onesuper in #551
- DOC: Examples and resources page by @onesuper in #561
Full Changelog: v0.5.4...v0.5.5
v0.5.4
What's new in 0.5.4 (2023-10-20)
These are the changes in inference v0.5.4.
New features
- FEAT: wizardcoder python by @UranusSeven in #539
- FEAT: Support grammar-based sampling for ggml models by @aresnow1 in #525
- FEAT: speculative decoding by @UranusSeven in #509
Enhancements
- ENH: Download embedding models from ModelScope by @ChengjieLi28 in #532
- ENH: lock transformers version by @UranusSeven in #549
- ENH: Support downloading code-llama family models from ModelScope by @ChengjieLi28 in #557
- ENH: Add gguf format of codellama-instruct by @aresnow1 in #567
Bug fixes
- BUG: Fix stream not compatible with openai by @codingl2k1 in #524
- BUG: set trust_remote_code to true by default by @richzw in #555
- BUG: add quantization to valid file name by @richzw in #562
- BUG: remove "generate" ability from Baichuan-2-chat json config by @Minamiyama in #556
Documentation
- DOC: update pot files by @UranusSeven in #538
- DOC: Add Client API reference by @codingl2k1 in #543
- DOC: Add client doc to the user guide by @codingl2k1 in #547
New Contributors
- @richzw made their first contribution in #555
- @Minamiyama made their first contribution in #556
Full Changelog: v0.5.3...v0.5.4
v0.5.3
What's new in 0.5.3 (2023-10-13)
These are the changes in inference v0.5.3.
New features
- FEAT: Add BAAI/BGE v1.5 family models by @ChengjieLi28 in #522
- FEAT: Support Mistral & Mistral-Instruct by @Bojun-Feng in #510
- FEAT: Add --model-uid to launch sub command by @codingl2k1 in #529
- FEAT: Support stable diffusion by @codingl2k1 in #484
Enhancements
- REF: Use restful client as default client by @aresnow1 in #470
- REF: refactor client codes for xinference-client by @ChengjieLi28 in #528
Bug fixes
Tests
- TST: fix tiny llama by @UranusSeven in #513
Documentation
- DOC: hardware specific installations by @UranusSeven in #517
- DOC: update installation by @UranusSeven in #527
Full Changelog: v0.5.2...v0.5.3
v0.5.2
What's new in 0.5.2 (2023-09-27)
These are the changes in inference v0.5.2.
Enhancements
- ENH: validate model URI on register by @UranusSeven in #476
- ENH: Skip download for embedding models by @aresnow1 in #499
- ENH: set
trust_remote_codeto true by @UranusSeven in #500
Full Changelog: v0.5.1...v0.5.2
v0.5.1
What's new in 0.5.1 (2023-09-26)
These are the changes in inference v0.5.1.
Enhancements
- ENH: Safe iterate stream of ggml model by @codingl2k1 in #449
- ENH: Skip download if model exists by @aresnow1 in #495
Documentation
- DOC: vLLM by @UranusSeven in #491
Full Changelog: v0.5.0...v0.5.1
v0.5.0
What's new in 0.5.0 (2023-09-22)
These are the changes in inference v0.5.0.
New features
- FEAT: incorporate vLLM by @UranusSeven in #445
- FEAT: add register model page for dashboard by @Bojun-Feng in #420
- FEAT: internlm 20b by @UranusSeven in #486
- FEAT: support glaive coder by @UranusSeven in #490
- FEAT: Support download models from modelscope by @aresnow1 in #475
Enhancements
- ENH: shorten OpenBuddy's desc by @UranusSeven in #471
- ENH: enable vLLM on Linux with cuda by @UranusSeven in #472
- ENH: vLLM engine supports more models by @UranusSeven in #477
- ENH: remove subpool on failure by @UranusSeven in #478
- ENH: support trust_remote_code when launching a model by @UranusSeven in #479
- ENH: vLLM auto tensor parallel by @UranusSeven in #480
Bug fixes
- BUG: llama-cpp version dismatch by @Bojun-Feng in #473
- BUG: incorrect endpoint on host 0.0.0.0 by @UranusSeven in #474
- BUG: prompt style not set as expected on web UI by @UranusSeven in #489
Tests
Documentation
Full Changelog: v0.4.4...v0.5.0