Releases: xorbitsai/inference
Releases · xorbitsai/inference
v0.7.2
What's new in 0.7.2 (2023-12-15)
These are the changes in inference v0.7.2.
New features
- FEAT: Supports
qwen-chat1.8B by @ChengjieLi28 in #757 - FEAT: Support gorilla openfunctions v1 by @codingl2k1 in #760
- FEAT: qwen function call by @codingl2k1 in #763
Enhancements
- ENH: Handle tool call failed by @codingl2k1 in #767
Bug fixes
- BUG: [UI] Fix model size selection crash issue by @ChengjieLi28 in #764
Documentation
- DOC: Fix
model_urimissing inCustom Modelsby @ChengjieLi28 in #759
Full Changelog: v0.7.1...v0.7.2
v0.7.1
What's new in 0.7.1 (2023-12-12)
These are the changes in inference v0.7.1.
Enhancements
- ENH: [UI] Supports
model_uidinput when launching models by @ChengjieLi28 in #746 - ENH: Add more vllm supported models by @aresnow1 in #756
Bug fixes
- BUG: Fix
cachedtag on UI by @ChengjieLi28 in #748 - BUG: Fix stream arg for vllm backend by @aresnow1 in #758
Documentation
Others
- Bugs: Fixing issue with emote encoding in streaming chat, Fixing issue with missing pad_token for pytorch tokenizers, allowing system message as latest message in chat by @AndiMajore in #747
New Contributors
- @AndiMajore made their first contribution in #747
Full Changelog: v0.7.0...v0.7.1
v0.7.0
What's new in 0.7.0 (2023-12-08)
These are the changes in inference v0.7.0.
Enhancements
- ENH: upgrade insecure requests when necessary by @waltcow in #712
- ENH: [UI] Using tab in running models by @ChengjieLi28 in #714
- ENH: [UI] supports launching rerank models by @ChengjieLi28 in #711
- ENH: [UI] Error can be shown on web UI directly via Snackbar by @ChengjieLi28 in #721
- ENH: [UI] Supports
n_gpuconfig when launching LLM models on web ui by @ChengjieLi28 in #730 - ENH: [UI]
n_gpudefault valueautoby @ChengjieLi28 in #738 - ENH: [UI] Support unregistering custom model on web UI by @ChengjieLi28 in #735
- ENH: Auto recover model actor by @codingl2k1 in #694
- ENH: allow rerank models run with LLM models on same device by @aresnow1 in #741
Bug fixes
- BUG: Auto patch trust remote code for embedding model by @codingl2k1 in #710
- BUG: Fix vLLM backend by @codingl2k1 in #728
Others
- Update builtin model list by @onesuper in #709
- Revert "ENH: upgrade insecure requests when necessary" by @qinxuye in #716
- CHORE: Format js file and check js code style by @ChengjieLi28 in #727
New Contributors
Full Changelog: v0.6.5...v0.7.0
v0.6.5
What's new in 0.6.5 (2023-12-01)
These are the changes in inference v0.6.5.
New features
- FEAT: Support jina embedding models by @aresnow1 in #704
- FEAT: Support Yi-chat by @aresnow1 in #700
- FEAT: Support qwen 72b by @aresnow1 in #705
- FEAT: ChatGLM3 tool calls by @codingl2k1 in #701
Enhancements
- ENH: Specify actor pool port for distributed deployment by @ChengjieLi28 in #688
- ENH: Remove
xorbitsdependency by @ChengjieLi28 in #699 - ENH: User can just specify a string for prompt style when registering custom LLM models by @ChengjieLi28 in #682
- ENH: Add more models supported by vllm by @aresnow1 in #706
Bug fixes
- BUG: Fix xinference start failed if invalid custom model found by @codingl2k1 in #690
Documentation
- Doc: Fix some incorrect links in documentation by @aresnow1 in #684
- Doc: Update readme by @aresnow1 in #687
- DOC: documentation for docker and k8s by @lynnleelhl in #661
Others
New Contributors
- @lynnleelhl made their first contribution in #661
Full Changelog: v0.6.4...v0.6.5
v0.6.4
What's new in 0.6.4 (2023-11-24)
These are the changes in inference v0.6.4.
New features
- FEAT: Support registering custom embedding model by @ChengjieLi28 in #667
- FEAT: Supports
qwen.cppforqwen-chatwithggmlformat by @ChengjieLi28 in #675 - FEAT: Xverse by @fengsxy in #678
- FEAT: Support rerank models by @aresnow1 in #672
Enhancements
- ENH: Add
generateinterface forchatglmwithggmlformat by @ChengjieLi28 in #671
Bug fixes
- BUG: Fix custom model missing config json by @codingl2k1 in #674
- BUG: Fix http error is not raised by @codingl2k1 in #657
- BUG: Fix pip install xinference[all] by @codingl2k1 in #679
Documentation
- DOC: update pot files by @UranusSeven in #638
- DOC: A more detailed beginner's guide has been created, covering various aspects of the first-time usage experience for new users. by @onesuper in #651
- DOC: documentation for using xinference by @fengsxy in #677
- DOC: Register custom embedding model by @ChengjieLi28 in #683
Others
- Add why xinf section to readme to compare pivitol features with others by @onesuper in #652
- Fix README.md by @aresnow1 in #669
New Contributors
Full Changelog: v0.6.3...v0.6.4
v0.6.3
What's new in 0.6.3 (2023-11-16)
These are the changes in inference v0.6.3.
New features
- FEAT: qwen-chat-14b by @UranusSeven in #494
- FEAT: Support gptq quantization by @codingl2k1 in #645
Bug fixes
- BUG: Fix restful api serialization slow by @codingl2k1 in #648
Tests
- TST: disable test_is_self_hosted by @UranusSeven in #641
Documentation
- DOC: About Logging in Xinference by @ChengjieLi28 in #631
- DOC: Init for Chinese doc by @ChengjieLi28 in #565
Full Changelog: v0.6.2...v0.6.3
v0.6.2
What's new in 0.6.2 (2023-11-09)
These are the changes in inference v0.6.2.
New features
- FEAT: Support Yi Model by @ChengjieLi28 in #629
Enhancements
- ENH: cache status by @UranusSeven in #616
- ENH: Supports request limits for the model by @ChengjieLi28 in #596
- ENH: running model location & accelerators by @UranusSeven in #626
- ENH: Create completion restful api compatibility by @codingl2k1 in #622
Bug fixes
- BUG: Compatible with openai 1.1 by @codingl2k1 in #619
- BUG: fix spec decoding by @UranusSeven in #628
- BUG:
No slot availableerror for embedding and LLM model on one card by @ChengjieLi28 in #611 - BUG: Rotating log does not create a new one when recreate the xinference cluster by @ChengjieLi28 in #618
Documentation
Full Changelog: v0.6.1...v0.6.2
v0.6.1
What's new in 0.6.1 (2023-11-06)
These are the changes in inference v0.6.1.
New features
Enhancements
- ENH: add command xinference-local by @UranusSeven in #610
- ENH: Don't check dead nodes by @aresnow1 in #614
Full Changelog: v0.6.0...v0.6.1
v0.6.0
What's new in 0.6.0 (2023-11-03)
These are the changes in inference v0.6.0.
New features
- FEAT: Zephyr by @UranusSeven in #597
- FEAT: stable diffusion with controlnet by @codingl2k1 in #575
Enhancements
- ENH: increase heartbeat interval by @UranusSeven in #604
- ENH: Support more models downloading from modelscope by @aresnow1 in #595
- ENH: Supports rotating file log by @ChengjieLi28 in #590
- ENH: stateless supervisor and worker by @UranusSeven in #546
Bug fixes
- BUG: Fix chat system messages by @codingl2k1 in #594
- BUG: fix transformers compatibility by @UranusSeven in #600
Tests
- TST: Compatible with
llama-cpp-python0.2.12 by @ChengjieLi28 in #603
Documentation
- DOC: Download model from ModelScope by @ChengjieLi28 in #553
- DOC: Stable Diffusion with ControlNet example by @codingl2k1 in #605
Full Changelog: v0.5.6...v0.6.0
v0.5.6
What's new in 0.5.6 (2023-10-30)
These are the changes in inference v0.5.6.
New features
- FEAT: launch embedding models by @Minamiyama in #582
- FEAT: chatglm3 by @UranusSeven in #587
Documentation
- DOC: update hot topics and fix docs by @UranusSeven in #584
Others
- CHORE: install setuptools in release actions by @aresnow1 in #588
- CHORE: Use python3.10 to build and release by @aresnow1 in #589
Full Changelog: v0.5.5...v0.5.6