What's new in 0.12.2 (2024-06-21)

@zhanghx0905

What's new in 0.12.2 (2024-06-21)

These are the changes in inference v0.12.2.

New features

FEAT: Add Tools Support for Qwen Series MOE Models by @zhanghx0905 in #1642
FEAT: [UI]Modify the deletion function of a custom model. by @yiboyasss in #1656
FEAT: [UI]Custom model presents JSON data and modifies it. by @yiboyasss in #1670
FEAT: Add Rerank model token input/output usage by @wxiwnd in #1657

Enhancements

ENH: Continuous batching supports all the models with transformers backend by @ChengjieLi28 in #1659

Bug fixes

BUG: show error when user launch quantized model without device supported by @Minamiyama in #1645
BUG: Fix default rerank type by @codingl2k1 in #1649
BUG: chat_completion not response while error appears more than 100 by @liuzhenghua in #1663

Tests

TST: Fix CI due to tenacity by @ChengjieLi28 in #1660

Others

CHORE: [pre-commit] Add exclude thirdparty rules by @frostyplanet in #1678

Full Changelog: v0.12.1...v0.12.2

@ayhhyhh

What's new in 0.12.1 (2024-06-14)

These are the changes in inference v0.12.1.

New features

FEAT: qwen2-instruct support tool call by @ayhhyhh in #1631
FEAT: Added a method to download models from csghub. by @hainaweiben in #1627
FEAT: glm4-chat support tool call by @codingl2k1 in #1617
FEAT: [UI] Supports viewing and deleting cache data. by @yiboyasss in #1637

Enhancements

ENH: modelscope for audio models by @Minamiyama in #1607
ENH: Supports generate interface for continuous batching by @ChengjieLi28 in #1621
ENH: quantization for glm-4v by @Minamiyama in #1610

Bug fixes

BUG: Fix wheel package missing thirdparty ChatTTS by @codingl2k1 in #1606
BUG: fix XINFERENCE_MODEL_SRC behavior by @LukeWang-Plus in #1616
BUG: Filtering Step for Streaming Responses to Qwen's Tool Calls when using vLLM by @zhanghx0905 in #1598

Others

Remove selected cache models by @hainaweiben in #1613

New Contributors

@LukeWang-Plus made their first contribution in #1616
@ayhhyhh made their first contribution in #1631

Full Changelog: v0.12.0...v0.12.1

@Minamiyama

What's new in 0.12.0 (2024-06-07)

These are the changes in inference v0.12.0.

New features

FEAT: new model: mini-cpm-llama3-v-2.5 by @Minamiyama in #1577
FEAT: support glm4-chat & glm4-chat-1m by @qinxuye in #1584
FEAT: add mistral-instruct-v0.3 by @qinxuye in #1576
FEAT: add codestral-v0.1 by @qinxuye in #1575
FEAT: Support ChatTTS by @codingl2k1 in #1578
FEAT: Continuous batching for chat model on transformers backend by @ChengjieLi28 in #1548
FEAT: support qwen2 by @qinxuye in #1597
Feat: support glm-4v 9b by @Minamiyama in #1591

Enhancements

ENH: make CogVLM2 support stream output by @Minamiyama in #1572
BLD: Docker clean all images after building image on self-hosted machine by @ChengjieLi28 in #1595
BLD: Fix pip is looking multiple versions of some packages while installing by @ChengjieLi28 in #1603

Bug fixes

BUG: Fix typo for cogvlm2 by @Minamiyama in #1573

Documentation

DOC: added new models in README by @qinxuye in #1585
DOC: Fix audio doc by @codingl2k1 in #1593
DOC: Usage about cal-model-memory by @wxiwnd in #1589
DOC: Fix audio doc by @codingl2k1 in #1599
DOC: Continuous batching by @ChengjieLi28 in #1602
DOC: add new models to readme by @qinxuye in #1604

New Contributors

@wxiwnd made their first contribution in #1589

Full Changelog: v0.11.3...v0.12.0

@qinxuye

What's new in 0.11.3 (2024-05-31)

These are the changes in inference v0.11.3.

New features

FEAT: support Yi-1.5-chat-16k by @qinxuye in #1544
FEAT: Support XINFERENCE_DISABLE_METRICS env by @codingl2k1 in #1547
FEAT: Support new model CogVLM by @amumu96 in #1551
FEAT: telechat model by @LIKEGAKKI in #1567

Enhancements

ENH: added engines options to model launch details by @qinxuye in #1546
ENH: rm mini-internvl by @amumu96 in #1563
ENH: add additional_option at vl gradio by @amumu96 in #1561
ENH: add real paths column by @hainaweiben in #1555

Bug fixes

BUG: fix launch model error when use torch 2.3.0 by @amumu96 in #1543
BUG: fix vl-model img path error by @amumu96 in #1559
BUG: Fix validation errors when define a custom baichuan-chat LLM model by @buptzyf in #1557

Documentation

DOC: update readme and fix description about model engine by @qinxuye in #1566

Others

Correct ModelActor import path in worker & supervisor by @frostyplanet in #1550

New Contributors

@buptzyf made their first contribution in #1557
@LIKEGAKKI made their first contribution in #1567

Full Changelog: v0.11.2...v0.11.3

@amumu96

What's new in 0.11.2.post1 (2024-05-24)

These are the changes in inference v0.11.2.post1, a hotfix version of v0.11.2.

Bug fixes

BUG: fix launch model error when use torch 2.3.0 by @amumu96 in #1543

Full Changelog: v0.11.2...v0.11.2.post1

@frostyplanet

What's new in 0.11.2 (2024-05-24)

These are the changes in inference v0.11.2.

New features

FEAT: Add command cal-model-mem by @frostyplanet in #1460
FEAT: add deepseek llm and coder base by @qinxuye in #1533
FEAT: add codeqwen1.5 by @qinxuye in #1535
FEAT: Auto detect rerank type for unknown rerank type by @codingl2k1 in #1538
FEAT: Provide the functionality to query information on various cached models hosted on the query node. by @hainaweiben in #1522

Enhancements

ENH: Compatible with huggingface-hub v0.23.0 by @ChengjieLi28 in #1514
ENH: convert command-r to chat by @qinxuye in #1537
ENH: Support Intern-VL-Chat model by @amumu96 in #1536
BLD: adapt to langchain 0.2.x, which has breaking changes by @mikeshi80 in #1521
BLD: Fix pre commit by @frostyplanet in #1527
BLD: compatible with torch 2.3.0 by @qinxuye in #1534

Bug fixes

BUG: Fix start worker failed due to None device name by @codingl2k1 in #1539
BUG: Fix gpu_idx allocate error when set replica > 1 by @amumu96 in #1528

Others

CHORE: Basic benchmark/benchmark_rerank.py by @codingl2k1 in #1479

Full Changelog: v0.11.1...v0.11.2

@qinxuye

What's new in 0.11.1 (2024-05-17)

These are the changes in inference v0.11.1.

New features

FEAT: support Yi-1.5 series by @qinxuye in #1489
FEAT: [UI] embedding and rerank support the specified GPU and CPU. by @yiboyasss in #1491

Enhancements

ENH: Refactoring the LoRa adaptation method for the LLM model. by @hainaweiben in #1470
ENH: Add stream_options support by @amumu96 in #1508

Bug fixes

BUG: fix top_k for vllm backend by @sixsun10 in #1461
BUG: Docker image issue due to torchvision by @ChengjieLi28 in #1485
BUG: Docker image crash during startup due to llama-cpp-python by @ChengjieLi28 in #1507
BUG: Fix prompt is needed when docker image builds by @ChengjieLi28 in #1512
BUG: llama.cpp model failed when chat due to lora by @ChengjieLi28 in #1513

Documentation

DOC: update quick start ipynb by @qinxuye in #1482
DOC: Update readme for being integrated by RAGFlow by @JinHai-CN in #1493
DOC: Lora usage by @ChengjieLi28 in #1506

New Contributors

@sixsun10 made their first contribution in #1461
@JinHai-CN made their first contribution in #1493

Full Changelog: v0.11.0...v0.11.1

@qinxuye

What's new in 0.11.0 (2024-05-11)

These are the changes in inference v0.11.0.

Break Changes

v0.11.0 introduced break change when launching model that model_engine should be specified, refer to Model Engine for more information

New features

FEAT: Support Mixtral-8x22b-instruct-v0.1 by @qinxuye in #1340
feat: add phi-3-mini series by @orangeclk in #1379
FEAT: add Starling model by @boy-hack in #1384
FEAT: support qwen1.5 110b by @qinxuye in #1388
FEAT: Support query engine with cmdline by @Ago327 in #1380
FEAT: Ascend support by @qinxuye in #1408
FEAT: Audio support verbose_json and timestamp by @codingl2k1 in #1402
FEAT: [UI] Add engine option when launching LLM by @yiboyasss in #1456

Enhancements

ENH: add custom image model by @amumu96 in #1312
ENH: Support more quantization with VLLM by @amumu96 in #1372
ENH: Update chatglm3 6b model version by @codingl2k1 in #1401
ENH: make qwen_vl support streaming output by @Minamiyama in #1425
ENH: Removed the max tokens limitation and boost performance by avoid unnecessary repeated cuda device detection. by @mikeshi80 in #1429
ENH: Improve benchmark and add long context generate by @frostyplanet in #1423
ENH: make yi_vl support streaming output by @Minamiyama in #1443
ENH: Some minor changes by @frostyplanet in #1453
ENH: make deepseek_vl support streaming output by @Minamiyama in #1444
ENH: Rename model_engine for more clear inference backend by @ChengjieLi28 in #1466
BLD: Use self-hosted aws machine to build docker image by @ChengjieLi28 in #1405
CLN: Remove actor client by @ChengjieLi28 in #1436
CLN: Remove all speculative-related codes by @ChengjieLi28 in #1435
REF: Query for engine by @Ago327 in #1342
REF: [UI] Refactor register model by @yiboyasss in #1368
REF: Add the model_engine parameter for launching process by @hainaweiben in #1367

Bug fixes

BUG: Fix llama3-instruct 70B filename error by @ChengjieLi28 in #1370
BUG: no role:user msg or content empty got an error. by @liuzhenghua in #1378
BUG: fix file template of andrewcanis/c4ai-command-r-v01-GGUF by @emulated24 in #1389
BUG: Fix using extra gpus due to match in __init__ by @ChengjieLi28 in #1400
BUG: Fix qwen tool call paramerter empty issue by @codingl2k1 in #1381
BUG: Fix tool calls return invalid usage by @codingl2k1 in #1420
BUG: Fix tools ability by @mikeshi80 in #1447
BUG: Install error on MacOS due to auto-gptq by @ChengjieLi28 in #1457
BUG: fix some issues in query engine interface by @Ago327 in #1442

Tests

TST: Pin huggingface-hub to pass CI since it has some break changes by @ChengjieLi28 in #1427

Documentation

DOC: update readme & fix Mac CI by @qinxuye in #1385
DOC: worker address should be specified for xinference-worker by @amumu96 in #1397
DOC: update docker doc in using xinference by @qinxuye in #1417
DOC: add the missing backslash in shell command by @mikeshi80 in #1451
DOC: Usage about model_engine by @ChengjieLi28 in #1468

Others

BUG：Fix mertics is empty when call /v1/chat/completions by @amumu96 in #1406

New Contributors

@liuzhenghua made their first contribution in #1378
@emulated24 made their first contribution in #1389
@orangeclk made their first contribution in #1379
@boy-hack made their first contribution in #1384
@frostyplanet made their first contribution in #1423

Full Changelog: v0.10.3...v0.11.0

@qinxuye

What's new in 0.10.3 (2024-04-24)

These are the changes in inference v0.10.3.

New features

FEAT: support llama-3 family by @qinxuye in #1332
FEAT: Add Belle-whisper-large-v3-zh by @codingl2k1 in #1351

Enhancements

ENH: fix the max length of codeqwen-7B-chat by @mikeshi80 in #1354
ENH: Clear cache for embedding and rerank by @codingl2k1 in #1360

Bug fixes

BUG: Fix Launching embedding or reranking models from commandline fails due to PEFT by @hainaweiben in #1343
BUG: Fix extra parameters issue when auto-recovering models by @ChengjieLi28 in #1348
BUG: Fix old rerank models use flag rerank issue by @codingl2k1 in #1350

Documentation

DOC: Add new models to README by @qinxuye in #1346
DOC: Update README, add FastGPT to integrations by @yangchuansheng in #1355

New Contributors

@yangchuansheng made their first contribution in #1355

Full Changelog: v0.10.2.post1...v0.10.3

@ChengjieLi28

What's new in 0.10.2.post1 (2024-04-19)

These are the changes in inference v0.10.2.post1.

Bug fixes

BUG: Fix xinference-client package depends on internal code by @ChengjieLi28 in #1330
BUG: Fix restful client depends on specific type by @ChengjieLi28 in #1331

Full Changelog: v0.10.2...v0.10.2.post1

Releases: xorbitsai/inference

v0.12.2

What's new in 0.12.2 (2024-06-21)

New features

Enhancements

Bug fixes

Tests

Others

Contributors

Uh oh!

v0.12.1

What's new in 0.12.1 (2024-06-14)

New features

Enhancements

Bug fixes

Others

New Contributors

Contributors

Uh oh!

v0.12.0

What's new in 0.12.0 (2024-06-07)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v0.11.3

What's new in 0.11.3 (2024-05-31)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v0.11.2.post1

What's new in 0.11.2.post1 (2024-05-24)

Bug fixes

Contributors

Uh oh!

v0.11.2

What's new in 0.11.2 (2024-05-24)

New features

Enhancements

Bug fixes

Others

Contributors

Uh oh!

v0.11.1

What's new in 0.11.1 (2024-05-17)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v0.11.0

What's new in 0.11.0 (2024-05-11)

Break Changes

New features

Enhancements

Bug fixes

Tests

Documentation

Others

New Contributors

Contributors

Uh oh!

v0.10.3

What's new in 0.10.3 (2024-04-24)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors