Skip to content

Tags: gty111/gLLM

Tags

v0.0.6

Toggle v0.0.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Kernel refactor: use sgl_kernel (#181)

* Use sgl-kernel and flashinfer

* Simplify build

* Update requirements

* Add health check

* Fix shape

* Fix attention

* Fix

* Fix

* Fix

* Fix conv3d

* Support endpoint

* Fix stop string

* Abstract conv3d module

* Fix moe and add conv file

* Fix weight for conv3d

* Fix

* Fix fused moe

* FIx

* Fix for moe and model max len

* Fix padding block

* Bump up to v0.0.6

* Clean up

---------

Co-authored-by: instinctguo <instinctguo@tencent.com>

v0.0.5

Toggle v0.0.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Bump up to version 0.0.5 (#147)

v0.0.4

Toggle v0.0.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add torchvision in requirements.txt (#120)

v0.0.3

Toggle v0.0.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Bump up to version 0.0.3 (#81)

v0.0.2

Toggle v0.0.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Support TP 🎉 (#72)

* Initial support for TP

* Use random initialization

* Fix PP forward

* Downgrade to torch 2.6.0

* Fix env setting for MAX_JOBS

* Downgrade to torch 2.5.1

* Fix TP group init

* Fix annotation

* Make llama compatible for tp

* Make chatglm compatible for TP

* Make Qwen3 compatible for TP

* Remove weight_loader in fused_moe

* Make fused_moe compatible for TP; Abstract weight load function

* Make qwen_moe compatible for tp

* Make mixtral compatible for TP

* Update readme

* Abstract module attention; Clean up code for TP attention; Clean up code for model weights loading for glm

* Add MoE tuing config for A100 PCIE 40GB

* Refactor scheduler.py and AllocatorID

* Refactor IDAllocator

* Refactor worker scheduler

* Update readme

* Make embed_tokens and lm_head compatible for TP

* Fix multi-node zmq_comm

* Bump version to 0.1.0

v0.0.1

Toggle v0.0.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add pyproject.toml (#62)