Tensor-Parallel-PyTorch

Launch multi-process with torchrun

torchrun --nnodes=1 --nproc-per-node=2 test_torch_dtensor_tp.py

Potential issue

Perfomrance w/o no SHM based all-reduce
MQA/GQA support
odd ranks support.

All-reduce jit trace test case

Help

trace_mode 0 for jit, 1 for torch.compile, 2 for no jit/trace
TORCH_COMPILE_DEBUG=1 to see the log of torch.compile

torchrun --nnodes=1 --nproc-per-node=2 test_allreduce_jit_trace.py --trace_mode 0

run with deepspeed

deepspeed --bind_cores_to_rank test_allreduce_jit_trace.py --deepspeed --trace_mode 2

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
multi-rounds-chat.py		multi-rounds-chat.py
test_allreduce_jit_trace.py		test_allreduce_jit_trace.py
test_torch_dtensor_tp.py		test_torch_dtensor_tp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tensor-Parallel-PyTorch

Launch multi-process with torchrun

Potential issue

All-reduce jit trace test case

Help

run with deepspeed

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

liangan1/Tensor-Parallel-PyTorch

Folders and files

Latest commit

History

Repository files navigation

Tensor-Parallel-PyTorch

Launch multi-process with torchrun

Potential issue

All-reduce jit trace test case

Help

run with deepspeed

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages