README.md

Models

Common modelzoo such as huggingface/transformers stuggles when using Pytorch native model parallelism. Following the design principle of vLLM, we keep a simple, parallelizable, highly-optimized with packed inputs in verl.

Adding a New Huggingface Model

Step 1: Copy the model file from HF to verl

Add a new file under verl/models/hf
Copy ONLY the model file from huggingface/transformers/models to verl/models/hf

Step 2: Modify the model file to use packed inputs

Remove all the code related to inference (kv cache)
Modify the inputs to include only
- input_ids (total_nnz,)
- cu_seqlens (total_nnz + 1,)
- max_seqlen_in_batch: int
Note that this requires using flash attention with causal mask.

Step 2.5: Add tests

Add a test to compare this version and the huggingface version
Following the infrastructure and add tests to tests/models/hf

Step 3: Add a function to apply tensor parallelism

Please follow
- https://pytorch.org/docs/stable/distributed.tensor.parallel.html
- https://pytorch.org/tutorials/intermediate/TP_tutorial.html
General comments
- Tensor Parallelism in native Pytorch is NOT auto-parallelism. The way it works is to specify how model parameters and input/output reshards using configs. These configs are then registered as hooks to perform input/output resharding before/after model forward.

Step 4: Add a function to apply data parallelism

Please use FSDP2 APIs
See demo here https://github.com/pytorch/torchtitan/blob/main/torchtitan/parallelisms/parallelize_llama.py#L413

Step 5: Add a function to apply pipeline parallelism

Comes in Pytorch 2.4
Currently only in alpha in nightly version
Check torchtitan for more details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models

Adding a New Huggingface Model

Step 1: Copy the model file from HF to verl

Step 2: Modify the model file to use packed inputs

Step 2.5: Add tests

Step 3: Add a function to apply tensor parallelism

Step 4: Add a function to apply data parallelism

Step 5: Add a function to apply pipeline parallelism

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Models

Adding a New Huggingface Model

Step 1: Copy the model file from HF to verl

Step 2: Modify the model file to use packed inputs

Step 2.5: Add tests

Step 3: Add a function to apply tensor parallelism

Step 4: Add a function to apply data parallelism

Step 5: Add a function to apply pipeline parallelism