Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : add pipeline parallelism support #6017

Merged
merged 23 commits into from
Mar 13, 2024
Merged
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
822121f
llama : add pipeline parallelism support for batch processing with mu…
slaren Feb 13, 2024
1ac668e
server : add -ub, --ubatch-size parameter
slaren Mar 12, 2024
4ddccc2
fix server embedding test
slaren Mar 12, 2024
937966d
llama : fix Mamba inference for pipeline parallelism
compilade Mar 12, 2024
00a415d
llama : limit max batch size to n_batch
slaren Mar 12, 2024
89bfa1f
add LLAMA_SCHED_MAX_COPIES to configure the number of input copies fo…
slaren Mar 12, 2024
aa1e2f8
fix hip build
slaren Mar 12, 2024
deb3e24
Merge remote-tracking branch 'origin/master' into sl/pipeline-paralle…
slaren Mar 12, 2024
ead5c8b
fix sycl build (disable cpy_tensor_async)
slaren Mar 12, 2024
255c1ec
fix hip build
slaren Mar 12, 2024
4400153
llama : limit n_batch and n_ubatch to n_ctx during context creation
slaren Mar 13, 2024
9e7cecc
llama : fix norm backend
slaren Mar 13, 2024
b25a0f1
batched-bench : sync after decode
ggerganov Mar 13, 2024
529e749
swiftui : sync after decode
ggerganov Mar 13, 2024
54cdd47
ggml : allow ggml_get_rows to use multiple threads if they are available
slaren Mar 13, 2024
cda49d3
check n_ubatch >= n_tokens with non-casual attention
slaren Mar 13, 2024
015e1bf
llama : do not limit n_batch to n_ctx with non-casual attn
slaren Mar 13, 2024
0d934ee
server : construct batch with size of llama_n_batch
ggerganov Mar 13, 2024
3c38789
ggml_backend_cpu_graph_compute : fix return value when alloc fails
slaren Mar 13, 2024
9092883
llama : better n_batch and n_ubatch comment
slaren Mar 13, 2024
cb580a6
fix merge
slaren Mar 13, 2024
1f56481
small fix
slaren Mar 13, 2024
976176d
reduce default n_batch to 2048
slaren Mar 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge remote-tracking branch 'origin/master' into sl/pipeline-paralle…
…lism
  • Loading branch information
slaren committed Mar 12, 2024
commit deb3e245c203ad8fde27d004dba2246cdb97477e

This merge commit was added into this branch cleanly.

There are no new changes to show, but you can still view the diff.