Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

不支持高并发,如何提高并发下响应速度 #747

Open
Jay-tian opened this issue Sep 9, 2024 · 2 comments
Open

不支持高并发,如何提高并发下响应速度 #747

Jay-tian opened this issue Sep 9, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@Jay-tian
Copy link

Jay-tian commented Sep 9, 2024

十个字使用GPU(T4卡),一个并发大概要1秒多,两个并发平均响应时间就要2秒到3秒,十个并发平均就要9秒多,GPU利用率只到了40%,有没有办法提高并发

@njzfw1024
Copy link

同问

@fumiama
Copy link
Member

fumiama commented Sep 10, 2024

使用vLLM(实验性)或开多个进程。

@fumiama fumiama added the documentation Improvements or additions to documentation label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants