C++ Torch Server

Serve torch models as rest-api using Drogon, example included for resnet18 model for Imagenet. Benchmarks show improvement of ~6-10x throughput and latencies for resnet18 at peak load.

Build & Run Instructions

# Create Optimized models for your machine.
$ python3 optimize_model_for_inference.py

# Build and Run Server
$ docker compose run --service-ports blaze

Development

Add Docker to CLion toolchain this will setup all necessary dependencies.

Client Instructions

curl "localhost:8088/classify" -F "image=@images/cat.jpg"

Benchmarking Instructions

# Drogon + libtorch
for i in {0..8}; do curl "localhost:8088/classify" -F "image=@images/cat.jpg"; done # Run once to warmup.
wrk -t8 -c100 -d60 -s benchmark/upload.lua "http://localhost:8088/classify" --latency

# FastAPI + pytorch
cd benchmark/python_fastapi
python3 -m venv env
source env/bin/activate
python3 -m pip install -r requirements.txt # Run just once to isntall dependencies to folder.
gunicorn main:app -w 2 -k uvicorn.workers.UvicornWorker --bind 127.0.0.1: # Best performance on my machine, tried 3/4 also.
deactivate # Use after benchmarking is done and gunicorn is closed

cd ../.. # back to root folder
for i in {0..8}; do curl "localhost:8088/classify" -F "image=@images/cat.jpg"; done
wrk -t8 -c100 -d60 -s benchmark/fastapi_upload.lua "http://localhost:8088/classify" --latency

Benchmarking results

Drogon + libtorch

# OS: Ubuntu 21.10 x86_64
# Kernel: 5.15.14-xanmod1
# CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz
# GPU: NVIDIA GeForce RTX 3070
Running 1m test @ http://localhost:8088/classify
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    39.30ms   10.96ms  95.51ms   70.50%
    Req/Sec   306.58     28.78   390.00     70.92%
  Latency Distribution
     50%   37.40ms
     75%   45.69ms
     90%   54.57ms
     99%   69.34ms
  146612 requests in 1.00m, 30.34MB read
Requests/sec:   2441.60
Transfer/sec:    517.41KB

FastAPI + pytorch

# OS: Ubuntu 21.10 x86_64
# Kernel: 5.15.14-xanmod1
# CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz
# GPU: NVIDIA GeForce RTX 3070
Running 1m test @ http://localhost:8088/classify
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   449.50ms  239.30ms   1.64s    70.39%
    Req/Sec    33.97     26.41   121.00     83.46%
  Latency Distribution
     50%  454.64ms
     75%  570.73ms
     90%  743.54ms
     99%    1.16s
  12981 requests in 1.00m, 2.64MB read
Requests/sec:    216.13
Transfer/sec:     44.96KB

Architecture

API request handing and model Pre-processing in the Drogon Controller controllers/ImageClass.cc
Batched Model Inference logic & post-processing in lib/ModelBatchInference.cpp

TODOS

Notes

WIP: Just gets the job done for now, not production ready, though tested regularly.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.devcontainer		.devcontainer
benchmark		benchmark
cmake		cmake
controllers		controllers
images		images
includes		includes
lib		lib
model_resources		model_resources
models		models
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
Dockerfile.cpp-cpu-ubuntu		Dockerfile.cpp-cpu-ubuntu
Dockerfile.cpp-gpu-ubuntu		Dockerfile.cpp-gpu-ubuntu
LICENSE		LICENSE
README.md		README.md
config.json		config.json
docker-compose.yml		docker-compose.yml
main.cc		main.cc
optimize_model_for_inference.py		optimize_model_for_inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C++ Torch Server

Serve torch models as rest-api using Drogon, example included for resnet18 model for Imagenet. Benchmarks show improvement of ~6-10x throughput and latencies for resnet18 at peak load.

Build & Run Instructions

Development

Client Instructions

Benchmarking Instructions

Benchmarking results

Architecture

TODOS

Notes

About

Releases

Packages

Languages

License

SABER-labs/Drogon-torch-serve

Folders and files

Latest commit

History

Repository files navigation

C++ Torch Server

Serve torch models as rest-api using Drogon, example included for resnet18 model for Imagenet. Benchmarks show improvement of ~6-10x throughput and latencies for resnet18 at peak load.

Build & Run Instructions

Development

Client Instructions

Benchmarking Instructions

Benchmarking results

Architecture

TODOS

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages