This example shows how to run TorchServe inference with Torch-TensorRT model
- Install CUDA and cuDNN. Verified with CUDA 11.7 and cuDNN 8.9.3.28
- Verified to be working with
tensorrt==8.5.3.1
andtorch-tensorrt==1.4.0
Change directory to the root of serve
Ex: if serve
is under /home/ubuntu
, change directory to /home/ubuntu/serve
We use float16
precision
TorchServe's base handler supports loading Torch TensorRT model with .pt
extension. Hence, the model is saved with .pt
extension.
python examples/torch_tensorrt/resnet_tensorrt.py
torch-model-archiver --model-name res50-trt-fp16 --handler image_classifier --version 1.0 --serialized-file res50_trt_fp16.pt --extra-files ./examples/image_classifier/index_to_name.json
mkdir model_store
mv res50-trt-fp16.mar model_store/.
torchserve --start --model-store model_store --models res50-trt-fp16=res50-trt-fp16.mar --ncs
curl http://127.0.0.1:8080/predictions/res50-trt-fp16 -T ./examples/image_classifier/kitten.jpg
produces the output
{
"tabby": 0.2723647356033325,
"tiger_cat": 0.13748960196971893,
"Egyptian_cat": 0.04659610986709595,
"lynx": 0.00318642589263618,
"lens_cap": 0.00224193069152534
}