Deprecation notice:

Dockerfile.neuron.dev has been deprecated. Please refer to deep learning containers repository for neuron torchserve containers.

Dockerfile.dev has been deprecated. Please refer to Dockerfile for dev torchserve containers.

Contents of this Document

Prerequisites
Create TorchServe docker image
Create torch-model-archiver from container
Running TorchServe docker image in production

Prerequisites

docker - Refer to the official docker installation guide
git - Refer to the official git set-up guide
For base Ubuntu with GPU, install following nvidia container toolkit and driver-
- Nvidia container toolkit
- Nvidia driver
NOTE - Dockerfiles have not been tested on windows native platform.

First things first

If you have not cloned TorchServe source then:

git clone https://github.com/pytorch/serve.git
cd serve/docker

Create TorchServe docker image

Use build_image.sh script to build the docker images. The script builds the production, dev and ci docker images.

Parameter	Description
-h, --help	Show script help
-b, --branch_name	Specify a branch name to use. Default: master
-g, --gpu	Build image with GPU based ubuntu base image
-bi, --baseimage specify base docker image. Example: nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
-bt, --buildtype	Which type of docker image to build. Can be one of : production, dev, ci
-t, --tag	Tag name for image. If not specified, script uses torchserve default tag names.
-cv, --cudaversion	Specify to cuda version to use. Supported values `cu92`, `cu101`, `cu102`, `cu111`, `cu113`, `cu116`, `cu117`, `cu118`. `cu121`, Default `cu121`
-ipex, --build-with-ipex	Specify to build with intel_extension_for_pytorch. If not specified, script builds without intel_extension_for_pytorch.
-n, --nightly	Specify to build with TorchServe nightly.
-py, --pythonversion	Specify the python version to use. Supported values `3.8`, `3.9`, `3.10`, `3.11`. Default `3.9`

PRODUCTION ENVIRONMENT IMAGES

Creates a docker image with publicly available torchserve and torch-model-archiver binaries installed.

To create a CPU based image

./build_image.sh

To create a GPU based image with cuda 10.2. Options are cu92, cu101, cu102, cu111, cu113, cu116, cu117, cu118
- GPU images are built with NVIDIA CUDA base image. If you want to use ONNX, please specify the base image as shown in the next section.

./build_image.sh -g -cv cu117

To create an image with a custom tag

./build_image.sh -t torchserve:1.0

NVIDIA CUDA RUNTIME BASE IMAGE

To make use of ONNX, we need to use NVIDIA CUDA runtime as the base image. This will increase the size of your Docker Image

  ./build_image.sh -bi nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 -g -cv cu117

DEVELOPER ENVIRONMENT IMAGES

Creates a docker image with torchserve and torch-model-archiver installed from source.

For creating CPU based image :

./build_image.sh -bt dev

For creating CPU based image with a different branch:

./build_image.sh -bt dev -b my_branch

For creating GPU based image with cuda version 11.3:

./build_image.sh -bt dev -g -cv cu113

For creating GPU based image with cuda version 11.1:

./build_image.sh -bt dev -g -cv cu111

For creating GPU based image with cuda version 10.2:

./build_image.sh -bt dev -g -cv cu102

For creating GPU based image with cuda version 10.1:

./build_image.sh -bt dev -g -cv cu101

For creating GPU based image with cuda version 9.2:

./build_image.sh -bt dev -g -cv cu92

For creating GPU based image with a different branch:

./build_image.sh -bt dev -g -cv cu113 -b my_branch

./build_image.sh -bt dev -g -cv cu111 -b my_branch

For creating image with a custom tag:

./build_image.sh -bt dev -t torchserve-dev:1.0

For creating image with Intel® Extension for PyTorch*:

./build_image.sh -bt dev -ipex -t torchserve-ipex:1.0

Start a container with a TorchServe image

The following examples will start the container with 8080/81/82 and 7070/71 port exposed to localhost.

Security Guideline

TorchServe's Dockerfile configures ports 8080, 8081 , 8082, 7070 and 7071 to be exposed to the host by default.

When mapping these ports to the host, make sure to specify localhost or a specific ip address.

Start CPU container

For the latest version, you can use the latest tag:

docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest

For specific versions you can pass in the specific tag to use (ex: pytorch/torchserve:0.1.1-cpu):

docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:0.1.1-cpu

Start CPU container with Intel® Extension for PyTorch*

docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071  torchserve-ipex:1.0

Start GPU container

For GPU latest image with gpu devices 1 and 2:

docker run --rm -it --gpus '"device=1,2"' -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest-gpu

For specific versions you can pass in the specific tag to use (ex: 0.1.1-cuda10.1-cudnn7-runtime):

docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:0.1.1-cuda10.1-cudnn7-runtime

For the latest version, you can use the latest-gpu tag:

docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest-gpu

Accessing TorchServe APIs inside container

The TorchServe's inference and management APIs can be accessed on localhost over 8080 and 8081 ports respectively. Example :

curl http://localhost:8080/ping

Create torch-model-archiver from container

To create mar [model archive] file for TorchServe deployment, you can use following steps

Start container by sharing your local model-store/any directory containing custom/example mar contents as well as model-store directory (if not there, create it)

docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 --name mar -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples pytorch/torchserve:latest

1.a. If starting container with Intel® Extension for PyTorch*, add the following lines in config.properties to enable IPEX and launcher with its default configuration.

ipex_enable=true
cpu_launcher_enable=true

docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 --name mar -v $(pwd)/config.properties:/home/model-server/config.properties -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples torchserve-ipex:1.0

List your container or skip this if you know container name

docker ps

Bind and get the bash prompt of running container

docker exec -it <container_name> /bin/bash

You will be landing at /home/model-server/.

Download the model weights if you have not done so already (they are not part of the repo)

curl -o /home/model-server/examples/image_classifier/densenet161-8d451a50.pth https://download.pytorch.org/models/densenet161-8d451a50.pth

Now Execute torch-model-archiver command e.g.

torch-model-archiver --model-name densenet161 --version 1.0 --model-file /home/model-server/examples/image_classifier/densenet_161/model.py --serialized-file /home/model-server/examples/image_classifier/densenet161-8d451a50.pth --export-path /home/model-server/model-store --extra-files /home/model-server/examples/image_classifier/index_to_name.json --handler image_classifier

Refer torch-model-archiver for details.

densenet161.mar file should be present at /home/model-server/model-store

Running TorchServe in a Production Docker Environment.

You may want to consider the following aspects / docker options when deploying torchserve in Production with Docker.

Shared Memory Size
- shm-size - The shm-size parameter allows you to specify the shared memory that a container can use. It enables memory-intensive containers to run faster by giving more access to allocated memory.
User Limits for System Resources
- --ulimit memlock=-1 : Maximum locked-in-memory address space.
- --ulimit stack : Linux stack size
The current ulimit values can be viewed by executing ulimit -a. A more exhaustive set of options for resource constraining can be found in the Docker Documentation here, here and here
Exposing specific ports / volumes between the host & docker env.
- -p8080:8080 -p8081:8081 -p 8082:8082 -p 7070:7070 -p 7071:7071 TorchServe uses default ports 8080 / 8081 / 8082 for REST based inference, management & metrics APIs and 7070 / 7071 for gRPC APIs. You may want to expose these ports to the host for HTTP & gRPC Requests between Docker & Host.
- The model store is passed to torchserve with the --model-store option. You may want to consider using a shared volume if you prefer pre populating models in model-store directory.

For example,

docker run --rm --shm-size=1g \
        --ulimit memlock=-1 \
        --ulimit stack=67108864 \
        -p 127.0.0.1:8080:8080 \
        -p 127.0.0.1:8081:8081 \
        -p 127.0.0.1:8082:8082 \
        -p 127.0.0.1:7070:7070 \
        -p 127.0.0.1:7071:7071 \
        --mount type=bind,source=/path/to/model/store,target=/tmp/models <container> torchserve --model-store=/tmp/models

Example showing serving model using Docker container

This is an example showing serving MNIST model using Docker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deprecation notice:

Contents of this Document

Prerequisites

First things first

Create TorchServe docker image

Start a container with a TorchServe image

Security Guideline

Start CPU container

Start CPU container with Intel® Extension for PyTorch*

Start GPU container

Accessing TorchServe APIs inside container

Create torch-model-archiver from container

Running TorchServe in a Production Docker Environment.

Example showing serving model using Docker container

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deprecation notice:

Contents of this Document

Prerequisites

First things first

Create TorchServe docker image

Start a container with a TorchServe image

Security Guideline

Start CPU container

Start CPU container with Intel® Extension for PyTorch*

Start GPU container

Accessing TorchServe APIs inside container

Create torch-model-archiver from container

Running TorchServe in a Production Docker Environment.

Example showing serving model using Docker container