Skip to content

Latest commit

 

History

History
292 lines (196 loc) · 11.2 KB

README.md

File metadata and controls

292 lines (196 loc) · 11.2 KB

Deprecation notice:

Dockerfile.neuron.dev has been deprecated. Please refer to deep learning containers repository for neuron torchserve containers.

Dockerfile.dev has been deprecated. Please refer to Dockerfile for dev torchserve containers.

Contents of this Document

Prerequisites

First things first

If you have not cloned TorchServe source then:

git clone https://github.com/pytorch/serve.git
cd serve/docker

Create TorchServe docker image

Use build_image.sh script to build the docker images. The script builds the production, dev and ci docker images.

Parameter Description
-h, --help Show script help
-b, --branch_name Specify a branch name to use. Default: master
-g, --gpu Build image with GPU based ubuntu base image
-bi, --baseimage specify base docker image. Example: nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
-bt, --buildtype Which type of docker image to build. Can be one of : production, dev, ci
-t, --tag Tag name for image. If not specified, script uses torchserve default tag names.
-cv, --cudaversion Specify to cuda version to use. Supported values cu92, cu101, cu102, cu111, cu113, cu116, cu117, cu118. cu121, Default cu121
-ipex, --build-with-ipex Specify to build with intel_extension_for_pytorch. If not specified, script builds without intel_extension_for_pytorch.
-n, --nightly Specify to build with TorchServe nightly.
-py, --pythonversion Specify the python version to use. Supported values 3.8, 3.9, 3.10, 3.11. Default 3.9

PRODUCTION ENVIRONMENT IMAGES

Creates a docker image with publicly available torchserve and torch-model-archiver binaries installed.

  • To create a CPU based image
./build_image.sh
  • To create a GPU based image with cuda 10.2. Options are cu92, cu101, cu102, cu111, cu113, cu116, cu117, cu118

    • GPU images are built with NVIDIA CUDA base image. If you want to use ONNX, please specify the base image as shown in the next section.
./build_image.sh -g -cv cu117
  • To create an image with a custom tag
./build_image.sh -t torchserve:1.0

NVIDIA CUDA RUNTIME BASE IMAGE

To make use of ONNX, we need to use NVIDIA CUDA runtime as the base image. This will increase the size of your Docker Image

  ./build_image.sh -bi nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 -g -cv cu117

DEVELOPER ENVIRONMENT IMAGES

Creates a docker image with torchserve and torch-model-archiver installed from source.

  • For creating CPU based image :
./build_image.sh -bt dev
  • For creating CPU based image with a different branch:
./build_image.sh -bt dev -b my_branch
  • For creating GPU based image with cuda version 11.3:
./build_image.sh -bt dev -g -cv cu113
  • For creating GPU based image with cuda version 11.1:
./build_image.sh -bt dev -g -cv cu111
  • For creating GPU based image with cuda version 10.2:
./build_image.sh -bt dev -g -cv cu102
  • For creating GPU based image with cuda version 10.1:
./build_image.sh -bt dev -g -cv cu101
  • For creating GPU based image with cuda version 9.2:
./build_image.sh -bt dev -g -cv cu92
  • For creating GPU based image with a different branch:
./build_image.sh -bt dev -g -cv cu113 -b my_branch
./build_image.sh -bt dev -g -cv cu111 -b my_branch
  • For creating image with a custom tag:
./build_image.sh -bt dev -t torchserve-dev:1.0
  • For creating image with Intel® Extension for PyTorch*:
./build_image.sh -bt dev -ipex -t torchserve-ipex:1.0

Start a container with a TorchServe image

The following examples will start the container with 8080/81/82 and 7070/71 port exposed to localhost.

Security Guideline

TorchServe's Dockerfile configures ports 8080, 8081 , 8082, 7070 and 7071 to be exposed to the host by default.

When mapping these ports to the host, make sure to specify localhost or a specific ip address.

Start CPU container

For the latest version, you can use the latest tag:

docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest

For specific versions you can pass in the specific tag to use (ex: pytorch/torchserve:0.1.1-cpu):

docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:0.1.1-cpu

Start CPU container with Intel® Extension for PyTorch*

docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071  torchserve-ipex:1.0

Start GPU container

For GPU latest image with gpu devices 1 and 2:

docker run --rm -it --gpus '"device=1,2"' -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest-gpu

For specific versions you can pass in the specific tag to use (ex: 0.1.1-cuda10.1-cudnn7-runtime):

docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:0.1.1-cuda10.1-cudnn7-runtime

For the latest version, you can use the latest-gpu tag:

docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest-gpu

Accessing TorchServe APIs inside container

The TorchServe's inference and management APIs can be accessed on localhost over 8080 and 8081 ports respectively. Example :

curl http://localhost:8080/ping

Create torch-model-archiver from container

To create mar [model archive] file for TorchServe deployment, you can use following steps

  1. Start container by sharing your local model-store/any directory containing custom/example mar contents as well as model-store directory (if not there, create it)
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 --name mar -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples pytorch/torchserve:latest

1.a. If starting container with Intel® Extension for PyTorch*, add the following lines in config.properties to enable IPEX and launcher with its default configuration.

ipex_enable=true
cpu_launcher_enable=true
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 --name mar -v $(pwd)/config.properties:/home/model-server/config.properties -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples torchserve-ipex:1.0
  1. List your container or skip this if you know container name
docker ps
  1. Bind and get the bash prompt of running container
docker exec -it <container_name> /bin/bash

You will be landing at /home/model-server/.

  1. Download the model weights if you have not done so already (they are not part of the repo)
curl -o /home/model-server/examples/image_classifier/densenet161-8d451a50.pth https://download.pytorch.org/models/densenet161-8d451a50.pth
  1. Now Execute torch-model-archiver command e.g.
torch-model-archiver --model-name densenet161 --version 1.0 --model-file /home/model-server/examples/image_classifier/densenet_161/model.py --serialized-file /home/model-server/examples/image_classifier/densenet161-8d451a50.pth --export-path /home/model-server/model-store --extra-files /home/model-server/examples/image_classifier/index_to_name.json --handler image_classifier

Refer torch-model-archiver for details.

  1. densenet161.mar file should be present at /home/model-server/model-store

Running TorchServe in a Production Docker Environment.

You may want to consider the following aspects / docker options when deploying torchserve in Production with Docker.

  • Shared Memory Size

    • shm-size - The shm-size parameter allows you to specify the shared memory that a container can use. It enables memory-intensive containers to run faster by giving more access to allocated memory.
  • User Limits for System Resources

    • --ulimit memlock=-1 : Maximum locked-in-memory address space.
    • --ulimit stack : Linux stack size

    The current ulimit values can be viewed by executing ulimit -a. A more exhaustive set of options for resource constraining can be found in the Docker Documentation here, here and here

  • Exposing specific ports / volumes between the host & docker env.

    • -p8080:8080 -p8081:8081 -p 8082:8082 -p 7070:7070 -p 7071:7071 TorchServe uses default ports 8080 / 8081 / 8082 for REST based inference, management & metrics APIs and 7070 / 7071 for gRPC APIs. You may want to expose these ports to the host for HTTP & gRPC Requests between Docker & Host.
    • The model store is passed to torchserve with the --model-store option. You may want to consider using a shared volume if you prefer pre populating models in model-store directory.

For example,

docker run --rm --shm-size=1g \
        --ulimit memlock=-1 \
        --ulimit stack=67108864 \
        -p 127.0.0.1:8080:8080 \
        -p 127.0.0.1:8081:8081 \
        -p 127.0.0.1:8082:8082 \
        -p 127.0.0.1:7070:7070 \
        -p 127.0.0.1:7071:7071 \
        --mount type=bind,source=/path/to/model/store,target=/tmp/models <container> torchserve --model-store=/tmp/models

Example showing serving model using Docker container

This is an example showing serving MNIST model using Docker.