Kubernetes LLM Deployment Guide

This comprehensive guide walks you through deploying Large Language Models (LLMs) using Ollama, on Azure Kubernetes Service (AKS). The setup includes both the Ollama server (a REST API server for running LLM models) and Open-WebUI client for easy interaction with your LLM.

Prerequisites

Azure CLI installed and configured
kubectl installed
SSH key pair generated
zsh shell (bash commands may vary slightly)

Project Structure

KubernetesLLM/
├── bicep/                      # Bicep Infrastructure as Code files
│   ├── main.bicep              # Main Bicep template for AKS deployment
│   ├── kubernetes-resources.bicep  # Bicep module for Kubernetes resources
│   └── main.parameters.json    # Parameters for Bicep deployment
├── images/                     # Documentation images
├── namespace.yaml              # Kubernetes namespace manifest
├── ollama-service.yaml         # Ollama service manifest
├── ollama-statefulset.yaml     # Ollama StatefulSet manifest
├── webui-deployment.yaml       # Open-WebUI deployment manifest
├── webui-ingress.yaml          # Open-WebUI ingress manifest
├── webui-pvc.yaml              # Open-WebUI persistent volume claim
├── webui-service.yaml          # Open-WebUI service manifest
├── deploy.sh                   # Deployment script for Bicep
├── my_ssh_key.pub              # SSH public key for AKS nodes
└── README.md                   # This documentation

Environment Setup

You can deploy this solution using either Azure CLI commands directly or using Azure Bicep for Infrastructure as Code.

Option 1: Using Azure CLI

Set the required environment variables:

export AKS_RG="llama3-aks-rg"
export AKS_NAME="llm-aks-cluster"

Deployment Steps

1. Create Azure Resource Group

az group create -n $AKS_RG -l eastus2

2. Create AKS Cluster

Note: Using Standard_B2s VM size (8GB RAM) for small LLM testing

az aks create -n $AKS_NAME -g $AKS_RG \
    --network-plugin azure \
    --network-plugin-mode overlay \
    -k 1.30.3 \
    --node-count 1 \
    --node-vm-size Standard_B2s \
    --ssh-key-value ./my_ssh_key.pub

3. Configure kubectl

az aks get-credentials -n $AKS_NAME -g $AKS_RG --overwrite-existing

4. Verify Cluster Connection

kubectl get nodes

5. Deploy Ollama and Open-WebUI

kubectl apply -f .

6. Monitor Deployment

Check all resources in the ollama namespace:

kubectl get all,pv,pvc -n ollama

7. Managing LLM Models

List running Ollama processes:

kubectl exec ollama-0 -n ollama -it -- ollama ps

Install and run an LLM model (example using llama3.2:3b):

kubectl exec ollama-0 -n ollama -it -- ollama run llama3.2:3b

8. Access the Web Interface

Get the public IP for the Open-WebUI service:

kubectl get svc -n ollama

Now you can navigate to the public IP of the client service to chat with the model.

Here are some example models that can be used in ollama available here:

Model	Parameters	Size	Download
Llama 3.1	8B	4.7GB	`ollama run llama3.1`
Llama 3.1	70B	40GB	`ollama run llama3.1:70b`
Llama 3.1	405B	231GB	`ollama run llama3.1:405b`
Phi 3 Mini	3.8B	2.3GB	`ollama run phi3`
Phi 3 Medium	14B	7.9GB	`ollama run phi3:medium`
Gemma 2	2B	1.6GB	`ollama run gemma2:2b`
Gemma 2	9B	5.5GB	`ollama run gemma2`
Gemma 2	27B	16GB	`ollama run gemma2:27b`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`
Solar	10.7B	6.1GB	`ollama run solar`

Important notes

The ollama server is running only on CPU. However, it can also run on GPU or also NPU.
As LLM models size are large, it is recommended to use a VM with large disk space.
During the inference, the model will consume a lot of memory and CPU. It is recommended to use a VM with a large memory and CPU
The deployment uses Azure CNI networking with overlay mode
Minimum recommended VM size is Standard_B2s (8GB RAM) for testing small LLMs
Adjust resources according to your LLM size and performance requirements

Option 2: Using Azure Bicep

This project includes Azure Bicep templates in the bicep/ directory for deploying the entire infrastructure in a repeatable, version-controlled way.

Make sure you have the latest Azure CLI installed with Bicep support:
```
az bicep install
az bicep upgrade
```
Review and customize the parameters in bicep/main.parameters.json if needed.
Run the deployment script:
```
./deploy.sh
```

The script will:

Create a resource group if it doesn't exist
Read your SSH public key from the my_ssh_key.pub file
Deploy the AKS cluster and Kubernetes resources using Bicep
Configure kubectl to connect to your new cluster

Bicep Deployment Details

The Bicep deployment consists of the following files in the bicep/ directory:

main.bicep - The main template that deploys the AKS cluster
- Defines the AKS cluster with the specified VM size, node count, and Kubernetes version
- Configures networking with Azure CNI in overlay mode
- Sets up SSH access using the provided public key
kubernetes-resources.bicep - A module that deploys the Kubernetes resources
- Creates the ollama namespace
- Deploys the Ollama StatefulSet and service
- Sets up the Open-WebUI deployment, service, and persistent volume claim
- Configures the necessary connections between components
main.parameters.json - Parameters for the deployment
- Defines default values for the AKS cluster name, location, VM size, etc.
- Can be customized to match your requirements
deploy.sh - A script in the root directory to simplify the deployment process
- Creates the Azure resource group
- Reads your SSH public key
- Creates a temporary parameters file with your SSH key
- Deploys the Bicep templates
- Configures kubectl to connect to your cluster

Customizing the Bicep Deployment

You can customize the deployment by:

Modifying the parameters in bicep/main.parameters.json
Editing the deploy.sh script to change deployment variables
Directly modifying the Bicep templates for more advanced customizations

For example, to deploy a larger VM size for running bigger LLM models, you can change the nodeVmSize parameter in the parameters file or deployment script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kubernetes LLM Deployment Guide

Prerequisites

Project Structure

Environment Setup

Option 1: Using Azure CLI

Deployment Steps

1. Create Azure Resource Group

2. Create AKS Cluster

3. Configure kubectl

4. Verify Cluster Connection

5. Deploy Ollama and Open-WebUI

6. Monitor Deployment

7. Managing LLM Models

8. Access the Web Interface

Important notes

Option 2: Using Azure Bicep

Bicep Deployment Details

Customizing the Bicep Deployment

References

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bicep		bicep
images		images
.gitignore		.gitignore
Modelfile		Modelfile
README.md		README.md
deploy.sh		deploy.sh
namespace.yaml		namespace.yaml
ollama-service.yaml		ollama-service.yaml
ollama-statefulset.yaml		ollama-statefulset.yaml
webui-deployment.yaml		webui-deployment.yaml
webui-ingress.yaml		webui-ingress.yaml
webui-pvc.yaml		webui-pvc.yaml
webui-service.yaml		webui-service.yaml

Ashref-dev/azure-k8s-llm-deploy

Folders and files

Latest commit

History

Repository files navigation

Kubernetes LLM Deployment Guide

Prerequisites

Project Structure

Environment Setup

Option 1: Using Azure CLI

Deployment Steps

1. Create Azure Resource Group

2. Create AKS Cluster

3. Configure kubectl

4. Verify Cluster Connection

5. Deploy Ollama and Open-WebUI

6. Monitor Deployment

7. Managing LLM Models

8. Access the Web Interface

Important notes

Option 2: Using Azure Bicep

Bicep Deployment Details

Customizing the Bicep Deployment

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages