This repo contains an instance of LocalAI launched using Docker Compose and comprises 2 containers:
- An Nginx Gateway to perform TLS termination
- The LocalAI API container to for model inference
See LocalAI notebook for an example of using langchain
with a locally hosted LLM.
The container were run on my personal setup:
- CPU: AMD Ryzen 9 7900X 12-Core Processor
- Memory 64 GiB
- Disk: 1 TB NVMe
- GPU: NVIDIA GeForce RTX 4090 24 GB VRAM
Note: A GPU is not strictly neccesary but is about 1 order of magnitude faster than a CPU for Large Language Model Inference
These applications must be installed:
- Docker
- Docker Compose
- NVIDIA GPU drivers (GPU only)
- NVIDIA Container Toolkit (GPU only)
Note: Those marked (GPU only) are only necessary when running inference on a GPU
Prepare a .env
file. API_KEY
is used to secure the LocalAI endpoint
The .env
file should look like this
API_KEY=<RANDOM_ALPHANUMERIC_STRING>
You can use this command to generate self-signed certificates in nginx/ssl
Replace the Common Name(CN) and Subject Alternate Name (SAN) as necessary)
openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 \
-nodes -keyout nginx/ssl/ca.key -out nginx/ssl/ca.crt -subj "/CN=example.com" \
-addext "subjectAltName=DNS:example.com,DNS:*.example.com,IP:10.0.0.1"
Install the certificate using these commands for Ubuntu
sudo apt-get install -y ca-certificates
sudo cp nginx/ssl/ca.crt /usr/local/share/ca-certificates
sudo update-ca-certificates
Start the LocalAI container in detached mode
docker compose up --detach
If you set API_KEY
in docker-compose.yaml, to connect to the endpoint, you must set the same for LOCALAI_API_KEY
export LOCALAI_API_KEY="<API_KEY>"
The server HTTP endpoint will be available at http://localhost:9998/
Get list of models
curl --header "X-API-Key: $LOCALAI_API_KEY" http://localhost:9998/models
The server HTTP endpoint will be available at https://localhost:9999/
Get list of models. --insecure
is needed since we are connecting to localhost
curl --insecure --header "X-API-Key: $LOCALAI_API_KEY" https://localhost:9999/models
Try sending in a prompt
curl --insecure https://localhost:9999/v1/chat/completions \
-H "X-API-Key: $LOCALAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "model": "mistral-7b-instruct-v0.3", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }'
On the first query, the model mistral-7b-instruct-v0.3
is loaded into memory
View GPU memory consumption using:
nvidia-smi
To shutdown all containers:
docker compose down