LocalAI setup

This repo contains an instance of LocalAI launched using Docker Compose and comprises 2 containers:

An Nginx Gateway to perform TLS termination
The LocalAI API container to for model inference

See LocalAI notebook for an example of using langchain with a locally hosted LLM.

Hardware Specifications

The container were run on my personal setup:

CPU: AMD Ryzen 9 7900X 12-Core Processor
Memory 64 GiB
Disk: 1 TB NVMe
GPU: NVIDIA GeForce RTX 4090 24 GB VRAM

Note: A GPU is not strictly neccesary but is about 1 order of magnitude faster than a CPU for Large Language Model Inference

Pre-requisites

These applications must be installed:

Note: Those marked (GPU only) are only necessary when running inference on a GPU

Setup

Create credentials

Prepare a .env file. API_KEY is used to secure the LocalAI endpoint

The .env file should look like this

API_KEY=<RANDOM_ALPHANUMERIC_STRING>

Generate self-signed certificates

You can use this command to generate self-signed certificates in nginx/ssl

Replace the Common Name(CN) and Subject Alternate Name (SAN) as necessary)

openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 \
  -nodes -keyout nginx/ssl/ca.key -out nginx/ssl/ca.crt -subj "/CN=example.com" \
  -addext "subjectAltName=DNS:example.com,DNS:*.example.com,IP:10.0.0.1"

Install the certificate using these commands for Ubuntu

sudo apt-get install -y ca-certificates
sudo cp nginx/ssl/ca.crt /usr/local/share/ca-certificates
sudo update-ca-certificates

Start 'er up

Start the LocalAI container in detached mode

docker compose up --detach

Set credentials

If you set API_KEY in docker-compose.yaml, to connect to the endpoint, you must set the same for LOCALAI_API_KEY

export LOCALAI_API_KEY="<API_KEY>"

Connect to HTTP endpoint

The server HTTP endpoint will be available at http://localhost:9998/

Get list of models

curl --header "X-API-Key: $LOCALAI_API_KEY" http://localhost:9998/models

Connect to the HTTPS endpoint

The server HTTP endpoint will be available at https://localhost:9999/

Get list of models. --insecure is needed since we are connecting to localhost

curl --insecure --header "X-API-Key: $LOCALAI_API_KEY" https://localhost:9999/models

Try sending in a prompt

curl --insecure https://localhost:9999/v1/chat/completions \
    -H "X-API-Key: $LOCALAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{ "model": "mistral-7b-instruct-v0.3", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }'

Check the GPU usage (if you are using a GPU)

On the first query, the model mistral-7b-instruct-v0.3 is loaded into memory

View GPU memory consumption using:

nvidia-smi

Tear down

To shutdown all containers:

docker compose down

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
nginx		nginx
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalAI setup

Hardware Specifications

Pre-requisites

Setup

Create credentials

Generate self-signed certificates

Start 'er up

Set credentials

Connect to HTTP endpoint

Connect to the HTTPS endpoint

Check the GPU usage (if you are using a GPU)

Tear down

About

Releases

Packages

frenoid/normans-localai

Folders and files

Latest commit

History

Repository files navigation

LocalAI setup

Hardware Specifications

Pre-requisites

Setup

Create credentials

Generate self-signed certificates

Start 'er up

Set credentials

Connect to HTTP endpoint

Connect to the HTTPS endpoint

Check the GPU usage (if you are using a GPU)

Tear down

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages