Admin Guide

Deployment

Docker-compose is used to orchestrate the frontend, API, workers, databases, and other services that are used in the Discourse Analysis Tool Suite.

Quickstart

0. Requirements

Machine with NVIDIA GPU
Docker with NVIDIA Container Toolkit

1. Clone the repository

git clone https://github.com/uhh-lt/dats.git

2. Run setup scripts

./bin/setup-envs.sh --project_name dats --port_prefix 101

./bin/setup-folders.sh

3. Start docker containers

docker compose -f compose.vllm.yml up -d

docker compose -f compose.ray.yml up -d

docker compose -f compose.docling.yml up -d

docker compose -f compose.yml -f compose.production.yml up --wait

4. Open DATS

Open https://localhost:10100/ in your browser

Updating deployed instances

First, locate the DATS directory on the machine and navigate to the docker directory. Then, get the newest code from git:

git switch main
git pull

Update DATS

1. Stop all containers

docker compose -f compose.yml -f compose.production.yml down

2. Update the /docker/.env file You have to update the /docker/.env file manually. Compare it with the .env.example file to find all differences. Then, use nano to change the .env file. Most likely, you need to update the DATS_BACKEND_DOCKER_VERSION and DATS_FRONTEND_DOCKER_VERSION variables to the newest version.

git diff --no-index .env.example .env
nano .env

3. Pull the newest docker containers

docker compose -f compose.yml -f compose.production.yml pull

4. Start all containers

docker compose -f compose.yml -f compose.production.yml up --wait

Now, DATS is updated to the new version. Note that you also may need to update the ray and vllm containers!

Update Ray

Ray only needs to run once per machine. It should always be up-to-date!

1. Stop the ray container

docker compose -f compose.ray.yml down

2. Update the /docker/.env file You have to manually set the DATS_RAY_DOCKER_VERSION environment variable to the newest version, for example with nano:

nano .env

3. Pull the new docker container

docker compose -f compose.ray.yml pull

4. Start Ray

docker compose -f compose.ray.yml up --wait

Now, Ray is updated to the new version. Note that ray only needs to run once per machine!

Update vLLM

vLLM only needs to run once per machine. It should always be up-to-date! However, vLLM is not developed by the DATS team, and its version number does not match our DATS version. Sometimes, even after deploying a new DATS version, the vLLM version remains unchanged.

1. Stop the vLLM container

docker compose -f compose.vllm.yml down

2. Pull the new docker container

docker compose -f compose.vllm.yml pull

3. Start vLLM

docker compose -f compose.vllm.yml up --wait

Now, vLLM is updated to the new version. Note that vLLM only needs to run once per machine!

Update Docling

Docling only needs to run once per machine. It should always be up-to-date! However, Docling is not developed by the DATS team, and its version number does not match our DATS version. Sometimes, even after deploying a new DATS version, the Docling version remains unchanged.

1. Stop the Docling container

docker compose -f compose.docling.yml down

2. Pull the new Docling container

docker compose -f compose.docling.yml pull

3. Start Docling

docker compose -f compose.docling.yml up --wait

Now, Docling is updated to the new version. Note that Docling only needs to run once per machine!

Folder structure

The script ./bin/setup-folders.sh creates multiple folders:

/backend_repo - User data
various cache directories (api, rq, ray, vllm)
various backup directories (weaviate, elasticsearch, postgres, repo)
data directories (elasticsearch, pg, redis, weaviate)

Configuration

There are two main files to configure DATS in production mode:

/docker/.env
/backend/configs/production.yaml

The .env file overrides frequently changing variables of the production.yaml config.

It is Strongly recommended to change the following configs in .env:

SYSTEM_USER_EMAIL
SYSTEM_USER_PASSWORD

You can find some additional configurations here. However, we do not expect these to be changed:

/docker/compose.yml
/docker/compose.production.yml
/docker/configs/es/elasticsearch.yml - Special Elasticsearch configuration
/docker/configs/frontend/nginx.conf - Special Frontend / NGINX configuration
/backend/src/ray_model_worker/config_gpu.yaml - Configure ML models

Backups

We provide several scripts to automatically create backups of all databases and uploaded user data. This is the recommended backup process

1. Stop backend and frontend Ensure that the backup process cannot be interrupted by users.

docker compose -f compose.yml -f compose.production.yml stop dats-frontend dats-backend-api

2. Create backups

./bin/backup-postgres.sh
./bin/backup-repo.sh
./bin/backup-elasticsearch.sh
./bin/backup-weaviate.sh

3. Restart containers

docker compose -f compose.yml -f compose.production.yml up --wait

SSO

DATS supports SSO using OAuth2. We tested it with Authentik as the Identity Provider and Single Sign On. We include a compose.authentik.yml to start an Authentik instance, but you can use any service that supports OAuth2/OpenID. This section explains the setup using Authentik.

1. Configure Authentik

First, a new application has to be created in Authentik.
Use dats as the name and slug, then choose Oauth2/OpenID Provider as the Provider Type.
Note the Client ID and Client secret. You will need it in the next step.
It is important to leave the private key field empty, as Authlib does not currently support token decryption.
Do not specify any groups. DATS does not support roles or groups.

Next, we need to find the metadata/OpenID-config-URL. In Authentik, this can be found under Applications/Provider/dats.

2. Configure DATS

Navigate to the docker directory and open the .env file
Fill the corresponding variables: OIDC_CLIENT_ID, OIDC_CLIENT_SECRET, and OIDC_SERVER_METADATA_URL. Also, set OIDC_ENABLED=True.

Monitoring

DATS is a complex application that consists of various Docker containers that are managed with Docker Compose. A monitoring system that watches the Docker containers' state and health is important for running applications reliably in Docker and not relying on users to report outages. We use Uptime Kuma, a simple, self-hosted, UI-focused monitoring software. It is open-source and can be run as another Docker container. Kuma uses MariaDB to store its data.

1. Configure Uptime Kuma

Kuma is configured as every other docker container in DATS using the /docker/.env file. Modify the corresponding variables: KUMA_*, MARIA_* and DOCKER_GROUP_ID.

2. Start Uptime Kuma

docker compose -f compose.kuma.yml up --wait

3. Configuration

Now, it is necessary to set up the monitoring manually.

View http://localhost:<KUMA_EXPOSED>
Setup Monitoring

More info can be found in Kuma's Documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Admin Guide

Deployment

Quickstart

Updating deployed instances

Update DATS

Update Ray

Update vLLM

Update Docling

Folder structure

Configuration

Backups

SSO

Monitoring

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally