DigitalHub CORE

Digital Hub is an Open-Source platform for building and managing Data and AI applications and services. Bringing the principles of DataOps, MLOps, DevOps and GitOps, integrating the state-of-art open source technologies and modern standards, DigitalHub extends your development process and operations to the whole application life-cycle, from exploration to deployment, monitoring, and evolution.

Explore. Use on-demand interactive scalable workspaces of your choice to code, explore, experiment and analyze the data and ML. Workspace provide a secure and isolated execution environments natively connected to the platform, such as Jupyter Notebooks or VS Code, Dremio distributed query engine, etc. Use extensibility mechanisms provided by the underlying workspace manager to bring your own workspace templates into the platform.
Process. Use persistent storages (Datalake and Relational DBs) to manage structured and non-structured data on top of the data abstraction layer. Elaborate data, perform data analysis activites and train AI models using frameworks and libraries of your choice (e.g., from python-based to DBT, to arbitrary containers). Manage the supporting computational and storage resources in a declarative and transparent manner.
Execute. Delegate the code execution, image preparation, run-time operations and services to the underlying Kubernetes-based execution infrastructure, Serverless platform, and pipeline execution automation environment.
Integrate. Build new AI services and expose your data in a standard and interoperable manner, to facilitate the integration within different applications, systems, business intelligence tools and visualizations.

To support this functionality, the platform relies on scalable Kubernetes platform and its extensions (operators) as well as on the modular architecture and functionality model that allows for dealing with arbitrary jobs, functions, frameworks and solutions without affecting your development workflow. The underlying methodology and management approach aim at facilitating the re-use, reproducibility, and portability of the solution among different contexts and settings.

Explore the full documentation at the link.

The full CHANGELOG is available in the root of the repo.

Quick start

DigitalHub Core is part of the DigitalHub platform, and depends on external components to support the complete set of functionalities. To bootstrap the platform in its entirety please see the documentation at link

Nevertheless, it is possible to run CORE standalone, by:

deploying the pre-built images in a local Kubernetes environment such as minikube
deploying the pre-built images in Docker/Podman
running the application from dist/source

Running locally is suitable only for testing and developing the application itself.

Running in kubernetes

If you have a kubernetes cluster available, deploy the latest core image using the files in hack/k8s. Core requires a service account with all the permissions required to interact with the Kubernetes API.

kubectl apply -f hack/k8s

After configuring the environment and pulling the images, the core API and console should be accessible via http from the service named core.

For example, with minikube:

minikube service core

|-----------|------|-------------|---------------------------|
| NAMESPACE | NAME | TARGET PORT |            URL            |
|-----------|------|-------------|---------------------------|
| default   | core | http/8080   | http://192.168.49.2:30180 |
|-----------|------|-------------|---------------------------|

Running in docker/podman

If you have a docker/podman environment and want to test DigitalHub core, download the latest stable version from the repository and launch it via

docker pull ghcr.io/scc-digitalhub/digitalhub-core:latest
docker run -p 8080:8080 ghcr.io/scc-digitalhub/digitalhub-core:latest

After the bootstrap, the web console will be accessible at http://localhost:8080.

The application leverages an internal embedded database along with an internal index and query engine. To persist data between container restarts mount a volume into the data folder as follows:

mkdir data && chmod 777 data
docker run -p 8080:8080 -v $(pwd)/data:/app/data ghcr.io/scc-digitalhub/digitalhub-core:latest

If a Kubernetes deployment is accessible via '.kube/config', CORE will be able to launch tasks and runs, but the actual execution may end in error due to limited network connectivity between docker and the containers running inside the kubernetes cluster. Without Kubernetes the application functionalities will be limited to CRUD. If the local docker environment is accessible via IP or FQDN from inside the Kubernetes cluster, use the address as DH_ENDPOINT to enable communication from inside Kubernetes towards CORE and run the container with the host network. Do note that kube config needs to have certificates embedded in order to be complete.

docker run -p 8080:8080 -e DH_ENDPOINT=http://172.17.0.1:8080 -e KUBECONFIG=/app/.kube/config -v $(pwd)/data:/app/data -v ~/.kube/config:/app/.kube/config --network=host ghcr.io/scc-digitalhub/digitalhub-core:latest

Configuration

You can locate the project configuration file in the application [core] module, named resources/application.yml. This file specifies configurations for various sections.

Each section allows configurations to be written directly in the YAML file or through environment variables (ENV).

The full configurable parameters are in CONFIGURATION.

The minimal set of configuration parameters required to deploy CORE are listed below.

Server and networking

Core is a networked application which exposes an API and a user web console. It is fundamental that the application is accessible by every client, web or machine, via a public facing address. To configure a split-horizon (unsupported!) system override the DH_ENDPOINT variable in the kubernetes env with the locally accessible address.

KEY	DEFAULT	DESCRIPTION
DH_ENDPOINT	http://localhost:8080	external user facing address for core. The endpoint needs to be accessible from the cluster and from the outside.
DH_NAME	dhcore	The name of the application as shown in API responses and user console
DH_CONTACTS_EMAIL		A contact email for the deployment. Blank by default

Database While CORE can use the internal embedded H2 Database, it is not suitable for production usage. Deploy a PostgreSQL (>12) database and provide the connection details as shown in table.

KEY	DEFAULT	DESCRIPTION
JDBC_PLATFORM	h2	Database platform. Valid values `h2` or `postgresql`
JDBC_DIALECT	org.hibernate.dialect.H2Dialect	Hibernate dialet
JDBC_DRIVER	org.h2.Driver	JDBC driver
JDBC_PASS	password	Password for connecting to database
JDBC_USER	sa	User for connecting to database
JDBC_URL	jdbc:h2:file:./data/db	Database JDBC url

An example with postgresql:

JDBC_DRIVER=org.postgresql.Driver
JDBC_DIALECT=org.hibernate.dialect.PostgreSQLDialect
JDBC_PASS=password
JDBC_USER=dhcore
JDBC_URL=jdbc:postgresql://localhost:5432/dhcore
JDBC_PLATFORM=postgresql

Security and Authentication

CORE implements a token based (OAuth2) authorization mechanism based on the principles of minimal privileges and zero-knowledge, where every job is provided with a set of unique, temporary credentials derived from the user authentication context.

Credentials are issued as access tokens and refresh tokens: configure a suitable duration for job execution. Do note that the minimum supported duration for access tokens is 2 hours, and 8 hours for refresh tokens. Any value under that threshold may produce unexpected issues.

Certificated are autogenerated at every start for test purposes: please create a valid JSON Web Key and pass to CORE to secure the application.

KEY	DEFAULT	DESCRIPTION
JWT_KEYSTORE_PATH	classpath:/keystore.jwks	Path for the JWKS file containing the keystore.
JWT_KEYSTORE_KID		Key Identifier for selecting the signing key from the keystore
JWT_ACCESS_TOKEN_DURATION	28800	Access token duration (in seconds). Defaults to 8 hours
JWT_REFRESH_TOKEN_DURATION	2592000	Refresh token duration (in seconds). Defaults to 30 days
JWT_CLIENT_ID	${security.basic.username}	Client id
JWT_CLIENT_SECRET	${security.basic.password}	Client secret. Not required

To fully activate the authorization system administrators need to enable one of the authentication providers between basic (a shared username/password) or oidc (OpenId Connect). Only the latter (oidc) is suitable for production use.

KEY	DEFAULT	DESCRIPTION
DH_AUTH_BASIC_USER	admin	Basic authentication username
DH_AUTH_BASIC_PASSWORD		Basic authentication password. Disabled by default
DH_AUTH_OIDC_ISSUER_URI		OpenId Connect issuer to use as upstream IdP
DH_AUTH_OIDC_CLIENT_NAME	${application.name}	Client Name used to connect to upstream IdP
DH_AUTH_OIDC_CLIENT_ID		Client Id used to connect to upstream IdP
DH_AUTH_OIDC_CLIENT_SECRET		Client Secret used to connect to upstream IdP
DH_AUTH_OIDC_CLAIM	roles	(Id Token) claim used to extract user authorizations
DH_AUTH_OIDC_USERNAME	preferred_username	(Id Token) claim used to extract username
DH_AUTH_OIDC_SCOPE	openid,email,profile	Scopes used to connect to upstream IdP

Kubernetes Core uses K8S as the compute engine and requires a set of valid credentials to interact with the API, for example in the form of a ServiceAccount with the right privileges. See hack/k8s for details. Furthermore, it is advisable to configure the integration with details regarding namespaces, resources and features (logging, metrics...) of your cluster.

KEY	DEFAULT	DESCRIPTION
K8S_NAMESPACE	default	K8s namespace to use
K8S_ENABLE_LOGS	true	Enable log collection from Pods
K8S_ENABLE_METRICS	true	Enable metrics collection from Metric Server
K8S_ENABLE_RESULTS	default	Enable collection of k8s resources created for runs for logging/debugging
K8S_SEC_DISABLE_ROOT	false	Enforce non-root workloads by applying security policy
K8S_SEC_USER		User ID used for runAsUser
K8S_SEC_GROUP		Group ID used for runAsGroup
K8S_IMAGE_PULL_POLICY	IfNotPresent	Image pull policy for containers
K8S_INIT_IMAGE	ghcr.io/scc-digitalhub/digitalhub-core-builder-tool:latest	Image used for init containers
K8S_SERVICE_TYPE	NodePort	Default service type used to deploy services
K8S_RESOURCE_CPU_DEFAULT	100m	Default CPU requested quota
K8S_RESOURCE_CPU_LIMIT		Default CPU limit quota
K8S_RESOURCE_MEM_DEFAULT	64m	Default RAM requested quota
K8S_RESOURCE_MEM_LIMIT		Default RAM limit quota
K8S_RESOURCE_GPU_KEY	nvidia.com/gpu	Key used to define gpu resources
K8S_RESOURCE_PVC_DEFAULT	2Gi	Default PVC size requested
K8S_RESOURCE_PVC_LIMIT		Default PVC size limit
K8S_RESOURCE_PVC_STORAGE_CLASS		Default PVC storage class

External services

Core can leverage externally deployed components for specific advanced functionalities, such as credentials generation, full text search, event queue etc.

Solr Solr is used to externalize indexing and full-text search. Pass the url and the collection to enable. When disabled, CORE uses an internal search engine based on Lucene.

SOLR needs the collection and schema+fields initialized:
- either pass ADMIN credentials to core to auto-create, or
- create beforehand leveraging schemas in solr/ and avoid passing ADMIN credentials

KEY	DEFAULT	DESCRIPTION
SOLR_URL	false	URL of solr
SOLR_USER		Username for solr authentication
SOLR_PASSWORD		Password for solr authentication
SOLR_ADMIN_USER	SOLR_USER	Admin Username for solr authentication.
SOLR_ADMIN_PASSWORD	SOLR_PASSWORD	Admin Password for solr authentication
SOLR_COLLECTION	dhcore	Name of the collection
SOLR_REINDEX	never	Set to `always` to reindex the whole repository at every restart

Argo

Argo is used to evaluate and orchestrate complex pipelines in K8s. Configure the parameters matching the Argo deployment to enable the integration

KEY	DEFAULT	DESCRIPTION
ARGOWORKFLOWS_ARTIFACTS_CONFIGMAP	artifact-repositories
ARGOWORKFLOWS_ARTIFACTS_KEY	default-artifact-repository
ARGOWORKFLOWS_SERVICE_ACCOUNT	default
ARGOWORKFLOWS_USER	1000

S3

The platform supports S3 as artifacts store. In order to provide jobs and user tools (i.e. console,cli, sdk) with credentials, it is possible to configure a static S3 Provider which will distribute the same set of static credentials. Do note that this configuration is aimed at testing and local deployment. Use a different provider or inject credentials outside CORE for production.

KEY	DEFAULT	DESCRIPTION
S3_CREDENTIALS_PROVIDER	false	Set to `enable` to activate
S3_ENDPOINT_URL		Custom ENDPOINT for S3 service (if any)
S3_BUCKET		Default bucket to use (if any)
AWS_DEFAULT_REGION		Default region (if any)
AWS_ACCESS_KEY		Access Key (required)
AWS_SECRET_KEY		Secret Key (required)
S3_PATH_STYLE_ACCESS		Enable/disable path style access

Minio

When the platform is deployed along side Minio, administrators can configure the usage of dynamic credentials in place of static S3. This way every run will obtain a different set of temporary credentials derived from the ARN role defined in config. The flow used is AssumeRole: if available from the S3 provider, this config can be used with providers different than Minio.

KEY	DEFAULT	DESCRIPTION
MINIO_CREDENTIALS_PROVIDER	false	Set to `enable` to activate
MINIO_CREDENTIALS_ENDPOINT		Custom ENDPOINT for S3 service (if any)
MINIO_CREDENTIALS_REGION	us-east-1	Default region (if any)
MINIO_CREDENTIALS_BUCKET		Default bucket to use (if any)
MINIO_CREDENTIALS_ACCESS_KEY		Access Key (required)
MINIO_CREDENTIALS_SECRET_KEY		Secret Key (required)

PostgreSQL

The platform supports storing data items such as tables and vectors inside a PostgreSQL instance, separated from the one dedicated to JDBC. In order to provide jobs and user tools (i.e. console,cli, sdk) with DB credentials, it is possible to configure a dynamic provider which will exchange credentials to obtain a temporary (i.e. expiring) role in the DB. Configure the properties to enable.

KEY	DEFAULT	DESCRIPTION
DB_CREDENTIALS_PROVIDER	false	Set to `enable` to activate
DB_CREDENTIALS_DATABASE		Name of the database to use
DB_CREDENTIALS_ENDPOINT		Endpoint of the DB STS api
DB_CREDENTIALS_ROLE		Role to assume in the DB
DB_CREDENTIALS_USER		Username for authenticating with STS
DB_CREDENTIALS_PASSWORD		Password for authenticating with STS

Source code and modules

DigitalHub core is a multi-module java application, where every component is defined by its scope and role in the project.

The project is essentially divided into the following parts:

Core
Modules
Frontend

Core manages all the basic elements that are the foundation of the project, while the frontend integrates the user console.

Modules include:

Common objects shared across the project and utilized by other modules.
The authorization and authentication component
Credentials providers for external data stores:
- DB for obtaining dynamic, temporary database credentials
- Minio for obtaining dynamic, temporary S3 credentials for Minio
The Fsm module is the state machine integrated with Core that ensures a function run progresses correctly. This prevents a function's execution from ending in an invalid state.
Files handles the access to file stores
Frameworks are the execution engines, supporting the various execution stacks:
- K8s executes tasks on Kubernetes. We currently support Job, CronJob, Deployment, and Serve frameworks, as well as Monitor functionality for our running objects
- The Kaniko module, integrated with Core, provides the capability to build images for running specific functions.
- Argo handles the execution of complex workflows via the Argo engine
Runtimes modules transform user-defined functions into executables for frameworks:
- Dbt database transformation function
- Kfp workflow execution
- KubeAI
- ML Model serving
- Python functions
- Container
Triggers support automation for functions
- Scheduler is a cron-like trigger
- Lifecycle triggers executions based on entities events, as managed by the FSM

Build from source

In order to build core from source clone the project along with all submodules into a folder.

# Download DH Core from GitHub
git clone https://github.com/scc-digitalhub/digitalhub-core.git && cd digitalhub-core

# Update all submodules
git submodule init
git submodule update

Building the application requires an environment with:

Java 21 or higher
Maven 3.8 or higher
Docker (to build a container image)

To install all the dependencies you can use SDKMAN:

curl -s "https://get.sdkman.io" | bash

and then

sdk install java
sdk install maven

Follow the installation docs for more information on initializing a suitable development environment.

CORE uses Maven as build system and is split into multiple modules. While it is possible to build directly via mvn package, there is a build script optimized for the project at build.sh.

Run it to build a packaged distributable for the application:

sh build.sh

If you want to build without the script, it is advisable to build modules and frontend before building the core application:

./mvnw clean install -pl '!application'
./mvnw clean package -pl 'application'

Afterwards, you can run core via:

./mvnw spring-boot:run -pl application

Do note that running the application from source is advisable only for developing new features or debugging issues.

Build container images

To make a local container image, use the Dockerfile included with the project. Do note that the file uses the cache image published by the project: if you want to build everything locally you need to use the Dockerfile-nocache as input.

docker built -t core:snapshot -f Dockerfile-nocache .

and then run it as container:

docker run -p 8080:8080 core:snapshot

Security Policy

The current release is the supported version. Security fixes are released together with all other fixes in each new release.

If you discover a security vulnerability in this project, please do not open a public issue.

Instead, report it privately by emailing us at digitalhub@fbk.eu. Include as much detail as possible to help us understand and address the issue quickly and responsibly.

Contributing

To report a bug or request a feature, please first check the existing issues to avoid duplicates. If none exist, open a new issue with a clear title and a detailed description, including any steps to reproduce if it's a bug.

To contribute code, start by forking the repository. Clone your fork locally and create a new branch for your changes. Make sure your commits follow the Conventional Commits v1.0 specification to keep history readable and consistent.

Once your changes are ready, push your branch to your fork and open a pull request against the main branch. Be sure to include a summary of what you changed and why. If your pull request addresses an issue, mention it in the description (e.g., “Closes #123”).

Please note that new contributors may be asked to sign a Contributor License Agreement (CLA) before their pull requests can be merged. This helps us ensure compliance with open source licensing standards.

We appreciate contributions and help in improving the project!

Authors

This project is developed and maintained by DSLab – Fondazione Bruno Kessler, with contributions from the open source community. A complete list of contributors is available in the project’s commit history and pull requests.

For questions or inquiries, please contact: digitalhub@fbk.eu

Copyright and license

This project is licensed under the Apache License, Version 2.0. You may not use this file except in compliance with the License. Ownership of contributions remains with the original authors and is governed by the terms of the Apache 2.0 License, including the requirement to grant a license to the project.

Name		Name	Last commit message	Last commit date
Latest commit History 2,223 Commits
.github/workflows		.github/workflows
.mvn/wrapper		.mvn/wrapper
.vscode		.vscode
application		application
checkstyle		checkstyle
core-builder-tool		core-builder-tool
entities		entities
frameworks		frameworks
frontend		frontend
hack/k8s		hack/k8s
modules		modules
providers		providers
runtimes		runtimes
schemas/specs		schemas/specs
templates/function		templates/function
triggers		triggers
.flattened-pom.xml		.flattened-pom.xml
.gitignore		.gitignore
.gitmodules		.gitmodules
.prettierrc.yml		.prettierrc.yml
AUTHORS		AUTHORS
CHANGELOG.md		CHANGELOG.md
CONFIGURATION.md		CONFIGURATION.md
CONTRIBUTING		CONTRIBUTING
COPYRIGHT		COPYRIGHT
Dockerfile		Dockerfile
Dockerfile-cache		Dockerfile-cache
Dockerfile-nocache		Dockerfile-nocache
ENV_VARS.txt		ENV_VARS.txt
I18N_KEYS.txt		I18N_KEYS.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
mvnw		mvnw
mvnw.cmd		mvnw.cmd
package-lock.json		package-lock.json
package.json		package.json
pom.xml		pom.xml
run.sh		run.sh
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DigitalHub CORE

Quick start

Running in kubernetes

Running in docker/podman

Configuration

Source code and modules

Build from source

Build container images

Security Policy

Contributing

Authors

Copyright and license

About

Uh oh!

Releases 137

Packages

Uh oh!

Uh oh!

Contributors 11

Uh oh!

Languages

License

scc-digitalhub/digitalhub-core

Folders and files

Latest commit

History

Repository files navigation

DigitalHub CORE

Quick start

Running in kubernetes

Running in docker/podman

Configuration

Source code and modules

Build from source

Build container images

Security Policy

Contributing

Authors

Copyright and license

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 137

Packages 0

Uh oh!

Uh oh!

Contributors 11

Uh oh!

Languages

Packages