Skip to content

miniconnect/holodb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

521 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

HoloDB โ€“ the on-the-fly relational database

Launch a scalable, writable, and consistent SQL database instantly from a configuration file - eliminating the need for persistent data storage.

Ideal for demos, automated testing, and CI/CD pipelines where you need a database-shaped system without the overhead of data management, migrations, or unnecessary resource waste.

๐Ÿ’ก Why HoloDB?

  • Virtual Data Layer: Data is served from declarative rules instead of storage.
  • Instant Availability: No pre-generation scripts or time-consuming data imports.
  • Fully Functional: Searchable, writable, and compatible with JDBC, Docker, and JPA.
  • Deterministic: Using a seed ensures the exact same dataset across every environment.

๐Ÿš€ Quick start

  1. Create config.yaml:
seed: 98765
schemas:
  - name: my_schema
    tables:
      - name: my_table
        writeable: true
        size: 1500
        columns:
          - { name: id, mode: counter }
          - { name: code, valuesPattern: '[A-F]{3}[0-9]{2}' }
          - { name: year, shuffleQuality: high, valuesRange: [1950, 2000] }
  1. Run the server:
docker run --rm -p 3430:3430 -v '/<path-to-you-file>/config.yaml:/app/config.yaml' miniconnect/holodb
  1. Query instantly

You can now connect to the server on port 3430.

$ ย  micl

Welcome in miniConnect SQL REPL! - localhost:3430

SQL > ย  USE my_schema;

Query was successfully executed!

SQL > ย  SELECT * FROM my_table WHERE year = 1990 ORDER BY id LIMIT 5;

Query was successfully executed!

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚ id  โ”‚ code  โ”‚ year โ”‚
 โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ค
 โ”‚ 125 โ”‚ ADB81 โ”‚ 1990 โ”‚
 โ”‚ 252 โ”‚ AEE24 โ”‚ 1990 โ”‚
 โ”‚ 280 โ”‚ BAC77 โ”‚ 1990 โ”‚
 โ”‚ 332 โ”‚ EFE77 โ”‚ 1990 โ”‚
 โ”‚ 371 โ”‚ BFF62 โ”‚ 1990 โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”˜

For more info about the micl command-line REPL see the miniconnect-client project.

For other ways to use the server and embedded mode see later.

๐Ÿ“š More examples

Explore more complex and realistic examples here.

๐Ÿงฉ SQL features

Currently, a limited subset of the SQL features is supported by the default query engine. It lacks some relatively basic features such as grouping and filtering by arbitrary expressions. However, it's powerful enough to serve queries of most ORM system.

Visit the SQL guide to learn more about the SQL features supported by the default query engine. Alternatively, you can try the experimental integration with the Apache Calcite query planner.

๐Ÿ“ Configuration reference

On the top level these keys are supported:

Key Type Description
seed LargeInteger global random seed (global default: 0)
schemas List list of schemas (see below)

The seed option defines a root value used in the hierarchical derivation of keys, which are then applied in randomized data generation at runtime. Modifying it remixes the entire content of the database.

For each schema, these subkeys are supported:

Key Type Description
name String name of the database schema
tables List list of tables in this schema, see below (global default: none)

For each table, these subkeys are supported:

Key Type Description
name String name of the database table
writeable boolean writeable or not (global default: false)
size LargeInteger number of records in this table (global default: 50)
columns List list of columns in this table, see below (global default: none)

If the writeable option is set to true, then an additional layer will be added over the read-only table, which accepts and stores insertions, updates, and deletions, and it gives the effect that the table is writeable.

For each column, several subkeys are supported, of which those related to the base value set and those related to additional settings can be distinguished (some of the latter may also apply to the base value set). At most one of the base value set keys should be used. These keys all begin with the prefix values:

Key Type Description
values Object[] explicit list of possible values
valuesResource String name of a java resource which contains the values line by line
valuesBundle String short name of a bundled value resource, otherwise similar to valuesResource (see below)
valuesRange LargeInteger[] start and end value of a numeric value range
valuesPattern String regex pattern for values (reverse indexed)
valuesDynamicPattern String regex pattern for values (not reverse indexed)
valuesTextKind String text kind for dummy text: phrase, title, sentence, paragraph, markdown, or html
valuesForeignColumn String[] use value set of a foreign column (ideal for ID-based foreign keys)

The valuesDynamicPattern option is backed by Generex. It supports slightly more regex features than valuesPattern, but it is not indexed and generates individual strings more slowly. The native engine behind valuesPattern uses an efficient, purpose-built algorithm, and should be preferred in most cases.

The valuesTextKind key can be used to generate dummy text in various forms. These are based on randomly mixed words from the "lorem ipsum" text, supplemented with some English conjunctions. The available values are as follows:

Text kind Example Description
phrase eiusmod an aliqua ullamco a short phrase
title The Nulla Sit Tempor a title with capitalization
sentence Some exercitation an occaecat anim the duis. a sentence terminated with perid
paragraph a paragraph containing 3-6 sentences
markdown MarkDown formatted text containing a few headers and paragraphs
html HTML formatted text containing a few headers and paragraphs

The column defined using valuesTextKind will not be indexed (for an indexed column containing single words, use the valuesBundle key with the value lorem).

There are several possible values for valuesBundle:

Bundle name Description
cities 100 major world cities
colors 147 color names (from CSS3)
countries 197 country names
female-forenames 100 frequent English female forenames
forenames 100 frequent English forenames (50 female, 50 male)
fruits 26 of the best selling fruits
log-levels the 6 common log levels (from log4j)
lorem 49 lower-case words of the Lorem ipsum text
male-forenames 100 frequent English male forenames
months the 12 month names
surnames 100 frequent English surnames
weekdays the names of the 7 days of the week

If used, the value of valuesForeignColumn must be an array of lengths 1, 2, or 3. The one-element version contains a column name in the same table. The two-element version contains a [<table>, <column>] pair in the same schema. The three-element version contains the [<schema>, <table>, <column>] triplet.

The remaining column settings are as follows:

Key Type Description
name String name of the table column
type String (Class<?>) java class name of column type
mode String filling mode: default, counter, fixed, or enum (global default: default)
nullCount LargeInteger count of null values (global default: 0)
distributionQuality String distribution quality: low, medium, or high (global default: medium)
shuffleQuality String shuffle quality: noop, very_low, low, medium, high, or very_high (global default: medium)
sourceFactory String java class name of source factory (must implement hu.webarticum.holodb.spi.config.SourceFactory)
sourceFactoryData any data will be passed to the source factory
defaultValue any default insert value for the column
seedKey LargeInteger custom seed key

In most cases, type can be omitted. If the configuration loader cannot guess the type, the startup aborts with an error. The type can always be explicitly specified; the required conversion is applied automatically (it is even possible to generate numbers based on, say, a regular expression, though this may cause problems when sorting).

The meaning of mode values:

Mode Description
default randomly distributed, non-unique values, indexed (except in case of valuesDynamicPattern used)
counter fill with increasing whole numbers starting from 1, unique, indexed (good choice for ID columns)
fixed values will not be shuffled, the count of values must be equal to the table size, non-indexed
enum similar to default, but with different proper rules for equality check, sort order and insertion/update

In the case of writable tables, if other than the enum mode is used, users can also put values different from the initial ones.

If nullCount is specified (even if 0), then the column will be nullable. Omit nullCount to make the column NOT NULL. In case of custom sourceFactory, the column will be NOT NULL only iff the source is an IndexedSource and has at least one null value.

Currently, for a fixed column, only values is supported.

In the case of counter mode, explicit setup of values will be ignored and should be omitted. The type of a counter column is always hu.webarticum.miniconnect.lang.LargeInteger.

If seedKey is specified, it will be explicitly used as a key for the sub-random-generator for the column. Setting or changing this value alters the data distribution, shuffling, etc. for this column without affecting other columns. If two columns of a table share the same non-null seedKey while they have the same settings (except for name), then they will provide the exact same values in the exact same order, effectively making them mirrors of each other. This also means that such a column can be renamed without remixing its content.

You can set default configuration for schemas, tables, and columns at any higher level in the configuration tree. Any value set at a lower lever will override any value set at a higher level (and, of course, the global default).

Key Available in
schemaDefaults root
tableDefaults root, schemas.*
columnDefaults root, schemas.*, schemas.*.tables.*

For example:

tableDefaults:
  writeable: false
  size: 120
columnDefaults:
  shuffleQuality: noop
schemas:
  - name: schema_1
    tables:
      # ...
schemas:
  - name: schema_2
    tableDefaults:
      writeable: true
    tables:
      # ...

Using this config all tables with no explicit size will have the size 120, all tables with no explicit writeable will read-only in schema_1, and writeable in schema_2. Also, data shuffling is disabled by default.

โฌ‡๏ธ Getting the configuration JSON Schema

You can generate a JSON schema for this configuration data structure by executing the config:generateSchema gradle task inside the holodb gradle project. Then the generated schema file will be found here:

projects/config/build/schemas/holodb-config.schema.json

Starting from version 7.0.0, the schema file is also published to the Maven repository.

๐Ÿ“‚ Loading values from resource

You can use custom predefined value sets too. To do this, create a file with one value on each line. Make this file available to the java classloader. If you use docker, the easiest way to do this is to copy the file into the /app/resources directory:

FROM miniconnect/holodb:latest

COPY config.yaml /app/config.yaml
COPY my-car-brands.txt /app/resources/my-car-brands.txt

You can use a predefined value set resource with the valuesResource key in config.yaml:

          # ...
          - name: car_brand
            valuesResource: 'my-car-brands.txt'

You can also retrieve existing data from several sources, for example WikiData, JSONPlaceholder or Kaggle.

Here is an example, where we get data from WikiData, process it with jq, then save it to the docker image. To safely achieve this, we use a builder image:

FROM dwdraju/alpine-curl-jq:latest AS builder
RUN curl --get \
  --data-urlencode 'query=SELECT ?lemma WHERE \
    { ?lexemeId dct:language wd:Q1860; wikibase:lemma ?lemma. ?lexemeId wikibase:lexicalCategory wd:Q9788 } \
    ORDER BY ?lemma' \
  'https://query.wikidata.org/bigdata/namespace/wdq/sparql' \
  -H 'Accept: application/json' \
  | jq -r '.results.bindings[].lemma.value' \
  > en-letters.txt

FROM miniconnect/holodb:latest
COPY config.yaml /app/config.yaml
COPY --from=builder /en-letters.txt /app/resources/en-letters.txt

๐Ÿ’ฝ Generating from an existing database

You can find an experimental python script in the tools directory that creates a HoloDB configuration from an existing MySQL database.

Here is an example of how you can use it:

python3 mysql_scanner.py -u your_user -p your_password -d your_database -w

Use the -h or --help option for more details.

๐Ÿ—„๏ธ Embedded mode via JDBC

You can use HoloDB as an embedded database.

To achieve this, first add the required dependency:

implementation "hu.webarticum.holodb:embedded:${holodbVersion}"

Set the JDBC connection URL, specifying a resource:

jdbc:holodb:embedded:resource://config.yaml

Or any file on the file system:

jdbc:holodb:embedded:file:///path/to/config.yaml

Or with selecting a specific schema:

jdbc:holodb:embedded:resource://config.yaml?schema=university

(Note: Number of slashes does matter.)

Use the hu.webarticum.holodb.embedded.HoloEmbeddedDriver driver class if its explicit setting is mandatory.

๐Ÿ”„ Client-server mode via JDBC

To achieve this, first add the required dependency:

implementation "hu.webarticum.miniconnect:jdbc:${miniConnectVersion}"

Set the JDBC connection URL, specifying a resource:

jdbc:miniconnect://localhost:3430

Or with selecting a specific schema:

jdbc:miniconnect://localhost:3430/university

In this case, use the hu.webarticum.miniconnect.jdbc.MiniJdbcDriver driver class if necessary.

๐Ÿ“ฆ Mocking JPA entities

To use the annotations below, set the jpa-annotations subproject as a dependency:

implementation "hu.webarticum.holodb:jpa-annotations:${holodbVersion}"

If you want to use the service providers (e. g. SourceFactory), include the spi subproject too:

implementation "hu.webarticum.holodb:spi:${holodbVersion}"

Actually running it requires the jpa subproject instead of the jpa-annotations:

implementation "hu.webarticum.holodb:jpa:${holodbVersion}"

The jpa subproject has several dependencies (while jpa-annotations is near pure). If you only use it for tests, define it as a test-only dependency.

Set this JDBC connection URL to use HoloDB as the database backend:

jdbc:holodb:jpa://

(Optionally, the schema can also be specified, e.g. jdbc:holodb:jpa:///my_schema_name.)

At the moment, schema construction is not fully automatic, it is necessary to explicitly pass the metamodel. For example in Micronaut:

@Singleton
public class HoloInit {

    private final EntityManager entityManager;

    public HoloInit(EntityManager entityManager) {
        this.entityManager = entityManager;
    }

    @EventListener
    @Transactional
    public void onStartup(StartupEvent startupEvent) {
        JpaMetamodelDriver.setMetamodel(entityManager.getMetamodel());
    }

}

The solution should be similarly simple for Spring or other frameworks.

Now, all of your entities will be backed by HoloDB tables with automatic configuration. To fine-tune this configuration, you can use some annotation on the entity classes. For example:

@Entity
@Table(name = "companies")
@HoloTable(size = 25)
@HoloVirtualColumn(name = "extracol", type = Integer.class, valuesRange = {10, 20})
public class Company {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column(name = "birth_country", nullable = false)
    @HoloColumn(valuesBundle = "countries")
    private String country;

    // ...

}

๐Ÿท๏ธ JPA annotations reference

These are the supported annotations:

Annotation Target Description
@HoloTable class Overrides table parameters (schema, name, writeable, size)
@HoloColumn field, method Overrides column parameters
@HoloIgnore class, field, method Ignores an entity or attribute
@HoloVirtualColumn class Defines an additional column for the entity (multiple occurrences allowed)

@HoloColumn and @HoloVirtualColumn accepts all the columns configurations (for @HoloVirtualColumn, name and type are mandatory).

Some numeric settings have two variants, one for usual and one for large values:

Annotation Usual field Large field
@HoloTable size (long) largeSize (String)
@HoloColumn nullCount (long) largeNullCount (String)
@HoloColumn valuesRange (long[]) largeValuesRange (String[])
@HoloVirtualColumn nullCount (long) largeNullCount (String)
@HoloVirtualColumn valuesRange (long[]) largeValuesRange (String[])

Some settings accepts custom data:

Annotation Annotation field Type Config field
@HoloColumn sourceFactoryData @HoloValue sourceFactoryData
@HoloColumn sourceFactoryDataMap @HoloValue[] sourceFactoryData
@HoloColumn defaultValue @HoloValue defaultValue
@HoloVirtualColumn sourceFactoryData @HoloValue sourceFactoryData
@HoloVirtualColumn sourceFactoryDataMap @HoloValue[] sourceFactoryData
@HoloVirtualColumn defaultValue @HoloValue defaultValue

Fields ending with the 'Map' suffix accepts an array of @HoloValues, you can use @HoloValue.key to set map entry key for each.

๐Ÿงฑ Overview of sub-projects

HoloDB consists of several Gradle subprojects located in the projects directory:

Subproject Description
โ–ถ๏ธย  app Standalone application
๐Ÿš€ย  bootstrap Utility for initializing a database from configuration
๐Ÿ› ๏ธย  config Configuration model classes
๐Ÿ”ฉย  core Core building blocks
๐Ÿ—„๏ธย  embedded Embedded mode drivers
๐Ÿ“’ย  jpa JPA mocking drivers
๐Ÿท๏ธย  jpa-annotations JPA configuration annotations (lightweight dependency)
๐Ÿ”ขย  regex Regex-based value provider
๐Ÿ”Œย  spi Service provider interfaces
๐Ÿ’พย  storage MiniBase storage implementation
๐Ÿ—ƒ๏ธย  values Predefined value sets

โš™๏ธ How does HoloDB work?

HoloDB is a flexible virtual relational database engine written in Java.

Like other relational database engines, HoloDB is a collection of tools built on top of a query engine layered over a structured data access API. The difference is that this API does not access a real pre-populated data storage, but dynamically computes values from configuration. Unlike simplistic SQL mocking techniques, results remain realistic and mutually consistent across queries, computed dynamically yet reproducibly.

Column data is typically produced in layered steps:

  1. Base set: an ordered, searchable, typically virtual collection of values (as simple as a numeric range or as complex as all strings matching a regex).
  2. Distribution: this base set is stretched over the required table size. The result is a monotonic, easily searchable list of values matching the configured characteristics.
  3. Shuffling: the permutation layer makes the data look realistically random. This is always invertible, so search queries remain efficient and reproducible.

Default method of providing column data

The base value set for a column is expected to be ordered and searchable. Such a value set can be as simple as a numerical range or as sophisticated as the huge space of strings matching to a complex regular expression.

The simplest but yet efficient distribution strategy is linear interpolation. However a more fine-tuned distribution can be parameterized with value frequency and some level of pseudo-randomness. You can also explicitly configure the amount of null values mixed in.

The shuffling layer ensures realistic randomness through a pair of functions: a permutation and its inverse. High-quality implementations are typically based on Feistel cipher and independently scalable hash functions. However, simpler and more performant implementations such as linear congruential methods often suffice. Exploring the trade-off between seemingly strong randomization and efficiency is one of the project's intriguing areas.

Writable tables utilize a diff layer, transparently tracking inserts, updates, and deletions separate from the immutable virtual baseline. Concurrent modifications are managed by a lightweight transaction management layer, ideal for short-lived writable datasets.

The on-the-fly computations rely heavily on arithmetic-centric operations rather than data storage and retrieval. For numeric efficiency, HoloDB introduces specialized types and algorithms, most of which can be used standalone too. For example, LargeInteger is an arbitrarily large numeric data type somewhat inspired by similar double-nature implementations such as SafeLong from the Spire library, BigInt from the Scala standard library, and others. Compared to these, LargeInteger is more efficient in case of frequent operations on smaller numbers.

๐Ÿ““ Changelog

See CHANGELOG.md.

About

Sketch declaratively, query immediately, reload on the go. Experience a full-fledged virtual database with an imperceptible resource footprint.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages