A friendly, lightweight semantic layer for data analytics, inspired by Airbnb's Minerva.
Disclaimer
Mimir is a project born out of curiosity to explore the core ideas behind a metrics store. It's a playground for demonstrating architectural patterns and seeing what a "metrics-as-code" workflow can feel like.
So while it's built with solid engineering principles in mind, it is not (and probably will never be) meant for production use. Feel free to clone it, break it, learn from it, and have fun!
You can find more about the context and the purpose of this project in this Medium article.
Mimir is designed to separate the development and consumption of metrics. The CLI is the primary tool for the "Control Plane" (managing definitions), while the server components form the "Service Plane" (serving data to clients).
This separation allows analytics engineers to define and validate metrics in a version-controlled, git-native workflow, while data consumers (whether BI tools or ad-hoc users) can query a stable, unified API.
+-----------------------+ +----------------+ +-------------------+ +---------------------+
| Analytics Engineer |----->| Mimir CLI |----->| Git Repository |----->| CI/CD Pipeline |
| (on their laptop) | | (create, | | (Metrics as Code) | | (mimir validate) |
+-----------------------+ | validate, | +-------------------+ +----------+----------+
| query --dry-run) | |
+----------------+ | (Sync)
|
v
+-----------------------+ +----------------+ +-------------------+ +---------------------+
| BI Tool (Tableau) |----->| SQL Proxy |----->| Mimir Server |<-----| Config DB / S3 |
+-----------------------+ +----------------+ | (FastAPI) | | (Centralized Store) |
(MySQL Protocol) +---------^---------+ +---------------------+
|
+-----------------------+ | (HTTP API)
| Ad-hoc User |----->| Mimir Client |-----------------+
| (on their laptop) | | (Python/Jupyter)|
+-----------------------+ +----------------+
- API Server: A FastAPI server that accepts
Inquiryrequests and returns data in Apache Arrow format. - SQL Proxy: A MySQL-compatible proxy (using
mysql-mimic) that lets you query Mimir with the SQL you already know and love. It translates your queries into Mimir API requests behind the scenes. - Query Engine: The brains of the operation. It uses a split-apply-combine strategy: it splits an inquiry into atomic queries for each data source, executes them in parallel, and uses an in-memory DuckDB instance to perform the final join.
Ensure you have the following tools installed:
- Python 3.12+
- Docker and Docker Compose
makecurl(for installinguv)
This project uses uv for fast dependency management. The install-dev command will create a virtual environment, install all necessary packages, and set up pre-commit hooks.
make install-devBuild and start the Docker containers for the Mimir API, the SQL proxy, and a sample PostgreSQL database.
make example-upOnce the services are running, you can query Mimir in two ways:
Connect to the proxy using any MySQL-compatible client (e.g., mysql, DBeaver, DataGrip).
- Host:
localhost - Port:
3306 - User/Password: (anything you want, the example proxy has no auth)
-- This query will be translated into an API call
SELECT
dim_rental_category,
AGG(movies_rented),
AGG(rentals_revenue)
FROM
mimir.metrics
WHERE
dim_rental_category = 'Action'
GROUP BY
dim_rental_category;Note: The SQL Proxy is experimental and does not yet support CTEs, subqueries, or derived tables.
You can also talk to the API directly. The interactive docs are available at http://localhost:8090/docs.
When you're done exploring:
make example-downMimir is driven by YAML configuration files located in the configs directory.
Sources define the "denormalized layer" and point to your underlying data. They are defined as a dictionary in one or more YAML files.
| Key | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | The unique name for the source. |
time_col |
string | Yes | The name of the primary timestamp/date column in the underlying table. |
connection_name |
string | Yes | The name of the secret file (w/o .json) in the secrets directory. |
sql |
string | Yes | A SQL query that defines the source, including joins and transformations. |
dimensions |
list | No | A list of dimension names that can be joined to this source. |
time_col_alias |
string | No | An optional alias for the time_col. |
description |
string | No | A human-readable description of the source. |
Metrics define the quantitative calculations for your business.
| Key | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | The unique name for the metric. |
source_name |
string | Yes | The name of the source this metric is built on. |
sql |
string | Yes | The SQL aggregation expression (e.g., SUM(amount) as revenue). |
description |
string | No | A human-readable description of the metric. |
required_dimensions |
list | No | A list of dimensions that must be available when querying this metric. |
Dimensions define the categorical fields you can group by.
| Key | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | The unique name for the dimension. |
source_name |
string | Yes | The name of the source this dimension applies to. |
sql |
string | Yes | The SQL expression for the dimension column (e.g., COALESCE(c.name, 'N/A')). |
description |
string | No | A human-readable description of the dimension. |
Mimir includes a powerful CLI for developing, validating, and querying your semantic layer. You can invoke it using uv run mimir -- [COMMAND].
init
Initializes a new Mimir project with the standard directory structure.
mkdir my-mimir-project
cd my-mimir-project
uv run mimir initcreate
Interactively create new metric or dimension definitions.
# Create a new metric
uv run mimir create metric
# Create a new dimension
uv run mimir create dimensionvalidate
Validates all configuration files in a specified directory to ensure they are syntactically correct and semantically valid.
uv run mimir validate --configs path/to/your/configslist
Lists all available definitions of a certain type.
# List all available data sources
uv run mimir list sources
# List all available metrics
uv run mimir list metrics
# List all available dimensions
uv run mimir list dimensionsdescribe
Shows detailed metadata for a single definition, including its description and underlying SQL.
uv run mimir describe <definition_name> <metric|dimension|source>query
Runs a query against the Mimir engine. This command can be run in two modes:
- Local Mode: Runs the query engine directly on your machine, using local config and secret files.
- Remote Mode: Acts as a client to a running Mimir API server.
Here's how you would run the same query from the SQL example using the CLI against a running Mimir server:
uv run mimir query \
--host http://localhost:8090 \
--metric movies_rented \
--metric rentals_revenue \
--dimension dim_rental_category \
--filter "dim_rental_category = 'Action'"You can also run it locally (without the --host flag) or use --dry-run to see the compiled SQL without executing it.
Mimir is built to be extended. If you don't like how something works, you can change it. Here are a couple of ideas to get you started.
Want to connect to a data source that isn't supported out-of-the-box?
- Create a new class that inherits from
mimir.api.connections.Connection. - Implement the
querymethod. This method just needs to accept a SQL string and return the results as apyarrow.Table.
Once you've done that, just register your new connection in the ConnectionFactory and Mimir will know how to use it.
Tired of YAML files? Want to load your metric definitions from a database, a Git repo, or an S3 bucket? Go for it.
- Create a new class that inherits from
mimir.api.loaders.BaseConfigLoader. - Implement the required methods (
get,get_all,get_secret).
Pass an instance of your new loader to the MimirEngine, and you're off to the races.
This project is a learning experiment, but contributions are absolutely welcome! Please feel free to open an issue or submit a pull request.
Here are some ideas for improvements. Help is wanted!
- Better config validation: Improve handling of malformed configurations and the CLI validate tool
- More Data Sources: Add native connection support for BigQuery, Snowflake, etc.
- Enhanced SQL Proxy: Add support for more complex queries and easier virtual tables discoverability.
- Web UI: A simple web interface for exploring and managing definitions.