db: add sqlite "source of truth" schema #2617

kradalby · 2025-05-21T09:15:53Z

~~This PR is kind of like a "stretch goal"~~.

This PR is very much needed to try to address the longstanding issues the database schema and data integrity has.

This adds a schema.sql which should be the source of truth representation of our schema after all migration is applied (for sqlite). This is to ensure you can look one place to figure out how the current schema is defined and not have to do the patchwork of migrations in your head to get an overview.

Thanks to @nblock for discovering this.

How did we get here?

As far as I have understood, this is a combination of several things, parts of things we did not know at the time, didn't think about and naivety. I have at least identified the following:

SQLite requires foreign key to be set on each connection
Essentially, from the early birth of Headscale, we added constraint to the schema via GORM, but for many many early versions, they were never enforced because SQLite was started without setting the PRAGMA foreign_keys, meaning it will just ignore them all.
Running for years without any enforcing likely let all sorts of odd relationships of data creep in, causing havoc when we eventually enabled it.

This has since been addressed, and from this PR, we will also enforce foreign keys for all migrations (we had to turn it off for some old migrations as they are a bit all over the place).

All migrations was done in ONE function
Until Headscale version 0.23.0, all migrations was just one big function, what does that mean? it meant that no matter which migrations version you started at, or if you had parts of the migrations applied previously, Headscale had no clue and just tried to do them all. This is not very smart, and let us say, it is amazing that it has not ended up worse than it has.

This has since been addressed and all changes are now versioned and runs only if needed.

GORM AutoMigrate is a bit of a trap
The ORM Headscale uses, called GORM has a potentially very nice migration helper called AutoMigrate, simplified, it will look at the Model (your go struct) and the current table, calculate the difference and make a migration path.

The problem here is that if you change a model multiple times over several versions and call AutoMigrate in the first migration, then it will apply changes that are even further ahead than it is suppose to. It does not really use the "snap shot" of the Model from the time of the code base where the migration was written. This is kind of mind field, because you can end up trying to apply a migration in between two AutoMigrate call that depends on the data being in the shape of an older model, before upgrading to the final state.
I am sure we "used AutoMigrate wrong", and it is probably even covered in the docs, but since we often try to clean up old code, we do not want to have a lot of deprecated fields on our models. It just does not fit our code. In addition, the opaque-ness of using AutoMigrate hides away "what is actually changing" for someone who comes back to the migration later and needs to debug something or wants to understand what goes on.

To handle this, from this PR, we will no longer accept the use of AutoMigrate moving forward, and all migrations will have to be explicit.

Why is this PR 6000+ lines

We are doing a very large migration that essentially rebuilds all of the SQLite tables to ensure they are equal to the new truth schema on disk. As some tables out there might have undeleted columns that is not used anymore (AutoMigrate does not remove columns), we have opted for rebuilding.

To make sure this is safe, we have done the following:

For every version of headscale from 0.2.0 to 0.26.1 we have
- extracted the schema at the time
- dumped a database (empty, but with partial migrations)
- test that it can be migrated to the newest schema and matches
For the other databases we have got hold of with previously "broken" migrations
- We test migration to the new version
- We try to validate that the data is as expected when it has finished migration

Note: None of these changes will apply to Postgres which is in a "maintenance only" mode.

ghost · 2025-05-21T09:15:56Z

Pull Request Revisions

Revision	Description
r12	GitHub workflow files reformatted Multiple GitHub workflow YAML files were reformatted with improved line breaks and multiline string handling, primarily affecting indentation and line wrapping
r11	Commented out gopls build configuration Removed active gopls override configuration, converting it to a commented-out block in the Nix flake configuration
r10	No changes since last revision
r9	Database migration SQL simplified Removed redundant `EXISTS` checks in database migration SQL insert statements, streamlining data copy process
r8	Refactored GitHub Actions integration tests Created reusable integration test template and updated workflow to split SQLite and PostgreSQL test configurations with separate job matrices

7 more revisions

r7	The changes are too large to summarize
r6	Added detailed SQLite configuration documentation Expanded inline documentation for SQLite configuration enums, explaining performance and durability tradeoffs for journal modes, auto-vacuum, and synchronous settings
r5	Refactored database code with minor improvements Made multiple small code quality improvements across database-related files, including error handling, constant definitions, test cleanup, and minor code style adjustments
r4	The changes are too large to summarize
r3	Complex SQLite schema migration implemented Added robust SQLite schema migration with comprehensive data preservation, validation, and handling of legacy database structures including routes table migration and index recreation
r2	SQLite database schema migration improved Implemented comprehensive SQLite schema migration with robust data preservation, foreign key validation, index recreation, and advanced error handling mechanisms
r1	Added database schema validation logic Introduced schema validation for SQLite database using embedded SQL schema file and squibble library, with specific validation for migrations

☑️ AI review skipped after 5 revisions, comment with `/review` to review again

Help

React with emojis to give feedback on AI-generated reviews:

👍 means the feedback was helpful and actionable
👎 means the feedback was incorrect or unhelpful

💬 Replying to feedback with a comment helps us improve the system. Your input also contributes to shaping future interactions with the AI reviewer.

We'd love to hear from you—reach out anytime at team@review.ai.

hscontrol/db/schema.sql

hscontrol/db/db.go

kradalby · 2025-07-03T10:25:22Z

I'm marking this as ready, but I would like to gather some more "broken" schemas, so will post in discord. also posted on #2475 and #2597.

If you have a broken database, I would love to get a dump of it. The most valuable would be:

The older (or multiple steps along the way) the better (an old backup for example)
The more data the better
Scrub PII, auth keys and API, but don't remove things that don't matter (you can leave public keys for node, disco machine)
A description of the type of error you got when it was bad, and possibly how you fixed it.

A dump could be taken with:

sqlite3 <DATABASE PATH> ".dump" > database.sql

If you have one of these, you can email me an SQL dump, my email is in my Github profile.

hscontrol/db/node_test.go

hscontrol/db/sqliteconfig/config.go

hscontrol/db/schema.sql

hscontrol/db/sqliteconfig/config.go

hscontrol/db/node_test.go

hscontrol/db/schema.sql

hscontrol/db/sqliteconfig/config.go

nblock

Tested this PR with several databases and they got migrated properly.

hscontrol/db/sqliteconfig/config.go

nblock · 2025-07-07T05:53:29Z

hscontrol/db/db.go

+		ctx, cancel := context.WithTimeout(context.Background(), contextTimeoutSecs*time.Second)
+		defer cancel()
+
+		if err := squibble.Validate(ctx, sqlConn, dbSchema); err != nil {


I deliberately modified the schema and it reported the difference. The error is reported as single line which makes it hard to read: FTL cmd/headscale/cli/serve.go:24 > Error initializing error="creating new headscale: init state: init database: validating schema: invalid schema (-got, +want):\n\n>> Modify index \"idx_policies_deleted_at\"\n@@ -1 +1 @@\n-CREATE INDEX idx_policies_deleted_at ON \"policies2\"(deleted_at)\n+CREATE INDEX idx_policies_deleted_at ON policies(deleted_at)\n\n>> Remove table \"policies2\"\n\n>> Add table \"policies\"\n+ CREATE TABLE policies(\n+ id integer PRIMARY KEY AUTOINCREMENT,\n+ data text,\n+ \n+ created_at datetime,\n+ updated_at datetime,\n+ deleted_at datetime\n+ )\n"

Is it possible to print the error on multiple lines?

Should not be printed more clear in human mode, logging library was not very keen on respecting newlines.

nblock · 2025-07-07T06:36:38Z

hscontrol/db/db.go

+						log.Info().Int("routes", len(routes)).Int("nodes_with_routes", len(tableData.RoutesData)).Msg("Collected routes data")
+					}
+
+					// Drop all existing tables to avoid foreign key constraint issues


This seems dangerous to me. If any of the restore steps below fail, we'd leave the user with an empty/half-filled database.

Have you considered the alternative:

Rename existing tables to $TABLE_old

Drop all indices if exists

Restore data

Drop old tables

Data could be directly inserted from old to the new table (in case no changes are required) and the data is still there in the renamed tables in case anything fails during restore.

The migration itself could probably not run a second time because old tables are renamed already but at least the data is still there in case the user has no backup.

Smarter, PTAL

hscontrol/db/db.go

hscontrol/db/schema.sql

CHANGELOG.md

hscontrol/db/db.go

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

We are already being punished by github actions, there seem to be little value in running all the tests for both databases, so only run a few key tests to check postgres isnt broken. Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

kradalby force-pushed the kradalby/schema-truth branch from be934bc to 50ecba7 Compare June 18, 2025 13:48

kradalby mentioned this pull request Jun 25, 2025

[Bug] node randomly gets removed due to a panic #2597

Closed

4 tasks

nblock requested changes Jun 30, 2025

View reviewed changes

hscontrol/db/schema.sql Show resolved Hide resolved

nblock reviewed Jun 30, 2025

View reviewed changes

hscontrol/db/db.go Outdated Show resolved Hide resolved

kradalby mentioned this pull request Jul 2, 2025

[Bug] Internal server error when logging in using Google OIDC #2475

Closed

4 tasks

kradalby force-pushed the kradalby/schema-truth branch 3 times, most recently from cdec2b5 to e8d7570 Compare July 3, 2025 09:49

kradalby marked this pull request as ready for review July 3, 2025 10:25

kradalby requested a review from juanfont as a code owner July 3, 2025 10:25

kradalby force-pushed the kradalby/schema-truth branch from e8d7570 to 7dca4e9 Compare July 3, 2025 11:21

ghost approved these changes Jul 3, 2025

View reviewed changes

kradalby force-pushed the kradalby/schema-truth branch from 7dca4e9 to f068b3f Compare July 3, 2025 12:30

kradalby requested a review from nblock July 3, 2025 12:30

nblock reviewed Jul 7, 2025

View reviewed changes

kradalby force-pushed the kradalby/schema-truth branch from f068b3f to cde5b80 Compare July 7, 2025 10:14

kradalby requested a review from ohdearaugustin as a code owner July 7, 2025 10:14

kradalby requested a review from nblock July 7, 2025 10:15

nblock requested changes Jul 7, 2025

View reviewed changes

kradalby added 3 commits July 7, 2025 13:28

db: add sqlite "source of truth" schema

d8c63b9

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

changelog: add entry for db

970db73

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

kradalby force-pushed the kradalby/schema-truth branch from 48a4054 to 4d160a0 Compare July 7, 2025 11:28

kradalby requested a review from nblock July 7, 2025 11:30

nblock approved these changes Jul 7, 2025

View reviewed changes

kradalby added 2 commits July 7, 2025 15:02

flake: dont override gopls

8e5e07e

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

.github/workflows: prettier

3742424

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

kradalby merged commit 5ba7120 into juanfont:main Jul 7, 2025
83 of 87 checks passed

kradalby deleted the kradalby/schema-truth branch July 7, 2025 14:48

kradalby mentioned this pull request Jul 15, 2025

[Bug] Failed to update 0.23.0 -> 0.24.x #2523

Closed

4 tasks

chenrui333 mentioned this pull request Oct 27, 2025

headscale 0.27.0 chenrui333/homebrew-tap#2260

Open

Uh oh!

db: add sqlite "source of truth" schema #2617

db: add sqlite "source of truth" schema #2617

Uh oh!

Conversation

kradalby commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How did we get here?

Why is this PR 6000+ lines

Uh oh!

ghost commented May 21, 2025 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Revisions

Uh oh!

Uh oh!

Uh oh!

kradalby commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nblock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nblock Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

kradalby Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

nblock Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

kradalby Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kradalby commented May 21, 2025 •

edited

Loading

ghost commented May 21, 2025 •

edited by ghost

Loading