-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
db: add sqlite "source of truth" schema #2617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Revisions
7 more revisions
☑️ AI review skipped after 5 revisions, comment with `/review` to review again HelpReact with emojis to give feedback on AI-generated reviews:
We'd love to hear from you—reach out anytime at team@review.ai. |
be934bc to
50ecba7
Compare
cdec2b5 to
e8d7570
Compare
|
I'm marking this as ready, but I would like to gather some more "broken" schemas, so will post in discord. also posted on #2475 and #2597. If you have a broken database, I would love to get a dump of it. The most valuable would be:
A dump could be taken with: If you have one of these, you can email me an SQL dump, my email is in my Github profile. |
e8d7570 to
7dca4e9
Compare
7dca4e9 to
f068b3f
Compare
nblock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this PR with several databases and they got migrated properly.
| ctx, cancel := context.WithTimeout(context.Background(), contextTimeoutSecs*time.Second) | ||
| defer cancel() | ||
|
|
||
| if err := squibble.Validate(ctx, sqlConn, dbSchema); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately modified the schema and it reported the difference. The error is reported as single line which makes it hard to read: FTL cmd/headscale/cli/serve.go:24 > Error initializing error="creating new headscale: init state: init database: validating schema: invalid schema (-got, +want):\n\n>> Modify index \"idx_policies_deleted_at\"\n@@ -1 +1 @@\n-CREATE INDEX idx_policies_deleted_at ON \"policies2\"(deleted_at)\n+CREATE INDEX idx_policies_deleted_at ON policies(deleted_at)\n\n>> Remove table \"policies2\"\n\n>> Add table \"policies\"\n+ CREATE TABLE policies(\n+ id integer PRIMARY KEY AUTOINCREMENT,\n+ data text,\n+ \n+ created_at datetime,\n+ updated_at datetime,\n+ deleted_at datetime\n+ )\n"
Is it possible to print the error on multiple lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not be printed more clear in human mode, logging library was not very keen on respecting newlines.
hscontrol/db/db.go
Outdated
| log.Info().Int("routes", len(routes)).Int("nodes_with_routes", len(tableData.RoutesData)).Msg("Collected routes data") | ||
| } | ||
|
|
||
| // Drop all existing tables to avoid foreign key constraint issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems dangerous to me. If any of the restore steps below fail, we'd leave the user with an empty/half-filled database.
Have you considered the alternative:
- Rename existing tables to
$TABLE_old - Drop all indices if exists
- Restore data
- Drop old tables
Data could be directly inserted from old to the new table (in case no changes are required) and the data is still there in the renamed tables in case anything fails during restore.
The migration itself could probably not run a second time because old tables are renamed already but at least the data is still there in case the user has no backup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smarter, PTAL
f068b3f to
cde5b80
Compare
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
We are already being punished by github actions, there seem to be little value in running all the tests for both databases, so only run a few key tests to check postgres isnt broken. Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
48a4054 to
4d160a0
Compare
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This PR is kind of like a "stretch goal".This PR is very much needed to try to address the longstanding issues the database schema and data integrity has.
This adds a
schema.sqlwhich should be the source of truth representation of our schema after all migration is applied (for sqlite). This is to ensure you can look one place to figure out how the current schema is defined and not have to do the patchwork of migrations in your head to get an overview.Thanks to @nblock for discovering this.
How did we get here?
As far as I have understood, this is a combination of several things, parts of things we did not know at the time, didn't think about and naivety. I have at least identified the following:
SQLite requires foreign key to be set on each connection
Essentially, from the early birth of Headscale, we added constraint to the schema via GORM, but for many many early versions, they were never enforced because SQLite was started without setting the
PRAGMA foreign_keys, meaning it will just ignore them all.Running for years without any enforcing likely let all sorts of odd relationships of data creep in, causing havoc when we eventually enabled it.
This has since been addressed, and from this PR, we will also enforce foreign keys for all migrations (we had to turn it off for some old migrations as they are a bit all over the place).
All migrations was done in ONE function
Until Headscale version
0.23.0, all migrations was just one big function, what does that mean? it meant that no matter which migrations version you started at, or if you had parts of the migrations applied previously, Headscale had no clue and just tried to do them all. This is not very smart, and let us say, it is amazing that it has not ended up worse than it has.This has since been addressed and all changes are now versioned and runs only if needed.
GORM AutoMigrate is a bit of a trap
The ORM Headscale uses, called GORM has a potentially very nice migration helper called AutoMigrate, simplified, it will look at the Model (your go struct) and the current table, calculate the difference and make a migration path.
The problem here is that if you change a model multiple times over several versions and call AutoMigrate in the first migration, then it will apply changes that are even further ahead than it is suppose to. It does not really use the "snap shot" of the Model from the time of the code base where the migration was written. This is kind of mind field, because you can end up trying to apply a migration in between two AutoMigrate call that depends on the data being in the shape of an older model, before upgrading to the final state.
I am sure we "used AutoMigrate wrong", and it is probably even covered in the docs, but since we often try to clean up old code, we do not want to have a lot of deprecated fields on our models. It just does not fit our code. In addition, the opaque-ness of using AutoMigrate hides away "what is actually changing" for someone who comes back to the migration later and needs to debug something or wants to understand what goes on.
To handle this, from this PR, we will no longer accept the use of AutoMigrate moving forward, and all migrations will have to be explicit.
Why is this PR 6000+ lines
We are doing a very large migration that essentially rebuilds all of the SQLite tables to ensure they are equal to the new truth schema on disk. As some tables out there might have undeleted columns that is not used anymore (AutoMigrate does not remove columns), we have opted for rebuilding.
To make sure this is safe, we have done the following:
For every version of headscale from 0.2.0 to 0.26.1 we have
For the other databases we have got hold of with previously "broken" migrations
Note: None of these changes will apply to Postgres which is in a "maintenance only" mode.