Skip to content

Store the progress & diff base in file and sqlite storages [v1]#1261

Closed
nolar wants to merge 6 commits into
mainfrom
claude/more-storages-DNoV8
Closed

Store the progress & diff base in file and sqlite storages [v1]#1261
nolar wants to merge 6 commits into
mainfrom
claude/more-storages-DNoV8

Conversation

@nolar
Copy link
Copy Markdown
Owner

@nolar nolar commented Mar 1, 2026

Summary

Add file-based and SQLite-based storage backends for progress and diffbase data, as alternatives to storing state on the Kubernetes object itself (annotations/status).

This should benefit users who do not want to store the state of the state-tracking handlers on the objects, e.g. to reduce the number of API PATCH requests to the cluster, but at the same time want something out of the box (instead of implementing it themselves).

These storages depend on StdLib alone and do not bring new dependencies. However, they cannot be made a new default (also out of the box), since each of them requires some external configuration — the persistent volume or a shared flilesystem in the container for all operator instances running (including the dev instances, if any). The storage in the object itself remains the default.

See also a PR to reduce those patches:

File-based storages (FileProgressStorage, FileDiffBaseStorage):

  • Each K8s resource gets its own YAML file on a shared filesystem or pod volume
  • Filenames use encoded/escaped namespace/name/uid for path safety
  • Common logic extracted into a FileNamingConvention mixin

SQLite-based storages (SQLiteProgressStorage, SQLiteDiffBaseStorage):

  • Progress stored as one row per handler with composite key (namespace, name, uid, handler_id)
  • Diffbase stored as one row per resource with composite key (namespace, name, uid)
  • Tables created optimistically on first write via try-except; reads return None if the table doesn't exist yet
  • Both storage types can share the same .db file (separate tables)
  • Uses only stdlib sqlite3 — no new dependencies

Both storage types still write a touch annotation to the K8s object to trigger watch events for delayed handler retries.

All new classes are exported from the top-level kopf namespace and documented in docs/configuration.rst.

https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19

Comment thread tests/persistence/test_storing_of_file_diffbase.py Fixed
@@ -0,0 +1,211 @@
import json

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'json' is not used.

Copilot Autofix

AI 3 months ago

To fix an unused import, remove the import statement for the module that is not referenced anywhere in the file. This eliminates an unnecessary dependency and slightly improves readability and load time.

In this specific case, delete the import json line at the top of tests/persistence/test_storing_of_sqlite_diffbase.py. No other lines need to change, and no additional methods, imports, or definitions are required. The rest of the imports (sqlite3, pytest, and the kopf symbols) remain unchanged.

Suggested changeset 1
tests/persistence/test_storing_of_sqlite_diffbase.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/persistence/test_storing_of_sqlite_diffbase.py b/tests/persistence/test_storing_of_sqlite_diffbase.py
--- a/tests/persistence/test_storing_of_sqlite_diffbase.py
+++ b/tests/persistence/test_storing_of_sqlite_diffbase.py
@@ -1,4 +1,3 @@
-import json
 import sqlite3
 
 import pytest
EOF
@@ -1,4 +1,3 @@
import json
import sqlite3

import pytest
Copilot is powered by AI and may make mistakes. Always verify output.
import json
import sqlite3

import pytest

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'pytest' is not used.

Copilot Autofix

AI 3 months ago

In general, unused imports should be removed to keep dependencies minimal and code clear. Pytest does not require an explicit import pytest for using fixtures like tmp_path, so as long as no other direct uses of pytest exist in the shown code, the import can be safely deleted.

The best fix here is to delete the single unused import line import pytest near the top of tests/persistence/test_storing_of_sqlite_diffbase.py. This does not change any existing test behavior, because pytest will still discover and run the tests and inject fixtures without that import. Specifically, remove line 4 (import pytest) and leave all other imports and code intact.

No new methods, imports, or definitions are needed to implement this change.

Suggested changeset 1
tests/persistence/test_storing_of_sqlite_diffbase.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/persistence/test_storing_of_sqlite_diffbase.py b/tests/persistence/test_storing_of_sqlite_diffbase.py
--- a/tests/persistence/test_storing_of_sqlite_diffbase.py
+++ b/tests/persistence/test_storing_of_sqlite_diffbase.py
@@ -1,8 +1,6 @@
 import json
 import sqlite3
 
-import pytest
-
 from kopf._cogs.configs.diffbase import SQLiteDiffBaseStorage
 from kopf._cogs.structs.bodies import Body, BodyEssence
 from kopf._cogs.structs.patches import Patch
EOF
@@ -1,8 +1,6 @@
import json
import sqlite3

import pytest

from kopf._cogs.configs.diffbase import SQLiteDiffBaseStorage
from kopf._cogs.structs.bodies import Body, BodyEssence
from kopf._cogs.structs.patches import Patch
Copilot is powered by AI and may make mistakes. Always verify output.
Comment thread tests/persistence/test_storing_of_sqlite_progress.py Fixed
@nolar nolar changed the title Add file-based storage for progress and diffbase data Add file-based and sqlite storages for progress and diffbase data Mar 1, 2026
claude added 5 commits March 1, 2026 13:50
Implement FileProgressStorage and FileDiffBaseStorage that store handler
progress records and diff-base essences as YAML files on a shared filesystem
or pod volume. This is for cases where users prefer not to store operator
state on the Kubernetes objects themselves.

Files are named {namespace}-{name}-{uid}.{type}.yaml in a flat directory.
FileProgressStorage still writes a touch annotation to the K8s object to
trigger watch events for delayed handler retries.

https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
Escape special characters (slashes, dots, colons, etc.) in the metadata
components used to build filenames, preventing path traversal via '..'
and subdirectory creation via '/'. Uses urllib.parse.quote with manual
dot replacement since dots are RFC 3986 unreserved characters.

https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
…efix

- Extract _escape() and _build_filename() into a FileNamingConvention
  mixin in conventions.py, shared by FileProgressStorage and
  FileDiffBaseStorage.
- Fix dot escaping: only protect against '..' as a standalone path
  component (path traversal), keep normal dots in place for names
  like 'my.app.v1'.
- Change default prefix from 'kopf.zalando.org' to 'kopf.dev' for
  new file-based storage classes.

https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
Previously only escaped '..' when it was the entire value. Now splits
on encoded slashes (%2F) and escapes '..' in any path component, so
names like "../path" or "a/../b" are also protected against traversal.

https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
Implement SQLiteProgressStorage and SQLiteDiffBaseStorage that store
handler progress and diff-base essences in a SQLite database file.
Uses only stdlib (sqlite3), no new dependencies.

- Composite primary key: (namespace, name, uid, handler_id) for
  progress, (namespace, name, uid) for diffbase
- One row per handler for progress records
- Optimistic table creation via try-except on first write
- Reads return None when the table doesn't exist yet
- Both storage types can share the same database file
- SQLiteConvention mixin in conventions.py for common logic
- Touch annotation still written to K8s object for watch events

https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
@nolar nolar force-pushed the claude/more-storages-DNoV8 branch from f2b021f to 782fe08 Compare March 1, 2026 12:51
@@ -0,0 +1,256 @@
import pytest

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'pytest' is not used.

Copilot Autofix

AI 2 months ago

To fix an unused import, the general approach is to remove the import statement when the imported name is not referenced anywhere in the file. This simplifies dependencies and avoids confusion.

In this specific case, in tests/persistence/test_storing_of_file_diffbase.py, the line import pytest at the top of the file is not used in the shown code. Pytest will still discover and run these tests based on function naming and fixtures like tmp_path without requiring a direct pytest import. The best fix is to delete that single line while leaving the rest of the imports and code unchanged.

Concretely, remove line 1 (import pytest) from tests/persistence/test_storing_of_file_diffbase.py. No new methods, imports, or definitions are needed.

Suggested changeset 1
tests/persistence/test_storing_of_file_diffbase.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/persistence/test_storing_of_file_diffbase.py b/tests/persistence/test_storing_of_file_diffbase.py
--- a/tests/persistence/test_storing_of_file_diffbase.py
+++ b/tests/persistence/test_storing_of_file_diffbase.py
@@ -1,4 +1,3 @@
-import pytest
 import yaml
 
 from kopf._cogs.configs.diffbase import FileDiffBaseStorage
EOF
@@ -1,4 +1,3 @@
import pytest
import yaml

from kopf._cogs.configs.diffbase import FileDiffBaseStorage
Copilot is powered by AI and may make mistakes. Always verify output.
@@ -0,0 +1,342 @@
import json

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'json' is not used.

Copilot Autofix

AI 2 months ago

To fix an unused import, remove the import statement so that the module is no longer brought into the namespace. This eliminates unnecessary dependencies and satisfies the static analysis rule without affecting runtime behavior.

In this case, the single best fix is to delete the import json line at the top of tests/persistence/test_storing_of_sqlite_progress.py, leaving all other imports and code intact. No additional methods, imports, or definitions are required, as the rest of the file uses only sqlite3, pytest, and the imported kopf classes. Specifically, edit line 1 of tests/persistence/test_storing_of_sqlite_progress.py to remove the import json line, ensuring that the remaining imports keep their current order and spacing.

Suggested changeset 1
tests/persistence/test_storing_of_sqlite_progress.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/persistence/test_storing_of_sqlite_progress.py b/tests/persistence/test_storing_of_sqlite_progress.py
--- a/tests/persistence/test_storing_of_sqlite_progress.py
+++ b/tests/persistence/test_storing_of_sqlite_progress.py
@@ -1,4 +1,3 @@
-import json
 import sqlite3
 
 import pytest
EOF
@@ -1,4 +1,3 @@
import json
import sqlite3

import pytest
Copilot is powered by AI and may make mistakes. Always verify output.
Signed-off-by: Sergey Vasilyev <nolar@nolar.info>
@nolar
Copy link
Copy Markdown
Owner Author

nolar commented Mar 2, 2026

Superseded by

This AI-generated PR contains way too many micro-issues all around the code. I had to massively rewrite the codebase by hand to fix that all. But I prefer to keep such AI PRs to later revise what went wrong and can be improved.

Copy link
Copy Markdown
Owner Author

@nolar nolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically correct — the best kind of correct. But so much off the expectations even after several iterations of corrections.

import base64
import hashlib
import pathlib
import sqlite3
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be imported internally in methods, only when/if used, i.e. configured. sqlite3 is sometimes broken and cannot import.

Comment on lines +362 to +363
name = body.get('metadata', {}).get('name')
uid = body.get('metadata', {}).get('uid')
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "" by default, to be used in primary keys (non-nullable), to work with objects with no name/namespace/uids (e.g. in tests, not in real clusters). For consistency with the namespace.

Comment on lines +366 to +368
def _connect(self) -> sqlite3.Connection:
self._path.parent.mkdir(parents=True, exist_ok=True)
return sqlite3.connect(str(self._path))
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a context manager which commits and closes after use.

if filepath is None:
return
filepath.parent.mkdir(parents=True, exist_ok=True)
filepath.write_text(yaml.safe_dump(dict(essence), default_flow_style=False), encoding='utf-8')
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be atomic, either by moves, or protected by file locks (atomic moves are easier).

return
encoded = json.dumps(dict(essence), separators=(',', ':'))
conn = self._connect()
try:
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a context manager.

Comment on lines +568 to +569
conn = self._connect()
try:
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a context manager.

Comment on lines +588 to +589
conn = self._connect()
try:
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a context manager.

Comment on lines +541 to +542
conn = self._connect()
try:
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a context manager.

def __init__(
self,
*,
path: str | pathlib.Path,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be a positional argument, too.

Comment on lines +525 to +526
prefix: str = 'kopf.dev',
touch_key: str = 'touch-dummy',
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be just a field path, no need to separate to prefixes and names in this kind of storage.

@nolar nolar closed this Mar 2, 2026
@nolar nolar changed the title Add file-based and sqlite storages for progress and diffbase data Store the progress & diff base in file and sqlite storages [v1] Mar 2, 2026
@nolar nolar deleted the claude/more-storages-DNoV8 branch March 2, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants