Store the progress & diff base in file and sqlite storages [v1]#1261
Store the progress & diff base in file and sqlite storages [v1]#1261nolar wants to merge 6 commits into
Conversation
| @@ -0,0 +1,211 @@ | |||
| import json | |||
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix an unused import, remove the import statement for the module that is not referenced anywhere in the file. This eliminates an unnecessary dependency and slightly improves readability and load time.
In this specific case, delete the import json line at the top of tests/persistence/test_storing_of_sqlite_diffbase.py. No other lines need to change, and no additional methods, imports, or definitions are required. The rest of the imports (sqlite3, pytest, and the kopf symbols) remain unchanged.
| @@ -1,4 +1,3 @@ | ||
| import json | ||
| import sqlite3 | ||
|
|
||
| import pytest |
| import json | ||
| import sqlite3 | ||
|
|
||
| import pytest |
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
In general, unused imports should be removed to keep dependencies minimal and code clear. Pytest does not require an explicit import pytest for using fixtures like tmp_path, so as long as no other direct uses of pytest exist in the shown code, the import can be safely deleted.
The best fix here is to delete the single unused import line import pytest near the top of tests/persistence/test_storing_of_sqlite_diffbase.py. This does not change any existing test behavior, because pytest will still discover and run the tests and inject fixtures without that import. Specifically, remove line 4 (import pytest) and leave all other imports and code intact.
No new methods, imports, or definitions are needed to implement this change.
| @@ -1,8 +1,6 @@ | ||
| import json | ||
| import sqlite3 | ||
|
|
||
| import pytest | ||
|
|
||
| from kopf._cogs.configs.diffbase import SQLiteDiffBaseStorage | ||
| from kopf._cogs.structs.bodies import Body, BodyEssence | ||
| from kopf._cogs.structs.patches import Patch |
Implement FileProgressStorage and FileDiffBaseStorage that store handler
progress records and diff-base essences as YAML files on a shared filesystem
or pod volume. This is for cases where users prefer not to store operator
state on the Kubernetes objects themselves.
Files are named {namespace}-{name}-{uid}.{type}.yaml in a flat directory.
FileProgressStorage still writes a touch annotation to the K8s object to
trigger watch events for delayed handler retries.
https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
Escape special characters (slashes, dots, colons, etc.) in the metadata components used to build filenames, preventing path traversal via '..' and subdirectory creation via '/'. Uses urllib.parse.quote with manual dot replacement since dots are RFC 3986 unreserved characters. https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
…efix - Extract _escape() and _build_filename() into a FileNamingConvention mixin in conventions.py, shared by FileProgressStorage and FileDiffBaseStorage. - Fix dot escaping: only protect against '..' as a standalone path component (path traversal), keep normal dots in place for names like 'my.app.v1'. - Change default prefix from 'kopf.zalando.org' to 'kopf.dev' for new file-based storage classes. https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
Previously only escaped '..' when it was the entire value. Now splits on encoded slashes (%2F) and escapes '..' in any path component, so names like "../path" or "a/../b" are also protected against traversal. https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
Implement SQLiteProgressStorage and SQLiteDiffBaseStorage that store handler progress and diff-base essences in a SQLite database file. Uses only stdlib (sqlite3), no new dependencies. - Composite primary key: (namespace, name, uid, handler_id) for progress, (namespace, name, uid) for diffbase - One row per handler for progress records - Optimistic table creation via try-except on first write - Reads return None when the table doesn't exist yet - Both storage types can share the same database file - SQLiteConvention mixin in conventions.py for common logic - Touch annotation still written to K8s object for watch events https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19
f2b021f to
782fe08
Compare
| @@ -0,0 +1,256 @@ | |||
| import pytest | |||
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To fix an unused import, the general approach is to remove the import statement when the imported name is not referenced anywhere in the file. This simplifies dependencies and avoids confusion.
In this specific case, in tests/persistence/test_storing_of_file_diffbase.py, the line import pytest at the top of the file is not used in the shown code. Pytest will still discover and run these tests based on function naming and fixtures like tmp_path without requiring a direct pytest import. The best fix is to delete that single line while leaving the rest of the imports and code unchanged.
Concretely, remove line 1 (import pytest) from tests/persistence/test_storing_of_file_diffbase.py. No new methods, imports, or definitions are needed.
| @@ -1,4 +1,3 @@ | ||
| import pytest | ||
| import yaml | ||
|
|
||
| from kopf._cogs.configs.diffbase import FileDiffBaseStorage |
| @@ -0,0 +1,342 @@ | |||
| import json | |||
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To fix an unused import, remove the import statement so that the module is no longer brought into the namespace. This eliminates unnecessary dependencies and satisfies the static analysis rule without affecting runtime behavior.
In this case, the single best fix is to delete the import json line at the top of tests/persistence/test_storing_of_sqlite_progress.py, leaving all other imports and code intact. No additional methods, imports, or definitions are required, as the rest of the file uses only sqlite3, pytest, and the imported kopf classes. Specifically, edit line 1 of tests/persistence/test_storing_of_sqlite_progress.py to remove the import json line, ensuring that the remaining imports keep their current order and spacing.
| @@ -1,4 +1,3 @@ | ||
| import json | ||
| import sqlite3 | ||
|
|
||
| import pytest |
Signed-off-by: Sergey Vasilyev <nolar@nolar.info>
|
Superseded by This AI-generated PR contains way too many micro-issues all around the code. I had to massively rewrite the codebase by hand to fix that all. But I prefer to keep such AI PRs to later revise what went wrong and can be improved. |
nolar
left a comment
There was a problem hiding this comment.
Technically correct — the best kind of correct. But so much off the expectations even after several iterations of corrections.
| import base64 | ||
| import hashlib | ||
| import pathlib | ||
| import sqlite3 |
There was a problem hiding this comment.
Must be imported internally in methods, only when/if used, i.e. configured. sqlite3 is sometimes broken and cannot import.
| name = body.get('metadata', {}).get('name') | ||
| uid = body.get('metadata', {}).get('uid') |
There was a problem hiding this comment.
Should be "" by default, to be used in primary keys (non-nullable), to work with objects with no name/namespace/uids (e.g. in tests, not in real clusters). For consistency with the namespace.
| def _connect(self) -> sqlite3.Connection: | ||
| self._path.parent.mkdir(parents=True, exist_ok=True) | ||
| return sqlite3.connect(str(self._path)) |
There was a problem hiding this comment.
Should be a context manager which commits and closes after use.
| if filepath is None: | ||
| return | ||
| filepath.parent.mkdir(parents=True, exist_ok=True) | ||
| filepath.write_text(yaml.safe_dump(dict(essence), default_flow_style=False), encoding='utf-8') |
There was a problem hiding this comment.
Must be atomic, either by moves, or protected by file locks (atomic moves are easier).
| return | ||
| encoded = json.dumps(dict(essence), separators=(',', ':')) | ||
| conn = self._connect() | ||
| try: |
There was a problem hiding this comment.
Should be a context manager.
| conn = self._connect() | ||
| try: |
There was a problem hiding this comment.
Should be a context manager.
| conn = self._connect() | ||
| try: |
There was a problem hiding this comment.
Should be a context manager.
| conn = self._connect() | ||
| try: |
There was a problem hiding this comment.
Should be a context manager.
| def __init__( | ||
| self, | ||
| *, | ||
| path: str | pathlib.Path, |
There was a problem hiding this comment.
Must be a positional argument, too.
| prefix: str = 'kopf.dev', | ||
| touch_key: str = 'touch-dummy', |
There was a problem hiding this comment.
Should be just a field path, no need to separate to prefixes and names in this kind of storage.
Summary
Add file-based and SQLite-based storage backends for progress and diffbase data, as alternatives to storing state on the Kubernetes object itself (annotations/status).
This should benefit users who do not want to store the state of the state-tracking handlers on the objects, e.g. to reduce the number of API PATCH requests to the cluster, but at the same time want something out of the box (instead of implementing it themselves).
These storages depend on StdLib alone and do not bring new dependencies. However, they cannot be made a new default (also out of the box), since each of them requires some external configuration — the persistent volume or a shared flilesystem in the container for all operator instances running (including the dev instances, if any). The storage in the object itself remains the default.
See also a PR to reduce those patches:
File-based storages (
FileProgressStorage,FileDiffBaseStorage):FileNamingConventionmixinSQLite-based storages (
SQLiteProgressStorage,SQLiteDiffBaseStorage):(namespace, name, uid, handler_id)(namespace, name, uid).dbfile (separate tables)sqlite3— no new dependenciesBoth storage types still write a touch annotation to the K8s object to trigger watch events for delayed handler retries.
All new classes are exported from the top-level
kopfnamespace and documented indocs/configuration.rst.https://claude.ai/code/session_01TMbVk9bPSsTFQmFWhrEk19