-
Notifications
You must be signed in to change notification settings - Fork 11
Implement migration from XML dictionaries to DDL #426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideAdds a new Sequence diagram for chadmin dictionary migrate commandsequenceDiagram
actor Admin
participant CLI as chadmin_dictionary_migrate
participant DictModule as dictionary_internal
participant ConfigUtils as config_utils
participant FS as Filesystem
participant CH as ClickHouseServer
Admin->>CLI: run chadmin dictionary migrate
CLI->>DictModule: migrate_dictionaries(ctx)
DictModule->>CH: execute_query CREATE DATABASE IF NOT EXISTS _dictionaries
CH-->>DictModule: ok
DictModule->>DictModule: config_glob_pattern = _get_dictionaries_config_pattern()
DictModule->>ConfigUtils: _load_config(CLICKHOUSE_SERVER_CONFIG_PATH)
ConfigUtils->>FS: read main config.xml
FS-->>ConfigUtils: config.xml content
ConfigUtils->>FS: list config.d directory
FS-->>ConfigUtils: config.d entries
ConfigUtils->>FS: read *.xml, *.yaml includes
FS-->>ConfigUtils: include contents
ConfigUtils-->>DictModule: merged config
DictModule->>DictModule: extract <clickhouse><dictionaries_config>
DictModule->>FS: glob(config_directory, config_glob_pattern)
FS-->>DictModule: dictionary config files
loop for each dictionary config file
DictModule->>DictModule: generate_ddl_dictionary_from_xml(config_file)
DictModule->>ConfigUtils: _load_config(config_file)
ConfigUtils->>FS: read XML
FS-->>ConfigUtils: parsed config dict
ConfigUtils-->>DictModule: dictionary config
DictModule->>DictModule: _build_dictionary_ddl_from_config(attrs)
DictModule->>DictModule: _build_source_block / _build_layout_block
DictModule->>DictModule: _build_lifetime_block
DictModule->>DictModule: _build_structure_and_primary_key
loop for each generated query
DictModule->>CH: execute_query CREATE DICTIONARY _dictionaries.name ...
CH-->>DictModule: ok or error
end
end
DictModule-->>CLI: migration completed
CLI-->>Admin: report success or error
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The
_build_layout_blockimplementation looks incorrect:layout_typeis always a string key so theif isinstance(layout_type, str)branch makes the rest of the function unreachable, and the later use oflayout_paramsin the f-string is wrong (it prints the dict instead of the layout name); consider refactoring this to correctly handle layouts with/without parameters. - In
_build_lifetime_block, the branchif max_val is None and not isinstance(min_val, (str, int))returnsLIFETIME(({min_val})), which both formats a non-scalar object into SQL and adds extra parentheses; it would be safer either to validate and error here or normalizemin_valto the expected scalar string/int. - The dictionary migration CLI help text says "Migrate XML-dictionaries to DLL"; consider correcting the wording to "DDL" so the command description matches the behavior implemented in
migrate_dictionaries.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_build_layout_block` implementation looks incorrect: `layout_type` is always a string key so the `if isinstance(layout_type, str)` branch makes the rest of the function unreachable, and the later use of `layout_params` in the f-string is wrong (it prints the dict instead of the layout name); consider refactoring this to correctly handle layouts with/without parameters.
- In `_build_lifetime_block`, the branch `if max_val is None and not isinstance(min_val, (str, int))` returns `LIFETIME(({min_val}))`, which both formats a non-scalar object into SQL and adds extra parentheses; it would be safer either to validate and error here or normalize `min_val` to the expected scalar string/int.
- The dictionary migration CLI help text says "Migrate XML-dictionaries to DLL"; consider correcting the wording to "DDL" so the command description matches the behavior implemented in `migrate_dictionaries`.
## Individual Comments
### Comment 1
<location> `ch_tools/chadmin/internal/dictionary.py:176-185` </location>
<code_context>
+ layout = attrs.get("layout")
</code_context>
<issue_to_address>
**issue (bug_risk):** Layout parsing logic never handles parameters and appears to use the wrong variable in the final string.
Because `layout_type = list(layout.keys())[0]` is always a `str`, the `if isinstance(layout_type, str)` branch always runs and returns `LAYOUT(<TYPE>())`, so layouts with parameters (e.g. `<type ...>`) are never handled. In addition, in the parameterized case the final `return` currently uses `layout_params` instead of `layout_type`:
```python
return f"LAYOUT({layout_params}({str_params}))"
```
which will format the dict rather than the layout name.
You can simplify and fix both issues by validating the dict shape up front, unpacking via `layout_type, layout_params = next(iter(layout.items()))`, then:
- If `layout_params` is falsy, return `LAYOUT(<TYPE>())`.
- Otherwise, validate `layout_params` is a dict of `str` values, build `str_params = ", ".join(...)`, and return `LAYOUT(<LAYOUT_TYPE>({str_params}))` using the layout name, not the params dict.
</issue_to_address>
### Comment 2
<location> `ch_tools/chadmin/internal/dictionary.py:220-221` </location>
<code_context>
+ "At least one of <min> or <max> must be specified in <lifetime>"
+ )
+
+ if max_val is None and not isinstance(min_val, (str, int)):
+ return f"LIFETIME(({min_val}))"
+
+ if not isinstance(min_val, (str, int)) or not isinstance(max_val, (str, int)):
</code_context>
<issue_to_address>
**issue (bug_risk):** Lifetime block handling returns an invalid expression when `min` has an unexpected type.
In `_build_lifetime_block`, this branch:
```python
if max_val is None and not isinstance(min_val, (str, int)):
return f"LIFETIME(({min_val}))"
```
will emit a `LIFETIME` clause with an invalid value instead of failing fast, and the double parentheses look syntactically wrong.
Consider validating and raising on bad input instead, e.g.:
```python
if max_val is None:
if not isinstance(min_val, (str, int)):
raise RuntimeError("<min> in <lifetime> must be a string or integer")
return f"LIFETIME({min_val})"
```
while keeping the existing strict checks for the `min`/`max` pair.
</issue_to_address>
### Comment 3
<location> `ch_tools/chadmin/internal/dictionary.py:273-282` </location>
<code_context>
+ defaults = {
</code_context>
<issue_to_address>
**issue (bug_risk):** Default null value normalization can raise a KeyError for unsupported types.
In `_normalize_null_default_value`, when `null_value is None` and `attr_type` is missing from `defaults`, `defaults[attr_type]` will raise a `KeyError`, causing an unexpected hard failure for any unlisted type. Consider either using `defaults.get(attr_type, "NULL")` (or another appropriate fallback) or raising a more explicit error that clearly indicates the type is unsupported.
</issue_to_address>
### Comment 4
<location> `ch_tools/chadmin/cli/dictionary_group.py:64-66` </location>
<code_context>
+ """
+ Migrate XML-dictionaries to DLL.
+ """
</code_context>
<issue_to_address>
**nitpick (typo):** The command description says 'DLL' instead of 'DDL'.
This docstring should say "DDL" rather than "DLL" to match data definition language and avoid confusion in the CLI help output.
```suggestion
"""
Migrate XML-dictionaries to DDL.
"""
```
</issue_to_address>
### Comment 5
<location> `tests/steps/chadmin.py:84` </location>
<code_context>
+@then('dictionary "{dict_name}" returns expected values on {node:w}')
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Assert on `SYSTEM RELOAD DICTIONARIES` exit code to avoid silently running checks against stale dictionaries
Right now the step ignores the exit code of `SYSTEM RELOAD DICTIONARIES`, so a failed reload could go unnoticed while subsequent lookups still run. Please capture the result of `container.exec_run('clickhouse-client -q "SYSTEM RELOAD DICTIONARIES"')` and assert `exit_code == 0` with a clear error message, similar to the existing assertions on `dictGet` queries.
```suggestion
reload_result = container.exec_run('clickhouse-client -q "SYSTEM RELOAD DICTIONARIES"')
assert (
reload_result.exit_code == 0
), (
f"SYSTEM RELOAD DICTIONARIES failed with exit code "
f"{reload_result.exit_code}, output:\n{reload_result.output.decode().strip()}"
)
```
</issue_to_address>
### Comment 6
<location> `tests/unit/chadmin/test_parse_xml_to_sql.py:9-18` </location>
<code_context>
+from ch_tools.chadmin.internal.dictionary import generate_ddl_dictionary_from_xml
+
+
+@pytest.fixture
+def test_data_dir(tmp_path: Path) -> Path:
+ data_dir = tmp_path / "test_dictionary.xml"
+ data_dir.mkdir()
+ return data_dir
+
+
+@pytest.fixture
+def create_xml_file(test_data_dir: Path) -> Callable[[Path, str], str]:
+ def _create(filename: Path, content: str) -> str:
+ filepath = test_data_dir / filename
+ filepath.write_text(content)
+ return str(filepath)
+
+ return _create
+
+
</code_context>
<issue_to_address>
**nitpick:** Align `create_xml_file` fixture type hints with how it is used in tests
The fixture is typed as `Callable[[Path, str], str]`, but `filename` is a `Path` in the implementation while all call sites pass a `str` (e.g. "test_dict.xml"). To match usage and improve tooling support, consider updating the type to `Callable[[str, str], str]` with `filename: str`, or use `Union[Path, str]` if both should be allowed.
</issue_to_address>
### Comment 7
<location> `tests/unit/chadmin/test_parse_xml_to_sql.py:284-293` </location>
<code_context>
+ ),
+ ],
+)
+def test_parse_xml_to_sql(
+ create_xml_file: Callable[[str, str], str], xml_content: str, expected_queries: str
+) -> None:
+ filepath = create_xml_file("test_dict.xml", xml_content)
+
+ result = generate_ddl_dictionary_from_xml(filepath)
+
+ assert len(result) == len(expected_queries)
+
+ for actual, expected in zip(result, expected_queries):
+ assert normalize_sql(actual) == normalize_sql(expected)
+
</code_context>
<issue_to_address>
**nitpick:** Fix type annotation of `expected_queries` parameter to reflect that it is a list of SQL strings
`expected_queries` is currently annotated as `str`, but the parametrization passes a list of SQL strings and the test iterates over it. Please change the type to `list[str]` (or `Sequence[str]`) so the signature matches actual usage and tools/readers aren’t misled.
</issue_to_address>
### Comment 8
<location> `tests/unit/chadmin/test_parse_xml_to_sql.py:26-35` </location>
<code_context>
+@pytest.mark.parametrize(
</code_context>
<issue_to_address>
**suggestion (testing):** Add negative tests for invalid or incomplete XML configurations to cover error paths in `generate_ddl_dictionary_from_xml`
`generate_ddl_dictionary_from_xml` has several branches that raise `RuntimeError` for malformed configs (e.g. missing `<dictionaries>` / `<dictionary>`, missing `<name>`, invalid `<source>` / `<structure>`, or both `<id>` and `<key>` present). Please add a parametrized test (e.g. `test_parse_xml_to_sql_invalid_configs`) that uses such broken XML snippets and asserts on the expected `RuntimeError` messages so these validation paths are covered and don’t regress.
Suggested implementation:
```python
@pytest.fixture
def create_xml_file(test_data_dir: Path) -> Callable[[Path, str], str]:
def _create(filename: Path, content: str) -> str:
filepath = test_data_dir / filename
filepath.write_text(content)
return str(filepath)
return _create
@pytest.mark.parametrize(
"xml_content,expected_error_substring",
[
pytest.param(
# Missing <dictionaries> root element
"<config></config>",
"dictionaries",
id="missing-dictionaries-root",
),
pytest.param(
# Missing <dictionary> element inside <dictionaries>
"<dictionaries></dictionaries>",
"dictionary",
id="missing-dictionary",
),
pytest.param(
# Missing <name> element in <dictionary>
"""
<dictionaries>
<dictionary>
<source>
<clickhouse>
<db>db</db>
<table>tbl</table>
</clickhouse>
</source>
<structure>
<attribute>
<name>value</name>
<type>String</type>
</attribute>
</structure>
</dictionary>
</dictionaries>
""",
"name",
id="missing-name",
),
pytest.param(
# Missing <source>/<structure> configuration
"""
<dictionaries>
<dictionary>
<name>dict_without_source</name>
</dictionary>
</dictionaries>
""",
"source",
id="missing-source-or-structure",
),
pytest.param(
# Both <id> and <key> present in <structure>
"""
<dictionaries>
<dictionary>
<name>dict_with_id_and_key</name>
<source>
<clickhouse>
<db>db</db>
<table>tbl</table>
</clickhouse>
</source>
<structure>
<id>
<name>id</name>
<type>UInt64</type>
</id>
<key>
<attribute>
<name>k</name>
<type>UInt64</type>
</attribute>
</key>
<attribute>
<name>value</name>
<type>String</type>
</attribute>
</structure>
</dictionary>
</dictionaries>
""",
"id and key",
id="id-and-key-present",
),
],
)
def test_parse_xml_to_sql_invalid_configs(
xml_content: str,
expected_error_substring: str,
create_xml_file: Callable[[Path, str], str],
) -> None:
# Use the existing helper to create an XML file on disk
xml_path = create_xml_file(Path("invalid_dictionary.xml"), xml_content)
with pytest.raises(RuntimeError) as excinfo:
generate_ddl_dictionary_from_xml(xml_path)
# Assert that the raised error contains the expected validation message
assert expected_error_substring in str(excinfo.value)
@pytest.mark.parametrize(
```
1. This edit assumes `generate_ddl_dictionary_from_xml` is already imported or defined in `tests/unit/chadmin/test_parse_xml_to_sql.py` (it likely is, since valid-config tests call it). If it is not, add the appropriate import at the top of this test file.
2. The `expected_error_substring` values (`"dictionaries"`, `"dictionary"`, `"name"`, `"source"`, `"id and key"`) should match substrings of the actual `RuntimeError` messages raised in `generate_ddl_dictionary_from_xml`. If the implementation uses different wording, adjust these substrings accordingly to match the real messages while keeping the tests focused on the specific validation branch they cover.
</issue_to_address>
### Comment 9
<location> `ch_tools/chadmin/internal/dictionary.py:203` </location>
<code_context>
lifetime = attrs.get("lifetime", None)
</code_context>
<issue_to_address>
**suggestion (code-quality):** Replace `dict.get(x, None)` with `dict.get(x)` ([`remove-none-from-default-get`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/remove-none-from-default-get))
```suggestion
lifetime = attrs.get("lifetime")
```
<br/><details><summary>Explanation</summary>When using a dictionary's `get` method you can specify a default to return if
the key is not found. This defaults to `None`, so it is unnecessary to specify
`None` if this is the required behaviour. Removing the unnecessary argument
makes the code slightly shorter and clearer.
</details>
</issue_to_address>
### Comment 10
<location> `ch_tools/chadmin/internal/dictionary.py:85` </location>
<code_context>
def generate_ddl_dictionary_from_xml(config_path: str) -> list[str]:
"""
Parse XML dictionary config and generate CREATE DICTIONARY statements.
"""
config: dict[str, Any] = _load_config(config_path)
queries: list[str] = []
dictionaries = config.get("dictionaries")
if dictionaries is None:
raise RuntimeError("Dictionary config is missing <dictionaries> section")
dictionary_nodes = dictionaries.get("dictionary")
if dictionary_nodes is None:
raise RuntimeError(
"Dictionary config is missing <dictionaries><dictionary> section"
)
if not isinstance(dictionary_nodes, list) or isinstance(dictionary_nodes, dict):
dictionary_nodes = [dictionary_nodes]
for attrs in dictionary_nodes:
if isinstance(attrs, dict):
queries.append(_build_dictionary_ddl_from_config(attrs))
return queries
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Move assignment closer to its usage within a block ([`move-assign-in-block`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign-in-block/))
- Convert for loop into list comprehension ([`list-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/list-comprehension/))
</issue_to_address>
### Comment 11
<location> `ch_tools/chadmin/internal/dictionary.py:138-140` </location>
<code_context>
def _build_dictionary_ddl_from_config(attrs: dict[str, Any]) -> str:
name = _get_dictionary_name(attrs)
source = _build_source_block(attrs)
lifetime = _build_lifetime_block(attrs)
layout = _build_layout_block(attrs)
primary_key, structure = _build_structure_and_primary_key(attrs)
query = _build_create_dictionary_statement(
name, structure, primary_key, source, layout, lifetime, "", ""
)
return query
</code_context>
<issue_to_address>
**issue (code-quality):** Inline variable that is immediately returned ([`inline-immediately-returned-variable`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/inline-immediately-returned-variable/))
</issue_to_address>
### Comment 12
<location> `ch_tools/chadmin/internal/dictionary.py:273` </location>
<code_context>
def _normalize_null_default_value(attr_type: str, null_value: str) -> str:
defaults = {
"Int8": "0",
"Int16": "0",
"Int32": "0",
"Int64": "0",
"Int128": "0",
"Int256": "0",
"UInt8": "0",
"UInt16": "0",
"UInt32": "0",
"UInt64": "0",
"UInt128": "0",
"UInt256": "0",
"Float32": "0",
"Float64": "0",
"Decimal": "0",
"Decimal32": "0",
"Decimal64": "0",
"Decimal128": "0",
"Decimal256": "0",
"String": "''",
"FixedString": "''",
"Date": "1970-01-01",
"Date32": "1900-01-01",
"DateTime": "1970-01-01 00:00:00",
"DateTime64": "1970-01-01 00:00:00.000",
"Bool": "0",
"UUID": "00000000-0000-0000-0000-000000000000",
"IPv4": "0.0.0.0",
"IPv6": "::",
"Array": "[]",
"Tuple": "()",
"Map": "{}",
"Nullable": "NULL",
"Point": "(0, 0)",
"Ring": "[]",
"Polygon": "[]",
"MultiPolygon": "[]",
}
if null_value is None:
return defaults[attr_type]
if attr_type in ("String", "FixedString"):
return f"'{null_value}'"
return null_value
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Move assignments closer to their usage ([`move-assign`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/move-assign/))
- Use set when checking membership of a collection of literals ([`collection-into-set`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/collection-into-set/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
tests/steps/chadmin.py
Outdated
| def step_check_dict_values(context: ContextT, dict_name: str, node: str) -> None: | ||
| container = get_container(context, node) | ||
|
|
||
| container.exec_run('clickhouse-client -q "SYSTEM RELOAD DICTIONARIES"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Assert on SYSTEM RELOAD DICTIONARIES exit code to avoid silently running checks against stale dictionaries
Right now the step ignores the exit code of SYSTEM RELOAD DICTIONARIES, so a failed reload could go unnoticed while subsequent lookups still run. Please capture the result of container.exec_run('clickhouse-client -q "SYSTEM RELOAD DICTIONARIES"') and assert exit_code == 0 with a clear error message, similar to the existing assertions on dictGet queries.
| container.exec_run('clickhouse-client -q "SYSTEM RELOAD DICTIONARIES"') | |
| reload_result = container.exec_run('clickhouse-client -q "SYSTEM RELOAD DICTIONARIES"') | |
| assert ( | |
| reload_result.exit_code == 0 | |
| ), ( | |
| f"SYSTEM RELOAD DICTIONARIES failed with exit code " | |
| f"{reload_result.exit_code}, output:\n{reload_result.output.decode().strip()}" | |
| ) |
| @pytest.mark.parametrize( | ||
| "xml_content,expected_queries", | ||
| [ | ||
| pytest.param( | ||
| """ | ||
| <dictionaries> | ||
| <dictionary> | ||
| <name>test_dict1</name> | ||
| <source> | ||
| <clickhouse> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add negative tests for invalid or incomplete XML configurations to cover error paths in generate_ddl_dictionary_from_xml
generate_ddl_dictionary_from_xml has several branches that raise RuntimeError for malformed configs (e.g. missing <dictionaries> / <dictionary>, missing <name>, invalid <source> / <structure>, or both <id> and <key> present). Please add a parametrized test (e.g. test_parse_xml_to_sql_invalid_configs) that uses such broken XML snippets and asserts on the expected RuntimeError messages so these validation paths are covered and don’t regress.
Suggested implementation:
@pytest.fixture
def create_xml_file(test_data_dir: Path) -> Callable[[Path, str], str]:
def _create(filename: Path, content: str) -> str:
filepath = test_data_dir / filename
filepath.write_text(content)
return str(filepath)
return _create
@pytest.mark.parametrize(
"xml_content,expected_error_substring",
[
pytest.param(
# Missing <dictionaries> root element
"<config></config>",
"dictionaries",
id="missing-dictionaries-root",
),
pytest.param(
# Missing <dictionary> element inside <dictionaries>
"<dictionaries></dictionaries>",
"dictionary",
id="missing-dictionary",
),
pytest.param(
# Missing <name> element in <dictionary>
"""
<dictionaries>
<dictionary>
<source>
<clickhouse>
<db>db</db>
<table>tbl</table>
</clickhouse>
</source>
<structure>
<attribute>
<name>value</name>
<type>String</type>
</attribute>
</structure>
</dictionary>
</dictionaries>
""",
"name",
id="missing-name",
),
pytest.param(
# Missing <source>/<structure> configuration
"""
<dictionaries>
<dictionary>
<name>dict_without_source</name>
</dictionary>
</dictionaries>
""",
"source",
id="missing-source-or-structure",
),
pytest.param(
# Both <id> and <key> present in <structure>
"""
<dictionaries>
<dictionary>
<name>dict_with_id_and_key</name>
<source>
<clickhouse>
<db>db</db>
<table>tbl</table>
</clickhouse>
</source>
<structure>
<id>
<name>id</name>
<type>UInt64</type>
</id>
<key>
<attribute>
<name>k</name>
<type>UInt64</type>
</attribute>
</key>
<attribute>
<name>value</name>
<type>String</type>
</attribute>
</structure>
</dictionary>
</dictionaries>
""",
"id and key",
id="id-and-key-present",
),
],
)
def test_parse_xml_to_sql_invalid_configs(
xml_content: str,
expected_error_substring: str,
create_xml_file: Callable[[Path, str], str],
) -> None:
# Use the existing helper to create an XML file on disk
xml_path = create_xml_file(Path("invalid_dictionary.xml"), xml_content)
with pytest.raises(RuntimeError) as excinfo:
generate_ddl_dictionary_from_xml(xml_path)
# Assert that the raised error contains the expected validation message
assert expected_error_substring in str(excinfo.value)
@pytest.mark.parametrize(- This edit assumes
generate_ddl_dictionary_from_xmlis already imported or defined intests/unit/chadmin/test_parse_xml_to_sql.py(it likely is, since valid-config tests call it). If it is not, add the appropriate import at the top of this test file. - The
expected_error_substringvalues ("dictionaries","dictionary","name","source","id and key") should match substrings of the actualRuntimeErrormessages raised ingenerate_ddl_dictionary_from_xml. If the implementation uses different wording, adjust these substrings accordingly to match the real messages while keeping the tests focused on the specific validation branch they cover.
|
|
||
|
|
||
| @dictionary_group.command("migrate") | ||
| @pass_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we need parameters here:
dry-runmode to check correctness of the generated sql- Options that allows to migrage selectively eg with regex
keep-going? Not sure but for consistency with other commands looks useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep-going looks bad for this command, it is better to fail as soon as possible before some dicts are converted, otherwise we will need to distinguish already migrated dictionaries from non-migrated ones on restart and make sure that you do not delete the file with the non-migrated dictionary
| Migrate external dictionaries to DDL. | ||
| """ | ||
| logging.info("Creating '_dictionaries' database") | ||
| execute_query(ctx, "CREATE DATABASE IF NOT EXISTS _dictionaries", format_=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move the name of database into config
| return queries | ||
|
|
||
|
|
||
| def _get_dictionaries_config_pattern() -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _get_dictionaries_config_pattern() -> str: | |
| def _get_dictionaries_config_paths_pattern() -> str: |
The best option I can think of, but the current name doesn't represent that we're getting the regexp for paths to XML files
And there are no function desc.
| config_path = Path(CLICKHOUSE_SERVER_CONFIG_PATH) | ||
| config_directory = config_path.parent | ||
| config_path_list = list(config_directory.glob(config_glob_pattern)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regexp in <dictionaries_config> also can represent abs paths.
| config_path_list = list(config_directory.glob(config_glob_pattern)) | ||
|
|
||
| logging.info("Starting external dictionaries migration") | ||
| for i, config_file in enumerate(config_path_list, start=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO execute_tasks_in_parallel can fit here very well, free parallelization
|
|
||
| def _get_dictionary_name(attrs: dict[str, Any]) -> str: | ||
| name = attrs.get("name") | ||
| if name is None or not isinstance(name, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if name is None or not isinstance(name, str): | |
| if not isinstance(name, str): |
isinstance(None, str) is fine ihmo
| return name | ||
|
|
||
|
|
||
| def _build_source_block(attrs: dict[str, Any]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acutually lets add the xml snippets for such functions, because it is hard to understand without looking into the config example
| ) | ||
|
|
||
| if max_val is None: | ||
| if not isinstance(min_val, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why str? The min/max for lifetime is integer value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xmltodict parses all text as str
|
|
||
| parts = [name, attr_type] | ||
|
|
||
| if "null_value" in attr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if "null_value" in attr: | |
| if 'null_value' not in attr: | |
| raise ... |
Null value is reqired
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that it is not required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://clickhouse.com/docs/sql-reference/dictionaries#attributes
According to the doc: it is required.
| parts = [name, attr_type] | ||
|
|
||
| if "null_value" in attr: | ||
| null_value = _normalize_null_default_value(attr_type, attr["null_value"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if in config the null value is specified as
<null_value />
Will it be equal to None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be {"null_value": None}
… extend functionality
| @when("we execute command on {node:w}") | ||
| @then("we execute command on {node:w}") | ||
| def step_command(context: ContextT, node: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it is not used in tests. Given and when should also be enough in any case
| @when("we execute command on {node:w} expecting failure") | ||
| @then("we execute command on {node:w} expecting failure") | ||
| def step_execute_command_expecting_failure(context: ContextT, node: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is already implemented like this
When we try to execute command on clickhouse01
"""
....
"""
Then it fails with response contains
"""
.....
"""
| except Exception as e: | ||
| raise RuntimeError(f"Error while removing '{config_file}'") from e | ||
| logging.info("Removing external dictionaries completed successfully") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reload can be added here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also makes sense to add an option to decide if new SQL dictionaries should be loaded after the migration, in case lazy load is enabled
|
|
||
| attributes = structure.get("attribute", []) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was moved lower, but old line is node deleted here
| config_path = CLICKHOUSE_SERVER_CONFIG_PATH | ||
| parsed_config = load_config_file(CLICKHOUSE_SERVER_CONFIG_PATH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_clickhouse_config can be reused here
| for config_file in config_path_list: | ||
| queries = _generate_ddl_dictionaries_from_xml(str(config_file), target_database) | ||
| for dict_name, query in queries: | ||
| all_dictionaries.append((config_file, dict_name, query)) | ||
|
|
||
| filtered_dictionaries = _filter_dictionaries( | ||
| all_dictionaries, include_pattern, exclude_pattern | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing filters to _generate_ddl_dictionaries_from_xml or filtering by file name before calling _generate_ddl_dictionaries_from_xml should be more optimal
| if include_pattern: | ||
| if not ( | ||
| fnmatch(dict_name, include_pattern) | ||
| or fnmatch(config_file_str, include_pattern) | ||
| or fnmatch(config_file.name, include_pattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it can be simplified to matching config_file_str only. Whether to check only filename or abs path can be decided with a pattern itself
| def _get_dictionary_nodes_from_config(config: dict[str, Any]) -> list[dict[str, Any]]: | ||
| dictionary_node_list = [] | ||
| if not isinstance(config, dict): | ||
| return dictionary_node_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such checks look redundant, config is already declared as dict[str, Any]
To run the migration, execute:
chadmin dictionary migrateSummary by Sourcery
Add a chadmin command to migrate ClickHouse external XML dictionaries into DDL dictionaries stored in a dedicated database, with parsing and conversion logic plus coverage tests.
New Features:
dictionary migratecommand to convert XML-defined external dictionaries into DDLCREATE DICTIONARYobjects in the_dictionariesdatabase.CREATE DICTIONARYSQL from ClickHouse dictionary configuration files, including support for various key structures, layouts, lifetimes, and attribute defaults.Enhancements:
Tests:
null_valuecases.chadmin dictionary migratecommand that validates migrated dictionaries against source table data.