Skip to content

chore(inkless): remove FileMerger component and all related infrastructure#607

Merged
jeqo merged 2 commits into
mainfrom
tvainika/remove-filemerger
May 22, 2026
Merged

chore(inkless): remove FileMerger component and all related infrastructure#607
jeqo merged 2 commits into
mainfrom
tvainika/remove-filemerger

Conversation

@tvainika

Copy link
Copy Markdown
Member

The FileMerger component was disabled in ReplicaManager due to hanging query issues and has been unused. This PR removes it entirely along with all supporting infrastructure.

Changes

Application code removed:

  • FileMerger, FileMergerMetrics, MergeBatchesInputStream, InputStreamWithPosition, BatchAndStream (entire merge package)
  • Control plane interface methods: getFileMergeWorkItem, commitFileMergeWorkItem, releaseFileMergeWorkItem
  • Control plane records: FileMergeWorkItem, MergedFileBatch, FileMergeWorkItemNotExist
  • Postgres job classes: GetFileMergeWorkItemJob, CommitFileMergeWorkItemJob, ReleaseFileMergeWorkItemJob
  • Config entries: file.merger.interval.ms, file.merger.temp.dir, file.merge.size.threshold.bytes, file.merge.lock.period.ms
  • Related metrics from PostgresControlPlaneMetrics and FileMergerMetrics

Database:

  • New migration V12__Remove_file_merge.sql drops tables (file_merge_work_items, file_merge_work_item_files), functions, and types
  • delete_files_v1 function updated to remove reference to dropped table
  • jOOQ classes regenerated

Docs & config:

  • Updated metrics.rst, configs.rst, ARCHITECTURE.md, PERFORMANCE.md
  • Removed scheduler entry and import from ReplicaManager.scala

Other:

  • .gitattributes added to mark jOOQ generated files as linguist-generated (hides diffs in GH by default)

@tvainika tvainika force-pushed the tvainika/remove-filemerger branch from 2997f60 to f1ce79a Compare May 22, 2026 08:50
@tvainika tvainika marked this pull request as ready for review May 22, 2026 09:18

@jeqo jeqo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor comment

Comment thread docs/inkless/PERFORMANCE.md Outdated
@jeqo

jeqo commented May 22, 2026

Copy link
Copy Markdown
Contributor

Claude has find some references on other files:

  • docs/inkless/FAQ.md:228-230 — The FAQ still references file merging as if it exists ("efficiency depends on file merging", "How does file merging work?"). Since this PR removes the feature entirely, these FAQ entries are now stale and could confuse users. Consider updating or removing them.

    # Current (line 228):
        * **Trailing Reads:** For trailing reads, efficiency depends on file merging. If file merging has been performed, retrieving older data is more efficient. Otherwise, multiple object reads may be needed, potentially increasing latency.
    * **Q: How does file merging work?**
      * A: File merging is a background process ...
    
    # Suggested:
        * **Trailing Reads:** For trailing reads, multiple object reads may be needed since data from different partitions coexists in the same objects, potentially increasing latency.
  • docs/inkless/GLOSSARY.md:62-63 — The glossary entry for "File Merger" still describes it as an active background process. Should be removed or marked as removed.

  • docs/inkless/PERFORMANCE.md:272 — The line "File merging helps reduce this over time by reorganizing data for better partition locality in merged objects." reads as if merging is still available. This contradicts line 264 in the same file which correctly says "Currently disabled". Consider removing or updating line 272 to match.

  • storage/inkless/src/main/java/io/aiven/inkless/control_plane/AbstractControlPlaneConfig.java — This class is now essentially empty (an abstract class with a baseConfigDef() that creates an empty ConfigDef, and a single constructor). If there are no plans to add new shared config entries soon, this could be inlined into its subclasses. Low priority; fine to leave as-is if other config items are expected here.

tvainika and others added 2 commits May 22, 2026 15:56
Remove the disabled FileMerger component along with its supporting
control plane infrastructure:

- FileMerger, FileMergerMetrics, and helper classes (merge package)
- Control plane methods: getFileMergeWorkItem, commitFileMergeWorkItem,
  releaseFileMergeWorkItem
- FileMergeWorkItem, MergedFileBatch, FileMergeWorkItemNotExist records
- Postgres job classes for file merge operations
- File merger config entries from InklessConfig and control plane configs
- All related tests and documentation references
- Metrics for file merge operations
- SQL migration V12 to drop file_merge tables, functions, and types
- Regenerated jOOQ classes to reflect schema changes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This hides diffs for files under storage/inkless/src/main/jooq/
in GitHub pull requests by default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tvainika tvainika force-pushed the tvainika/remove-filemerger branch from f1ce79a to c2d85b6 Compare May 22, 2026 12:56
@tvainika tvainika requested a review from jeqo May 22, 2026 12:58
@jeqo jeqo merged commit 690a5a6 into main May 22, 2026
4 checks passed
@jeqo jeqo deleted the tvainika/remove-filemerger branch May 22, 2026 17:26
giuseppelillo pushed a commit that referenced this pull request May 29, 2026
…cture (#607)

Remove the disabled FileMerger component along with its supporting
control plane infrastructure:

- FileMerger, FileMergerMetrics, and helper classes (merge package)
- Control plane methods: getFileMergeWorkItem, commitFileMergeWorkItem,
  releaseFileMergeWorkItem
- FileMergeWorkItem, MergedFileBatch, FileMergeWorkItemNotExist records
- Postgres job classes for file merge operations
- File merger config entries from InklessConfig and control plane configs
- All related tests and documentation references
- Metrics for file merge operations
- SQL migration V12 to drop file_merge tables, functions, and types
- Regenerated jOOQ classes to reflect schema changes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add .gitattributes to mark generated JOOQ files as linguist-generated

This hides diffs for files under storage/inkless/src/main/jooq/
in GitHub pull requests by default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
giuseppelillo pushed a commit that referenced this pull request May 29, 2026
…cture (#607)

Remove the disabled FileMerger component along with its supporting
control plane infrastructure:

- FileMerger, FileMergerMetrics, and helper classes (merge package)
- Control plane methods: getFileMergeWorkItem, commitFileMergeWorkItem,
  releaseFileMergeWorkItem
- FileMergeWorkItem, MergedFileBatch, FileMergeWorkItemNotExist records
- Postgres job classes for file merge operations
- File merger config entries from InklessConfig and control plane configs
- All related tests and documentation references
- Metrics for file merge operations
- SQL migration V12 to drop file_merge tables, functions, and types
- Regenerated jOOQ classes to reflect schema changes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add .gitattributes to mark generated JOOQ files as linguist-generated

This hides diffs for files under storage/inkless/src/main/jooq/
in GitHub pull requests by default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants