Skip to content

feat: serialize index to binary#273

Merged
16bit-ykiko merged 7 commits into
mainfrom
serialize-index
Oct 7, 2025
Merged

feat: serialize index to binary#273
16bit-ykiko merged 7 commits into
mainfrom
serialize-index

Conversation

@16bit-ykiko

Copy link
Copy Markdown
Member

No description provided.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @16bit-ykiko, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly advances the project's indexing capabilities by implementing binary serialization for the MergedIndex. This allows the index to be efficiently stored and retrieved, improving performance and enabling persistence. The changes include core serialization logic using FlatBuffers, optimizations to underlying data structures for binary compatibility, and enhancements to the build system to ensure consistent dependency management. Additionally, the PR improves the robustness of semantic analysis by handling invalid inputs gracefully.

Highlights

  • Binary Index Serialization: Introduced the capability to serialize the MergedIndex data structure into a binary format using FlatBuffers, enabling persistent storage and efficient loading of the project's index.
  • Data Structure Optimization: Refactored the Relation struct in TUIndex.h to optimize storage by using std::bit_cast for definition_range and target_symbol, making it more suitable for binary serialization.
  • Robustness Improvements: Enhanced the SemanticVisitor by replacing assertions with early returns for invalid inputs in handleDeclOccurrence and handleRelation, preventing crashes and improving stability.
  • Data Integrity and Consistency: Implemented sorting and deduplication for relations and occurrences within FileIndex to ensure data consistency and prepare the index for reliable serialization.
  • Build System Configuration: Updated CMake configurations to explicitly link libuv using its target, enforce static library builds for dependencies, and disable test builds for libuv and roaring.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces binary serialization for the index using FlatBuffers. The changes span from build system adjustments in CMake to the core serialization logic. My review highlights opportunities to improve the CMake configuration's flexibility, enhance the new container-like API in CompilationDatabase with const correctness, and increase the robustness of the serialization logic by adding compile-time checks for struct layouts to prevent potential runtime issues. I've also suggested removing some commented-out test code.

Comment thread src/Index/MergedIndex.cpp
Comment thread src/Index/MergedIndex.cpp
Comment thread cmake/package.cmake
Comment thread include/Compiler/Command.h
Comment thread include/Index/TUIndex.h Outdated
Comment thread src/Index/MergedIndex.cpp Outdated
@16bit-ykiko

Copy link
Copy Markdown
Member Author

/gemini summary

@16bit-ykiko

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

This pull request significantly refactors the project's indexing system to introduce binary serialization for the MergedIndex using FlatBuffers. This change is crucial for enabling persistent storage and efficient loading of the index, which will improve overall performance and scalability. The work involved a substantial overhaul of existing index-related data structures, including their optimization for binary compatibility, and the removal of older, less efficient memory-based index components. Additionally, the build system has been updated to reflect new dependency management strategies, and the semantic analysis visitor has been made more robust by handling invalid inputs gracefully. The changes also include enhancements to the include graph and compilation database access, alongside expanded unit tests to cover new and existing functionalities.

Highlights

  • Binary Index Serialization: Introduced the capability to serialize the MergedIndex data structure into a binary format using FlatBuffers, enabling persistent storage and efficient loading of the project's index. This involved implementing serialize and deserialize methods for MergedIndex.
  • Data Structure Optimization: Refactored the Relation struct in TUIndex.h to optimize storage by replacing a union with std::bit_cast for definition_range and target_symbol, making it more suitable for binary serialization. Additionally, Symbol struct in TUIndex.h now includes SymbolKind and Bitmap reference_files.
  • Robustness Improvements: Enhanced the SemanticVisitor in include/AST/Semantic.h by replacing assertions with early returns for invalid inputs in handleDeclOccurrence and handleRelation, preventing crashes and improving stability.
  • Data Integrity and Consistency: Implemented sorting and deduplication for relations and occurrences within FileIndex in src/Index/TUIndex.cpp to ensure data consistency and prepare the index for reliable serialization.
  • Build System Configuration: Updated CMake configurations in CMakeLists.txt and cmake/package.cmake to explicitly link libuv using its target, enforce static library builds for dependencies, and disable test builds for libuv and roaring.
  • Index Structure Refactoring: Removed older, memory-based index structures like Contextual.h, HeaderIndex.h, RawIndex.h, and their corresponding implementation files, streamlining the indexing architecture.
  • Include Graph Enhancements: Modified IncludeGraph.h to rename getInclude to include_location_id and added a path_id method for easier retrieval of file path IDs, improving include graph navigation.
  • Compilation Database Access: Added commands_size(), begin(), and end() methods to the CompilationDatabase class in include/Compiler/Command.h, providing standard container-like access to compilation commands.
  • Unit Test Improvements: Refactored TUIndex unit tests to improve expect_select and introduce go_to_definition helpers, along with adding comprehensive tests for various C++ template constructs (Class, Function, Alias, Var, Concept templates).
Changelog
  • CMakeLists.txt
    • Updated libuv linking from uv_a to libuv::libuv.
  • cmake/package.cmake
    • Added LIBUV_BUILD_SHARED OFF and LIBUV_BUILD_TESTS OFF with FORCE for libuv.
    • Added FORCE to BUILD_SHARED_LIBS OFF.
    • Added FORCE to ENABLE_ROARING_TESTS OFF.
    • Formatted target_compile_definitions for spdlog.
  • include/AST/Semantic.h
    • Replaced assert with early return for null decl or invalid location in handleDeclOccurrence.
    • Replaced assert with early return for null decl, target, or invalid range in handleRelation.
  • include/Compiler/Command.h
    • Added commands_size(), begin(), and end() methods to CompilationDatabase for container-like access.
  • include/Index/Contextual.h
    • Removed file.
  • include/Index/HeaderIndex.h
    • Removed file.
  • include/Index/IncludeGraph.h
    • Renamed getInclude to include_location_id.
    • Added path_id method to retrieve the path ID for a given FileID.
  • include/Index/MergedIndex.h
    • Removed ROARING_EXCEPTIONS and ROARING_TERMINATE macros.
    • Included Support/Bitmap.h.
    • Moved llvm::DenseMapInfo specializations outside clice::index namespace.
    • Refactored MergedIndex and HeaderContexts structs, removing llvm::BumpPtrAllocator from MergedIndex.
    • Added serialize method to MergedIndex and deserialize method to MergedIndexView.
    • Added operator== for HeaderContexts and MergedIndex.
  • include/Index/RawIndex.h
    • Removed file.
  • include/Index/TUIndex.h
    • Included AST/SymbolKind.h and Support/Bitmap.h.
    • Modified Relation struct: replaced char padding[4] with std::uint32_t padding, removed union for definition_range and target_symbol, and added set_definition_range and definition_range methods using std::bit_cast.
    • Modified Symbol struct: added SymbolKind kind and Bitmap reference_files.
  • include/Support/Bitmap.h
    • Added new file defining clice::Bitmap as roaring::Roaring.
  • src/Index/HeaderIndex.cpp
    • Removed file.
  • src/Index/IncludeGraph.cpp
    • Modified addIncludeChain to handle invalid include_loc and use fid for unit.file_path.
    • Added graph.paths.emplace_back(unit.file_path(unit.interested_file())); in IncludeGraph::from.
  • src/Index/MergedIndex.cpp
    • Included schema_generated.h and llvm/Support/raw_os_ostream.h.
    • Removed allocator usage for canonical_cache string storage.
    • Implemented serialize and deserialize methods for MergedIndex using FlatBuffers.
  • src/Index/TUIndex.cpp
    • Removed redundant assert(decl && "Invalid decl") in handleDeclOccurrence.
    • Added logic to populate result.symbols with name and kind when a new symbol is encountered.
    • Updated Relation handling for DeclOrDef to use set_definition_range.
    • Added sorting and deduplication for relations and occurrences in FileIndex and updated reference_files for symbols.
  • tests/unit/Index/HeaderIndex.cpp
    • Removed file.
  • tests/unit/Index/MergedIndex.cpp
    • Added Async/Async.h include.
    • Modified Serialization test to build an index, merge it, serialize, deserialize, and assert equality.
  • tests/unit/Index/TUIndex.cpp
    • Refactored expect_select to return a vector of occurrences and added go_to_definition helper.
    • Added new tests for ClassTemplate, FunctionTemplate, AliasTemplate, VarTemplate, and Concept using go_to_definition.
  • tests/unit/Index/Template.cpp
    • Removed file.
Activity
  • gemini-code-assist[bot] provided an initial summary of changes.
  • gemini-code-assist[bot] left several review comments:
  • High priority: Noted fragility of reinterpret_cast for Occurrence and Relation structs in MergedIndex.cpp, suggesting static_asserts for memory layout verification.
  • Medium priority: Questioned the use of FORCE with set(CACHE ...) in cmake/package.cmake, recommending descriptive strings and avoiding FORCE unless strictly necessary.
  • Medium priority: Suggested adding const overloads for commands_size(), begin(), and end() methods in CompilationDatabase for better API usability.
  • Medium priority: Recommended renaming paddings to padding in TUIndex.h for clarity and consistency.
  • Medium priority: Advised removing commented-out test() function from tests/unit/Index/MergedIndex.cpp to maintain code cleanliness.
  • 16bit-ykiko requested a summary and a review from the bot.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring to serialize the index into a binary format using FlatBuffers. This includes removing the old HeaderIndex and RawIndex structures and introducing a new MergedIndex with serialization and deserialization capabilities. The data structures for indexing have been updated, and new tests have been added to verify the serialization logic and improve coverage for template indexing. Overall, this is a solid improvement. I have a few suggestions regarding robustness and type safety in the new serialization/deserialization code.

Comment thread src/Index/MergedIndex.cpp
Comment thread src/Index/MergedIndex.cpp
Comment thread src/Index/MergedIndex.cpp
@16bit-ykiko 16bit-ykiko merged commit 4c63c52 into main Oct 7, 2025
9 checks passed
@16bit-ykiko 16bit-ykiko deleted the serialize-index branch October 7, 2025 10:21
@16bit-ykiko 16bit-ykiko changed the title Serialize index to binary feat: serialize index to binary Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant