Skip to content

feat: complete multi-process architecture, indexing, and cross-file features#362

Closed
16bit-ykiko wants to merge 13 commits into
mainfrom
phase6-advanced-features
Closed

feat: complete multi-process architecture, indexing, and cross-file features#362
16bit-ykiko wants to merge 13 commits into
mainfrom
phase6-advanced-features

Conversation

@16bit-ykiko

@16bit-ykiko 16bit-ykiko commented Mar 16, 2026

Copy link
Copy Markdown
Member

Summary

  • Module import detection: Fixed lexer-based scan() to detect C++20 import / export import directives, enabling proper module dependency resolution
  • CompileGraph fix: Dispatcher now reads actual file content for PCH/PCM compilation instead of scanning empty strings
  • Indexing pipeline: After each compilation, TUIndex is built and merged into ProjectIndex (global symbols) and per-file MergedIndex (positional queries)
  • Cross-file LSP features: GoToDefinition (with Declaration fallback), FindReferences, Rename, PrepareRename, WorkspaceSymbol, CallHierarchy (prepare/incoming/outgoing), TypeHierarchy (prepare/subtypes/supertypes)
  • WorkerPool integration: Multi-process mode gated on --workers CLI flag; single-process remains default
  • Infrastructure: ServerPathPool, CompileGraph, CacheManager, FuzzyGraph, IndexScheduler, MemoryMonitor, ProgressReporter, socket mode
  • Documentation: Design doc, implementation plan, and decision/issue notes in docs/agent/

Test plan

  • All 11 integration tests pass (6 existing + 5 new cross-file tests)
  • Build passes with cmake --build build/Debug
  • Single-process mode fully functional with all LSP features
  • GoToDefinition, DocumentSymbols, SemanticTokens, WorkspaceSymbol, Capabilities verified via integration tests

Made with Cursor

Summary by CodeRabbit

  • New Features

    • Comprehensive Language Server Protocol implementation with IDE features: hover, completion, go-to-definition, find references, rename, semantic tokens, document symbols, call hierarchy, and type hierarchy.
    • Support for C++ module import declarations.
    • Multi-process worker architecture for compilation and indexing.
    • Precompiled header and module caching system.
    • Server configuration via workspace settings.
    • Pipe and socket server transport modes.
  • Tests

    • Added cross-file integration tests for language server features.

16bit-ykiko and others added 13 commits March 16, 2026 21:43
Implement the core MasterServer (single-process pipe mode) with:
- CLI options parsing (--mode, --host, --port, --worker-memory-limit)
- Config loading from clice.toml with variable expansion
- LSP lifecycle: initialize, initialized, shutdown, exit
- Document sync: didOpen, didChange, didSave, didClose
- Feature requests: hover (with include directive fallback), completion,
  signature help, formatting, semantic tokens, document symbols,
  folding ranges, document links, inlay hints
- Build scheduling with per-document completion event synchronization
- Proper string ownership for coroutine URI parameters
- Resource directory initialization for compilation pipeline

All 6 integration tests pass (lifecycle + file operations).

Made-with: Cursor
Implement the StatelessWorker as an independent process that communicates
via bincode JSON-RPC over stdin/stdout. This lays the groundwork for the
multi-process architecture where the master server delegates one-shot
compilation tasks (PCH/PCM builds, completion, signature help, indexing)
to stateless worker processes.

- Define worker IPC protocol types (worker_protocol.h) with RequestTraits
  and NotificationTraits for bincode serialization
- Implement StatelessWorker class handling BuildPCH, BuildPCM, Completion,
  SignatureHelp, and Index requests with cancellation bridging
- Wire --mode=stateless-worker to run_stateless_worker_mode entry point
- Feature results use opaque JSON string forwarding pattern

Made-with: Cursor
Implement the StatefulWorker as an independent process that holds ASTs
per document and handles continuous feature requests. This enables the
multi-process architecture where stateful workers own document state
and process compile/hover/semanticTokens/symbols/folding/links/inlayHints.

- Implement StatefulWorker with per-document DocumentEntry holding AST
- Per-document strand mutex (et::mutex) for coroutine-level serialization
- LRU eviction with EvictedParams notification back to master
- Wire --mode=stateful-worker to run_stateful_worker_mode entry point
- All feature results use opaque JSON string forwarding pattern

Made-with: Cursor
Implement the unified WorkerPool that spawns and manages stateless and
stateful worker subprocesses, providing round-robin and path-based
routing with automatic crash recovery.

- WorkerHandle wraps process, transport, and BincodePeer per worker
- spawn_worker launches child processes with pipe-based IPC
- Stateless workers use round-robin request distribution
- Stateful workers use least-loaded assignment with LRU tracking
- monitor_worker detects crashes and auto-restarts workers
- collect_stderr relays worker logs to master spdlog output
- register_evicted_handler bridges worker EvictedParams notifications

Made-with: Cursor
Implement the compilation dependency management infrastructure:
- ServerPathPool: global path interning with BumpPtrAllocator and
  StringMap for O(1) lookup, mapping file paths to stable uint32_t IDs
- CompileGraph: DAG managing PCH/PCM dependencies with dirty tracking,
  cascading invalidation, cycle detection, and cancellation support.
  Uses a dispatcher callback to decouple from WorkerPool.
- CacheManager: on-disk PCH/PCM cache with xxhash-based deterministic
  paths and LRU eviction by access time

Made-with: Cursor
Implement the background indexing infrastructure:
- FuzzyGraph: approximate include graph with delta updates, tracking
  forward and backward include edges using path IDs. Supports
  transitive affected-file queries and in-degree computation for
  indexing priority.
- IndexScheduler: idle-time background indexer that pauses on user
  activity, resumes after 3s idle delay, and processes files by
  include-frequency priority. Integrates with CompileGraph for
  dependency compilation before indexing.

Made-with: Cursor
Complete the advanced features layer:
- Socket mode: TCP listener using eventide tcp_socket listen/accept,
  spawning a Server instance per connection for multi-client support
- ProgressReporter: LSP progress lifecycle management (begin/report/end)
  using LSPObject construction for WorkDoneProgress values
- MemoryMonitor: platform-specific process memory querying (macOS task_info,
  Linux /proc/statm) with periodic monitoring coroutine
- Wire socket mode into main entry point dispatch

Made-with: Cursor
… into Server

Wire the infrastructure components into the actual LSP server:
- Server now owns ServerPathPool, CompileGraph, CacheManager, FuzzyGraph
- on_initialize sets up CompileGraph dispatcher for PCH/PCM compilation
- scan_dependencies runs lexer-based scan on didOpen/didChange to detect
  module imports and include edges
- resolve_module_deps scans CDB to find module interface units and
  registers dependency edges in CompileGraph
- run_build calls compile_graph.compile_deps() before compilation to
  ensure PCH/PCM dependencies are built first
- make_compile_params injects cached PCH/PCM paths from CompileGraph
- compile_graph.update() cascades invalidation on file changes

Made-with: Cursor
…tion

- Fix scan() to detect C++20 import directives (cxx_import_decl,
  cxx_export_import_decl), populating ScanResult.modules correctly
- Fix CompileGraph dispatcher to read actual file content instead of
  scanning empty strings, enabling proper PCM/PCH compilation
- Connect indexing pipeline: build TUIndex after compilation, merge
  into ProjectIndex (global symbols) and per-file MergedIndex
- Implement all cross-file LSP features: GoToDefinition (with
  Declaration fallback), FindReferences, Rename, PrepareRename,
  WorkspaceSymbol, CallHierarchy (prepare/incoming/outgoing),
  TypeHierarchy (prepare/subtypes/supertypes)
- Integrate WorkerPool with --workers flag for multi-process mode
- Add module name cache for O(1) module resolution
- Add integration tests for cross-file features and capabilities
- Make Relation::definition_range() const-qualified

Made-with: Cursor
@coderabbitai

coderabbitai Bot commented Mar 17, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Introduces a complete LSP server implementation for clice, including command-line option parsing, configuration loading, document synchronization, code intelligence features, multi-mode execution (pipe, socket, stateless/stateful workers), compilation graph management, background indexing, and worker pool infrastructure with IPC integration.

Changes

Cohort / File(s) Summary
Build & Configuration
.gitignore, CMakeLists.txt, cmake/package.cmake
Updated ignores for editor artifacts; refactored CMakeLists.txt to introduce clice-server static library containing all server sources and aliased clice::server target; renamed EVENTIDE_\* feature flags to ET_\* and added ET_ENABLE_RTTI/EXCEPTIONS.
Main Server Entry & Modes
src/clice.cc, src/server/options.h, src/server/options.cpp
Added program startup logic with logger initialization, argument parsing, and mode dispatch (Pipe, Socket, StatelessWorker, StatefulWorker); Options struct with command-line parsing and defaults computation.
Core LSP Server
src/server/server.h, src/server/server.cpp
Comprehensive LSP server implementation with full lifecycle handling (initialize, shutdown, exit), document synchronization, all standard code intelligence handlers (hover, completion, definition, references, symbols, etc.), build orchestration, dependency scanning, indexing, and cross-file navigation.
Socket & Pipe Transport
src/server/socket_mode.h, src/server/socket_mode.cpp
Socket mode entry point that listens for TCP connections and spawns per-connection LSP server instances via IPC.
Configuration Management
src/server/config.h, src/server/config.cpp
Configuration structures (Project, Rule) and loader that parses clice.toml with variable expansion for workspace paths and settings (cache, index, logging directories; compilation commands; clang-tidy flags).
Worker Infrastructure
src/server/stateless_worker.h, src/server/stateless_worker.cpp, src/server/stateful_worker.h, src/server/stateful_worker.cpp
Stateless and stateful worker implementations handling IPC-driven compilation, code completion, indexing, and document caching; includes per-document build orchestration, LRU eviction, and feature handlers.
Worker Pool & IPC Protocol
src/server/worker_pool.h, src/server/worker_pool.cpp, src/server/worker_protocol.h
Comprehensive worker pool managing spawning, monitoring, restart logic, load balancing, ownership tracking; defines IPC protocol structures for stateless/stateful requests (BuildPCH, BuildPCM, Completion, Index, Compile, Hover, SemanticTokens, etc.) with RequestTraits and NotificationTraits.
Compilation & Dependency Graphs
src/server/compile_graph.h, src/server/compile_graph.cpp, src/server/fuzzy_graph.h, src/server/fuzzy_graph.cpp, src/server/path_pool.h
CompileGraph managing compilation units, dependencies, and asynchronous dependency-aware compilation with circular detection; FuzzyGraph tracking file-to-include relationships bidirectionally; ServerPathPool for interning and resolving file paths.
Background Indexing & Scheduling
src/server/index_scheduler.h, src/server/index_scheduler.cpp
IndexScheduler orchestrating background file indexing with queue management, idle delay on user activity, per-file dependency compilation, and PSCMs collection.
Cache & Resource Management
src/server/cache_manager.h, src/server/cache_manager.cpp, src/server/memory_monitor.h, src/server/memory_monitor.cpp, src/server/progress.h, src/server/progress.cpp
CacheManager for PCH/PCM with deterministic paths, existence checks, and LRU eviction; MemoryMonitor with platform-specific process memory queries; ProgressReporter for LSP progress notifications.
Scanner Enhancement
src/syntax/scan.cpp
Added C++ import declaration handling (cxx_import_decl, cxx_export_import_decl) to scan loop, parsing import names and header paths alongside module names.
Type System
src/index/tu_index.h
Made Relation::definition_range() const-qualified.
Documentation
docs/agent/implement.md, docs/agent/note.md
Comprehensive phased implementation plan (Phase 0–6) detailing infrastructure, workers, compilation graph, indexing; extensive decision log covering design choices, workarounds, integration strategies.
Test Infrastructure & Data
tests/fixtures/client.py, tests/integration/test_cross_file.py, tests/integration/test_file_operation.py, tests/data/cross_file/*
Extended LSP test client with navigation methods (go_to_definition, find_references, prepare_rename, etc.); new integration tests for cross-file navigation, symbols, semantic tokens, workspace symbols, and capabilities; test data (helper.h, helper.cpp, main.cpp) for multi-file scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client as LSP Client
    participant Server as LSP Server
    participant CG as CompileGraph
    participant Workers as WorkerPool
    participant Index as ProjectIndex

    Client->>Server: didOpen(document)
    activate Server
    Server->>Server: scan_dependencies()
    Server->>CG: register_unit(path_id)
    Server->>CG: update(path_id)
    activate CG
    CG->>Workers: send_stateless(BuildPCH)
    activate Workers
    Workers-->>CG: PCH result
    deactivate Workers
    CG->>Workers: send_stateless(BuildPCM)
    activate Workers
    Workers-->>CG: PCM results
    deactivate Workers
    deactivate CG
    Server->>Workers: send_stateless(Index)
    activate Workers
    Workers-->>Server: Index result
    deactivate Workers
    Server->>Index: merge(path_id, symbols)
    deactivate Server
Loading
sequenceDiagram
    participant Cli as CLI
    participant Server as Server
    participant Sched as IndexScheduler
    participant CG as CompileGraph
    participant Workers as WorkerPool

    Cli->>Server: initialize()
    activate Server
    Server->>Sched: build_initial_queue(cdb, graph)
    Server->>Sched: run()
    deactivate Server
    
    activate Sched
    loop For each queued file
        Sched->>CG: compile_deps(path_id)
        activate CG
        CG->>Workers: send_stateless(BuildPCH/PCM)
        CG-->>Sched: compilation complete
        deactivate CG
        Sched->>Workers: send_stateless(Index)
        Workers-->>Sched: index result
        Sched->>Sched: indexed_count++
    end
    deactivate Sched
Loading
sequenceDiagram
    participant Client as LSP Client
    participant Server as Socket Server
    participant Worker as StatefulWorker
    participant Comp as Compiler

    Client->>Server: TCP connect
    activate Server
    Server->>Server: spawn handle_connection
    Server->>Worker: create StatefulWorker
    activate Worker
    Worker->>Worker: register_callbacks()
    Worker-->>Client: initialize response
    deactivate Server
    
    Client->>Worker: didOpen(document)
    activate Worker
    Worker->>Comp: compile(params)
    Comp-->>Worker: diagnostics
    Worker->>Worker: touch_lru(uri)
    Worker->>Worker: shrink_if_over_limit()
    Worker-->>Client: publishDiagnostics
    deactivate Worker
    
    Client->>Worker: hover(position)
    activate Worker
    Worker->>Worker: find_document(uri)
    note over Worker: Acquire strand lock
    Worker->>Comp: query hover
    Comp-->>Worker: hover result
    note over Worker: Release strand lock
    Worker-->>Client: hover response
    deactivate Worker
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 Hop hop, the server springs to life,
With workers dancing, free from strife,
Compile graphs bloom and indices shine,
LSP magic, now it's mine!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main objective of the changeset: adding multi-process architecture, indexing, and cross-file features. It is concise, specific, and accurately reflects the primary changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch phase6-advanced-features
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (12)
docs/agent/implement.md (1)

567-567: Clarify worker log collection task lifetime management.

The collect_worker_log task is scheduled but there's no shown mechanism to cancel it when the worker is restarted (e.g., in restart_worker at line 621). Consider:

  1. Storing the task handle in WorkerProcess:
struct WorkerProcess {
    et::process proc;
    et::BincodePeer peer;
    std::unique_ptr<et::StreamTransport> transport;
    et::task_handle log_task;  // 新增:用于取消
};
  1. Cancelling the old log task before spawning a new worker:
et::task<> restart_worker(bool is_stateful, std::size_t index) {
    auto& worker = /* get worker */;
    worker.log_task.cancel();  // 取消旧的日志收集任务
    worker = co_await spawn_worker(options, mode, loop);
}

This prevents accumulation of orphaned log collection tasks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agent/implement.md` at line 567, The scheduled collect_worker_log task
started via loop.schedule(std::move(spawn.stderr_pipe), worker_id) can become
orphaned when a worker is restarted; add a cancellable task handle (e.g.,
et::task_handle log_task) to the WorkerProcess struct and store the handle
returned by loop.schedule so you can call worker.log_task.cancel() (or
equivalent) before replacing the WorkerProcess in restart_worker and then
reassign the new log_task from the newly spawned worker (symbols:
collect_worker_log, loop.schedule, WorkerProcess, log_task, restart_worker,
spawn_worker).
cmake/package.cmake (1)

47-59: Consider pinning eventide to a specific commit or tag.

The eventide dependency uses GIT_TAG main, which can lead to non-reproducible builds if upstream changes. Other dependencies (spdlog, tomlplusplus, croaring, flatbuffers) are pinned to specific versions.

The ET_* flag changes and new RTTI/exceptions flags look appropriate.

Suggested change
 FetchContent_Declare(
     eventide
     GIT_REPOSITORY https://github.com/clice-io/eventide
-    GIT_TAG        main
+    GIT_TAG        v1.0.0  # Pin to a specific version/commit
     GIT_SHALLOW    TRUE
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmake/package.cmake` around lines 47 - 59, The FetchContent_Declare call for
eventide currently uses GIT_TAG main which makes builds non-reproducible; update
the FetchContent_Declare entry for the eventide target to pin GIT_TAG to a
specific release tag or a commit hash (replace "main" with a stable tag or
commit) so the dependency is fixed, and keep the rest of the ET_* option toggles
and the FetchContent_MakeAvailable(eventide ...) invocation unchanged.
src/server/memory_monitor.cpp (1)

51-62: Placeholder implementation noted.

The run coroutine currently only logs without actual memory monitoring integration. The comments indicate this is intentional pending WorkerPool PID exposure. Consider adding a TODO comment or issue tracker reference to ensure this gets completed.

Would you like me to open an issue to track the completion of memory monitoring integration with WorkerPool?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/memory_monitor.cpp` around lines 51 - 62, MemoryMonitor::run is a
placeholder that only logs instead of performing memory checks or integrating
with WorkerPool PIDs; update the function by adding a clear TODO comment with an
issue/ticket reference (or create one) and a short note on required work (expose
WorkerPool PIDs and query per-worker memory metrics), and if helpful add a
minimal assertion or metric-emission stub (e.g., emit a "not implemented"
metric) so the placeholder is discoverable at runtime; reference
MemoryMonitor::run, WorkerPool, and the periodic sleep loop when adding the TODO
and linking the issue.
src/server/config.cpp (1)

32-35: Hardcoded version strings may become stale.

The version and llvm_version variables are hardcoded. Consider sourcing these from CMake-generated defines or a version header to keep them synchronized with the actual build.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/config.cpp` around lines 32 - 35, The handling of var_name ==
"version" and "llvm_version" uses hardcoded strings via result.append("0.1.0")
and result.append("21"); replace those with build-generated macros or constants
(e.g. PROJECT_VERSION or LLVM_VERSION_STRING) produced by CMake/configure_file
and exposed via a version header; update the code to append the macro/constant
(e.g. result.append(PROJECT_VERSION)) and ensure CMake generates and installs
the header or defines the macros (e.g. add configure_file or add_definitions) so
the version and llvm_version remain in sync with the build.
src/server/progress.h (1)

33-38: Consider a shared token source.

Line 38 resets the counter for every ProgressReporter. If more than one reporter can emit over the same peer, token reuse will make concurrent progress streams collide. A peer- or process-scoped generator would avoid that edge.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/progress.h` around lines 33 - 38, The ProgressReporter class
initializes next_token_ to 0 per-instance, causing token reuse across multiple
ProgressReporter instances and collisions over the same peer; change the design
to use a shared token generator instead of the instance member next_token_. Move
token generation to a peer- or process-scoped source (for example attach a token
counter to et::ipc::JsonPeer or use a shared std::atomic<std::uint32_t> /
singleton generator) and replace uses of ProgressReporter::next_token_ with
calls to that shared getNextToken() so tokens are globally unique across peer_
and processes.
tests/integration/test_cross_file.py (3)

64-69: Assert capability support in a spec-compatible way.

These checks are both too strict and too loose: definitionProvider / referencesProvider can legitimately be option objects, while key-presence alone lets a false capability pass for call/type/workspace symbol support. Assert that each capability exists and is not False.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_cross_file.py` around lines 64 - 69, Update the
capability assertions in tests/integration/test_cross_file.py to ensure each
capability key exists and is not False instead of strict boolean checks or mere
key-presence: for result["capabilities"] (caps) assert "definitionProvider" in
caps and caps["definitionProvider"] is not False, assert "referencesProvider" in
caps and caps["referencesProvider"] is not False, and similarly assert
"callHierarchyProvider" in caps and caps["callHierarchyProvider"] is not False,
"typeHierarchyProvider" in caps and caps["typeHierarchyProvider"] is not False,
and "workspaceSymbolProvider" in caps and caps["workspaceSymbolProvider"] is not
False.

7-18: Turn these into behavior checks, not smoke tests.

test_go_to_definition currently allows None, and test_workspace_symbol passes on an empty list. That won’t catch regressions in the new cross-file features; use a symbol defined in another file and assert the returned target URI/range or symbol name(s).

Also applies to: 49-56

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_cross_file.py` around lines 7 - 18,
test_go_to_definition and the related workspace symbol tests are written as
smoke tests that allow None/empty results; change them into behavior checks that
assert cross-file resolution succeeds by using a symbol defined in another file.
Update test_go_to_definition to open both source files (e.g., main.cpp and the
file that defines the target symbol), call client.go_to_definition("main.cpp",
2, 4) and assert the returned Location/target URI and range point to the other
file and expected symbol span; similarly, modify the test_workspace_symbol case
(lines ~49-56) to call client.workspace_symbol with the cross-file symbol name
and assert the returned symbol entries contain the correct name and URI rather
than accepting an empty list. Refer to functions client.go_to_definition and
client.workspace_symbol to locate and change the assertions.

9-11: Replace the fixed 5s sleeps with a readiness wait.

Hard-coded delays make the suite slower and still flaky under CI load. Poll for the expected symbol/token/result, or expose an index-ready/progress signal from the fixture instead of sleeping blindly.

Also applies to: 23-25, 38-40, 51-53

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_cross_file.py` around lines 9 - 11, Replace the fixed
await asyncio.sleep(5) calls with a readiness wait that polls for the actual
condition the test needs (e.g. the symbol/token/result to appear) or use an
index-ready/progress signal exposed by the test fixture; specifically, after
calling client.initialize(...) and client.did_open("main.cpp") replace the blind
sleeps with a loop or helper like wait_for_condition/await_until that repeatedly
queries the language server (e.g. a workspace/symbol or documentSymbol request,
diagnostics, or a custom client.is_index_ready()/fixture.index_ready_event)
until the expected symbol/token is present or a timeout is reached; apply the
same change for the other occurrences of await asyncio.sleep(5) in this test so
tests become deterministic and CI-friendly.
src/server/path_pool.h (1)

23-30: Avoid storing a second copy of every interned path.

llvm::StringMap already owns stable key storage, so the extra allocator copy doubles memory in the component meant to deduplicate path strings. If the LLVM version here guarantees getKey() stability for entries, you can reuse that key directly and drop the second allocation.

♻️ Suggested refactor
         if(inserted) {
-            auto data = allocator.Allocate<char>(path.size() + 1);
-            std::copy(path.begin(), path.end(), data);
-            data[path.size()] = '\0';
-            auto saved = llvm::StringRef(data, path.size());
-            paths.push_back(saved);
+            paths.push_back(it->getKey());
             it->second = static_cast<std::uint32_t>(paths.size() - 1);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/path_pool.h` around lines 23 - 30, The code currently makes a
second heap copy of the interned path by allocating via allocator.Allocate and
saving that into paths; instead reuse the stable key storage inside the
llvm::StringMap entry returned from cache.try_emplace: after insertion, obtain
the map-stored key (via the entry's getKey() accessor, e.g. it->first.getKey()
or it->first->getKey() depending on your StringMap iterator type), construct an
llvm::StringRef from that key and push it into paths, update it->second
accordingly, and remove the allocator.Allocate copy and std::copy usage so you
no longer duplicate the string storage.
src/server/server.cpp (2)

720-772: O(N) module name resolution on every import.

In resolve_module_deps, when a module name is not in the cache, the code iterates through all files in the compilation database and reads/scans each one. For large projects, this could be expensive. The caching mechanism helps after the first lookup, but initial resolution could be slow.

Consider building the module name cache lazily on first use or during initialization to avoid repeated O(N) scans during editing sessions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/server.cpp` around lines 720 - 772, resolve_module_deps currently
does an O(N) scan of cdb.files() each time a module name is missing from
module_name_cache (see resolve_module_deps, module_name_cache, cdb.files(),
scan), which is expensive for large projects; fix it by populating
module_name_cache once (either at Server startup or lazily on first miss) by
iterating cdb.files() a single time to map file_scan.module_name -> path_id (and
mark interface units), then change resolve_module_deps to consult that prebuilt
map and only fall back to a targeted scan if necessary; ensure
compile_graph.register_unit logic remains the same when inserting cached
entries.

672-692: Inconsistent indentation in the non-worker build branch.

Lines 685-691 appear to have different indentation compared to surrounding code. This looks like a formatting issue that should be corrected.

🔧 Suggested fix for indentation
         auto unit = compile(compile_params);

         auto it3 = documents.find(uri);
         if(it3 == documents.end())
             break;
         auto& doc3 = it3->second;
         if(doc3.generation != gen)
             continue;

-        auto diags = feature::diagnostics(unit);
-        doc3.unit = std::make_unique<CompilationUnit>(std::move(unit));
-        publish_diagnostics(uri, diags);
-
-        if(doc3.unit) {
-            update_index(doc3);
-        }
+            auto diags = feature::diagnostics(unit);
+            doc3.unit = std::make_unique<CompilationUnit>(std::move(unit));
+            publish_diagnostics(uri, diags);
+
+            if(doc3.unit) {
+                update_index(doc3);
+            }
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/server.cpp` around lines 672 - 692, The block in the non-worker
build branch around the compile path (uses make_compile_params, compile,
documents.find/it3, doc3.generation, feature::diagnostics, publish_diagnostics,
update_index) has inconsistent indentation for the statements between the
if(doc3.generation != gen) check and the closing brace; reformat so all
statements (auto diags = ..., doc3.unit = ..., publish_diagnostics(...), and the
if(doc3.unit) update_index(doc3) block) are indented consistently with the
surrounding else-block and aligned with the opening brace of that scope, and
ensure the closing braces line up with the corresponding opening braces.
src/server/worker_pool.h (1)

37-51: Round-robin counter may overflow after extended use.

stateless.next is incremented unconditionally and will eventually overflow. While the modulo operation makes this functionally correct due to unsigned wraparound semantics, consider whether this is intentional or if you'd prefer to reset the counter periodically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/worker_pool.h` around lines 37 - 51, The round-robin counter in
send_stateless (stateless.next) is incremented unboundedly and can overflow;
change the update so it does not grow without bound—e.g., compute the worker
index with stateless.next % stateless.workers.size() but update stateless.next
in a bounded way (set stateless.next = (stateless.next + 1) %
stateless.workers.size() or reset to stateless.next % stateless.workers.size()
when it exceeds a threshold) so the counter is kept small and avoids overflow
while preserving round-robin semantics; apply this change in the send_stateless
function where stateless.next is read/updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/agent/implement.md`:
- Line 201: The SmallVector declaration in ServerPathPool is missing template
parameters; update the SmallVector usages (e.g., the members named path_ids or
paths inside the ServerPathPool implementation) to use the correct template form
llvm::SmallVector<ElementType, N> (for example llvm::SmallVector<std::uint32_t,
8> for stored IDs or llvm::SmallVector<llvm::StringRef, 8> for stored path
strings) so the definitions compile; ensure any forward declarations or typedefs
referencing SmallVector are similarly corrected to the templated form.
- Around line 426-458: Replace the manual doc.strand_mutex.lock()/unlock() in
StatefulWorker::on_compile with an RAII guard that acquires the mutex and
releases it on scope exit (use std::lock_guard/std::scoped_lock or the eventide
awaitable lock variant that returns a guard) so the mutex is always released
even if co_await et::queue throws; apply the same change to the on_hover example
(replace explicit lock/unlock with the same RAII guard for doc.strand_mutex) and
ensure the guard type is compatible with the async/await pattern used by
et::queue.

In `@docs/agent/note.md`:
- Around line 143-145: The note "Decision: Global queries not wired in this
phase" is confusing relative to later text that says indexing and cross-file
features are integrated; update the wording to clarify this is historical
context for a specific phase (or else reconcile the status). Edit the paragraph
referencing StatelssWorker `on_index` and IndexScheduler in the implement.md
note to explicitly state "at that phase" or "historical — before TUIndex
extraction was wired" (or flip to state the current integrated status if
indexing is done), so readers won't see a contradiction with the later section
that describes indexing/cross-file features as integrated.
- Line 161: Update the docs to correct the Linux procfs path used by
get_process_memory: replace the incorrect "/proc/statm" reference with the
per-process path "/proc/self/statm" (or "/proc/<pid>/statm") and note that
WorkerPool must expose worker PIDs before the periodic monitoring loop can read
those per-process statm files; ensure the description in get_process_memory
mentions the platform-specific use of macOS task_info and Linux /proc/self/statm
for clarity.

In `@src/server/cache_manager.cpp`:
- Around line 98-103: The loop in cache_manager.cpp decrements total_size
unconditionally after calling llvm::sys::fs::remove, which can make the cache
appear smaller even if remove failed; modify the loop to capture the remove
result (std::error_code ec = llvm::sys::fs::remove(entry.path)), only subtract
entry.size and emit LOG_INFO("Evicted...") when ec indicates success, and on
failure emit a LOG_WARN/LOG_ERROR including ec.message() and entry.path and skip
decrementing total_size so the eviction loop continues correctly until disk
usage is under max_size_; keep the break condition using total_size <= max_size_
as-is.
- Around line 24-25: The call to llvm::sys::fs::create_directories(cache_dir_)
currently ignores errors so unwritable paths are treated as ready; update the
CacheManager initialization to check the return status of
create_directories(cache_dir_), and if it fails log the error (include the error
code/message) instead of the successful LOG_INFO("Cache directory: {}"); then
disable caching (e.g. set a cache_enabled_ flag to false) or propagate the
failure/return early so subsequent PCH/PCM writes won't assume the directory
exists. Ensure you reference cache_dir_, create_directories(cache_dir_), and
replace the current LOG_INFO with a LOG_ERROR/LOG_WARN that includes the error
details.

In `@src/server/compile_graph.cpp`:
- Around line 33-61: The remove_unit implementation currently erases the unit
while coroutines (compile_deps()/compile_impl()) may be suspended waiting on the
unit's completion event, causing use-after-free; change remove_unit to first
mark the unit as canceled (e.g., set a removed/canceled flag on the Unit entry
and signal its completion/event to wake waiters) and do NOT erase the map entry
or destroy the event immediately — instead decrement/observe an in-flight
counter or attach a deferred cleanup list and only physically erase the units
map entry and remove from pch_cache/pcm_cache after all in-flight compiles
(tracked by an in_flight counter or waiter count on the Unit) have finished;
reference CompileGraph::remove_unit, Unit::completion (or completion),
compile_deps(), compile_impl(), and the pch_cache/pcm_cache removals when making
this change.
- Line 101: The cycle-detection stack is never armed because top-level calls to
compile_impl are scheduled with stack == nullptr; change the loop.schedule(...)
invocations that currently call loop.schedule(compile_impl(dep_id, loop)) so
they construct and pass a real stack object (e.g., a newly created shared_ptr or
unique_ptr to the stack/vector type used by compile_impl) instead of nullptr,
and ensure every internal call to compile_impl propagates that same stack
pointer so the guard inside compile_impl (the stack check and cycle-reporting
code) can run and break dependency cycles rather than leaving units waiting on
each other's completion event.

In `@src/server/config.cpp`:
- Around line 73-74: The code assigns
project["max_active_file"].value<int64_t>() directly to
config.project.max_active_file via static_cast<std::size_t> which will wrap
negative inputs; update the handling around the call that reads
"max_active_file" so you first check that the optional<int64_t> (from
value<int64_t>()) has a value and that *v >= 0 before casting, and if the value
is negative or missing, either set a safe default or return/log an error;
specifically modify the block referencing project["max_active_file"], the local
variable v (result of value<int64_t>()), and the assignment to
config.project.max_active_file to perform the non-negative validation and error/
default handling.

In `@src/server/index_scheduler.cpp`:
- Around line 102-110: IndexScheduler currently only logs failures after
co_await pool_.send_stateless(params) and never updates the in-memory index, so
successful background indexing doesn't become visible; after confirming
result.has_value() and result->success, persist the returned index data into
project_index_ (e.g., merge or replace via the appropriate ProjectIndex API) so
newly indexed symbols are available to cross-file queries, taking any required
locking/atomic update into account; locate this logic around
pool_.send_stateless and the result/result->success handling and add the
persistence step to project_index_.
- Around line 91-102: The indexing request only sets worker::IndexParams
params.file while pcm_vec, directory and arguments are left out, so
StatelessWorker::on_index cannot replay the compile context; update the code
that builds params (worker::IndexParams) to assign the compiled module list and
command-line bits: copy pcm_vec into params (e.g. params.pcms or equivalent
field), set params.directory from the compile_graph_.get_pch/get_pcms context or
path's directory, and populate params.arguments with the same arguments produced
by compile_deps() so the worker receives the full compile context for replaying
the command line and module artifacts.

In `@src/server/memory_monitor.cpp`:
- Around line 41-44: Replace the hardcoded 4096 multiplier when computing memory
from "pages" by querying the system page size with sysconf(_SC_PAGESIZE); in
memory_monitor.cpp, include <unistd.h> if needed, call sysconf to get page_size
(handle a -1 fallback if syscall fails) and multiply pages by the returned
page_size (cast to std::size_t) before returning; update the return expression
that currently uses "pages * 4096" to use the queried page_size instead.

In `@src/server/options.cpp`:
- Around line 50-89: The CLI currently ignores unknown flags and omits the
--workers switch from help; update the argument parsing loop (the for loop and
extract_value lambda) so that unrecognized arguments produce a clear
error/warning (e.g., LOG_ERROR or LOG_WARN and exit or set a failure flag)
instead of silently falling back to defaults, and add the --workers description
to the help output alongside the other options; specifically modify the help
printing block and the final else branch of the parsing chain to report unknown
flags (referencing extract_value, opts.enable_workers, and Mode handling) so
typos and unsupported flags are surfaced to the user.
- Around line 83-86: The current numeric parsing for --port and
--worker-memory-limit is unsafe; replace the std::atoi/std::stoull usage by
calling std::from_chars on the extracted string (from extract_value) for each
option, verify the parse succeeded (ec == std::errc()) and that the pointer
reached the end of the string to reject trailing garbage, for --port
additionally check the parsed integer is in the 1–65535 range before assigning
to opts.port, and for --worker_memory_limit parse into an unsigned long long but
first reject any leading '-' and ensure full consumption; on any parse or range
failure, handle it gracefully (e.g., log a descriptive error and abort startup
or set a safe default and return an error) rather than letting exceptions or
reinterpretation of negative values occur.

In `@src/server/path_pool.h`:
- Around line 3-9: The header uses std::copy (transitively relied upon) but
doesn't include <algorithm>, so add `#include` <algorithm> to this header (place
it with the other standard includes near the top) to make the header
self-contained and ensure std::copy is declared for any translation unit that
includes path_pool.h.

In `@src/server/server.cpp`:
- Around line 825-829: The Server::path_to_uri function naively prepends
"file://" and fails to percent-encode special characters and platform-specific
separators; update it to produce a valid file URI by normalizing path separators
(convert backslashes to '/'), ensuring the proper leading slash for Windows
drive letters, and percent-encoding reserved characters (spaces, non-ASCII,
percent, ?, #, etc.) per RFC 3986; implement or call a utility encoder (e.g., a
percentEncode function) and use it inside Server::path_to_uri so the returned
string is a correctly escaped file:// URI for all platforms.

In `@src/server/stateless_worker.cpp`:
- Around line 220-225: The code currently hardcodes
worker::IndexResult.result.success = true after calling compile(compile_params)
and does not populate the TU index; change this to determine success from the
compile(...) return (e.g., inspect the returned unit for errors or a status
field) and set result.success accordingly, populate result.TUIndex (or the
appropriate TU container) from the compilation unit when indexing is
implemented, and on compile failure set result.success = false and include
error/diagnostic info so callers can retry; update references around compile,
compile_params, worker::IndexResult, result.success and the TU index population
to reflect these checks and avoid reporting success for empty/failed parses.

In `@src/server/worker_pool.cpp`:
- Around line 52-63: The pool shutdown currently closes worker pipes but
monitor_worker unconditionally calls restart_worker, causing workers to be
respawned; add a shutdown flag (e.g., WorkerPool::shutting_down as an
atomic<bool>) set to true at the start of WorkerPool::stop() before closing
outputs, and update monitor_worker (and any other place that currently calls
restart_worker on exit) to check that flag and skip calling restart_worker when
shutting_down is true so workers drain instead of being restarted; ensure the
flag is used in both stateless and stateful monitoring paths and is thread-safe
(atomic) so no races occur.
- Around line 178-208: The assign_worker implementation can assign index 0 when
stateful.workers is empty; add an explicit check at the top of
WorkerPool::assign_worker for stateful.workers.empty() and handle it (e.g.,
throw a std::runtime_error or return an invalid sentinel like SIZE_MAX) instead
of proceeding to compute best_idx; this prevents inserting
stateful.owner[path_id] with a non-existent worker index and avoids later
out-of-bounds access when dereferencing stateful.workers. Ensure the chosen
behavior is consistent with callers (adjust caller handling if you choose to
throw) and reference stateful.workers, stateful.owner, stateful.owner_lru, and
stateful.owner_lru_index when making the change.

---

Nitpick comments:
In `@cmake/package.cmake`:
- Around line 47-59: The FetchContent_Declare call for eventide currently uses
GIT_TAG main which makes builds non-reproducible; update the
FetchContent_Declare entry for the eventide target to pin GIT_TAG to a specific
release tag or a commit hash (replace "main" with a stable tag or commit) so the
dependency is fixed, and keep the rest of the ET_* option toggles and the
FetchContent_MakeAvailable(eventide ...) invocation unchanged.

In `@docs/agent/implement.md`:
- Line 567: The scheduled collect_worker_log task started via
loop.schedule(std::move(spawn.stderr_pipe), worker_id) can become orphaned when
a worker is restarted; add a cancellable task handle (e.g., et::task_handle
log_task) to the WorkerProcess struct and store the handle returned by
loop.schedule so you can call worker.log_task.cancel() (or equivalent) before
replacing the WorkerProcess in restart_worker and then reassign the new log_task
from the newly spawned worker (symbols: collect_worker_log, loop.schedule,
WorkerProcess, log_task, restart_worker, spawn_worker).

In `@src/server/config.cpp`:
- Around line 32-35: The handling of var_name == "version" and "llvm_version"
uses hardcoded strings via result.append("0.1.0") and result.append("21");
replace those with build-generated macros or constants (e.g. PROJECT_VERSION or
LLVM_VERSION_STRING) produced by CMake/configure_file and exposed via a version
header; update the code to append the macro/constant (e.g.
result.append(PROJECT_VERSION)) and ensure CMake generates and installs the
header or defines the macros (e.g. add configure_file or add_definitions) so the
version and llvm_version remain in sync with the build.

In `@src/server/memory_monitor.cpp`:
- Around line 51-62: MemoryMonitor::run is a placeholder that only logs instead
of performing memory checks or integrating with WorkerPool PIDs; update the
function by adding a clear TODO comment with an issue/ticket reference (or
create one) and a short note on required work (expose WorkerPool PIDs and query
per-worker memory metrics), and if helpful add a minimal assertion or
metric-emission stub (e.g., emit a "not implemented" metric) so the placeholder
is discoverable at runtime; reference MemoryMonitor::run, WorkerPool, and the
periodic sleep loop when adding the TODO and linking the issue.

In `@src/server/path_pool.h`:
- Around line 23-30: The code currently makes a second heap copy of the interned
path by allocating via allocator.Allocate and saving that into paths; instead
reuse the stable key storage inside the llvm::StringMap entry returned from
cache.try_emplace: after insertion, obtain the map-stored key (via the entry's
getKey() accessor, e.g. it->first.getKey() or it->first->getKey() depending on
your StringMap iterator type), construct an llvm::StringRef from that key and
push it into paths, update it->second accordingly, and remove the
allocator.Allocate copy and std::copy usage so you no longer duplicate the
string storage.

In `@src/server/progress.h`:
- Around line 33-38: The ProgressReporter class initializes next_token_ to 0
per-instance, causing token reuse across multiple ProgressReporter instances and
collisions over the same peer; change the design to use a shared token generator
instead of the instance member next_token_. Move token generation to a peer- or
process-scoped source (for example attach a token counter to et::ipc::JsonPeer
or use a shared std::atomic<std::uint32_t> / singleton generator) and replace
uses of ProgressReporter::next_token_ with calls to that shared getNextToken()
so tokens are globally unique across peer_ and processes.

In `@src/server/server.cpp`:
- Around line 720-772: resolve_module_deps currently does an O(N) scan of
cdb.files() each time a module name is missing from module_name_cache (see
resolve_module_deps, module_name_cache, cdb.files(), scan), which is expensive
for large projects; fix it by populating module_name_cache once (either at
Server startup or lazily on first miss) by iterating cdb.files() a single time
to map file_scan.module_name -> path_id (and mark interface units), then change
resolve_module_deps to consult that prebuilt map and only fall back to a
targeted scan if necessary; ensure compile_graph.register_unit logic remains the
same when inserting cached entries.
- Around line 672-692: The block in the non-worker build branch around the
compile path (uses make_compile_params, compile, documents.find/it3,
doc3.generation, feature::diagnostics, publish_diagnostics, update_index) has
inconsistent indentation for the statements between the if(doc3.generation !=
gen) check and the closing brace; reformat so all statements (auto diags = ...,
doc3.unit = ..., publish_diagnostics(...), and the if(doc3.unit)
update_index(doc3) block) are indented consistently with the surrounding
else-block and aligned with the opening brace of that scope, and ensure the
closing braces line up with the corresponding opening braces.

In `@src/server/worker_pool.h`:
- Around line 37-51: The round-robin counter in send_stateless (stateless.next)
is incremented unboundedly and can overflow; change the update so it does not
grow without bound—e.g., compute the worker index with stateless.next %
stateless.workers.size() but update stateless.next in a bounded way (set
stateless.next = (stateless.next + 1) % stateless.workers.size() or reset to
stateless.next % stateless.workers.size() when it exceeds a threshold) so the
counter is kept small and avoids overflow while preserving round-robin
semantics; apply this change in the send_stateless function where stateless.next
is read/updated.

In `@tests/integration/test_cross_file.py`:
- Around line 64-69: Update the capability assertions in
tests/integration/test_cross_file.py to ensure each capability key exists and is
not False instead of strict boolean checks or mere key-presence: for
result["capabilities"] (caps) assert "definitionProvider" in caps and
caps["definitionProvider"] is not False, assert "referencesProvider" in caps and
caps["referencesProvider"] is not False, and similarly assert
"callHierarchyProvider" in caps and caps["callHierarchyProvider"] is not False,
"typeHierarchyProvider" in caps and caps["typeHierarchyProvider"] is not False,
and "workspaceSymbolProvider" in caps and caps["workspaceSymbolProvider"] is not
False.
- Around line 7-18: test_go_to_definition and the related workspace symbol tests
are written as smoke tests that allow None/empty results; change them into
behavior checks that assert cross-file resolution succeeds by using a symbol
defined in another file. Update test_go_to_definition to open both source files
(e.g., main.cpp and the file that defines the target symbol), call
client.go_to_definition("main.cpp", 2, 4) and assert the returned
Location/target URI and range point to the other file and expected symbol span;
similarly, modify the test_workspace_symbol case (lines ~49-56) to call
client.workspace_symbol with the cross-file symbol name and assert the returned
symbol entries contain the correct name and URI rather than accepting an empty
list. Refer to functions client.go_to_definition and client.workspace_symbol to
locate and change the assertions.
- Around line 9-11: Replace the fixed await asyncio.sleep(5) calls with a
readiness wait that polls for the actual condition the test needs (e.g. the
symbol/token/result to appear) or use an index-ready/progress signal exposed by
the test fixture; specifically, after calling client.initialize(...) and
client.did_open("main.cpp") replace the blind sleeps with a loop or helper like
wait_for_condition/await_until that repeatedly queries the language server (e.g.
a workspace/symbol or documentSymbol request, diagnostics, or a custom
client.is_index_ready()/fixture.index_ready_event) until the expected
symbol/token is present or a timeout is reached; apply the same change for the
other occurrences of await asyncio.sleep(5) in this test so tests become
deterministic and CI-friendly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fccc713e-57bf-4109-b6f7-ecc3755d38ce

📥 Commits

Reviewing files that changed from the base of the PR and between 73afcfb and a4a2596.

📒 Files selected for processing (43)
  • .gitignore
  • CMakeLists.txt
  • cmake/package.cmake
  • docs/agent/design.md
  • docs/agent/implement.md
  • docs/agent/note.md
  • src/clice.cc
  • src/index/tu_index.h
  • src/server/cache_manager.cpp
  • src/server/cache_manager.h
  • src/server/compile_graph.cpp
  • src/server/compile_graph.h
  • src/server/config.cpp
  • src/server/config.h
  • src/server/fuzzy_graph.cpp
  • src/server/fuzzy_graph.h
  • src/server/index_scheduler.cpp
  • src/server/index_scheduler.h
  • src/server/memory_monitor.cpp
  • src/server/memory_monitor.h
  • src/server/options.cpp
  • src/server/options.h
  • src/server/path_pool.h
  • src/server/progress.cpp
  • src/server/progress.h
  • src/server/server.cpp
  • src/server/server.h
  • src/server/socket_mode.cpp
  • src/server/socket_mode.h
  • src/server/stateful_worker.cpp
  • src/server/stateful_worker.h
  • src/server/stateless_worker.cpp
  • src/server/stateless_worker.h
  • src/server/worker_pool.cpp
  • src/server/worker_pool.h
  • src/server/worker_protocol.h
  • src/syntax/scan.cpp
  • tests/data/cross_file/helper.cpp
  • tests/data/cross_file/helper.h
  • tests/data/cross_file/main.cpp
  • tests/fixtures/client.py
  • tests/integration/test_cross_file.py
  • tests/integration/test_file_operation.py
💤 Files with no reviewable changes (1)
  • tests/integration/test_file_operation.py

Comment thread docs/agent/implement.md

**分步实现**:

1. **ServerPathPool**(`src/server/path_pool.h`):实现全局路径池(`llvm::BumpPtrAllocator` + `llvm::StringMap` + `llvm::SmallVector`),所有路径通过 `intern()` 获取 `uint32_t` ID,后续各模块统一使用 path_id

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the incomplete llvm::SmallVector type.

The llvm::SmallVector requires template parameters. It should be llvm::SmallVector<Type, Size>, for example:

llvm::SmallVector<std::uint32_t, 8> path_ids;  // 存储路径ID

or for storing path strings:

llvm::SmallVector<llvm::StringRef, 8> paths;

This syntax error would prevent compilation if followed literally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agent/implement.md` at line 201, The SmallVector declaration in
ServerPathPool is missing template parameters; update the SmallVector usages
(e.g., the members named path_ids or paths inside the ServerPathPool
implementation) to use the correct template form llvm::SmallVector<ElementType,
N> (for example llvm::SmallVector<std::uint32_t, 8> for stored IDs or
llvm::SmallVector<llvm::StringRef, 8> for stored path strings) so the
definitions compile; ensure any forward declarations or typedefs referencing
SmallVector are similarly corrected to the templated form.

Comment thread docs/agent/implement.md
Comment on lines +426 to +458
et::RequestResult<worker::CompileParams>
StatefulWorker::on_compile(auto& ctx, const worker::CompileParams& params) {
auto& doc = documents[params.uri];
doc.version = params.version;
doc.text = params.text;
doc.directory = params.directory;
doc.arguments = params.arguments;
doc.pch = params.pch;
doc.pcms = params.pcms;

touch_lru(params.uri);

// per-document 串行化
co_await doc.strand_mutex.lock();

auto result = co_await et::queue([&]() {
CompilationParams cp;
cp.kind = CompilationKind::Parse;
// ... 从 doc 构造编译参数 ...
cp.add_remapped_file(path, doc.text, doc.pch.second);
doc.unit = compile(cp);
doc.dirty = false;
return worker::CompileResult{
.uri = params.uri,
.version = params.version,
.diagnostics = feature::diagnostics(doc.unit),
.memory_usage = current_memory_usage()
};
}, loop);

doc.strand_mutex.unlock();
co_return result;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use RAII for mutex management to prevent deadlocks.

The manual lock() and unlock() pattern (lines 439, 456) is error-prone. If an exception occurs between lock and unlock (e.g., from co_await et::queue), the mutex will remain locked, causing deadlock.

Consider using RAII with std::lock_guard or eventide's equivalent:

-    co_await doc.strand_mutex.lock();
+    auto lock = co_await doc.strand_mutex.scoped_lock();

     auto result = co_await et::queue([&]() {
         // ... compilation work ...
     }, loop);

-    doc.strand_mutex.unlock();
     co_return result;

The same issue exists in the on_hover example at lines 473-484.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agent/implement.md` around lines 426 - 458, Replace the manual
doc.strand_mutex.lock()/unlock() in StatefulWorker::on_compile with an RAII
guard that acquires the mutex and releases it on scope exit (use
std::lock_guard/std::scoped_lock or the eventide awaitable lock variant that
returns a guard) so the mutex is always released even if co_await et::queue
throws; apply the same change to the on_hover example (replace explicit
lock/unlock with the same RAII guard for doc.strand_mutex) and ensure the guard
type is compatible with the async/await pattern used by et::queue.

Comment thread docs/agent/note.md
Comment on lines +143 to +145
### Decision: Global queries not wired in this phase
The implement.md shows FindReferences/GoToDef/Rename using the index, but the actual TUIndex extraction from workers is still a TODO (StatelessWorker `on_index` returns success without data). The IndexScheduler infrastructure is in place; wiring to real TUIndex data will happen when the indexing pipeline is complete.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify stale TODO versus completed indexing integration.

Line 143–145 says global index-backed queries are not wired yet, but Line 192–207 later says indexing and cross-file features are integrated. This reads as contradictory in the final doc state. Please mark this section as historical context (e.g., “at that phase”) or update wording to avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agent/note.md` around lines 143 - 145, The note "Decision: Global
queries not wired in this phase" is confusing relative to later text that says
indexing and cross-file features are integrated; update the wording to clarify
this is historical context for a specific phase (or else reconcile the status).
Edit the paragraph referencing StatelssWorker `on_index` and IndexScheduler in
the implement.md note to explicitly state "at that phase" or "historical —
before TUIndex extraction was wired" (or flip to state the current integrated
status if indexing is done), so readers won't see a contradiction with the later
section that describes indexing/cross-file features as integrated.

Comment thread docs/agent/note.md
The design.md 6.2 describes config file watching using `et::fs_event`. However, eventide does not expose an `fs_event` API. Configuration hot-reload will need to be implemented via polling or deferred until eventide adds file system event support.

### Decision: MemoryMonitor is informational placeholder
Full memory monitoring requires exposing worker PIDs from `WorkerPool`, which is not yet wired. The `get_process_memory()` function uses platform-specific APIs (macOS `task_info`, Linux `/proc/statm`). The periodic monitoring loop is in place and will be connected when the master server integrates all subsystems.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

On Linux, what is the correct /proc path to read memory usage for the current process: /proc/statm or /proc/self/statm?

💡 Result:

Use /proc/self/statm to read memory usage for the current process; it’s a symlink to that process’s /proc/<pid>/statm entry. (docs.kernel.org)

/proc/statm is not the correct path (the statm file is per-process under /proc/<pid>/, or via /proc/self/). (docs.kernel.org)

Citations:


🏁 Script executed:

head -165 docs/agent/note.md | tail -10

Repository: clice-io/clice

Length of output: 880


Fix Linux procfs path for per-process memory lookup.

Line 161 references /proc/statm, which is not a valid Linux path. For process memory, use /proc/self/statm (a symlink to /proc/<pid>/statm). The current wording could mislead implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agent/note.md` at line 161, Update the docs to correct the Linux procfs
path used by get_process_memory: replace the incorrect "/proc/statm" reference
with the per-process path "/proc/self/statm" (or "/proc/<pid>/statm") and note
that WorkerPool must expose worker PIDs before the periodic monitoring loop can
read those per-process statm files; ensure the description in get_process_memory
mentions the platform-specific use of macOS task_info and Linux /proc/self/statm
for clarity.

Comment on lines +24 to +25
llvm::sys::fs::create_directories(cache_dir_);
LOG_INFO("Cache directory: {}", cache_dir_);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle cache directory creation failures.

Line 24 ignores the returned error code, so an unwritable cache path still gets logged as ready and the first real failure shows up much later when PCH/PCM writes start. Please surface the error or disable caching here instead of proceeding silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/cache_manager.cpp` around lines 24 - 25, The call to
llvm::sys::fs::create_directories(cache_dir_) currently ignores errors so
unwritable paths are treated as ready; update the CacheManager initialization to
check the return status of create_directories(cache_dir_), and if it fails log
the error (include the error code/message) instead of the successful
LOG_INFO("Cache directory: {}"); then disable caching (e.g. set a cache_enabled_
flag to false) or propagate the failure/return early so subsequent PCH/PCM
writes won't assume the directory exists. Ensure you reference cache_dir_,
create_directories(cache_dir_), and replace the current LOG_INFO with a
LOG_ERROR/LOG_WARN that includes the error details.

Comment thread src/server/path_pool.h
Comment on lines +3 to +9
#include <cassert>
#include <cstdint>

#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Allocator.h"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/server/path_pool.h

Repository: clice-io/clice

Length of output: 1870


Include <algorithm> directly in this header.

std::copy is used at line 26, but this header doesn't include <algorithm>. This makes the header non-self-contained and brittle—it will break if unrelated include-order changes remove the transitive <algorithm> dependency.

Suggested fix
+#include <algorithm>
 `#include` <cassert>
 `#include` <cstdint>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <cassert>
#include <cstdint>
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Allocator.h"
`#include` <algorithm>
`#include` <cassert>
`#include` <cstdint>
`#include` "llvm/ADT/SmallVector.h"
`#include` "llvm/ADT/StringMap.h"
`#include` "llvm/ADT/StringRef.h"
`#include` "llvm/Support/Allocator.h"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/path_pool.h` around lines 3 - 9, The header uses std::copy
(transitively relied upon) but doesn't include <algorithm>, so add `#include`
<algorithm> to this header (place it with the other standard includes near the
top) to make the header self-contained and ensure std::copy is declared for any
translation unit that includes path_pool.h.

Comment thread src/server/server.cpp
Comment on lines +825 to +829
std::string Server::path_to_uri(llvm::StringRef path) {
std::string uri = "file://";
uri += path;
return uri;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

URI construction doesn't handle special characters.

The path_to_uri function simply prepends file:// to the path. On Windows and other platforms, paths with spaces or special characters should be percent-encoded for valid URIs.

🛠️ Consider using proper URI encoding
 std::string Server::path_to_uri(llvm::StringRef path) {
-    std::string uri = "file://";
-    uri += path;
-    return uri;
+    // Consider using a proper URI encoding library or implementing percent-encoding
+    // for special characters like spaces, #, ?, etc.
+    std::string uri = "file://";
+    // For now, basic implementation - may need enhancement for special characters
+    uri += path;
+    return uri;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/server.cpp` around lines 825 - 829, The Server::path_to_uri
function naively prepends "file://" and fails to percent-encode special
characters and platform-specific separators; update it to produce a valid file
URI by normalizing path separators (convert backslashes to '/'), ensuring the
proper leading slash for Windows drive letters, and percent-encoding reserved
characters (spaces, non-ASCII, percent, ?, #, etc.) per RFC 3986; implement or
call a utility encoder (e.g., a percentEncode function) and use it inside
Server::path_to_uri so the returned string is a correctly escaped file:// URI
for all platforms.

Comment on lines +220 to +225
auto unit = compile(compile_params);

worker::IndexResult result;
result.success = true;
// TODO: extract TUIndex data from the compilation unit when indexing is fully implemented

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't report indexing success before an index is actually produced.

compile(compile_params) is ignored here, Line 223 hardcodes success = true, and the TODO leaves the TU index unpopulated. That makes failed or empty parses look like successful indexing runs, which will suppress retries and leave cross-file data stale.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/stateless_worker.cpp` around lines 220 - 225, The code currently
hardcodes worker::IndexResult.result.success = true after calling
compile(compile_params) and does not populate the TU index; change this to
determine success from the compile(...) return (e.g., inspect the returned unit
for errors or a status field) and set result.success accordingly, populate
result.TUIndex (or the appropriate TU container) from the compilation unit when
indexing is implemented, and on compile failure set result.success = false and
include error/diagnostic info so callers can retry; update references around
compile, compile_params, worker::IndexResult, result.success and the TU index
population to reflect these checks and avoid reporting success for empty/failed
parses.

Comment on lines +52 to +63
et::task<> WorkerPool::stop() {
for(auto& w : stateless.workers) {
if(w && w->alive) {
w->peer->close_output();
}
}
for(auto& w : stateful.workers) {
if(w && w->alive) {
w->peer->close_output();
}
}
co_return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Stopping the pool currently respawns every worker.

stop() closes the worker pipes, but monitor_worker() unconditionally calls restart_worker() on every exit. A normal shutdown therefore turns into a restart loop instead of draining the pool. Add a shutdown flag and skip restarts once shutdown begins.

Also applies to: 119-133

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/worker_pool.cpp` around lines 52 - 63, The pool shutdown currently
closes worker pipes but monitor_worker unconditionally calls restart_worker,
causing workers to be respawned; add a shutdown flag (e.g.,
WorkerPool::shutting_down as an atomic<bool>) set to true at the start of
WorkerPool::stop() before closing outputs, and update monitor_worker (and any
other place that currently calls restart_worker on exit) to check that flag and
skip calling restart_worker when shutting_down is true so workers drain instead
of being restarted; ensure the flag is used in both stateless and stateful
monitoring paths and is thread-safe (atomic) so no races occur.

Comment on lines +178 to +208
std::size_t WorkerPool::assign_worker(std::uint32_t path_id) {
auto it = stateful.owner.find(path_id);
if(it != stateful.owner.end()) {
auto lru_it = stateful.owner_lru_index.find(path_id);
if(lru_it != stateful.owner_lru_index.end()) {
stateful.owner_lru.erase(lru_it->second);
stateful.owner_lru.push_front(path_id);
lru_it->second = stateful.owner_lru.begin();
}
return it->second;
}

std::size_t min_count = SIZE_MAX;
std::size_t best_idx = 0;
for(std::size_t i = 0; i < stateful.workers.size(); ++i) {
std::size_t count = 0;
for(auto& [pid, wid] : stateful.owner) {
if(wid == i)
count++;
}
if(count < min_count) {
min_count = count;
best_idx = i;
}
}

stateful.owner[path_id] = best_idx;
stateful.owner_lru.push_front(path_id);
stateful.owner_lru_index[path_id] = stateful.owner_lru.begin();

return best_idx;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle the zero-stateful-worker case explicitly.

If stateful.workers is empty, best_idx stays 0 and this method records ownership for a worker that does not exist. The first caller that dereferences that index will walk past the end of the worker array.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/worker_pool.cpp` around lines 178 - 208, The assign_worker
implementation can assign index 0 when stateful.workers is empty; add an
explicit check at the top of WorkerPool::assign_worker for
stateful.workers.empty() and handle it (e.g., throw a std::runtime_error or
return an invalid sentinel like SIZE_MAX) instead of proceeding to compute
best_idx; this prevents inserting stateful.owner[path_id] with a non-existent
worker index and avoids later out-of-bounds access when dereferencing
stateful.workers. Ensure the chosen behavior is consistent with callers (adjust
caller handling if you choose to throw) and reference stateful.workers,
stateful.owner, stateful.owner_lru, and stateful.owner_lru_index when making the
change.

@16bit-ykiko 16bit-ykiko deleted the phase6-advanced-features branch March 22, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant