feat: complete multi-process architecture, indexing, and cross-file features#362
feat: complete multi-process architecture, indexing, and cross-file features#36216bit-ykiko wants to merge 13 commits into
Conversation
Implement the core MasterServer (single-process pipe mode) with: - CLI options parsing (--mode, --host, --port, --worker-memory-limit) - Config loading from clice.toml with variable expansion - LSP lifecycle: initialize, initialized, shutdown, exit - Document sync: didOpen, didChange, didSave, didClose - Feature requests: hover (with include directive fallback), completion, signature help, formatting, semantic tokens, document symbols, folding ranges, document links, inlay hints - Build scheduling with per-document completion event synchronization - Proper string ownership for coroutine URI parameters - Resource directory initialization for compilation pipeline All 6 integration tests pass (lifecycle + file operations). Made-with: Cursor
Implement the StatelessWorker as an independent process that communicates via bincode JSON-RPC over stdin/stdout. This lays the groundwork for the multi-process architecture where the master server delegates one-shot compilation tasks (PCH/PCM builds, completion, signature help, indexing) to stateless worker processes. - Define worker IPC protocol types (worker_protocol.h) with RequestTraits and NotificationTraits for bincode serialization - Implement StatelessWorker class handling BuildPCH, BuildPCM, Completion, SignatureHelp, and Index requests with cancellation bridging - Wire --mode=stateless-worker to run_stateless_worker_mode entry point - Feature results use opaque JSON string forwarding pattern Made-with: Cursor
Implement the StatefulWorker as an independent process that holds ASTs per document and handles continuous feature requests. This enables the multi-process architecture where stateful workers own document state and process compile/hover/semanticTokens/symbols/folding/links/inlayHints. - Implement StatefulWorker with per-document DocumentEntry holding AST - Per-document strand mutex (et::mutex) for coroutine-level serialization - LRU eviction with EvictedParams notification back to master - Wire --mode=stateful-worker to run_stateful_worker_mode entry point - All feature results use opaque JSON string forwarding pattern Made-with: Cursor
Implement the unified WorkerPool that spawns and manages stateless and stateful worker subprocesses, providing round-robin and path-based routing with automatic crash recovery. - WorkerHandle wraps process, transport, and BincodePeer per worker - spawn_worker launches child processes with pipe-based IPC - Stateless workers use round-robin request distribution - Stateful workers use least-loaded assignment with LRU tracking - monitor_worker detects crashes and auto-restarts workers - collect_stderr relays worker logs to master spdlog output - register_evicted_handler bridges worker EvictedParams notifications Made-with: Cursor
Implement the compilation dependency management infrastructure: - ServerPathPool: global path interning with BumpPtrAllocator and StringMap for O(1) lookup, mapping file paths to stable uint32_t IDs - CompileGraph: DAG managing PCH/PCM dependencies with dirty tracking, cascading invalidation, cycle detection, and cancellation support. Uses a dispatcher callback to decouple from WorkerPool. - CacheManager: on-disk PCH/PCM cache with xxhash-based deterministic paths and LRU eviction by access time Made-with: Cursor
Implement the background indexing infrastructure: - FuzzyGraph: approximate include graph with delta updates, tracking forward and backward include edges using path IDs. Supports transitive affected-file queries and in-degree computation for indexing priority. - IndexScheduler: idle-time background indexer that pauses on user activity, resumes after 3s idle delay, and processes files by include-frequency priority. Integrates with CompileGraph for dependency compilation before indexing. Made-with: Cursor
Complete the advanced features layer: - Socket mode: TCP listener using eventide tcp_socket listen/accept, spawning a Server instance per connection for multi-client support - ProgressReporter: LSP progress lifecycle management (begin/report/end) using LSPObject construction for WorkDoneProgress values - MemoryMonitor: platform-specific process memory querying (macOS task_info, Linux /proc/statm) with periodic monitoring coroutine - Wire socket mode into main entry point dispatch Made-with: Cursor
… into Server Wire the infrastructure components into the actual LSP server: - Server now owns ServerPathPool, CompileGraph, CacheManager, FuzzyGraph - on_initialize sets up CompileGraph dispatcher for PCH/PCM compilation - scan_dependencies runs lexer-based scan on didOpen/didChange to detect module imports and include edges - resolve_module_deps scans CDB to find module interface units and registers dependency edges in CompileGraph - run_build calls compile_graph.compile_deps() before compilation to ensure PCH/PCM dependencies are built first - make_compile_params injects cached PCH/PCM paths from CompileGraph - compile_graph.update() cascades invalidation on file changes Made-with: Cursor
…tion - Fix scan() to detect C++20 import directives (cxx_import_decl, cxx_export_import_decl), populating ScanResult.modules correctly - Fix CompileGraph dispatcher to read actual file content instead of scanning empty strings, enabling proper PCM/PCH compilation - Connect indexing pipeline: build TUIndex after compilation, merge into ProjectIndex (global symbols) and per-file MergedIndex - Implement all cross-file LSP features: GoToDefinition (with Declaration fallback), FindReferences, Rename, PrepareRename, WorkspaceSymbol, CallHierarchy (prepare/incoming/outgoing), TypeHierarchy (prepare/subtypes/supertypes) - Integrate WorkerPool with --workers flag for multi-process mode - Add module name cache for O(1) module resolution - Add integration tests for cross-file features and capabilities - Make Relation::definition_range() const-qualified Made-with: Cursor
Made-with: Cursor
📝 WalkthroughWalkthroughIntroduces a complete LSP server implementation for clice, including command-line option parsing, configuration loading, document synchronization, code intelligence features, multi-mode execution (pipe, socket, stateless/stateful workers), compilation graph management, background indexing, and worker pool infrastructure with IPC integration. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as LSP Client
participant Server as LSP Server
participant CG as CompileGraph
participant Workers as WorkerPool
participant Index as ProjectIndex
Client->>Server: didOpen(document)
activate Server
Server->>Server: scan_dependencies()
Server->>CG: register_unit(path_id)
Server->>CG: update(path_id)
activate CG
CG->>Workers: send_stateless(BuildPCH)
activate Workers
Workers-->>CG: PCH result
deactivate Workers
CG->>Workers: send_stateless(BuildPCM)
activate Workers
Workers-->>CG: PCM results
deactivate Workers
deactivate CG
Server->>Workers: send_stateless(Index)
activate Workers
Workers-->>Server: Index result
deactivate Workers
Server->>Index: merge(path_id, symbols)
deactivate Server
sequenceDiagram
participant Cli as CLI
participant Server as Server
participant Sched as IndexScheduler
participant CG as CompileGraph
participant Workers as WorkerPool
Cli->>Server: initialize()
activate Server
Server->>Sched: build_initial_queue(cdb, graph)
Server->>Sched: run()
deactivate Server
activate Sched
loop For each queued file
Sched->>CG: compile_deps(path_id)
activate CG
CG->>Workers: send_stateless(BuildPCH/PCM)
CG-->>Sched: compilation complete
deactivate CG
Sched->>Workers: send_stateless(Index)
Workers-->>Sched: index result
Sched->>Sched: indexed_count++
end
deactivate Sched
sequenceDiagram
participant Client as LSP Client
participant Server as Socket Server
participant Worker as StatefulWorker
participant Comp as Compiler
Client->>Server: TCP connect
activate Server
Server->>Server: spawn handle_connection
Server->>Worker: create StatefulWorker
activate Worker
Worker->>Worker: register_callbacks()
Worker-->>Client: initialize response
deactivate Server
Client->>Worker: didOpen(document)
activate Worker
Worker->>Comp: compile(params)
Comp-->>Worker: diagnostics
Worker->>Worker: touch_lru(uri)
Worker->>Worker: shrink_if_over_limit()
Worker-->>Client: publishDiagnostics
deactivate Worker
Client->>Worker: hover(position)
activate Worker
Worker->>Worker: find_document(uri)
note over Worker: Acquire strand lock
Worker->>Comp: query hover
Comp-->>Worker: hover result
note over Worker: Release strand lock
Worker-->>Client: hover response
deactivate Worker
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 19
🧹 Nitpick comments (12)
docs/agent/implement.md (1)
567-567: Clarify worker log collection task lifetime management.The
collect_worker_logtask is scheduled but there's no shown mechanism to cancel it when the worker is restarted (e.g., inrestart_workerat line 621). Consider:
- Storing the task handle in
WorkerProcess:struct WorkerProcess { et::process proc; et::BincodePeer peer; std::unique_ptr<et::StreamTransport> transport; et::task_handle log_task; // 新增:用于取消 };
- Cancelling the old log task before spawning a new worker:
et::task<> restart_worker(bool is_stateful, std::size_t index) { auto& worker = /* get worker */; worker.log_task.cancel(); // 取消旧的日志收集任务 worker = co_await spawn_worker(options, mode, loop); }This prevents accumulation of orphaned log collection tasks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/agent/implement.md` at line 567, The scheduled collect_worker_log task started via loop.schedule(std::move(spawn.stderr_pipe), worker_id) can become orphaned when a worker is restarted; add a cancellable task handle (e.g., et::task_handle log_task) to the WorkerProcess struct and store the handle returned by loop.schedule so you can call worker.log_task.cancel() (or equivalent) before replacing the WorkerProcess in restart_worker and then reassign the new log_task from the newly spawned worker (symbols: collect_worker_log, loop.schedule, WorkerProcess, log_task, restart_worker, spawn_worker).cmake/package.cmake (1)
47-59: Consider pinning eventide to a specific commit or tag.The eventide dependency uses
GIT_TAG main, which can lead to non-reproducible builds if upstream changes. Other dependencies (spdlog, tomlplusplus, croaring, flatbuffers) are pinned to specific versions.The ET_* flag changes and new RTTI/exceptions flags look appropriate.
Suggested change
FetchContent_Declare( eventide GIT_REPOSITORY https://github.com/clice-io/eventide - GIT_TAG main + GIT_TAG v1.0.0 # Pin to a specific version/commit GIT_SHALLOW TRUE )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmake/package.cmake` around lines 47 - 59, The FetchContent_Declare call for eventide currently uses GIT_TAG main which makes builds non-reproducible; update the FetchContent_Declare entry for the eventide target to pin GIT_TAG to a specific release tag or a commit hash (replace "main" with a stable tag or commit) so the dependency is fixed, and keep the rest of the ET_* option toggles and the FetchContent_MakeAvailable(eventide ...) invocation unchanged.src/server/memory_monitor.cpp (1)
51-62: Placeholder implementation noted.The
runcoroutine currently only logs without actual memory monitoring integration. The comments indicate this is intentional pending WorkerPool PID exposure. Consider adding a TODO comment or issue tracker reference to ensure this gets completed.Would you like me to open an issue to track the completion of memory monitoring integration with WorkerPool?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/memory_monitor.cpp` around lines 51 - 62, MemoryMonitor::run is a placeholder that only logs instead of performing memory checks or integrating with WorkerPool PIDs; update the function by adding a clear TODO comment with an issue/ticket reference (or create one) and a short note on required work (expose WorkerPool PIDs and query per-worker memory metrics), and if helpful add a minimal assertion or metric-emission stub (e.g., emit a "not implemented" metric) so the placeholder is discoverable at runtime; reference MemoryMonitor::run, WorkerPool, and the periodic sleep loop when adding the TODO and linking the issue.src/server/config.cpp (1)
32-35: Hardcoded version strings may become stale.The
versionandllvm_versionvariables are hardcoded. Consider sourcing these from CMake-generated defines or a version header to keep them synchronized with the actual build.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/config.cpp` around lines 32 - 35, The handling of var_name == "version" and "llvm_version" uses hardcoded strings via result.append("0.1.0") and result.append("21"); replace those with build-generated macros or constants (e.g. PROJECT_VERSION or LLVM_VERSION_STRING) produced by CMake/configure_file and exposed via a version header; update the code to append the macro/constant (e.g. result.append(PROJECT_VERSION)) and ensure CMake generates and installs the header or defines the macros (e.g. add configure_file or add_definitions) so the version and llvm_version remain in sync with the build.src/server/progress.h (1)
33-38: Consider a shared token source.Line 38 resets the counter for every
ProgressReporter. If more than one reporter can emit over the same peer, token reuse will make concurrent progress streams collide. A peer- or process-scoped generator would avoid that edge.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/progress.h` around lines 33 - 38, The ProgressReporter class initializes next_token_ to 0 per-instance, causing token reuse across multiple ProgressReporter instances and collisions over the same peer; change the design to use a shared token generator instead of the instance member next_token_. Move token generation to a peer- or process-scoped source (for example attach a token counter to et::ipc::JsonPeer or use a shared std::atomic<std::uint32_t> / singleton generator) and replace uses of ProgressReporter::next_token_ with calls to that shared getNextToken() so tokens are globally unique across peer_ and processes.tests/integration/test_cross_file.py (3)
64-69: Assert capability support in a spec-compatible way.These checks are both too strict and too loose:
definitionProvider/referencesProvidercan legitimately be option objects, while key-presence alone lets afalsecapability pass for call/type/workspace symbol support. Assert that each capability exists and is notFalse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_cross_file.py` around lines 64 - 69, Update the capability assertions in tests/integration/test_cross_file.py to ensure each capability key exists and is not False instead of strict boolean checks or mere key-presence: for result["capabilities"] (caps) assert "definitionProvider" in caps and caps["definitionProvider"] is not False, assert "referencesProvider" in caps and caps["referencesProvider"] is not False, and similarly assert "callHierarchyProvider" in caps and caps["callHierarchyProvider"] is not False, "typeHierarchyProvider" in caps and caps["typeHierarchyProvider"] is not False, and "workspaceSymbolProvider" in caps and caps["workspaceSymbolProvider"] is not False.
7-18: Turn these into behavior checks, not smoke tests.
test_go_to_definitioncurrently allowsNone, andtest_workspace_symbolpasses on an empty list. That won’t catch regressions in the new cross-file features; use a symbol defined in another file and assert the returned target URI/range or symbol name(s).Also applies to: 49-56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_cross_file.py` around lines 7 - 18, test_go_to_definition and the related workspace symbol tests are written as smoke tests that allow None/empty results; change them into behavior checks that assert cross-file resolution succeeds by using a symbol defined in another file. Update test_go_to_definition to open both source files (e.g., main.cpp and the file that defines the target symbol), call client.go_to_definition("main.cpp", 2, 4) and assert the returned Location/target URI and range point to the other file and expected symbol span; similarly, modify the test_workspace_symbol case (lines ~49-56) to call client.workspace_symbol with the cross-file symbol name and assert the returned symbol entries contain the correct name and URI rather than accepting an empty list. Refer to functions client.go_to_definition and client.workspace_symbol to locate and change the assertions.
9-11: Replace the fixed 5s sleeps with a readiness wait.Hard-coded delays make the suite slower and still flaky under CI load. Poll for the expected symbol/token/result, or expose an index-ready/progress signal from the fixture instead of sleeping blindly.
Also applies to: 23-25, 38-40, 51-53
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_cross_file.py` around lines 9 - 11, Replace the fixed await asyncio.sleep(5) calls with a readiness wait that polls for the actual condition the test needs (e.g. the symbol/token/result to appear) or use an index-ready/progress signal exposed by the test fixture; specifically, after calling client.initialize(...) and client.did_open("main.cpp") replace the blind sleeps with a loop or helper like wait_for_condition/await_until that repeatedly queries the language server (e.g. a workspace/symbol or documentSymbol request, diagnostics, or a custom client.is_index_ready()/fixture.index_ready_event) until the expected symbol/token is present or a timeout is reached; apply the same change for the other occurrences of await asyncio.sleep(5) in this test so tests become deterministic and CI-friendly.src/server/path_pool.h (1)
23-30: Avoid storing a second copy of every interned path.
llvm::StringMapalready owns stable key storage, so the extra allocator copy doubles memory in the component meant to deduplicate path strings. If the LLVM version here guaranteesgetKey()stability for entries, you can reuse that key directly and drop the second allocation.♻️ Suggested refactor
if(inserted) { - auto data = allocator.Allocate<char>(path.size() + 1); - std::copy(path.begin(), path.end(), data); - data[path.size()] = '\0'; - auto saved = llvm::StringRef(data, path.size()); - paths.push_back(saved); + paths.push_back(it->getKey()); it->second = static_cast<std::uint32_t>(paths.size() - 1); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/path_pool.h` around lines 23 - 30, The code currently makes a second heap copy of the interned path by allocating via allocator.Allocate and saving that into paths; instead reuse the stable key storage inside the llvm::StringMap entry returned from cache.try_emplace: after insertion, obtain the map-stored key (via the entry's getKey() accessor, e.g. it->first.getKey() or it->first->getKey() depending on your StringMap iterator type), construct an llvm::StringRef from that key and push it into paths, update it->second accordingly, and remove the allocator.Allocate copy and std::copy usage so you no longer duplicate the string storage.src/server/server.cpp (2)
720-772: O(N) module name resolution on every import.In
resolve_module_deps, when a module name is not in the cache, the code iterates through all files in the compilation database and reads/scans each one. For large projects, this could be expensive. The caching mechanism helps after the first lookup, but initial resolution could be slow.Consider building the module name cache lazily on first use or during initialization to avoid repeated O(N) scans during editing sessions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/server.cpp` around lines 720 - 772, resolve_module_deps currently does an O(N) scan of cdb.files() each time a module name is missing from module_name_cache (see resolve_module_deps, module_name_cache, cdb.files(), scan), which is expensive for large projects; fix it by populating module_name_cache once (either at Server startup or lazily on first miss) by iterating cdb.files() a single time to map file_scan.module_name -> path_id (and mark interface units), then change resolve_module_deps to consult that prebuilt map and only fall back to a targeted scan if necessary; ensure compile_graph.register_unit logic remains the same when inserting cached entries.
672-692: Inconsistent indentation in the non-worker build branch.Lines 685-691 appear to have different indentation compared to surrounding code. This looks like a formatting issue that should be corrected.
🔧 Suggested fix for indentation
auto unit = compile(compile_params); auto it3 = documents.find(uri); if(it3 == documents.end()) break; auto& doc3 = it3->second; if(doc3.generation != gen) continue; - auto diags = feature::diagnostics(unit); - doc3.unit = std::make_unique<CompilationUnit>(std::move(unit)); - publish_diagnostics(uri, diags); - - if(doc3.unit) { - update_index(doc3); - } + auto diags = feature::diagnostics(unit); + doc3.unit = std::make_unique<CompilationUnit>(std::move(unit)); + publish_diagnostics(uri, diags); + + if(doc3.unit) { + update_index(doc3); + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/server.cpp` around lines 672 - 692, The block in the non-worker build branch around the compile path (uses make_compile_params, compile, documents.find/it3, doc3.generation, feature::diagnostics, publish_diagnostics, update_index) has inconsistent indentation for the statements between the if(doc3.generation != gen) check and the closing brace; reformat so all statements (auto diags = ..., doc3.unit = ..., publish_diagnostics(...), and the if(doc3.unit) update_index(doc3) block) are indented consistently with the surrounding else-block and aligned with the opening brace of that scope, and ensure the closing braces line up with the corresponding opening braces.src/server/worker_pool.h (1)
37-51: Round-robin counter may overflow after extended use.
stateless.nextis incremented unconditionally and will eventually overflow. While the modulo operation makes this functionally correct due to unsigned wraparound semantics, consider whether this is intentional or if you'd prefer to reset the counter periodically.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/worker_pool.h` around lines 37 - 51, The round-robin counter in send_stateless (stateless.next) is incremented unboundedly and can overflow; change the update so it does not grow without bound—e.g., compute the worker index with stateless.next % stateless.workers.size() but update stateless.next in a bounded way (set stateless.next = (stateless.next + 1) % stateless.workers.size() or reset to stateless.next % stateless.workers.size() when it exceeds a threshold) so the counter is kept small and avoids overflow while preserving round-robin semantics; apply this change in the send_stateless function where stateless.next is read/updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/agent/implement.md`:
- Line 201: The SmallVector declaration in ServerPathPool is missing template
parameters; update the SmallVector usages (e.g., the members named path_ids or
paths inside the ServerPathPool implementation) to use the correct template form
llvm::SmallVector<ElementType, N> (for example llvm::SmallVector<std::uint32_t,
8> for stored IDs or llvm::SmallVector<llvm::StringRef, 8> for stored path
strings) so the definitions compile; ensure any forward declarations or typedefs
referencing SmallVector are similarly corrected to the templated form.
- Around line 426-458: Replace the manual doc.strand_mutex.lock()/unlock() in
StatefulWorker::on_compile with an RAII guard that acquires the mutex and
releases it on scope exit (use std::lock_guard/std::scoped_lock or the eventide
awaitable lock variant that returns a guard) so the mutex is always released
even if co_await et::queue throws; apply the same change to the on_hover example
(replace explicit lock/unlock with the same RAII guard for doc.strand_mutex) and
ensure the guard type is compatible with the async/await pattern used by
et::queue.
In `@docs/agent/note.md`:
- Around line 143-145: The note "Decision: Global queries not wired in this
phase" is confusing relative to later text that says indexing and cross-file
features are integrated; update the wording to clarify this is historical
context for a specific phase (or else reconcile the status). Edit the paragraph
referencing StatelssWorker `on_index` and IndexScheduler in the implement.md
note to explicitly state "at that phase" or "historical — before TUIndex
extraction was wired" (or flip to state the current integrated status if
indexing is done), so readers won't see a contradiction with the later section
that describes indexing/cross-file features as integrated.
- Line 161: Update the docs to correct the Linux procfs path used by
get_process_memory: replace the incorrect "/proc/statm" reference with the
per-process path "/proc/self/statm" (or "/proc/<pid>/statm") and note that
WorkerPool must expose worker PIDs before the periodic monitoring loop can read
those per-process statm files; ensure the description in get_process_memory
mentions the platform-specific use of macOS task_info and Linux /proc/self/statm
for clarity.
In `@src/server/cache_manager.cpp`:
- Around line 98-103: The loop in cache_manager.cpp decrements total_size
unconditionally after calling llvm::sys::fs::remove, which can make the cache
appear smaller even if remove failed; modify the loop to capture the remove
result (std::error_code ec = llvm::sys::fs::remove(entry.path)), only subtract
entry.size and emit LOG_INFO("Evicted...") when ec indicates success, and on
failure emit a LOG_WARN/LOG_ERROR including ec.message() and entry.path and skip
decrementing total_size so the eviction loop continues correctly until disk
usage is under max_size_; keep the break condition using total_size <= max_size_
as-is.
- Around line 24-25: The call to llvm::sys::fs::create_directories(cache_dir_)
currently ignores errors so unwritable paths are treated as ready; update the
CacheManager initialization to check the return status of
create_directories(cache_dir_), and if it fails log the error (include the error
code/message) instead of the successful LOG_INFO("Cache directory: {}"); then
disable caching (e.g. set a cache_enabled_ flag to false) or propagate the
failure/return early so subsequent PCH/PCM writes won't assume the directory
exists. Ensure you reference cache_dir_, create_directories(cache_dir_), and
replace the current LOG_INFO with a LOG_ERROR/LOG_WARN that includes the error
details.
In `@src/server/compile_graph.cpp`:
- Around line 33-61: The remove_unit implementation currently erases the unit
while coroutines (compile_deps()/compile_impl()) may be suspended waiting on the
unit's completion event, causing use-after-free; change remove_unit to first
mark the unit as canceled (e.g., set a removed/canceled flag on the Unit entry
and signal its completion/event to wake waiters) and do NOT erase the map entry
or destroy the event immediately — instead decrement/observe an in-flight
counter or attach a deferred cleanup list and only physically erase the units
map entry and remove from pch_cache/pcm_cache after all in-flight compiles
(tracked by an in_flight counter or waiter count on the Unit) have finished;
reference CompileGraph::remove_unit, Unit::completion (or completion),
compile_deps(), compile_impl(), and the pch_cache/pcm_cache removals when making
this change.
- Line 101: The cycle-detection stack is never armed because top-level calls to
compile_impl are scheduled with stack == nullptr; change the loop.schedule(...)
invocations that currently call loop.schedule(compile_impl(dep_id, loop)) so
they construct and pass a real stack object (e.g., a newly created shared_ptr or
unique_ptr to the stack/vector type used by compile_impl) instead of nullptr,
and ensure every internal call to compile_impl propagates that same stack
pointer so the guard inside compile_impl (the stack check and cycle-reporting
code) can run and break dependency cycles rather than leaving units waiting on
each other's completion event.
In `@src/server/config.cpp`:
- Around line 73-74: The code assigns
project["max_active_file"].value<int64_t>() directly to
config.project.max_active_file via static_cast<std::size_t> which will wrap
negative inputs; update the handling around the call that reads
"max_active_file" so you first check that the optional<int64_t> (from
value<int64_t>()) has a value and that *v >= 0 before casting, and if the value
is negative or missing, either set a safe default or return/log an error;
specifically modify the block referencing project["max_active_file"], the local
variable v (result of value<int64_t>()), and the assignment to
config.project.max_active_file to perform the non-negative validation and error/
default handling.
In `@src/server/index_scheduler.cpp`:
- Around line 102-110: IndexScheduler currently only logs failures after
co_await pool_.send_stateless(params) and never updates the in-memory index, so
successful background indexing doesn't become visible; after confirming
result.has_value() and result->success, persist the returned index data into
project_index_ (e.g., merge or replace via the appropriate ProjectIndex API) so
newly indexed symbols are available to cross-file queries, taking any required
locking/atomic update into account; locate this logic around
pool_.send_stateless and the result/result->success handling and add the
persistence step to project_index_.
- Around line 91-102: The indexing request only sets worker::IndexParams
params.file while pcm_vec, directory and arguments are left out, so
StatelessWorker::on_index cannot replay the compile context; update the code
that builds params (worker::IndexParams) to assign the compiled module list and
command-line bits: copy pcm_vec into params (e.g. params.pcms or equivalent
field), set params.directory from the compile_graph_.get_pch/get_pcms context or
path's directory, and populate params.arguments with the same arguments produced
by compile_deps() so the worker receives the full compile context for replaying
the command line and module artifacts.
In `@src/server/memory_monitor.cpp`:
- Around line 41-44: Replace the hardcoded 4096 multiplier when computing memory
from "pages" by querying the system page size with sysconf(_SC_PAGESIZE); in
memory_monitor.cpp, include <unistd.h> if needed, call sysconf to get page_size
(handle a -1 fallback if syscall fails) and multiply pages by the returned
page_size (cast to std::size_t) before returning; update the return expression
that currently uses "pages * 4096" to use the queried page_size instead.
In `@src/server/options.cpp`:
- Around line 50-89: The CLI currently ignores unknown flags and omits the
--workers switch from help; update the argument parsing loop (the for loop and
extract_value lambda) so that unrecognized arguments produce a clear
error/warning (e.g., LOG_ERROR or LOG_WARN and exit or set a failure flag)
instead of silently falling back to defaults, and add the --workers description
to the help output alongside the other options; specifically modify the help
printing block and the final else branch of the parsing chain to report unknown
flags (referencing extract_value, opts.enable_workers, and Mode handling) so
typos and unsupported flags are surfaced to the user.
- Around line 83-86: The current numeric parsing for --port and
--worker-memory-limit is unsafe; replace the std::atoi/std::stoull usage by
calling std::from_chars on the extracted string (from extract_value) for each
option, verify the parse succeeded (ec == std::errc()) and that the pointer
reached the end of the string to reject trailing garbage, for --port
additionally check the parsed integer is in the 1–65535 range before assigning
to opts.port, and for --worker_memory_limit parse into an unsigned long long but
first reject any leading '-' and ensure full consumption; on any parse or range
failure, handle it gracefully (e.g., log a descriptive error and abort startup
or set a safe default and return an error) rather than letting exceptions or
reinterpretation of negative values occur.
In `@src/server/path_pool.h`:
- Around line 3-9: The header uses std::copy (transitively relied upon) but
doesn't include <algorithm>, so add `#include` <algorithm> to this header (place
it with the other standard includes near the top) to make the header
self-contained and ensure std::copy is declared for any translation unit that
includes path_pool.h.
In `@src/server/server.cpp`:
- Around line 825-829: The Server::path_to_uri function naively prepends
"file://" and fails to percent-encode special characters and platform-specific
separators; update it to produce a valid file URI by normalizing path separators
(convert backslashes to '/'), ensuring the proper leading slash for Windows
drive letters, and percent-encoding reserved characters (spaces, non-ASCII,
percent, ?, #, etc.) per RFC 3986; implement or call a utility encoder (e.g., a
percentEncode function) and use it inside Server::path_to_uri so the returned
string is a correctly escaped file:// URI for all platforms.
In `@src/server/stateless_worker.cpp`:
- Around line 220-225: The code currently hardcodes
worker::IndexResult.result.success = true after calling compile(compile_params)
and does not populate the TU index; change this to determine success from the
compile(...) return (e.g., inspect the returned unit for errors or a status
field) and set result.success accordingly, populate result.TUIndex (or the
appropriate TU container) from the compilation unit when indexing is
implemented, and on compile failure set result.success = false and include
error/diagnostic info so callers can retry; update references around compile,
compile_params, worker::IndexResult, result.success and the TU index population
to reflect these checks and avoid reporting success for empty/failed parses.
In `@src/server/worker_pool.cpp`:
- Around line 52-63: The pool shutdown currently closes worker pipes but
monitor_worker unconditionally calls restart_worker, causing workers to be
respawned; add a shutdown flag (e.g., WorkerPool::shutting_down as an
atomic<bool>) set to true at the start of WorkerPool::stop() before closing
outputs, and update monitor_worker (and any other place that currently calls
restart_worker on exit) to check that flag and skip calling restart_worker when
shutting_down is true so workers drain instead of being restarted; ensure the
flag is used in both stateless and stateful monitoring paths and is thread-safe
(atomic) so no races occur.
- Around line 178-208: The assign_worker implementation can assign index 0 when
stateful.workers is empty; add an explicit check at the top of
WorkerPool::assign_worker for stateful.workers.empty() and handle it (e.g.,
throw a std::runtime_error or return an invalid sentinel like SIZE_MAX) instead
of proceeding to compute best_idx; this prevents inserting
stateful.owner[path_id] with a non-existent worker index and avoids later
out-of-bounds access when dereferencing stateful.workers. Ensure the chosen
behavior is consistent with callers (adjust caller handling if you choose to
throw) and reference stateful.workers, stateful.owner, stateful.owner_lru, and
stateful.owner_lru_index when making the change.
---
Nitpick comments:
In `@cmake/package.cmake`:
- Around line 47-59: The FetchContent_Declare call for eventide currently uses
GIT_TAG main which makes builds non-reproducible; update the
FetchContent_Declare entry for the eventide target to pin GIT_TAG to a specific
release tag or a commit hash (replace "main" with a stable tag or commit) so the
dependency is fixed, and keep the rest of the ET_* option toggles and the
FetchContent_MakeAvailable(eventide ...) invocation unchanged.
In `@docs/agent/implement.md`:
- Line 567: The scheduled collect_worker_log task started via
loop.schedule(std::move(spawn.stderr_pipe), worker_id) can become orphaned when
a worker is restarted; add a cancellable task handle (e.g., et::task_handle
log_task) to the WorkerProcess struct and store the handle returned by
loop.schedule so you can call worker.log_task.cancel() (or equivalent) before
replacing the WorkerProcess in restart_worker and then reassign the new log_task
from the newly spawned worker (symbols: collect_worker_log, loop.schedule,
WorkerProcess, log_task, restart_worker, spawn_worker).
In `@src/server/config.cpp`:
- Around line 32-35: The handling of var_name == "version" and "llvm_version"
uses hardcoded strings via result.append("0.1.0") and result.append("21");
replace those with build-generated macros or constants (e.g. PROJECT_VERSION or
LLVM_VERSION_STRING) produced by CMake/configure_file and exposed via a version
header; update the code to append the macro/constant (e.g.
result.append(PROJECT_VERSION)) and ensure CMake generates and installs the
header or defines the macros (e.g. add configure_file or add_definitions) so the
version and llvm_version remain in sync with the build.
In `@src/server/memory_monitor.cpp`:
- Around line 51-62: MemoryMonitor::run is a placeholder that only logs instead
of performing memory checks or integrating with WorkerPool PIDs; update the
function by adding a clear TODO comment with an issue/ticket reference (or
create one) and a short note on required work (expose WorkerPool PIDs and query
per-worker memory metrics), and if helpful add a minimal assertion or
metric-emission stub (e.g., emit a "not implemented" metric) so the placeholder
is discoverable at runtime; reference MemoryMonitor::run, WorkerPool, and the
periodic sleep loop when adding the TODO and linking the issue.
In `@src/server/path_pool.h`:
- Around line 23-30: The code currently makes a second heap copy of the interned
path by allocating via allocator.Allocate and saving that into paths; instead
reuse the stable key storage inside the llvm::StringMap entry returned from
cache.try_emplace: after insertion, obtain the map-stored key (via the entry's
getKey() accessor, e.g. it->first.getKey() or it->first->getKey() depending on
your StringMap iterator type), construct an llvm::StringRef from that key and
push it into paths, update it->second accordingly, and remove the
allocator.Allocate copy and std::copy usage so you no longer duplicate the
string storage.
In `@src/server/progress.h`:
- Around line 33-38: The ProgressReporter class initializes next_token_ to 0
per-instance, causing token reuse across multiple ProgressReporter instances and
collisions over the same peer; change the design to use a shared token generator
instead of the instance member next_token_. Move token generation to a peer- or
process-scoped source (for example attach a token counter to et::ipc::JsonPeer
or use a shared std::atomic<std::uint32_t> / singleton generator) and replace
uses of ProgressReporter::next_token_ with calls to that shared getNextToken()
so tokens are globally unique across peer_ and processes.
In `@src/server/server.cpp`:
- Around line 720-772: resolve_module_deps currently does an O(N) scan of
cdb.files() each time a module name is missing from module_name_cache (see
resolve_module_deps, module_name_cache, cdb.files(), scan), which is expensive
for large projects; fix it by populating module_name_cache once (either at
Server startup or lazily on first miss) by iterating cdb.files() a single time
to map file_scan.module_name -> path_id (and mark interface units), then change
resolve_module_deps to consult that prebuilt map and only fall back to a
targeted scan if necessary; ensure compile_graph.register_unit logic remains the
same when inserting cached entries.
- Around line 672-692: The block in the non-worker build branch around the
compile path (uses make_compile_params, compile, documents.find/it3,
doc3.generation, feature::diagnostics, publish_diagnostics, update_index) has
inconsistent indentation for the statements between the if(doc3.generation !=
gen) check and the closing brace; reformat so all statements (auto diags = ...,
doc3.unit = ..., publish_diagnostics(...), and the if(doc3.unit)
update_index(doc3) block) are indented consistently with the surrounding
else-block and aligned with the opening brace of that scope, and ensure the
closing braces line up with the corresponding opening braces.
In `@src/server/worker_pool.h`:
- Around line 37-51: The round-robin counter in send_stateless (stateless.next)
is incremented unboundedly and can overflow; change the update so it does not
grow without bound—e.g., compute the worker index with stateless.next %
stateless.workers.size() but update stateless.next in a bounded way (set
stateless.next = (stateless.next + 1) % stateless.workers.size() or reset to
stateless.next % stateless.workers.size() when it exceeds a threshold) so the
counter is kept small and avoids overflow while preserving round-robin
semantics; apply this change in the send_stateless function where stateless.next
is read/updated.
In `@tests/integration/test_cross_file.py`:
- Around line 64-69: Update the capability assertions in
tests/integration/test_cross_file.py to ensure each capability key exists and is
not False instead of strict boolean checks or mere key-presence: for
result["capabilities"] (caps) assert "definitionProvider" in caps and
caps["definitionProvider"] is not False, assert "referencesProvider" in caps and
caps["referencesProvider"] is not False, and similarly assert
"callHierarchyProvider" in caps and caps["callHierarchyProvider"] is not False,
"typeHierarchyProvider" in caps and caps["typeHierarchyProvider"] is not False,
and "workspaceSymbolProvider" in caps and caps["workspaceSymbolProvider"] is not
False.
- Around line 7-18: test_go_to_definition and the related workspace symbol tests
are written as smoke tests that allow None/empty results; change them into
behavior checks that assert cross-file resolution succeeds by using a symbol
defined in another file. Update test_go_to_definition to open both source files
(e.g., main.cpp and the file that defines the target symbol), call
client.go_to_definition("main.cpp", 2, 4) and assert the returned
Location/target URI and range point to the other file and expected symbol span;
similarly, modify the test_workspace_symbol case (lines ~49-56) to call
client.workspace_symbol with the cross-file symbol name and assert the returned
symbol entries contain the correct name and URI rather than accepting an empty
list. Refer to functions client.go_to_definition and client.workspace_symbol to
locate and change the assertions.
- Around line 9-11: Replace the fixed await asyncio.sleep(5) calls with a
readiness wait that polls for the actual condition the test needs (e.g. the
symbol/token/result to appear) or use an index-ready/progress signal exposed by
the test fixture; specifically, after calling client.initialize(...) and
client.did_open("main.cpp") replace the blind sleeps with a loop or helper like
wait_for_condition/await_until that repeatedly queries the language server (e.g.
a workspace/symbol or documentSymbol request, diagnostics, or a custom
client.is_index_ready()/fixture.index_ready_event) until the expected
symbol/token is present or a timeout is reached; apply the same change for the
other occurrences of await asyncio.sleep(5) in this test so tests become
deterministic and CI-friendly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fccc713e-57bf-4109-b6f7-ecc3755d38ce
📒 Files selected for processing (43)
.gitignoreCMakeLists.txtcmake/package.cmakedocs/agent/design.mddocs/agent/implement.mddocs/agent/note.mdsrc/clice.ccsrc/index/tu_index.hsrc/server/cache_manager.cppsrc/server/cache_manager.hsrc/server/compile_graph.cppsrc/server/compile_graph.hsrc/server/config.cppsrc/server/config.hsrc/server/fuzzy_graph.cppsrc/server/fuzzy_graph.hsrc/server/index_scheduler.cppsrc/server/index_scheduler.hsrc/server/memory_monitor.cppsrc/server/memory_monitor.hsrc/server/options.cppsrc/server/options.hsrc/server/path_pool.hsrc/server/progress.cppsrc/server/progress.hsrc/server/server.cppsrc/server/server.hsrc/server/socket_mode.cppsrc/server/socket_mode.hsrc/server/stateful_worker.cppsrc/server/stateful_worker.hsrc/server/stateless_worker.cppsrc/server/stateless_worker.hsrc/server/worker_pool.cppsrc/server/worker_pool.hsrc/server/worker_protocol.hsrc/syntax/scan.cpptests/data/cross_file/helper.cpptests/data/cross_file/helper.htests/data/cross_file/main.cpptests/fixtures/client.pytests/integration/test_cross_file.pytests/integration/test_file_operation.py
💤 Files with no reviewable changes (1)
- tests/integration/test_file_operation.py
|
|
||
| **分步实现**: | ||
|
|
||
| 1. **ServerPathPool**(`src/server/path_pool.h`):实现全局路径池(`llvm::BumpPtrAllocator` + `llvm::StringMap` + `llvm::SmallVector`),所有路径通过 `intern()` 获取 `uint32_t` ID,后续各模块统一使用 path_id |
There was a problem hiding this comment.
Fix the incomplete llvm::SmallVector type.
The llvm::SmallVector requires template parameters. It should be llvm::SmallVector<Type, Size>, for example:
llvm::SmallVector<std::uint32_t, 8> path_ids; // 存储路径IDor for storing path strings:
llvm::SmallVector<llvm::StringRef, 8> paths;This syntax error would prevent compilation if followed literally.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/agent/implement.md` at line 201, The SmallVector declaration in
ServerPathPool is missing template parameters; update the SmallVector usages
(e.g., the members named path_ids or paths inside the ServerPathPool
implementation) to use the correct template form llvm::SmallVector<ElementType,
N> (for example llvm::SmallVector<std::uint32_t, 8> for stored IDs or
llvm::SmallVector<llvm::StringRef, 8> for stored path strings) so the
definitions compile; ensure any forward declarations or typedefs referencing
SmallVector are similarly corrected to the templated form.
| et::RequestResult<worker::CompileParams> | ||
| StatefulWorker::on_compile(auto& ctx, const worker::CompileParams& params) { | ||
| auto& doc = documents[params.uri]; | ||
| doc.version = params.version; | ||
| doc.text = params.text; | ||
| doc.directory = params.directory; | ||
| doc.arguments = params.arguments; | ||
| doc.pch = params.pch; | ||
| doc.pcms = params.pcms; | ||
|
|
||
| touch_lru(params.uri); | ||
|
|
||
| // per-document 串行化 | ||
| co_await doc.strand_mutex.lock(); | ||
|
|
||
| auto result = co_await et::queue([&]() { | ||
| CompilationParams cp; | ||
| cp.kind = CompilationKind::Parse; | ||
| // ... 从 doc 构造编译参数 ... | ||
| cp.add_remapped_file(path, doc.text, doc.pch.second); | ||
| doc.unit = compile(cp); | ||
| doc.dirty = false; | ||
| return worker::CompileResult{ | ||
| .uri = params.uri, | ||
| .version = params.version, | ||
| .diagnostics = feature::diagnostics(doc.unit), | ||
| .memory_usage = current_memory_usage() | ||
| }; | ||
| }, loop); | ||
|
|
||
| doc.strand_mutex.unlock(); | ||
| co_return result; | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use RAII for mutex management to prevent deadlocks.
The manual lock() and unlock() pattern (lines 439, 456) is error-prone. If an exception occurs between lock and unlock (e.g., from co_await et::queue), the mutex will remain locked, causing deadlock.
Consider using RAII with std::lock_guard or eventide's equivalent:
- co_await doc.strand_mutex.lock();
+ auto lock = co_await doc.strand_mutex.scoped_lock();
auto result = co_await et::queue([&]() {
// ... compilation work ...
}, loop);
- doc.strand_mutex.unlock();
co_return result;The same issue exists in the on_hover example at lines 473-484.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/agent/implement.md` around lines 426 - 458, Replace the manual
doc.strand_mutex.lock()/unlock() in StatefulWorker::on_compile with an RAII
guard that acquires the mutex and releases it on scope exit (use
std::lock_guard/std::scoped_lock or the eventide awaitable lock variant that
returns a guard) so the mutex is always released even if co_await et::queue
throws; apply the same change to the on_hover example (replace explicit
lock/unlock with the same RAII guard for doc.strand_mutex) and ensure the guard
type is compatible with the async/await pattern used by et::queue.
| ### Decision: Global queries not wired in this phase | ||
| The implement.md shows FindReferences/GoToDef/Rename using the index, but the actual TUIndex extraction from workers is still a TODO (StatelessWorker `on_index` returns success without data). The IndexScheduler infrastructure is in place; wiring to real TUIndex data will happen when the indexing pipeline is complete. | ||
|
|
There was a problem hiding this comment.
Clarify stale TODO versus completed indexing integration.
Line 143–145 says global index-backed queries are not wired yet, but Line 192–207 later says indexing and cross-file features are integrated. This reads as contradictory in the final doc state. Please mark this section as historical context (e.g., “at that phase”) or update wording to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/agent/note.md` around lines 143 - 145, The note "Decision: Global
queries not wired in this phase" is confusing relative to later text that says
indexing and cross-file features are integrated; update the wording to clarify
this is historical context for a specific phase (or else reconcile the status).
Edit the paragraph referencing StatelssWorker `on_index` and IndexScheduler in
the implement.md note to explicitly state "at that phase" or "historical —
before TUIndex extraction was wired" (or flip to state the current integrated
status if indexing is done), so readers won't see a contradiction with the later
section that describes indexing/cross-file features as integrated.
| The design.md 6.2 describes config file watching using `et::fs_event`. However, eventide does not expose an `fs_event` API. Configuration hot-reload will need to be implemented via polling or deferred until eventide adds file system event support. | ||
|
|
||
| ### Decision: MemoryMonitor is informational placeholder | ||
| Full memory monitoring requires exposing worker PIDs from `WorkerPool`, which is not yet wired. The `get_process_memory()` function uses platform-specific APIs (macOS `task_info`, Linux `/proc/statm`). The periodic monitoring loop is in place and will be connected when the master server integrates all subsystems. |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
On Linux, what is the correct /proc path to read memory usage for the current process: /proc/statm or /proc/self/statm?
💡 Result:
Use /proc/self/statm to read memory usage for the current process; it’s a symlink to that process’s /proc/<pid>/statm entry. (docs.kernel.org)
/proc/statm is not the correct path (the statm file is per-process under /proc/<pid>/, or via /proc/self/). (docs.kernel.org)
Citations:
- 1: https://docs.kernel.org/filesystems/proc.html?utm_source=openai
- 2: https://docs.kernel.org/filesystems/proc.html?utm_source=openai
🏁 Script executed:
head -165 docs/agent/note.md | tail -10Repository: clice-io/clice
Length of output: 880
Fix Linux procfs path for per-process memory lookup.
Line 161 references /proc/statm, which is not a valid Linux path. For process memory, use /proc/self/statm (a symlink to /proc/<pid>/statm). The current wording could mislead implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/agent/note.md` at line 161, Update the docs to correct the Linux procfs
path used by get_process_memory: replace the incorrect "/proc/statm" reference
with the per-process path "/proc/self/statm" (or "/proc/<pid>/statm") and note
that WorkerPool must expose worker PIDs before the periodic monitoring loop can
read those per-process statm files; ensure the description in get_process_memory
mentions the platform-specific use of macOS task_info and Linux /proc/self/statm
for clarity.
| llvm::sys::fs::create_directories(cache_dir_); | ||
| LOG_INFO("Cache directory: {}", cache_dir_); |
There was a problem hiding this comment.
Handle cache directory creation failures.
Line 24 ignores the returned error code, so an unwritable cache path still gets logged as ready and the first real failure shows up much later when PCH/PCM writes start. Please surface the error or disable caching here instead of proceeding silently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/cache_manager.cpp` around lines 24 - 25, The call to
llvm::sys::fs::create_directories(cache_dir_) currently ignores errors so
unwritable paths are treated as ready; update the CacheManager initialization to
check the return status of create_directories(cache_dir_), and if it fails log
the error (include the error code/message) instead of the successful
LOG_INFO("Cache directory: {}"); then disable caching (e.g. set a cache_enabled_
flag to false) or propagate the failure/return early so subsequent PCH/PCM
writes won't assume the directory exists. Ensure you reference cache_dir_,
create_directories(cache_dir_), and replace the current LOG_INFO with a
LOG_ERROR/LOG_WARN that includes the error details.
| #include <cassert> | ||
| #include <cstdint> | ||
|
|
||
| #include "llvm/ADT/SmallVector.h" | ||
| #include "llvm/ADT/StringMap.h" | ||
| #include "llvm/ADT/StringRef.h" | ||
| #include "llvm/Support/Allocator.h" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/server/path_pool.hRepository: clice-io/clice
Length of output: 1870
Include <algorithm> directly in this header.
std::copy is used at line 26, but this header doesn't include <algorithm>. This makes the header non-self-contained and brittle—it will break if unrelated include-order changes remove the transitive <algorithm> dependency.
Suggested fix
+#include <algorithm>
`#include` <cassert>
`#include` <cstdint>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include <cassert> | |
| #include <cstdint> | |
| #include "llvm/ADT/SmallVector.h" | |
| #include "llvm/ADT/StringMap.h" | |
| #include "llvm/ADT/StringRef.h" | |
| #include "llvm/Support/Allocator.h" | |
| `#include` <algorithm> | |
| `#include` <cassert> | |
| `#include` <cstdint> | |
| `#include` "llvm/ADT/SmallVector.h" | |
| `#include` "llvm/ADT/StringMap.h" | |
| `#include` "llvm/ADT/StringRef.h" | |
| `#include` "llvm/Support/Allocator.h" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/path_pool.h` around lines 3 - 9, The header uses std::copy
(transitively relied upon) but doesn't include <algorithm>, so add `#include`
<algorithm> to this header (place it with the other standard includes near the
top) to make the header self-contained and ensure std::copy is declared for any
translation unit that includes path_pool.h.
| std::string Server::path_to_uri(llvm::StringRef path) { | ||
| std::string uri = "file://"; | ||
| uri += path; | ||
| return uri; | ||
| } |
There was a problem hiding this comment.
URI construction doesn't handle special characters.
The path_to_uri function simply prepends file:// to the path. On Windows and other platforms, paths with spaces or special characters should be percent-encoded for valid URIs.
🛠️ Consider using proper URI encoding
std::string Server::path_to_uri(llvm::StringRef path) {
- std::string uri = "file://";
- uri += path;
- return uri;
+ // Consider using a proper URI encoding library or implementing percent-encoding
+ // for special characters like spaces, #, ?, etc.
+ std::string uri = "file://";
+ // For now, basic implementation - may need enhancement for special characters
+ uri += path;
+ return uri;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/server.cpp` around lines 825 - 829, The Server::path_to_uri
function naively prepends "file://" and fails to percent-encode special
characters and platform-specific separators; update it to produce a valid file
URI by normalizing path separators (convert backslashes to '/'), ensuring the
proper leading slash for Windows drive letters, and percent-encoding reserved
characters (spaces, non-ASCII, percent, ?, #, etc.) per RFC 3986; implement or
call a utility encoder (e.g., a percentEncode function) and use it inside
Server::path_to_uri so the returned string is a correctly escaped file:// URI
for all platforms.
| auto unit = compile(compile_params); | ||
|
|
||
| worker::IndexResult result; | ||
| result.success = true; | ||
| // TODO: extract TUIndex data from the compilation unit when indexing is fully implemented | ||
|
|
There was a problem hiding this comment.
Don't report indexing success before an index is actually produced.
compile(compile_params) is ignored here, Line 223 hardcodes success = true, and the TODO leaves the TU index unpopulated. That makes failed or empty parses look like successful indexing runs, which will suppress retries and leave cross-file data stale.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/stateless_worker.cpp` around lines 220 - 225, The code currently
hardcodes worker::IndexResult.result.success = true after calling
compile(compile_params) and does not populate the TU index; change this to
determine success from the compile(...) return (e.g., inspect the returned unit
for errors or a status field) and set result.success accordingly, populate
result.TUIndex (or the appropriate TU container) from the compilation unit when
indexing is implemented, and on compile failure set result.success = false and
include error/diagnostic info so callers can retry; update references around
compile, compile_params, worker::IndexResult, result.success and the TU index
population to reflect these checks and avoid reporting success for empty/failed
parses.
| et::task<> WorkerPool::stop() { | ||
| for(auto& w : stateless.workers) { | ||
| if(w && w->alive) { | ||
| w->peer->close_output(); | ||
| } | ||
| } | ||
| for(auto& w : stateful.workers) { | ||
| if(w && w->alive) { | ||
| w->peer->close_output(); | ||
| } | ||
| } | ||
| co_return; |
There was a problem hiding this comment.
Stopping the pool currently respawns every worker.
stop() closes the worker pipes, but monitor_worker() unconditionally calls restart_worker() on every exit. A normal shutdown therefore turns into a restart loop instead of draining the pool. Add a shutdown flag and skip restarts once shutdown begins.
Also applies to: 119-133
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/worker_pool.cpp` around lines 52 - 63, The pool shutdown currently
closes worker pipes but monitor_worker unconditionally calls restart_worker,
causing workers to be respawned; add a shutdown flag (e.g.,
WorkerPool::shutting_down as an atomic<bool>) set to true at the start of
WorkerPool::stop() before closing outputs, and update monitor_worker (and any
other place that currently calls restart_worker on exit) to check that flag and
skip calling restart_worker when shutting_down is true so workers drain instead
of being restarted; ensure the flag is used in both stateless and stateful
monitoring paths and is thread-safe (atomic) so no races occur.
| std::size_t WorkerPool::assign_worker(std::uint32_t path_id) { | ||
| auto it = stateful.owner.find(path_id); | ||
| if(it != stateful.owner.end()) { | ||
| auto lru_it = stateful.owner_lru_index.find(path_id); | ||
| if(lru_it != stateful.owner_lru_index.end()) { | ||
| stateful.owner_lru.erase(lru_it->second); | ||
| stateful.owner_lru.push_front(path_id); | ||
| lru_it->second = stateful.owner_lru.begin(); | ||
| } | ||
| return it->second; | ||
| } | ||
|
|
||
| std::size_t min_count = SIZE_MAX; | ||
| std::size_t best_idx = 0; | ||
| for(std::size_t i = 0; i < stateful.workers.size(); ++i) { | ||
| std::size_t count = 0; | ||
| for(auto& [pid, wid] : stateful.owner) { | ||
| if(wid == i) | ||
| count++; | ||
| } | ||
| if(count < min_count) { | ||
| min_count = count; | ||
| best_idx = i; | ||
| } | ||
| } | ||
|
|
||
| stateful.owner[path_id] = best_idx; | ||
| stateful.owner_lru.push_front(path_id); | ||
| stateful.owner_lru_index[path_id] = stateful.owner_lru.begin(); | ||
|
|
||
| return best_idx; |
There was a problem hiding this comment.
Handle the zero-stateful-worker case explicitly.
If stateful.workers is empty, best_idx stays 0 and this method records ownership for a worker that does not exist. The first caller that dereferences that index will walk past the end of the worker array.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server/worker_pool.cpp` around lines 178 - 208, The assign_worker
implementation can assign index 0 when stateful.workers is empty; add an
explicit check at the top of WorkerPool::assign_worker for
stateful.workers.empty() and handle it (e.g., throw a std::runtime_error or
return an invalid sentinel like SIZE_MAX) instead of proceeding to compute
best_idx; this prevents inserting stateful.owner[path_id] with a non-existent
worker index and avoids later out-of-bounds access when dereferencing
stateful.workers. Ensure the chosen behavior is consistent with callers (adjust
caller handling if you choose to throw) and reference stateful.workers,
stateful.owner, stateful.owner_lru, and stateful.owner_lru_index when making the
change.
Summary
scan()to detect C++20import/export importdirectives, enabling proper module dependency resolution--workersCLI flag; single-process remains defaultdocs/agent/Test plan
cmake --build build/DebugMade with Cursor
Summary by CodeRabbit
New Features
Tests