Enhancement: idea to allow defering full indexing?

Because full indexing of my projects takes a long time and a lot of CPU, and because I create and delete a number of worktrees quickly where I make smaller, superficial changes that don't need to be fully indexed, I'm looking for ways to reduce or defer indexing until it's needed.  I understand that in this mode queries that rely on global reference information will be missing or incomplete.

I'm aware of `initialBlacklist`, which is very useful in some situations, but was thinking about something more specifically targeted at this problem.

An idea would be for a new configuration option introducing a "distance" parameter.  If the distance value is 0 (the default), then all files in the project are indexed.  If the value is 1, then only the file which is visited by the editor would be indexed (equivalent to blacklisting all files, if I understand that behavior correctly).  If the value is 2, then when a file is visited that file plus files one "distance" removed from that file are indexed.  Etc.

By "distance" I mean a logical proximity to the visited file.  I would define this as "the source files that define the declarations in a header file included by the original source file".  So for example if we visit a file `foo.c` and it includes a file `bar.h` which declares classes or functions and those classes or functions are defined/implemented in a file `bar.c`, then `bar.c` is one "distance" removed from `foo.c`.   If `bar.c` includes `blah.h` and `blah.h` declares classes or functions which are defined in `blah.c`, then `blah.c` is one "distance" removed from `bar.c` and **two** "distances" removed from `foo.c`.

Of course there's no way to be sure, based on the current information available (without indexing!!), which source file defines the classes or functions declared in a given header file.  However, I think it's a reasonable heuristic to use file names.  Almost every project of any significant size (that would need to worry about reduced indexing) follows a convention that a file `xyzzy.c` (or `xyzzy.cpp` or `xyzzy.cc`) defines the classes / functions declared in `xyzzy.h` (or `xyzzy.hpp`).

So then, the algorithm for this parameter would be something like:

1. server receives a request for a new source file X
2. set `sources` to `[X]`
3. for (i = distance; i > 0; --i):
   * set `newsources` to `[]`
   * for each `file` in `sources`:
       * index `file`
       * for each header found while indexing `file`, check to see if it has a corresponding source file (by name) which hasn't been indexed.  if so, add to `newsources`
   * set `sources` to `newsources`

The check for source file could be as simple as: chop the path and extension (after the final `.`) to get _base_ then add all files found in the project named _base_ plus a list of source extensions (`.c`, `.cpp`, `.cc`).  If we index a few too many files it's not a big deal, and most projects won't have the same base name file multiple times anyway as it's confusing for developers (and some build systems!)

Setting "distance" to "2" using this algorithm should be sufficient to allow "jump to definition" capability for any symbol in a file we visit, without requiring the full project to be indexed up-front.  At least, for development environments following the most common organizational models.  It wouldn't allow accurate "list all callers of this method"-style capability of course, unless you used "distance 0" (or otherwise caused everything to be indexed).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancement: idea to allow defering full indexing? #180

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Enhancement: idea to allow defering full indexing? #180

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions