Skip to content

Enhancement: idea to allow defering full indexing? #180

@madscientist

Description

@madscientist

Because full indexing of my projects takes a long time and a lot of CPU, and because I create and delete a number of worktrees quickly where I make smaller, superficial changes that don't need to be fully indexed, I'm looking for ways to reduce or defer indexing until it's needed. I understand that in this mode queries that rely on global reference information will be missing or incomplete.

I'm aware of initialBlacklist, which is very useful in some situations, but was thinking about something more specifically targeted at this problem.

An idea would be for a new configuration option introducing a "distance" parameter. If the distance value is 0 (the default), then all files in the project are indexed. If the value is 1, then only the file which is visited by the editor would be indexed (equivalent to blacklisting all files, if I understand that behavior correctly). If the value is 2, then when a file is visited that file plus files one "distance" removed from that file are indexed. Etc.

By "distance" I mean a logical proximity to the visited file. I would define this as "the source files that define the declarations in a header file included by the original source file". So for example if we visit a file foo.c and it includes a file bar.h which declares classes or functions and those classes or functions are defined/implemented in a file bar.c, then bar.c is one "distance" removed from foo.c. If bar.c includes blah.h and blah.h declares classes or functions which are defined in blah.c, then blah.c is one "distance" removed from bar.c and two "distances" removed from foo.c.

Of course there's no way to be sure, based on the current information available (without indexing!!), which source file defines the classes or functions declared in a given header file. However, I think it's a reasonable heuristic to use file names. Almost every project of any significant size (that would need to worry about reduced indexing) follows a convention that a file xyzzy.c (or xyzzy.cpp or xyzzy.cc) defines the classes / functions declared in xyzzy.h (or xyzzy.hpp).

So then, the algorithm for this parameter would be something like:

  1. server receives a request for a new source file X
  2. set sources to [X]
  3. for (i = distance; i > 0; --i):
    • set newsources to []
    • for each file in sources:
      • index file
      • for each header found while indexing file, check to see if it has a corresponding source file (by name) which hasn't been indexed. if so, add to newsources
    • set sources to newsources

The check for source file could be as simple as: chop the path and extension (after the final .) to get base then add all files found in the project named base plus a list of source extensions (.c, .cpp, .cc). If we index a few too many files it's not a big deal, and most projects won't have the same base name file multiple times anyway as it's confusing for developers (and some build systems!)

Setting "distance" to "2" using this algorithm should be sufficient to allow "jump to definition" capability for any symbol in a file we visit, without requiring the full project to be indexed up-front. At least, for development environments following the most common organizational models. It wouldn't allow accurate "list all callers of this method"-style capability of course, unless you used "distance 0" (or otherwise caused everything to be indexed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions