-
Notifications
You must be signed in to change notification settings - Fork 274
Description
Because full indexing of my projects takes a long time and a lot of CPU, and because I create and delete a number of worktrees quickly where I make smaller, superficial changes that don't need to be fully indexed, I'm looking for ways to reduce or defer indexing until it's needed. I understand that in this mode queries that rely on global reference information will be missing or incomplete.
I'm aware of initialBlacklist, which is very useful in some situations, but was thinking about something more specifically targeted at this problem.
An idea would be for a new configuration option introducing a "distance" parameter. If the distance value is 0 (the default), then all files in the project are indexed. If the value is 1, then only the file which is visited by the editor would be indexed (equivalent to blacklisting all files, if I understand that behavior correctly). If the value is 2, then when a file is visited that file plus files one "distance" removed from that file are indexed. Etc.
By "distance" I mean a logical proximity to the visited file. I would define this as "the source files that define the declarations in a header file included by the original source file". So for example if we visit a file foo.c and it includes a file bar.h which declares classes or functions and those classes or functions are defined/implemented in a file bar.c, then bar.c is one "distance" removed from foo.c. If bar.c includes blah.h and blah.h declares classes or functions which are defined in blah.c, then blah.c is one "distance" removed from bar.c and two "distances" removed from foo.c.
Of course there's no way to be sure, based on the current information available (without indexing!!), which source file defines the classes or functions declared in a given header file. However, I think it's a reasonable heuristic to use file names. Almost every project of any significant size (that would need to worry about reduced indexing) follows a convention that a file xyzzy.c (or xyzzy.cpp or xyzzy.cc) defines the classes / functions declared in xyzzy.h (or xyzzy.hpp).
So then, the algorithm for this parameter would be something like:
- server receives a request for a new source file X
- set
sourcesto[X] - for (i = distance; i > 0; --i):
- set
newsourcesto[] - for each
fileinsources:- index
file - for each header found while indexing
file, check to see if it has a corresponding source file (by name) which hasn't been indexed. if so, add tonewsources
- index
- set
sourcestonewsources
- set
The check for source file could be as simple as: chop the path and extension (after the final .) to get base then add all files found in the project named base plus a list of source extensions (.c, .cpp, .cc). If we index a few too many files it's not a big deal, and most projects won't have the same base name file multiple times anyway as it's confusing for developers (and some build systems!)
Setting "distance" to "2" using this algorithm should be sufficient to allow "jump to definition" capability for any symbol in a file we visit, without requiring the full project to be indexed up-front. At least, for development environments following the most common organizational models. It wouldn't allow accurate "list all callers of this method"-style capability of course, unless you used "distance 0" (or otherwise caused everything to be indexed).