[Performance] Do not perform unnecessary operations in makeblastdb #686
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SequenceServer MAKEBLASTDB wrapper was working in two-steps:
unformatted and DBs that may require reformatting and storing
them in instance variables
being run beforehand to populate the instance variables and was
using these values to perform listing, formatting and reformatting
operations.
When SequienceServer.init was invoked (any time the web server starts or the CLI binary is launched) it was calling makeblastdb.scan regardless of whether it will format/reformat the databases. This was rather slow on large database dirs (I saw upwards of a minute on a large dir).
This change refactors MAKEBLASTDB wrapper to only scan for DBs to format or reformat when it is actually going to perform any of these operations.
Now the class does not rely on running #scan beforehand to perform any operations, and invokes the data gathering methods lazilly (i.e. only when gathering data is required), making sure it does not perform any slow operations when they are not necessary.