-
Notifications
You must be signed in to change notification settings - Fork 157
commit-graph: add new config for changed-paths & recommend it in scalar #1983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
commit-graph: add new config for changed-paths & recommend it in scalar #1983
Conversation
Welcome to GitGitGadgetHi @emilyyang-ms, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that either:
You can CC potential reviewers by adding a footer to the PR description with the following syntax:
NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description, Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):
To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
/allow |
User emilyyang-ms is now allowed to use GitGitGadget. WARNING: emilyyang-ms has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments in the file diff, and here is a copy of your commit message:
Commitgraph: add new config for changed-paths and recommend it in scalar
Changed-path bloom filters has been a stable feature for a few years and
it significantly improves performance of file history computation for
large repos. Currently it can be turned on by 'git commit-graph write
--changed-paths'. As one of the large repos, Microsoft Office MonoRepo
would like to get this feature on for repo developers with minimal
effort.
In this commit, we're proposing a new config option
'commitGraph.changedPaths', which acts like '--changed-paths' and always
respects command line precedence. We then add this new config as
recommended in scalar to benefit large repos.
This commit also adds corresponding doc and unit tests for the new
config.
Signed-off-by: Emily Yang <emilyyang@microsoft.com>
Mentored-by: Derrick Stolee <dstolee@microsoft.com>
A few things about this:
- Start your subject line with
commit-graph: add new...
to better match the commit-graph builtin. - Your sign-off should be the very last line, signaling "I attest to everything before this line".
- Helped-by is more common than Mentored-by, but this is flexible and your call.
...filters has been
is a little awkward. Maybe "The changed-path Bloom filters feature has been stable for a few years..."- "As one of the large repos, Microsoft Office MonoRepo would like to get this feature on for repo developers with minimal effort." This won't be an effective appeal. Instead, a more generic "Large monorepos using Git's background maintenance to build and update commit-graph files could use an easy switch to enable this feature without a foreground computation."
- Your config has a true/false/unset logic and it's similar to how the
--changed-paths
option has a--changed-paths
/--no-changed-paths
/not-present tri-state. It may be good to discuss how these work together (false
config disables a previoustrue
config but doesn't imply--no-changed-paths
). You can reference the change0087a87ba8 (commit-graph: persist existence of changed-paths, 2020-07-01)
(exactly that format, asgit log -1 --pretty=reference <oid>
would output) for how this tri-state situation happened. It's good context for how the filters persist once they exist from a foreground write with--changed-paths
. - "This commit also adds corresponding doc and unit tests for the new
config." This is a requirement of such a change, so doesn't need to be included in the message.
After another round of edits, we'll talk about getting your cover letter into shape as it will be part of your introduction to the mailing list and we'll want to CC the right folks.
@wilbaker may be interested in this PR and is another resource for getting help on Git contributions. |
04b6ef3
to
a42d730
Compare
There is a merge commit in this Pull Request:
Please rebase the branch and force-push. |
a88f4ed
to
7f486b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code change is ready to go. Just few minor nits about the commit message and cover letter (which I expect we can pair on during our chat today):
- You'll want to use
stolee@gmail.com
for my email address as that's the one I use to communicate on the mailing list. (No way you'd know this in advance.) - Your
@microsoft.com
email address will not be able to send messages to the mailing list or effectively read the mailing list. You'll want to set up a different, personal email address. It can be something likeemily-git@gmail.com
or something specific to interacting with the mailing list if you don't want to use your real address. - We'll talk about going through the history of this area to find folks to add as
cc:
trailers at the end of your cover letter. I've populated a couple (the maintainer and myself) but we'll find more. - Since you only have one change, the cover letter is going to appear as a bonus message below your commit message. It's a good opportunity to introduce yourself briefly. After you submit, I'll reply to help introduce you and will mention our pre-review.
The changed-path Bloom filters feature has proven stable and reliable over several years of use, delivering significant performance improvement for file history computation in large monorepos. Currently a user can opt-in to writing the changed-path Bloom filters using the "--changed-paths" option to "git commit-graph write". The filters will be persisted until the user drops the filters using the "--no-changed-paths" option. Large monorepos using Git's background maintenance to build and update commit-graph files could use an easy switch to enable this feature without a foreground computation. In this commit, we're proposing a new config option "commitGraph.changedPaths" - "true" value acts like "--changed-paths"; "false" disables a previous "true" config value but doesn't imply "--no-changed-paths". This config will always respect the precedence of command line option "--changed-paths" and "--no-changed-paths". We also set this new config as optional recommended config in scalar to turn on this feature for large repos. Helped-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
7f486b7
to
90b271e
Compare
/preview |
Preview email sent as pull.1983.git.1760038995218.gitgitgadget@gmail.com |
/submit |
Submitted as pull.1983.git.1760043710502.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Junio C Hamano wrote (reply to this): "Emily Yang via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Emily Yang <emilyyang.git@gmail.com>
>
> The changed-path Bloom filters feature has proven stable and reliable
> over several years of use, delivering significant performance
> improvement for file history computation in large monorepos. Currently
> a user can opt-in to writing the changed-path Bloom filters using the
> "--changed-paths" option to "git commit-graph write". The filters will
> be persisted until the user drops the filters using the
> "--no-changed-paths" option.
Makes sense.
> Large monorepos using Git's background maintenance to build and update
> commit-graph files could use an easy switch to enable this feature
> without a foreground computation.
Again makes sense.
> In this commit, we're proposing a new
> config option "commitGraph.changedPaths" - "true" value acts like
> "--changed-paths"; "false" disables a previous "true" config value but
> doesn't imply "--no-changed-paths".
The way the above is phrased is so unusual that I am afraid it would
confuse readers.
When a configuration variable gives an opportunity for the users to
override the hardcoded default (in this case, --no-changed-paths has
been the traditional default, and graph.changedPaths=true would make
us pretend as if --changed-paths were given from the command line).
So if we were to have this configuration variable, setting it false
MUST make it pretend as if --no-changed-paths were given from the
command line, and MUST continue to do so even in some future we
changed the hardcoded default to be "true" (i.e., unless the user
says graph.changedPath=false in the configuration and/or declines
with "--no-changed-paths" from the command line, we will record the
changed paths filter by default).
Setting commitGraph.changedPaths to true should mean that the
"git commit-graph write" command behaves as if --changed-paths
were given immediately after that "write", so that an end-user
commmand
$ git commit-graph write
should behave as if it was written like this
$ git commit-graph write --changed-paths
and
$ git commit-graph write --no-changed-paths
should behave as if it was written like this
$ git commit-graph write --changed-paths --no-changed-paths
i.e. allowing the command line --no-changed-paths to override it.
Setting commitGraph.changedPaths to false should similarly mean that
"--no-changed-paths" implicitly is added immediately after "write",
meaning that
$ git commit-graph write
should behave as if it was written like this
$ git commit-graph write --no-changed-paths
As it is the default not to write changed-paths filter, this has no
effect, but I would say it still "implies" --no-changed-paths, and I
hope you'd agree once you imagine a hypothetical future in which the
default for "git commit-graph write" is to write changed-paths
filter by default.
> This config will always respect the
> precedence of command line option "--changed-paths" and
> "--no-changed-paths".
This is a bit unusual way to phrase this, but I think it makes sense
for the configuraiton variable to be overridden by the command line
option, as that is the bog-standard way configuration variables and
command line options interact with each other; it is so standard
that it is probably not even worth saying it.
> We also set this new config as optional recommended config in scalar to
> turn on this feature for large repos.
Great. Yes, from the start of the description above, anybody who is
aware of the "scalar" effort would be anticipating this conclusion.
> Helped-by: Derrick Stolee <stolee@gmail.com>
> Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
> ---
> commit-graph: add new config for changed-paths & recommend it in scalar
>
> Hello,
>
> I'm Emily and I'm interested in contributing to Git. This is my first
> contribution to Git, super excited!
>
> I'm from Microsoft and spend most of my time working in the Office
> MonoRepo (OMR, one of the largest repos in the world). Recently I've
> been working with Derrick Stolee on Git performance related topics. We'd
> love to propose a small enhancement on the existing changed-paths Bloom
> filters feature to benefit large repos like OMR. Please kindly review
> the code and provide your feedback!
>
> Thanks, Emily
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1983
>
> Documentation/config/commitgraph.adoc | 8 +++++
> builtin/commit-graph.c | 2 ++
> scalar.c | 1 +
> t/t5318-commit-graph.sh | 44 +++++++++++++++++++++++++++
> 4 files changed, 55 insertions(+)
>
> diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc
> index 7f8c9d6638..c540e8a43d 100644
> --- a/Documentation/config/commitgraph.adoc
> +++ b/Documentation/config/commitgraph.adoc
> @@ -8,6 +8,14 @@ commitGraph.maxNewFilters::
> Specifies the default value for the `--max-new-filters` option of `git
> commit-graph write` (c.f., linkgit:git-commit-graph[1]).
>
> +commitGraph.changedPaths::
> + If true, then `git commit-graph write` will compute and write
> + changed-path Bloom filters by default, equivalent to passing
> + `--changed-paths`. If false or unset, changed-path Bloom filters
> + will only be written when explicitly requested via `--changed-paths`.
> + Command-line options always take precedence over this configuration.
> + Defaults to unset.
> +
> commitGraph.readChangedPaths::
> Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
> commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index fe3ebaadad..d62005edc0 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value,
> {
> if (!strcmp(var, "commitgraph.maxnewfilters"))
> write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
> + else if (!strcmp(var, "commitgraph.changedpaths"))
> + opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;
This is iffy.
Unless the way existing command line parser figures out if the user
wants or does not want to use the feature is so screwed up, you
shouldn't have to do any such thing.
Why do you need to special case 'false' this way? The usual
practice is
* First, you initialize the variable "enable_changed_paths" with the
hardcoded default. In this case, as changed-paths is not written
by default, you'd initialize it to 0 (not -1).
* Then you read from the configuration variables to update it. If
you see commitgraph.changedPaths configuration, you take its
value (either 0 or 1 as it is a Boolean variable) and overwrite
the hardcoded default in the "enable_changed_paths" variable. Otherwise
you leave "enable_changed_paths" as-is.
* If you also have environment variable override, then you see if
there is the environment variable you care about, and if so,
override "enable_changed_paths" with its value. Otherwise you leave
"enable_changed_paths" as-is.
* Finally you read from the command line options using
parse_options(). If there are command line options given,
"enable_changed_paths" would be overriden again.
If the way the existing parser sets up enable_changed_paths is
screwed up and does not follow the above pattern (I didn't check),
perhaps you'd need a preliminary clean-up patch before adding this
new feature.
Thanks. |
Hello,
I'm Emily and I'm interested in contributing to Git. This is my first contribution to Git, super excited!
I'm from Microsoft and spend most of my time working in the Office MonoRepo (OMR, one of the largest repos in the world). Recently I've been working with Derrick Stolee on Git performance related topics. We'd love to propose a small enhancement on the existing changed-paths Bloom filters feature to benefit large repos like OMR. Please kindly review the code and provide your feedback!
Thanks,
Emily
cc: gitster@pobox.com
cc: stolee@gmail.com
cc: me@ttaylorr.com
cc: ps@pks.im
cc: newren@gmail.com