Skip to content

Conversation

emilyyang-ms
Copy link

@emilyyang-ms emilyyang-ms commented Oct 7, 2025

Hello,

I'm Emily and I'm interested in contributing to Git. This is my first contribution to Git, super excited!

I'm from Microsoft and spend most of my time working in the Office MonoRepo (OMR, one of the largest repos in the world). Recently I've been working with Derrick Stolee on Git performance related topics. We'd love to propose a small enhancement on the existing changed-paths Bloom filters feature to benefit large repos like OMR. Please kindly review the code and provide your feedback!

Thanks,
Emily

cc: gitster@pobox.com
cc: stolee@gmail.com
cc: me@ttaylorr.com
cc: ps@pks.im
cc: newren@gmail.com

Copy link

gitgitgadget bot commented Oct 7, 2025

Welcome to GitGitGadget

Hi @emilyyang-ms, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

  • Your Pull Request has a good description, if it consists of multiple commits, as it will be used as cover letter.
  • Your Pull Request description is empty, if it consists of a single commit, as the commit message should be descriptive enough by itself.

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <revi.ewer@example.com>, Ill Takalook <ill.takalook@example.net>

NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description,
because it will result in a malformed CC list on the mailing list. See
example.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@gitgitgadget gitgitgadget bot added the new user label Oct 7, 2025
@dscho
Copy link
Member

dscho commented Oct 7, 2025

/allow

Copy link

gitgitgadget bot commented Oct 7, 2025

User emilyyang-ms is now allowed to use GitGitGadget.

WARNING: emilyyang-ms has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use.

Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments in the file diff, and here is a copy of your commit message:

Commitgraph: add new config for changed-paths and recommend it in scalar

Changed-path bloom filters has been a stable feature for a few years and
it significantly improves performance of file history computation for
large repos. Currently it can be turned on by 'git commit-graph write
--changed-paths'. As one of the large repos, Microsoft Office MonoRepo
would like to get this feature on for repo developers with minimal
effort.

In this commit, we're proposing a new config option
'commitGraph.changedPaths', which acts like '--changed-paths' and always
respects command line precedence. We then add this new config as
recommended in scalar to benefit large repos.

This commit also adds corresponding doc and unit tests for the new
config.

Signed-off-by: Emily Yang <emilyyang@microsoft.com>
Mentored-by: Derrick Stolee <dstolee@microsoft.com>

A few things about this:

  • Start your subject line with commit-graph: add new... to better match the commit-graph builtin.
  • Your sign-off should be the very last line, signaling "I attest to everything before this line".
  • Helped-by is more common than Mentored-by, but this is flexible and your call.
  • ...filters has been is a little awkward. Maybe "The changed-path Bloom filters feature has been stable for a few years..."
  • "As one of the large repos, Microsoft Office MonoRepo would like to get this feature on for repo developers with minimal effort." This won't be an effective appeal. Instead, a more generic "Large monorepos using Git's background maintenance to build and update commit-graph files could use an easy switch to enable this feature without a foreground computation."
  • Your config has a true/false/unset logic and it's similar to how the --changed-paths option has a --changed-paths/--no-changed-paths/not-present tri-state. It may be good to discuss how these work together (false config disables a previous true config but doesn't imply --no-changed-paths). You can reference the change 0087a87ba8 (commit-graph: persist existence of changed-paths, 2020-07-01) (exactly that format, as git log -1 --pretty=reference <oid> would output) for how this tri-state situation happened. It's good context for how the filters persist once they exist from a foreground write with --changed-paths.
  • "This commit also adds corresponding doc and unit tests for the new
    config." This is a requirement of such a change, so doesn't need to be included in the message.

After another round of edits, we'll talk about getting your cover letter into shape as it will be part of your introduction to the mailing list and we'll want to CC the right folks.

@derrickstolee
Copy link

@wilbaker may be interested in this PR and is another resource for getting help on Git contributions.

@emilyyang-ms emilyyang-ms force-pushed the changed-paths-config branch 2 times, most recently from 04b6ef3 to a42d730 Compare October 8, 2025 18:23
@emilyyang-ms emilyyang-ms changed the title Commitgraph: add new config for changed-paths and recommend it in scalar commit-graph: add new config for changed-paths & recommend it in scalar Oct 8, 2025
Copy link

gitgitgadget bot commented Oct 8, 2025

There is a merge commit in this Pull Request:

a88f4edeeef4c79f8db791058414032614c36dbd

Please rebase the branch and force-push.

Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code change is ready to go. Just few minor nits about the commit message and cover letter (which I expect we can pair on during our chat today):

  • You'll want to use stolee@gmail.com for my email address as that's the one I use to communicate on the mailing list. (No way you'd know this in advance.)
  • Your @microsoft.com email address will not be able to send messages to the mailing list or effectively read the mailing list. You'll want to set up a different, personal email address. It can be something like emily-git@gmail.com or something specific to interacting with the mailing list if you don't want to use your real address.
  • We'll talk about going through the history of this area to find folks to add as cc: trailers at the end of your cover letter. I've populated a couple (the maintainer and myself) but we'll find more.
  • Since you only have one change, the cover letter is going to appear as a bonus message below your commit message. It's a good opportunity to introduce yourself briefly. After you submit, I'll reply to help introduce you and will mention our pre-review.

The changed-path Bloom filters feature has proven stable and reliable
over several years of use, delivering significant performance
improvement for file history computation in large monorepos. Currently
a user can opt-in to writing the changed-path Bloom filters using the
"--changed-paths" option to "git commit-graph write". The filters will
be persisted until the user drops the filters using the
"--no-changed-paths" option.

Large monorepos using Git's background maintenance to build and update
commit-graph files could use an easy switch to enable this feature
without a foreground computation. In this commit, we're proposing a new
config option "commitGraph.changedPaths" - "true" value acts like
"--changed-paths"; "false" disables a previous "true" config value but
doesn't imply "--no-changed-paths". This config will always respect the
precedence of command line option "--changed-paths" and
"--no-changed-paths".

We also set this new config as optional recommended config in scalar to
turn on this feature for large repos.

Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
@emilyyang-ms
Copy link
Author

/preview

Copy link

gitgitgadget bot commented Oct 9, 2025

Preview email sent as pull.1983.git.1760038995218.gitgitgadget@gmail.com

@emilyyang-ms
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Oct 9, 2025

Submitted as pull.1983.git.1760043710502.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1983/emilyyang-ms/changed-paths-config-v1

To fetch this version to local tag pr-1983/emilyyang-ms/changed-paths-config-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1983/emilyyang-ms/changed-paths-config-v1

Copy link

gitgitgadget bot commented Oct 9, 2025

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Emily Yang via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Emily Yang <emilyyang.git@gmail.com>
>
> The changed-path Bloom filters feature has proven stable and reliable
> over several years of use, delivering significant performance
> improvement for file history computation in large monorepos. Currently
> a user can opt-in to writing the changed-path Bloom filters using the
> "--changed-paths" option to "git commit-graph write". The filters will
> be persisted until the user drops the filters using the
> "--no-changed-paths" option.

Makes sense.

> Large monorepos using Git's background maintenance to build and update
> commit-graph files could use an easy switch to enable this feature
> without a foreground computation.

Again makes sense.

> In this commit, we're proposing a new
> config option "commitGraph.changedPaths" - "true" value acts like
> "--changed-paths"; "false" disables a previous "true" config value but
> doesn't imply "--no-changed-paths".

The way the above is phrased is so unusual that I am afraid it would
confuse readers.

When a configuration variable gives an opportunity for the users to
override the hardcoded default (in this case, --no-changed-paths has
been the traditional default, and graph.changedPaths=true would make
us pretend as if --changed-paths were given from the command line).
So if we were to have this configuration variable, setting it false
MUST make it pretend as if --no-changed-paths were given from the
command line, and MUST continue to do so even in some future we
changed the hardcoded default to be "true" (i.e., unless the user
says graph.changedPath=false in the configuration and/or declines
with "--no-changed-paths" from the command line, we will record the
changed paths filter by default).

Setting commitGraph.changedPaths to true should mean that the
"git commit-graph write" command behaves as if --changed-paths
were given immediately after that "write", so that an end-user
commmand

    $ git commit-graph write

should behave as if it was written like this

    $ git commit-graph write --changed-paths

and

    $ git commit-graph write --no-changed-paths

should behave as if it was written like this

    $ git commit-graph write --changed-paths --no-changed-paths

i.e. allowing the command line --no-changed-paths to override it.

Setting commitGraph.changedPaths to false should similarly mean that
"--no-changed-paths" implicitly is added immediately after "write",
meaning that 

    $ git commit-graph write

should behave as if it was written like this

    $ git commit-graph write --no-changed-paths

As it is the default not to write changed-paths filter, this has no
effect, but I would say it still "implies" --no-changed-paths, and I
hope you'd agree once you imagine a hypothetical future in which the
default for "git commit-graph write" is to write changed-paths
filter by default.

> This config will always respect the
> precedence of command line option "--changed-paths" and
> "--no-changed-paths".

This is a bit unusual way to phrase this, but I think it makes sense
for the configuraiton variable to be overridden by the command line
option, as that is the bog-standard way configuration variables and
command line options interact with each other; it is so standard
that it is probably not even worth saying it.

> We also set this new config as optional recommended config in scalar to
> turn on this feature for large repos.

Great.  Yes, from the start of the description above, anybody who is
aware of the "scalar" effort would be anticipating this conclusion.

> Helped-by: Derrick Stolee <stolee@gmail.com>
> Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
> ---
>     commit-graph: add new config for changed-paths & recommend it in scalar
>     
>     Hello,
>     
>     I'm Emily and I'm interested in contributing to Git. This is my first
>     contribution to Git, super excited!
>     
>     I'm from Microsoft and spend most of my time working in the Office
>     MonoRepo (OMR, one of the largest repos in the world). Recently I've
>     been working with Derrick Stolee on Git performance related topics. We'd
>     love to propose a small enhancement on the existing changed-paths Bloom
>     filters feature to benefit large repos like OMR. Please kindly review
>     the code and provide your feedback!
>     
>     Thanks, Emily
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1983%2Femilyyang-ms%2Fchanged-paths-config-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1983/emilyyang-ms/changed-paths-config-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1983
>
>  Documentation/config/commitgraph.adoc |  8 +++++
>  builtin/commit-graph.c                |  2 ++
>  scalar.c                              |  1 +
>  t/t5318-commit-graph.sh               | 44 +++++++++++++++++++++++++++
>  4 files changed, 55 insertions(+)
>
> diff --git a/Documentation/config/commitgraph.adoc b/Documentation/config/commitgraph.adoc
> index 7f8c9d6638..c540e8a43d 100644
> --- a/Documentation/config/commitgraph.adoc
> +++ b/Documentation/config/commitgraph.adoc
> @@ -8,6 +8,14 @@ commitGraph.maxNewFilters::
>  	Specifies the default value for the `--max-new-filters` option of `git
>  	commit-graph write` (c.f., linkgit:git-commit-graph[1]).
>  
> +commitGraph.changedPaths::
> +	If true, then `git commit-graph write` will compute and write
> +	changed-path Bloom filters by default, equivalent to passing
> +	`--changed-paths`. If false or unset, changed-path Bloom filters
> +	will only be written when explicitly requested via `--changed-paths`.
> +	Command-line options always take precedence over this configuration.
> +	Defaults to unset.
> +
>  commitGraph.readChangedPaths::
>  	Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
>  	commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index fe3ebaadad..d62005edc0 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -210,6 +210,8 @@ static int git_commit_graph_write_config(const char *var, const char *value,
>  {
>  	if (!strcmp(var, "commitgraph.maxnewfilters"))
>  		write_opts.max_new_filters = git_config_int(var, value, ctx->kvi);
> +	else if (!strcmp(var, "commitgraph.changedpaths"))
> +		opts.enable_changed_paths = git_config_bool(var, value) ? 1 : -1;

This is iffy.

Unless the way existing command line parser figures out if the user
wants or does not want to use the feature is so screwed up, you
shouldn't have to do any such thing.

Why do you need to special case 'false' this way?  The usual
practice is

 * First, you initialize the variable "enable_changed_paths" with the
   hardcoded default.  In this case, as changed-paths is not written
   by default, you'd initialize it to 0 (not -1).

 * Then you read from the configuration variables to update it.  If
   you see commitgraph.changedPaths configuration, you take its
   value (either 0 or 1 as it is a Boolean variable) and overwrite
   the hardcoded default in the "enable_changed_paths" variable.  Otherwise
   you leave "enable_changed_paths" as-is.

 * If you also have environment variable override, then you see if
   there is the environment variable you care about, and if so,
   override "enable_changed_paths" with its value.  Otherwise you leave
   "enable_changed_paths" as-is.

 * Finally you read from the command line options using
   parse_options().  If there are command line options given,
   "enable_changed_paths" would be overriden again.

If the way the existing parser sets up enable_changed_paths is
screwed up and does not follow the above pattern (I didn't check),
perhaps you'd need a preliminary clean-up patch before adding this
new feature.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants