Skip to content

Conversation

ezekielnewren
Copy link

@ezekielnewren ezekielnewren commented Sep 7, 2025

Changes since v5.

  • Address review feedback on commit messages.
  • Drop commit "xdiff: delete rchg aliasing"
  • Use DISCARD/KEEP/INVESTIGATE instead of NONE/SOME/TOO_MANY
  • Fix the word wrapping in the comments of xprepare.c

Cleanup of the functions xdl_cleanup_records() and xdl_clean_mmatch() is out of scope for this patch series. The changes to them are incidental to explaining why 'char rchg' is refactored to 'bool changed'.

Changes since v4.

  • Make it clear that the field xdfile_t.rchg (now 'xdfile_t.changed') is distinct from the local variables dis1, dis2 (now 'matches1', 'matches2').
  • Use NONE, SOME, TOO_MANY instead of NO, YES, MAYBE.
  • Use bool literals for xdfile_t.changed.

Changes since v3.

  • Address review feedback.
  • Split the deletion of xdl_get_rec() into 2 commits.
  • Move NO, YES, MAYBE into xprepare.c, and use bool literals.
  • refactor 'char rchg' to 'bool changed'

Changes since v2.

  • No patch changes, just resending to get patch 9 to show up on the mailing list.
  • A few tweaks to the cover letter.

Changes since v1, to address review feedback.

  • Only include the clean up patches; The remaining patches will be split into a separate series.
  • Commit message clarifications.
  • Minor style cleanups.
  • Performance impacts included in commit message of patch 8.

Relevant part of the original cover letter follows:

Before:

typedef struct s_xrecord {
	struct s_xrecord *next;
	char const *ptr;
	long size;
	unsigned long ha;
} xrecord_t;

typedef struct s_xdfile {
	chastore_t rcha;
	long nrec;
	unsigned int hbits;
	xrecord_t **rhash;
	long dstart, dend;
	xrecord_t **recs;
	char *rchg;
	long *rindex;
	long nreff;
	unsigned long *ha;
} xdfile_t;

After cleanup:

typedef struct s_xrecord {
	char const *ptr;
	long size;
	unsigned long ha;
} xrecord_t;

typedef struct s_xdfile {
	xrecord_t *recs;
	long nrec;
	long dstart, dend;
	bool *changed;
	long *rindex;
	long nreff;
} xdfile_t;

===

cc: Elijah Newren newren@gmail.com
cc: Phillip Wood phillip.wood123@gmail.com
cc: Ben Knoble ben.knoble@gmail.com
cc: Jeff King peff@peff.net
cc: "Kristoffer Haugsbakk" kristofferhaugsbakk@fastmail.com

Copy link

There are issues in commit d322e13:
xdiff: delete local variables and initialize/free xdfile_t directly
Commit not signed off

Copy link

There are issues in commit 2e3f0c1:
xdiff: delete xdl_get_rec() in xemit
Commit not signed off

Copy link

There are issues in commit 97c2453:
xdiff: delete struct diffdata_t
Commit not signed off

Copy link

There are issues in commit e852992:
xdiff: delete redundant array xdfile_t.ha
Commit not signed off

Copy link

There are issues in commit 3e5fc55:
xdiff: delete fields ha, line, size in xdlclass_t in favor of an xrecord_t
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 7ad3e4d:
xdiff: delete chastore from xdfile_t, view with --color-words
Commit not signed off

Copy link

There are issues in commit 85e79ce:
xdiff: treat xdfile_t.rchg like an enum
Commit not signed off

Copy link

There are issues in commit 5286faa:
compat/rust_types.h: define rust primitive types
Commit not signed off

Copy link

There are issues in commit ce227c5:
xdiff: include compat/rust_types.h
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 3e7f722:
xdiff: make xrecord_t.ptr a u8 instead of char
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 6269cd6:
xdiff: make xrecord_t.size a usize instead of long
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 20cbe5d:
xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit e7b31e7:
xdiff: make xdfile_t.nrec a usize instead of long
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 8871ee0:
xdiff: make xdfile_t.nreff a usize instead of long
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit eb77b2a:
xdiff: change the types of dstart, dend, rchg, and rindex in xdfile_t
Commit checks stopped - the message is too short
Commit not signed off

@ezekielnewren ezekielnewren force-pushed the use_rust_types_in_xdiff branch from eb77b2a to 00401e7 Compare September 7, 2025 18:50
@ezekielnewren
Copy link
Author

/submit

Copy link

Submitted as pull.2048.git.git.1757274320.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v1

To fetch this version to local tag pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v1

@@ -0,0 +1,28 @@
#ifndef COMPAT_RUST_TYPES_H

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Ezekiel Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +/*
> + * A typedef for bool is not needed because C bool and Rust bool are
> + * the same if #include <stdbool.h> is used.
> + */
> +
> +typedef uint8_t   u8;
> +typedef uint16_t  u16;
> +typedef uint32_t  u32;
> +typedef uint64_t  u64;
> +
> +typedef int8_t    i8;
> +typedef int16_t   i16;
> +typedef int32_t   i32;
> +typedef int64_t   i64;

The standard guarantees that these are all of the above are exactly
N-bits wide, so I can buy the above types.  But before I can buy the
above typedefs, don't we need to rename existing variables that
squat on these names?

    $ git grep -n -E -e '\<[ui](8|16|32|64)\>'

gives some hits, like

    reftable/record.c:678:	uint8_t i64[8];
    t/helper/test-parse-options.c:123:	uint16_t u16 = 0;
    t/helper/test-parse-options.c:148:		OPT_UNSIGNED(0, "u16", &u16, "get a 16 bit unsigned integer"),

to avoid confusion?  There are handful other hits.

> +typedef float     f32;
> +typedef double    f64;

It may be that they can be used interchangeably in practice on
popular platforms, but are these guaranteed to be equivalent by some
standard?  C only cares about the minimum required range and
precision, so you may have allocated enough bytes thinking you can
fit a f32 but your float may not fit there.

Or does Rust care only about platforms with IEEE 754 and would
refuse to port to other exotic architectures so the above worries
would not apply?

> +typedef size_t    usize;
> +typedef ptrdiff_t isize;

Interesting.  I would have expected, "isize" that is a signed
variant of "usize" to be aliased out of ssize_t (simply because it
is declared that "usize" corresponds to "size_t" on the previous
line), not using ptrdiff_t.

> +typedef uint32_t  rust_char;
> +
> +#endif /* COMPAT_RUST_TYPES_H */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ezekiel Newren wrote (reply to this):

On Mon, Sep 8, 2025 at 9:08 AM Junio C Hamano <gitster@pobox.com> wrote:
> The standard guarantees that these are all of the above are exactly
> N-bits wide, so I can buy the above types.  But before I can buy the
> above typedefs, don't we need to rename existing variables that
> squat on these names?
>
>     $ git grep -n -E -e '\<[ui](8|16|32|64)\>'
>
> gives some hits, like
>
>     reftable/record.c:678:      uint8_t i64[8];
>     t/helper/test-parse-options.c:123:  uint16_t u16 = 0;
>     t/helper/test-parse-options.c:148:          OPT_UNSIGNED(0, "u16", &u16, "get a 16 bit unsigned integer"),
>
> to avoid confusion?  There are handful other hits.

Those places should be cleaned up, but it's not an immediate problem
because compat/rust_types.h is not auto included anywhere. These
typedefs live in git-compat-util.h in my "Introduce Rust" patch series
and I never had a problem with compilation or testing. One reason I
included compat/posix.h instead of git-compat-util.h is because it
includes stdbool.h where git-compat-util.h doesn't. I'll be happy to
do that cleanup.

> > +typedef float     f32;
> > +typedef double    f64;
>
> It may be that they can be used interchangeably in practice on
> popular platforms, but are these guaranteed to be equivalent by some
> standard?  C only cares about the minimum required range and
> precision, so you may have allocated enough bytes thinking you can
> fit a f32 but your float may not fit there.
>
> Or does Rust care only about platforms with IEEE 754 and would
> refuse to port to other exotic architectures so the above worries
> would not apply?

If the typedefs in compat/rust_types.h are incorrect for a
platform/target then compat/posix.h or compat/rust_types.h should be
updated rather than relying on Rust's core::ffi's guess as what it is.

On the Rust side: core::ffi assumes that a C 'float' is the same as
f32 and that a C 'double' is an f64[1,2,3]. So I am making the same
assumption that the Rust maintainers are _and_ I'm keeping ambiguity
on the C side rather than eroding Rust's type precision. If a platform
does not follow these assumptions then the compat/rust_types.h should
warn those using that platform or fail to build entirely.

[1] https://doc.rust-lang.org/1.89.0/core/ffi/type.c_float.html
[2] https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#80-81
[3] https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#34-35

long flags;
} xdlclassifier_t;


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> Move xdl_prepare_env() later in the file to avoid the need
> for static forward declarations.
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
>  xdiff/xprepare.c | 116 ++++++++++++++++++++---------------------------
>  1 file changed, 50 insertions(+), 66 deletions(-)
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index e1d4017b2d..a45c5ee208 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -53,21 +53,6 @@ typedef struct s_xdlclassifier {
>
>
>
> -static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags);
> -static void xdl_free_classifier(xdlclassifier_t *cf);
> -static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t **rhash,
> -                              unsigned int hbits, xrecord_t *rec);
> -static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_t const *xpp,
> -                          xdlclassifier_t *cf, xdfile_t *xdf);
> -static void xdl_free_ctx(xdfile_t *xdf);
> -static int xdl_clean_mmatch(char const *dis, long i, long s, long e);
> -static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2);
> -static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2);
> -static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2);
> -
> -
> -
> -
>  static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags) {
>         cf->flags = flags;
>
> @@ -242,57 +227,6 @@ static void xdl_free_ctx(xdfile_t *xdf) {
>  }
>
>
> -int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> -                   xdfenv_t *xe) {
> -       long enl1, enl2, sample;
> -       xdlclassifier_t cf;
> -
> -       memset(&cf, 0, sizeof(cf));
> -
> -       /*
> -        * For histogram diff, we can afford a smaller sample size and
> -        * thus a poorer estimate of the number of lines, as the hash
> -        * table (rhash) won't be filled up/grown. The number of lines
> -        * (nrecs) will be updated correctly anyway by
> -        * xdl_prepare_ctx().
> -        */
> -       sample = (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF
> -                 ? XDL_GUESS_NLINES2 : XDL_GUESS_NLINES1);
> -
> -       enl1 = xdl_guess_lines(mf1, sample) + 1;
> -       enl2 = xdl_guess_lines(mf2, sample) + 1;
> -
> -       if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0)
> -               return -1;
> -
> -       if (xdl_prepare_ctx(1, mf1, enl1, xpp, &cf, &xe->xdf1) < 0) {
> -
> -               xdl_free_classifier(&cf);
> -               return -1;
> -       }
> -       if (xdl_prepare_ctx(2, mf2, enl2, xpp, &cf, &xe->xdf2) < 0) {
> -
> -               xdl_free_ctx(&xe->xdf1);
> -               xdl_free_classifier(&cf);
> -               return -1;
> -       }
> -
> -       if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) &&
> -           (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF) &&
> -           xdl_optimize_ctxs(&cf, &xe->xdf1, &xe->xdf2) < 0) {
> -
> -               xdl_free_ctx(&xe->xdf2);
> -               xdl_free_ctx(&xe->xdf1);
> -               xdl_free_classifier(&cf);
> -               return -1;
> -       }
> -
> -       xdl_free_classifier(&cf);
> -
> -       return 0;
> -}
> -
> -
>  void xdl_free_env(xdfenv_t *xe) {
>
>         xdl_free_ctx(&xe->xdf2);
> @@ -460,3 +394,53 @@ static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2
>
>         return 0;
>  }
> +
> +int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> +                   xdfenv_t *xe) {
> +       long enl1, enl2, sample;
> +       xdlclassifier_t cf;
> +
> +       memset(&cf, 0, sizeof(cf));
> +
> +       /*
> +        * For histogram diff, we can afford a smaller sample size and
> +        * thus a poorer estimate of the number of lines, as the hash
> +        * table (rhash) won't be filled up/grown. The number of lines
> +        * (nrecs) will be updated correctly anyway by
> +        * xdl_prepare_ctx().
> +        */
> +       sample = (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF
> +                 ? XDL_GUESS_NLINES2 : XDL_GUESS_NLINES1);
> +
> +       enl1 = xdl_guess_lines(mf1, sample) + 1;
> +       enl2 = xdl_guess_lines(mf2, sample) + 1;
> +
> +       if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0)
> +               return -1;
> +
> +       if (xdl_prepare_ctx(1, mf1, enl1, xpp, &cf, &xe->xdf1) < 0) {
> +
> +               xdl_free_classifier(&cf);
> +               return -1;
> +       }
> +       if (xdl_prepare_ctx(2, mf2, enl2, xpp, &cf, &xe->xdf2) < 0) {
> +
> +               xdl_free_ctx(&xe->xdf1);
> +               xdl_free_classifier(&cf);
> +               return -1;
> +       }
> +
> +       if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) &&
> +           (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF) &&
> +           xdl_optimize_ctxs(&cf, &xe->xdf1, &xe->xdf2) < 0) {
> +
> +               xdl_free_ctx(&xe->xdf2);
> +               xdl_free_ctx(&xe->xdf1);
> +               xdl_free_classifier(&cf);
> +               return -1;
> +           }
> +
> +       xdl_free_classifier(&cf);
> +
> +       return 0;
> +}
> --
> gitgitgadget

Viewing this with --color-moved makes it clear that the changes
exactly match what you summarize in the commit message.

Copy link

User Elijah Newren <newren@gmail.com> has been added to the cc: list.

rhash[hi] = rec;

return 0;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:48 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> xdl_prepare_ctx() uses local variables and assigns them to the
> corresponding xdfile_t fields if there are no errors. Delete them and
> use the fields of xdfile_t directly.

In particular, those local variables are essentially a hand-rolled
additional implementation of xdl_free_ctx() inlined into
xdl_prepare_ctx().  You're just modifying the code to use the existing
xdl_free_ctx() function so we don't have two ways to free such
variables (especially since one of those two was an ugly inlining).

> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
>  xdiff/xprepare.c | 79 +++++++++++++++++++-----------------------------
>  1 file changed, 31 insertions(+), 48 deletions(-)
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index a45c5ee208..2ed1785b09 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -134,99 +134,82 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
>  }
>
>
> +static void xdl_free_ctx(xdfile_t *xdf)
> +{
> +

unnecessary blank line here

> +       xdl_free(xdf->rhash);
> +       xdl_free(xdf->rindex);
> +       xdl_free(xdf->rchg - 1);
> +       xdl_free(xdf->ha);
> +       xdl_free(xdf->recs);
> +       xdl_cha_free(&xdf->rcha);
> +}
> +
> +
>  static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_t const *xpp,
>                            xdlclassifier_t *cf, xdfile_t *xdf) {
> -       unsigned int hbits;
> -       long nrec, hsize, bsize;
> +       long bsize;
>         unsigned long hav;
>         char const *blk, *cur, *top, *prev;
>         xrecord_t *crec;
> -       xrecord_t **recs;
> -       xrecord_t **rhash;
> -       unsigned long *ha;
> -       char *rchg;
> -       long *rindex;
>
> -       ha = NULL;
> -       rindex = NULL;
> -       rchg = NULL;
> -       rhash = NULL;
> -       recs = NULL;
> +       xdf->ha = NULL;
> +       xdf->rindex = NULL;
> +       xdf->rchg = NULL;
> +       xdf->rhash = NULL;
> +       xdf->recs = NULL;
>
>         if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
>                 goto abort;
> -       if (!XDL_ALLOC_ARRAY(recs, narec))
> +       if (!XDL_ALLOC_ARRAY(xdf->recs, narec))
>                 goto abort;
>
> -       hbits = xdl_hashbits((unsigned int) narec);
> -       hsize = 1 << hbits;
> -       if (!XDL_CALLOC_ARRAY(rhash, hsize))
> +       xdf->hbits = xdl_hashbits((unsigned int) narec);
> +       if (!XDL_CALLOC_ARRAY(xdf->rhash, 1 << xdf->hbits))
>                 goto abort;
>
> -       nrec = 0;
> +       xdf->nrec = 0;
>         if ((cur = blk = xdl_mmfile_first(mf, &bsize))) {
>                 for (top = blk + bsize; cur < top; ) {
>                         prev = cur;
>                         hav = xdl_hash_record(&cur, top, xpp->flags);
> -                       if (XDL_ALLOC_GROW(recs, nrec + 1, narec))
> +                       if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec))
>                                 goto abort;
>                         if (!(crec = xdl_cha_alloc(&xdf->rcha)))
>                                 goto abort;
>                         crec->ptr = prev;
>                         crec->size = (long) (cur - prev);
>                         crec->ha = hav;
> -                       recs[nrec++] = crec;
> -                       if (xdl_classify_record(pass, cf, rhash, hbits, crec) < 0)
> +                       xdf->recs[xdf->nrec++] = crec;
> +                       if (xdl_classify_record(pass, cf, xdf->rhash, xdf->hbits, crec) < 0)
>                                 goto abort;
>                 }
>         }
>
> -       if (!XDL_CALLOC_ARRAY(rchg, nrec + 2))
> +       if (!XDL_CALLOC_ARRAY(xdf->rchg, xdf->nrec + 2))
>                 goto abort;
>
>         if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) &&
>             (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF)) {
> -               if (!XDL_ALLOC_ARRAY(rindex, nrec + 1))
> +               if (!XDL_ALLOC_ARRAY(xdf->rindex, xdf->nrec + 1))
>                         goto abort;
> -               if (!XDL_ALLOC_ARRAY(ha, nrec + 1))
> +               if (!XDL_ALLOC_ARRAY(xdf->ha, xdf->nrec + 1))
>                         goto abort;
>         }
>
> -       xdf->nrec = nrec;
> -       xdf->recs = recs;
> -       xdf->hbits = hbits;
> -       xdf->rhash = rhash;
> -       xdf->rchg = rchg + 1;
> -       xdf->rindex = rindex;
> +       xdf->rchg += 1;
>         xdf->nreff = 0;
> -       xdf->ha = ha;
>         xdf->dstart = 0;
> -       xdf->dend = nrec - 1;
> +       xdf->dend = xdf->nrec - 1;
>
>         return 0;
>
>  abort:
> -       xdl_free(ha);
> -       xdl_free(rindex);
> -       xdl_free(rchg);
> -       xdl_free(rhash);
> -       xdl_free(recs);
> -       xdl_cha_free(&xdf->rcha);
> +       xdl_free_ctx(xdf);
>         return -1;
>  }
>
>
> -static void xdl_free_ctx(xdfile_t *xdf) {
> -
> -       xdl_free(xdf->rhash);
> -       xdl_free(xdf->rindex);
> -       xdl_free(xdf->rchg - 1);
> -       xdl_free(xdf->ha);
> -       xdl_free(xdf->recs);
> -       xdl_cha_free(&xdf->rcha);
> -}
> -
> -
>  void xdl_free_env(xdfenv_t *xe) {
>
>         xdl_free_ctx(&xe->xdf2);
> --
> gitgitgadget

Looks good.

xdl_free(cf->rcrecs);
xdl_free(cf->rchash);
xdl_cha_free(&cf->ncha);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized,
> but never used for anything by the code. Remove them.
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
>  xdiff/xprepare.c | 15 ++-------------
>  xdiff/xtypes.h   |  3 ---
>  2 files changed, 2 insertions(+), 16 deletions(-)
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 2ed1785b09..91b0ed54e0 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -91,8 +91,7 @@ static void xdl_free_classifier(xdlclassifier_t *cf) {
>  }
>
>
> -static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t **rhash,
> -                              unsigned int hbits, xrecord_t *rec) {
> +static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec) {
>         long hi;
>         char const *line;
>         xdlclass_t *rcrec;
> @@ -126,10 +125,6 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
>
>         rec->ha = (unsigned long) rcrec->idx;
>
> -       hi = (long) XDL_HASHLONG(rec->ha, hbits);
> -       rec->next = rhash[hi];
> -       rhash[hi] = rec;
> -
>         return 0;
>  }
>
> @@ -137,7 +132,6 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
>  static void xdl_free_ctx(xdfile_t *xdf)
>  {
>
> -       xdl_free(xdf->rhash);
>         xdl_free(xdf->rindex);
>         xdl_free(xdf->rchg - 1);
>         xdl_free(xdf->ha);
> @@ -156,7 +150,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>         xdf->ha = NULL;
>         xdf->rindex = NULL;
>         xdf->rchg = NULL;
> -       xdf->rhash = NULL;
>         xdf->recs = NULL;
>
>         if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
> @@ -164,10 +157,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>         if (!XDL_ALLOC_ARRAY(xdf->recs, narec))
>                 goto abort;
>
> -       xdf->hbits = xdl_hashbits((unsigned int) narec);
> -       if (!XDL_CALLOC_ARRAY(xdf->rhash, 1 << xdf->hbits))
> -               goto abort;
> -
>         xdf->nrec = 0;
>         if ((cur = blk = xdl_mmfile_first(mf, &bsize))) {
>                 for (top = blk + bsize; cur < top; ) {
> @@ -181,7 +170,7 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>                         crec->size = (long) (cur - prev);
>                         crec->ha = hav;
>                         xdf->recs[xdf->nrec++] = crec;
> -                       if (xdl_classify_record(pass, cf, xdf->rhash, xdf->hbits, crec) < 0)
> +                       if (xdl_classify_record(pass, cf, crec) < 0)
>                                 goto abort;
>                 }
>         }
> diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
> index 8442bd436e..8b8467360e 100644
> --- a/xdiff/xtypes.h
> +++ b/xdiff/xtypes.h
> @@ -39,7 +39,6 @@ typedef struct s_chastore {
>  } chastore_t;
>
>  typedef struct s_xrecord {
> -       struct s_xrecord *next;
>         char const *ptr;
>         long size;
>         unsigned long ha;
> @@ -48,8 +47,6 @@ typedef struct s_xrecord {
>  typedef struct s_xdfile {
>         chastore_t rcha;
>         long nrec;
> -       unsigned int hbits;
> -       xrecord_t **rhash;
>         long dstart, dend;
>         xrecord_t **recs;
>         char *rchg;
> --
> gitgitgadget

Always nice to see unused fields get removed.

* Davide Libenzi <davidel@xmailserver.org>
*
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> This function aliases the fields of xrecord_t, which makes it harder
> to track the usages of those fields. Delete it.
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
>  xdiff/xemit.c | 38 +++++++++++++-------------------------
>  1 file changed, 13 insertions(+), 25 deletions(-)
>
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index 1d40c9cb40..2161ac3cd0 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -22,23 +22,13 @@
>
>  #include "xinclude.h"
>
> -static long xdl_get_rec(xdfile_t *xdf, long ri, char const **rec) {
>

Can we remove this line too, to simplify the diff?  (i.e. make there
only be one blank line between the include of xinclude.h and
xdl_emit_record?

> -       *rec = xdf->recs[ri]->ptr;
> -
> -       return xdf->recs[ri]->size;
> -}
> -
> -
> -static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb) {
> -       long size, psize = strlen(pre);
> -       char const *rec;
> -
> -       size = xdl_get_rec(xdf, ri, &rec);
> -       if (xdl_emit_diffrec(rec, size, pre, psize, ecb) < 0) {
> +static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb)
> +{

The change of the opening curly brace to be on a new line does match
our general coding guidelines, but cleanups like this should be in a
separate patch.

> +       xrecord_t *rec = xdf->recs[ri];
>
> +       if (xdl_emit_diffrec(rec->ptr, rec->size, pre, strlen(pre), ecb) < 0)
>                 return -1;
> -       }

While in this case you were modifying the line in question and thus
fixing the code to also not use curly braces around a single
statement, which is more justified, it still makes the patch slightly
harder for reviewers to read since you are doing multiple things (what
you said in the commit message, plus cleaning up style "violations").
It'd be better to leave the existing style violations in place, or fix
them in a separate patch.

>
>         return 0;
>  }
> @@ -120,11 +110,11 @@ static long def_ff(const char *rec, long len, char *buf, long sz)
>  static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri,
>                            char *buf, long sz)
>  {
> -       const char *rec;
> -       long len = xdl_get_rec(xdf, ri, &rec);
> +       xrecord_t *rec = xdf->recs[ri];
> +
>         if (!xecfg->find_func)
> -               return def_ff(rec, len, buf, sz);
> -       return xecfg->find_func(rec, len, buf, sz, xecfg->find_func_priv);
> +               return def_ff(rec->ptr, rec->size, buf, sz);
> +       return xecfg->find_func(rec->ptr, rec->size, buf, sz, xecfg->find_func_priv);
>  }
>
>  static int is_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri)
> @@ -160,14 +150,12 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg,
>
>  static int is_empty_rec(xdfile_t *xdf, long ri)
>  {
> -       const char *rec;
> -       long len = xdl_get_rec(xdf, ri, &rec);
> +       xrecord_t *rec = xdf->recs[ri];
> +       long i = 0;
>
> -       while (len > 0 && XDL_ISSPACE(*rec)) {
> -               rec++;
> -               len--;
> -       }
> -       return !len;
> +       for (; i < rec->size && XDL_ISSPACE(rec->ptr[i]); i++);
> +
> +       return i == rec->size;
>  }

I agree that the code is easier to follow without the aliasing.


/*
* Rule: "Divide et Impera" (divide & conquer). Recursively split the box in
* sub-boxes by calling the box splitting function. Note that the real job

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> Every field in this struct is an alias for a certain field in xdfile_t.
>
> diffdata_t.nrec   -> xdfile_t.nreff
> diffdata_t.ha     -> xdfile_t.ha
> diffdata_t.rindex -> xdfile_t.rindex
> diffdata_t.rchg   -> xdfile_t.recharge
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
>  xdiff/xdiffi.c | 32 ++++++++------------------------
>  xdiff/xdiffi.h | 11 ++---------
>  2 files changed, 10 insertions(+), 33 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 5a96e36dfb..bbf0161f84 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -257,10 +257,10 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
>   * sub-boxes by calling the box splitting function. Note that the real job
>   * (marking changed lines) is done in the two boundary reaching checks.
>   */
> -int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,
> -                diffdata_t *dd2, long off2, long lim2,
> +int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
> +                xdfile_t *xdf2, long off2, long lim2,
>                  long *kvdf, long *kvdb, int need_min, xdalgoenv_t *xenv) {
> -       unsigned long const *ha1 = dd1->ha, *ha2 = dd2->ha;
> +       unsigned long const *ha1 = xdf1->ha, *ha2 = xdf2->ha;
>
>         /*
>          * Shrink the box by walking through each diagonal snake (SW and NE).
> @@ -273,17 +273,11 @@ int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,
>          * be obviously changed.
>          */
>         if (off1 == lim1) {
> -               char *rchg2 = dd2->rchg;
> -               long *rindex2 = dd2->rindex;
> -
>                 for (; off2 < lim2; off2++)
> -                       rchg2[rindex2[off2]] = 1;
> +                       xdf2->rchg[xdf2->rindex[off2]] = 1;
>         } else if (off2 == lim2) {
> -               char *rchg1 = dd1->rchg;
> -               long *rindex1 = dd1->rindex;
> -
>                 for (; off1 < lim1; off1++)
> -                       rchg1[rindex1[off1]] = 1;
> +                       xdf1->rchg[xdf1->rindex[off1]] = 1;
>         } else {
>                 xdpsplit_t spl;
>                 spl.i1 = spl.i2 = 0;
> @@ -300,9 +294,9 @@ int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,
>                 /*
>                  * ... et Impera.
>                  */
> -               if (xdl_recs_cmp(dd1, off1, spl.i1, dd2, off2, spl.i2,
> +               if (xdl_recs_cmp(xdf1, off1, spl.i1, xdf2, off2, spl.i2,
>                                  kvdf, kvdb, spl.min_lo, xenv) < 0 ||
> -                   xdl_recs_cmp(dd1, spl.i1, lim1, dd2, spl.i2, lim2,
> +                   xdl_recs_cmp(xdf1, spl.i1, lim1, xdf2, spl.i2, lim2,
>                                  kvdf, kvdb, spl.min_hi, xenv) < 0) {
>
>                         return -1;
> @@ -318,7 +312,6 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>         long ndiags;
>         long *kvd, *kvdf, *kvdb;
>         xdalgoenv_t xenv;
> -       diffdata_t dd1, dd2;
>         int res;
>
>         if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0)
> @@ -357,16 +350,7 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>         xenv.snake_cnt = XDL_SNAKE_CNT;
>         xenv.heur_min = XDL_HEUR_MIN_COST;
>
> -       dd1.nrec = xe->xdf1.nreff;
> -       dd1.ha = xe->xdf1.ha;
> -       dd1.rchg = xe->xdf1.rchg;
> -       dd1.rindex = xe->xdf1.rindex;
> -       dd2.nrec = xe->xdf2.nreff;
> -       dd2.ha = xe->xdf2.ha;
> -       dd2.rchg = xe->xdf2.rchg;
> -       dd2.rindex = xe->xdf2.rindex;
> -
> -       res = xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
> +       res = xdl_recs_cmp(&xe->xdf1, 0, xe->xdf1.nreff, &xe->xdf2, 0, xe->xdf2.nreff,
>                            kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0,
>                            &xenv);
>         xdl_free(kvd);
> diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
> index 126c9d8ff4..49e52c67f9 100644
> --- a/xdiff/xdiffi.h
> +++ b/xdiff/xdiffi.h
> @@ -24,13 +24,6 @@
>  #define XDIFFI_H
>
>
> -typedef struct s_diffdata {
> -       long nrec;
> -       unsigned long const *ha;
> -       long *rindex;
> -       char *rchg;
> -} diffdata_t;
> -
>  typedef struct s_xdalgoenv {
>         long mxcost;
>         long snake_cnt;
> @@ -46,8 +39,8 @@ typedef struct s_xdchange {
>
>
>
> -int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,
> -                diffdata_t *dd2, long off2, long lim2,
> +int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
> +                xdfile_t *xdf2, long off2, long lim2,
>                  long *kvdf, long *kvdb, int need_min, xdalgoenv_t *xenv);
>  int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>                 xdfenv_t *xe);
> --
> gitgitgadget

Viewing this commit with --color-moved helps highlight in the code
what you say in the commit message.  Makes sense.

* Davide Libenzi <davidel@xmailserver.org>
*
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> When 0 <= i < xdfile_t.nreff the following is true:
> xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]]

I like getting rid of redundant stuff.  One thing to note here is that
you're replacing a single indirection with two...

>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
>  xdiff/xdiffi.c   | 24 ++++++++++++++----------
>  xdiff/xprepare.c | 12 ++----------
>  xdiff/xtypes.h   |  1 -
>  3 files changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index bbf0161f84..11cd090b53 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -22,6 +22,11 @@
>
>  #include "xinclude.h"
>
> +static unsigned long get_hash(xdfile_t *xdf, long index)
> +{
> +       return xdf->recs[xdf->rindex[index]]->ha;
> +}
> +
>  #define XDL_MAX_COST_MIN 256
>  #define XDL_HEUR_MIN_COST 256
>  #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
> @@ -42,8 +47,8 @@ typedef struct s_xdpsplit {
>   * using this algorithm, so a little bit of heuristic is needed to cut the
>   * search and to return a suboptimal point.
>   */
> -static long xdl_split(unsigned long const *ha1, long off1, long lim1,
> -                     unsigned long const *ha2, long off2, long lim2,
> +static long xdl_split(xdfile_t *xdf1, long off1, long lim1,
> +                     xdfile_t *xdf2, long off2, long lim2,
>                       long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
>                       xdalgoenv_t *xenv) {
>         long dmin = off1 - lim2, dmax = lim1 - off2;
> @@ -87,7 +92,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
>                                 i1 = kvdf[d + 1];
>                         prev1 = i1;
>                         i2 = i1 - d;
> -                       for (; i1 < lim1 && i2 < lim2 && ha1[i1] == ha2[i2]; i1++, i2++);
> +                       for (; i1 < lim1 && i2 < lim2 && get_hash(xdf1, i1) == get_hash(xdf2, i2); i1++, i2++);

You're not going to be happy with me asking, so sorry in advance, but
I'm really curious...we are now replacing a single indirection with a
double-indirection inside a for loop, which is nested within two other
for loops.  Three levels of for-loops to me suggests it might be a hot
codepath.  Does this double indirection in these codepaths affect
performance?

>                         if (i1 - prev1 > xenv->snake_cnt)
>                                 got_snake = 1;
>                         kvdf[d] = i1;
> @@ -124,7 +129,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
>                                 i1 = kvdb[d + 1] - 1;
>                         prev1 = i1;
>                         i2 = i1 - d;
> -                       for (; i1 > off1 && i2 > off2 && ha1[i1 - 1] == ha2[i2 - 1]; i1--, i2--);
> +                       for (; i1 > off1 && i2 > off2 && get_hash(xdf1, i1 - 1) == get_hash(xdf2, i2 - 1); i1--, i2--);
>                         if (prev1 - i1 > xenv->snake_cnt)
>                                 got_snake = 1;
>                         kvdb[d] = i1;
> @@ -159,7 +164,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
>                                 if (v > XDL_K_HEUR * ec && v > best &&
>                                     off1 + xenv->snake_cnt <= i1 && i1 < lim1 &&
>                                     off2 + xenv->snake_cnt <= i2 && i2 < lim2) {
> -                                       for (k = 1; ha1[i1 - k] == ha2[i2 - k]; k++)
> +                                       for (k = 1; get_hash(xdf1, i1 - k) == get_hash(xdf2, i2 - k); k++)
>                                                 if (k == xenv->snake_cnt) {
>                                                         best = v;
>                                                         spl->i1 = i1;
> @@ -183,7 +188,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
>                                 if (v > XDL_K_HEUR * ec && v > best &&
>                                     off1 < i1 && i1 <= lim1 - xenv->snake_cnt &&
>                                     off2 < i2 && i2 <= lim2 - xenv->snake_cnt) {
> -                                       for (k = 0; ha1[i1 + k] == ha2[i2 + k]; k++)
> +                                       for (k = 0; get_hash(xdf1, i1 + k) == get_hash(xdf2, i2 + k); k++)
>                                                 if (k == xenv->snake_cnt - 1) {
>                                                         best = v;
>                                                         spl->i1 = i1;
> @@ -260,13 +265,12 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
>  int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
>                  xdfile_t *xdf2, long off2, long lim2,
>                  long *kvdf, long *kvdb, int need_min, xdalgoenv_t *xenv) {
> -       unsigned long const *ha1 = xdf1->ha, *ha2 = xdf2->ha;
>
>         /*
>          * Shrink the box by walking through each diagonal snake (SW and NE).
>          */
> -       for (; off1 < lim1 && off2 < lim2 && ha1[off1] == ha2[off2]; off1++, off2++);
> -       for (; off1 < lim1 && off2 < lim2 && ha1[lim1 - 1] == ha2[lim2 - 1]; lim1--, lim2--);
> +       for (; off1 < lim1 && off2 < lim2 && get_hash(xdf1, off1) == get_hash(xdf2, off2); off1++, off2++);
> +       for (; off1 < lim1 && off2 < lim2 && get_hash(xdf1, lim1 - 1) == get_hash(xdf2, lim2 - 1); lim1--, lim2--);
>
>         /*
>          * If one dimension is empty, then all records on the other one must
> @@ -285,7 +289,7 @@ int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
>                 /*
>                  * Divide ...
>                  */
> -               if (xdl_split(ha1, off1, lim1, ha2, off2, lim2, kvdf, kvdb,
> +               if (xdl_split(xdf1, off1, lim1, xdf2, off2, lim2, kvdf, kvdb,
>                               need_min, &spl, xenv) < 0) {
>
>                         return -1;
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 91b0ed54e0..59730989a3 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -134,7 +134,6 @@ static void xdl_free_ctx(xdfile_t *xdf)
>
>         xdl_free(xdf->rindex);
>         xdl_free(xdf->rchg - 1);
> -       xdl_free(xdf->ha);
>         xdl_free(xdf->recs);
>         xdl_cha_free(&xdf->rcha);
>  }
> @@ -147,7 +146,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>         char const *blk, *cur, *top, *prev;
>         xrecord_t *crec;
>
> -       xdf->ha = NULL;
>         xdf->rindex = NULL;
>         xdf->rchg = NULL;
>         xdf->recs = NULL;
> @@ -182,8 +180,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>             (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF)) {
>                 if (!XDL_ALLOC_ARRAY(xdf->rindex, xdf->nrec + 1))
>                         goto abort;
> -               if (!XDL_ALLOC_ARRAY(xdf->ha, xdf->nrec + 1))
> -                       goto abort;
>         }
>
>         xdf->rchg += 1;
> @@ -301,9 +297,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>              i <= xdf1->dend; i++, recs++) {
>                 if (dis1[i] == 1 ||
>                     (dis1[i] == 2 && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) {
> -                       xdf1->rindex[nreff] = i;
> -                       xdf1->ha[nreff] = (*recs)->ha;
> -                       nreff++;
> +                       xdf1->rindex[nreff++] = i;
>                 } else
>                         xdf1->rchg[i] = 1;
>         }
> @@ -313,9 +307,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>              i <= xdf2->dend; i++, recs++) {
>                 if (dis2[i] == 1 ||
>                     (dis2[i] == 2 && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) {
> -                       xdf2->rindex[nreff] = i;
> -                       xdf2->ha[nreff] = (*recs)->ha;
> -                       nreff++;
> +                       xdf2->rindex[nreff++] = i;
>                 } else
>                         xdf2->rchg[i] = 1;
>         }
> diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
> index 8b8467360e..85848f1685 100644
> --- a/xdiff/xtypes.h
> +++ b/xdiff/xtypes.h
> @@ -52,7 +52,6 @@ typedef struct s_xdfile {
>         char *rchg;
>         long *rindex;
>         long nreff;
> -       unsigned long *ha;
>  } xdfile_t;
>
>  typedef struct s_xdfenv {
> --
> gitgitgadget

Other than the performance question, it looks like you've made a
straightforward mechanical change as highlighted in the commit
message.

#define XDL_GUESS_NLINES1 256
#define XDL_GUESS_NLINES2 20


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
>  xdiff/xprepare.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 59730989a3..6f1d4b4725 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -32,9 +32,7 @@
>
>  typedef struct s_xdlclass {
>         struct s_xdlclass *next;
> -       unsigned long ha;
> -       char const *line;
> -       long size;
> +       xrecord_t rec;
>         long idx;
>         long len1, len2;
>  } xdlclass_t;
> @@ -93,14 +91,12 @@ static void xdl_free_classifier(xdlclassifier_t *cf) {
>
>  static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec) {
>         long hi;
> -       char const *line;
>         xdlclass_t *rcrec;
>
> -       line = rec->ptr;
>         hi = (long) XDL_HASHLONG(rec->ha, cf->hbits);
>         for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
> -               if (rcrec->ha == rec->ha &&
> -                               xdl_recmatch(rcrec->line, rcrec->size,
> +               if (rcrec->rec.ha == rec->ha &&
> +                               xdl_recmatch(rcrec->rec.ptr, rcrec->rec.size,
>                                         rec->ptr, rec->size, cf->flags))
>                         break;
>
> @@ -113,9 +109,9 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
>                 if (XDL_ALLOC_GROW(cf->rcrecs, cf->count, cf->alloc))
>                                 return -1;
>                 cf->rcrecs[rcrec->idx] = rcrec;
> -               rcrec->line = line;
> -               rcrec->size = rec->size;
> -               rcrec->ha = rec->ha;
> +               rcrec->rec.ptr = rec->ptr;
> +               rcrec->rec.size = rec->size;
> +               rcrec->rec.ha = rec->ha;
>                 rcrec->len1 = rcrec->len2 = 0;
>                 rcrec->next = cf->rchash[hi];
>                 cf->rchash[hi] = rcrec;
> --
> gitgitgadget

I can see the changes match the one-line summary.  And I think the
point is simplification or reducing redundancy or something...but
could a single sentence motivation (stating which of these purposes is
at play) be added to the commit message?

*/

#include "xinclude.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>

My personal bias is that things like "view with --color-words" makes
more sense to include near the end of the commit message, just before
the sign-offs.  Not sure if others agree on that.

> The chastore_t type is very unfriendly to Rust FFI. It's also redundant
> since 'recs' is a vector type that grows every time an xrecord_t is
> added.

The second sentence seems to presume the reader knows what chastore_t
type is for, and about the confusing dual layering between it and
recs.its confusing dual layering.  I liked your more extended
explanation in https://lore.kernel.org/git/7ea2dccd71fc502f20614ce217fc9885d1b17413.1756496539.git.gitgitgadget@gmail.com/;
could some of that be used here?

>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
>  xdiff/xdiffi.c     | 24 ++++++++++----------
>  xdiff/xemit.c      |  6 ++---
>  xdiff/xhistogram.c |  2 +-
>  xdiff/xmerge.c     | 56 +++++++++++++++++++++++-----------------------
>  xdiff/xpatience.c  | 10 ++++-----
>  xdiff/xprepare.c   | 19 ++++++----------
>  xdiff/xtypes.h     |  3 +--
>  xdiff/xutils.c     | 12 +++++-----
>  8 files changed, 63 insertions(+), 69 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 11cd090b53..a66125d44a 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -24,7 +24,7 @@
>
>  static unsigned long get_hash(xdfile_t *xdf, long index)
>  {
> -       return xdf->recs[xdf->rindex[index]]->ha;
> +       return xdf->recs[xdf->rindex[index]].ha;
>  }
>
>  #define XDL_MAX_COST_MIN 256
> @@ -489,13 +489,13 @@ static void measure_split(const xdfile_t *xdf, long split,
>                 m->indent = -1;
>         } else {
>                 m->end_of_file = 0;
> -               m->indent = get_indent(xdf->recs[split]);
> +               m->indent = get_indent(&xdf->recs[split]);
>         }
>
>         m->pre_blank = 0;
>         m->pre_indent = -1;
>         for (i = split - 1; i >= 0; i--) {
> -               m->pre_indent = get_indent(xdf->recs[i]);
> +               m->pre_indent = get_indent(&xdf->recs[i]);
>                 if (m->pre_indent != -1)
>                         break;
>                 m->pre_blank += 1;
> @@ -508,7 +508,7 @@ static void measure_split(const xdfile_t *xdf, long split,
>         m->post_blank = 0;
>         m->post_indent = -1;
>         for (i = split + 1; i < xdf->nrec; i++) {
> -               m->post_indent = get_indent(xdf->recs[i]);
> +               m->post_indent = get_indent(&xdf->recs[i]);
>                 if (m->post_indent != -1)
>                         break;
>                 m->post_blank += 1;
> @@ -752,7 +752,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g)
>  static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
>  {
>         if (g->end < xdf->nrec &&
> -           recs_match(xdf->recs[g->start], xdf->recs[g->end])) {
> +           recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) {
>                 xdf->rchg[g->start++] = 0;
>                 xdf->rchg[g->end++] = 1;
>
> @@ -773,7 +773,7 @@ static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
>  static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g)
>  {
>         if (g->start > 0 &&
> -           recs_match(xdf->recs[g->start - 1], xdf->recs[g->end - 1])) {
> +           recs_match(&xdf->recs[g->start - 1], &xdf->recs[g->end - 1])) {
>                 xdf->rchg[--g->start] = 1;
>                 xdf->rchg[--g->end] = 0;
>
> @@ -988,16 +988,16 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
>
>         for (xch = xscr; xch; xch = xch->next) {
>                 int ignore = 1;
> -               xrecord_t **rec;
> +               xrecord_t *rec;
>                 long i;
>
>                 rec = &xe->xdf1.recs[xch->i1];
>                 for (i = 0; i < xch->chg1 && ignore; i++)
> -                       ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags);
> +                       ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
>
>                 rec = &xe->xdf2.recs[xch->i2];
>                 for (i = 0; i < xch->chg2 && ignore; i++)
> -                       ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags);
> +                       ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
>
>                 xch->ignore = ignore;
>         }
> @@ -1021,7 +1021,7 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
>         xdchange_t *xch;
>
>         for (xch = xscr; xch; xch = xch->next) {
> -               xrecord_t **rec;
> +               xrecord_t *rec;
>                 int ignore = 1;
>                 long i;
>
> @@ -1033,11 +1033,11 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
>
>                 rec = &xe->xdf1.recs[xch->i1];
>                 for (i = 0; i < xch->chg1 && ignore; i++)
> -                       ignore = record_matches_regex(rec[i], xpp);
> +                       ignore = record_matches_regex(&rec[i], xpp);
>
>                 rec = &xe->xdf2.recs[xch->i2];
>                 for (i = 0; i < xch->chg2 && ignore; i++)
> -                       ignore = record_matches_regex(rec[i], xpp);
> +                       ignore = record_matches_regex(&rec[i], xpp);
>
>                 xch->ignore = ignore;
>         }
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index 2161ac3cd0..b2f1f30cd3 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -25,7 +25,7 @@
>
>  static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb)
>  {
> -       xrecord_t *rec = xdf->recs[ri];
> +       xrecord_t *rec = &xdf->recs[ri];
>
>         if (xdl_emit_diffrec(rec->ptr, rec->size, pre, strlen(pre), ecb) < 0)
>                 return -1;
> @@ -110,7 +110,7 @@ static long def_ff(const char *rec, long len, char *buf, long sz)
>  static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri,
>                            char *buf, long sz)
>  {
> -       xrecord_t *rec = xdf->recs[ri];
> +       xrecord_t *rec = &xdf->recs[ri];
>
>         if (!xecfg->find_func)
>                 return def_ff(rec->ptr, rec->size, buf, sz);
> @@ -150,7 +150,7 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg,
>
>  static int is_empty_rec(xdfile_t *xdf, long ri)
>  {
> -       xrecord_t *rec = xdf->recs[ri];
> +       xrecord_t *rec = &xdf->recs[ri];
>         long i = 0;
>
>         for (; i < rec->size && XDL_ISSPACE(rec->ptr[i]); i++);
> diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
> index 040d81e0bc..4d857e8ae2 100644
> --- a/xdiff/xhistogram.c
> +++ b/xdiff/xhistogram.c
> @@ -86,7 +86,7 @@ struct region {
>         ((LINE_MAP(index, ptr))->cnt)
>
>  #define REC(env, s, l) \
> -       (env->xdf##s.recs[l - 1])
> +       (&env->xdf##s.recs[l - 1])
>
>  static int cmp_recs(xrecord_t *r1, xrecord_t *r2)
>  {
> diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
> index af40c88a5b..fd600cbb5d 100644
> --- a/xdiff/xmerge.c
> +++ b/xdiff/xmerge.c
> @@ -97,12 +97,12 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2,
>                 int line_count, long flags)
>  {
>         int i;
> -       xrecord_t **rec1 = xe1->xdf2.recs + i1;
> -       xrecord_t **rec2 = xe2->xdf2.recs + i2;
> +       xrecord_t *rec1 = xe1->xdf2.recs + i1;
> +       xrecord_t *rec2 = xe2->xdf2.recs + i2;
>
>         for (i = 0; i < line_count; i++) {
> -               int result = xdl_recmatch(rec1[i]->ptr, rec1[i]->size,
> -                       rec2[i]->ptr, rec2[i]->size, flags);
> +               int result = xdl_recmatch(rec1[i].ptr, rec1[i].size,
> +                       rec2[i].ptr, rec2[i].size, flags);
>                 if (!result)
>                         return -1;
>         }
> @@ -111,7 +111,7 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2,
>
>  static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int needs_cr, int add_nl, char *dest)
>  {
> -       xrecord_t **recs;
> +       xrecord_t *recs;
>         int size = 0;
>
>         recs = (use_orig ? xe->xdf1.recs : xe->xdf2.recs) + i;
> @@ -119,12 +119,12 @@ static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int nee
>         if (count < 1)
>                 return 0;
>
> -       for (i = 0; i < count; size += recs[i++]->size)
> +       for (i = 0; i < count; size += recs[i++].size)
>                 if (dest)
> -                       memcpy(dest + size, recs[i]->ptr, recs[i]->size);
> +                       memcpy(dest + size, recs[i].ptr, recs[i].size);
>         if (add_nl) {
> -               i = recs[count - 1]->size;
> -               if (i == 0 || recs[count - 1]->ptr[i - 1] != '\n') {
> +               i = recs[count - 1].size;
> +               if (i == 0 || recs[count - 1].ptr[i - 1] != '\n') {
>                         if (needs_cr) {
>                                 if (dest)
>                                         dest[size] = '\r';
> @@ -160,22 +160,22 @@ static int is_eol_crlf(xdfile_t *file, int i)
>
>         if (i < file->nrec - 1)
>                 /* All lines before the last *must* end in LF */
> -               return (size = file->recs[i]->size) > 1 &&
> -                       file->recs[i]->ptr[size - 2] == '\r';
> +               return (size = file->recs[i].size) > 1 &&
> +                       file->recs[i].ptr[size - 2] == '\r';
>         if (!file->nrec)
>                 /* Cannot determine eol style from empty file */
>                 return -1;
> -       if ((size = file->recs[i]->size) &&
> -                       file->recs[i]->ptr[size - 1] == '\n')
> +       if ((size = file->recs[i].size) &&
> +                       file->recs[i].ptr[size - 1] == '\n')
>                 /* Last line; ends in LF; Is it CR/LF? */
>                 return size > 1 &&
> -                       file->recs[i]->ptr[size - 2] == '\r';
> +                       file->recs[i].ptr[size - 2] == '\r';
>         if (!i)
>                 /* The only line has no eol */
>                 return -1;
>         /* Determine eol from second-to-last line */
> -       return (size = file->recs[i - 1]->size) > 1 &&
> -               file->recs[i - 1]->ptr[size - 2] == '\r';
> +       return (size = file->recs[i - 1].size) > 1 &&
> +               file->recs[i - 1].ptr[size - 2] == '\r';
>  }
>
>  static int is_cr_needed(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m)
> @@ -334,22 +334,22 @@ static int recmatch(xrecord_t *rec1, xrecord_t *rec2, unsigned long flags)
>  static void xdl_refine_zdiff3_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
>                 xpparam_t const *xpp)
>  {
> -       xrecord_t **rec1 = xe1->xdf2.recs, **rec2 = xe2->xdf2.recs;
> +       xrecord_t *rec1 = xe1->xdf2.recs, *rec2 = xe2->xdf2.recs;
>         for (; m; m = m->next) {
>                 /* let's handle just the conflicts */
>                 if (m->mode)
>                         continue;
>
>                 while(m->chg1 && m->chg2 &&
> -                     recmatch(rec1[m->i1], rec2[m->i2], xpp->flags)) {
> +                     recmatch(&rec1[m->i1], &rec2[m->i2], xpp->flags)) {
>                         m->chg1--;
>                         m->chg2--;
>                         m->i1++;
>                         m->i2++;
>                 }
>                 while (m->chg1 && m->chg2 &&
> -                      recmatch(rec1[m->i1 + m->chg1 - 1],
> -                               rec2[m->i2 + m->chg2 - 1], xpp->flags)) {
> +                      recmatch(&rec1[m->i1 + m->chg1 - 1],
> +                               &rec2[m->i2 + m->chg2 - 1], xpp->flags)) {
>                         m->chg1--;
>                         m->chg2--;
>                 }
> @@ -381,12 +381,12 @@ static int xdl_refine_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
>                  * This probably does not work outside git, since
>                  * we have a very simple mmfile structure.
>                  */
> -               t1.ptr = (char *)xe1->xdf2.recs[m->i1]->ptr;
> -               t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1]->ptr
> -                       + xe1->xdf2.recs[m->i1 + m->chg1 - 1]->size - t1.ptr;
> -               t2.ptr = (char *)xe2->xdf2.recs[m->i2]->ptr;
> -               t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1]->ptr
> -                       + xe2->xdf2.recs[m->i2 + m->chg2 - 1]->size - t2.ptr;
> +               t1.ptr = (char *)xe1->xdf2.recs[m->i1].ptr;
> +               t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1].ptr
> +                       + xe1->xdf2.recs[m->i1 + m->chg1 - 1].size - t1.ptr;
> +               t2.ptr = (char *)xe2->xdf2.recs[m->i2].ptr;
> +               t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1].ptr
> +                       + xe2->xdf2.recs[m->i2 + m->chg2 - 1].size - t2.ptr;
>                 if (xdl_do_diff(&t1, &t2, xpp, &xe) < 0)
>                         return -1;
>                 if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
> @@ -440,8 +440,8 @@ static int line_contains_alnum(const char *ptr, long size)
>  static int lines_contain_alnum(xdfenv_t *xe, int i, int chg)
>  {
>         for (; chg; chg--, i++)
> -               if (line_contains_alnum(xe->xdf2.recs[i]->ptr,
> -                               xe->xdf2.recs[i]->size))
> +               if (line_contains_alnum(xe->xdf2.recs[i].ptr,
> +                               xe->xdf2.recs[i].size))
>                         return 1;
>         return 0;
>  }
> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
> index 77dc411d19..bf69a58527 100644
> --- a/xdiff/xpatience.c
> +++ b/xdiff/xpatience.c
> @@ -88,9 +88,9 @@ static int is_anchor(xpparam_t const *xpp, const char *line)
>  static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>                           int pass)
>  {
> -       xrecord_t **records = pass == 1 ?
> +       xrecord_t *records = pass == 1 ?
>                 map->env->xdf1.recs : map->env->xdf2.recs;
> -       xrecord_t *record = records[line - 1];
> +       xrecord_t *record = &records[line - 1];
>         /*
>          * After xdl_prepare_env() (or more precisely, due to
>          * xdl_classify_record()), the "ha" member of the records (AKA lines)
> @@ -121,7 +121,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>                 return;
>         map->entries[index].line1 = line;
>         map->entries[index].hash = record->ha;
> -       map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1]->ptr);
> +       map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1].ptr);
>         if (!map->first)
>                 map->first = map->entries + index;
>         if (map->last) {
> @@ -246,8 +246,8 @@ static int find_longest_common_sequence(struct hashmap *map, struct entry **res)
>
>  static int match(struct hashmap *map, int line1, int line2)
>  {
> -       xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
> -       xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
> +       xrecord_t *record1 = &map->env->xdf1.recs[line1 - 1];
> +       xrecord_t *record2 = &map->env->xdf2.recs[line2 - 1];
>         return record1->ha == record2->ha;
>  }
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 6f1d4b4725..92f9845003 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -131,7 +131,6 @@ static void xdl_free_ctx(xdfile_t *xdf)
>         xdl_free(xdf->rindex);
>         xdl_free(xdf->rchg - 1);
>         xdl_free(xdf->recs);
> -       xdl_cha_free(&xdf->rcha);
>  }
>
>
> @@ -146,8 +145,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>         xdf->rchg = NULL;
>         xdf->recs = NULL;
>
> -       if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
> -               goto abort;
>         if (!XDL_ALLOC_ARRAY(xdf->recs, narec))
>                 goto abort;
>
> @@ -158,12 +155,10 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>                         hav = xdl_hash_record(&cur, top, xpp->flags);
>                         if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec))
>                                 goto abort;
> -                       if (!(crec = xdl_cha_alloc(&xdf->rcha)))
> -                               goto abort;
> +                       crec = &xdf->recs[xdf->nrec++];
>                         crec->ptr = prev;
>                         crec->size = (long) (cur - prev);
>                         crec->ha = hav;
> -                       xdf->recs[xdf->nrec++] = crec;
>                         if (xdl_classify_record(pass, cf, crec) < 0)
>                                 goto abort;
>                 }
> @@ -263,7 +258,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>   */
>  static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) {
>         long i, nm, nreff, mlim;
> -       xrecord_t **recs;
> +       xrecord_t *recs;
>         xdlclass_t *rcrec;
>         char *dis, *dis1, *dis2;
>         int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
> @@ -276,7 +271,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>         if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT)
>                 mlim = XDL_MAX_EQLIMIT;
>         for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
> -               rcrec = cf->rcrecs[(*recs)->ha];
> +               rcrec = cf->rcrecs[recs->ha];
>                 nm = rcrec ? rcrec->len2 : 0;
>                 dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>         }
> @@ -284,7 +279,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>         if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
>                 mlim = XDL_MAX_EQLIMIT;
>         for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
> -               rcrec = cf->rcrecs[(*recs)->ha];
> +               rcrec = cf->rcrecs[recs->ha];
>                 nm = rcrec ? rcrec->len1 : 0;
>                 dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>         }
> @@ -320,13 +315,13 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>   */
>  static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
>         long i, lim;
> -       xrecord_t **recs1, **recs2;
> +       xrecord_t *recs1, *recs2;
>
>         recs1 = xdf1->recs;
>         recs2 = xdf2->recs;
>         for (i = 0, lim = XDL_MIN(xdf1->nrec, xdf2->nrec); i < lim;
>              i++, recs1++, recs2++)
> -               if ((*recs1)->ha != (*recs2)->ha)
> +               if (recs1->ha != recs2->ha)
>                         break;
>
>         xdf1->dstart = xdf2->dstart = i;
> @@ -334,7 +329,7 @@ static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
>         recs1 = xdf1->recs + xdf1->nrec - 1;
>         recs2 = xdf2->recs + xdf2->nrec - 1;
>         for (lim -= i, i = 0; i < lim; i++, recs1--, recs2--)
> -               if ((*recs1)->ha != (*recs2)->ha)
> +               if (recs1->ha != recs2->ha)
>                         break;
>
>         xdf1->dend = xdf1->nrec - i - 1;
> diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
> index 85848f1685..3d26cbf1ec 100644
> --- a/xdiff/xtypes.h
> +++ b/xdiff/xtypes.h
> @@ -45,10 +45,9 @@ typedef struct s_xrecord {
>  } xrecord_t;
>
>  typedef struct s_xdfile {
> -       chastore_t rcha;
> +       xrecord_t *recs;
>         long nrec;
>         long dstart, dend;
> -       xrecord_t **recs;
>         char *rchg;
>         long *rindex;
>         long nreff;
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index 444a108f87..332982b509 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -416,12 +416,12 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
>         mmfile_t subfile1, subfile2;
>         xdfenv_t env;
>
> -       subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr;
> -       subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr +
> -               diff_env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
> -       subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr;
> -       subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr +
> -               diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
> +       subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1].ptr;
> +       subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2].ptr +
> +               diff_env->xdf1.recs[line1 + count1 - 2].size - subfile1.ptr;
> +       subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1].ptr;
> +       subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2].ptr +
> +               diff_env->xdf2.recs[line2 + count2 - 2].size - subfile2.ptr;
>         if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0)
>                 return -1;
>
> --
> gitgitgadget

You weren't kidding with the --color-words callout; there's an awful
lot of places where you only change one or two characters (e.g. '->'
becoming '.'); that's much easier to see when viewing the diff with
that flag.

Anyway, looks good.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Phillip Wood wrote (reply to this):

On 09/09/2025 09:58, Elijah Newren wrote:
> On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Ezekiel Newren <ezekielnewren@gmail.com>
>> The chastore_t type is very unfriendly to Rust FFI. It's also redundant
>> since 'recs' is a vector type that grows every time an xrecord_t is
>> added.
> > The second sentence seems to presume the reader knows what chastore_t
> type is for, and about the confusing dual layering between it and
> recs.its confusing dual layering.  I liked your more extended
> explanation in https://lore.kernel.org/git/7ea2dccd71fc502f20614ce217fc9885d1b17413.1756496539.git.gitgitgadget@gmail.com/;
> could some of that be used here?

I agree that's a better explaination. I also think it would be helpful to spell out the implications of this change. If I understand the change correctly we now store all the records in a contiguous array, rather than having the records in a arena and storing a separate array of pointers to those records. As sizeof(xrecord_t) is pretty small the change to contiguous storage hopefully wont cause any allocation issues, though I guess it does mean we end up copying more data as we grow the array compared to using an arena.

Overall these first few patches look like a really nice cleanup.

Thanks

Phillip

>>
>> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
>> ---
>>   xdiff/xdiffi.c     | 24 ++++++++++----------
>>   xdiff/xemit.c      |  6 ++---
>>   xdiff/xhistogram.c |  2 +-
>>   xdiff/xmerge.c     | 56 +++++++++++++++++++++++-----------------------
>>   xdiff/xpatience.c  | 10 ++++-----
>>   xdiff/xprepare.c   | 19 ++++++----------
>>   xdiff/xtypes.h     |  3 +--
>>   xdiff/xutils.c     | 12 +++++-----
>>   8 files changed, 63 insertions(+), 69 deletions(-)
>>
>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>> index 11cd090b53..a66125d44a 100644
>> --- a/xdiff/xdiffi.c
>> +++ b/xdiff/xdiffi.c
>> @@ -24,7 +24,7 @@
>>
>>   static unsigned long get_hash(xdfile_t *xdf, long index)
>>   {
>> -       return xdf->recs[xdf->rindex[index]]->ha;
>> +       return xdf->recs[xdf->rindex[index]].ha;
>>   }
>>
>>   #define XDL_MAX_COST_MIN 256
>> @@ -489,13 +489,13 @@ static void measure_split(const xdfile_t *xdf, long split,
>>                  m->indent = -1;
>>          } else {
>>                  m->end_of_file = 0;
>> -               m->indent = get_indent(xdf->recs[split]);
>> +               m->indent = get_indent(&xdf->recs[split]);
>>          }
>>
>>          m->pre_blank = 0;
>>          m->pre_indent = -1;
>>          for (i = split - 1; i >= 0; i--) {
>> -               m->pre_indent = get_indent(xdf->recs[i]);
>> +               m->pre_indent = get_indent(&xdf->recs[i]);
>>                  if (m->pre_indent != -1)
>>                          break;
>>                  m->pre_blank += 1;
>> @@ -508,7 +508,7 @@ static void measure_split(const xdfile_t *xdf, long split,
>>          m->post_blank = 0;
>>          m->post_indent = -1;
>>          for (i = split + 1; i < xdf->nrec; i++) {
>> -               m->post_indent = get_indent(xdf->recs[i]);
>> +               m->post_indent = get_indent(&xdf->recs[i]);
>>                  if (m->post_indent != -1)
>>                          break;
>>                  m->post_blank += 1;
>> @@ -752,7 +752,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g)
>>   static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
>>   {
>>          if (g->end < xdf->nrec &&
>> -           recs_match(xdf->recs[g->start], xdf->recs[g->end])) {
>> +           recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) {
>>                  xdf->rchg[g->start++] = 0;
>>                  xdf->rchg[g->end++] = 1;
>>
>> @@ -773,7 +773,7 @@ static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
>>   static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g)
>>   {
>>          if (g->start > 0 &&
>> -           recs_match(xdf->recs[g->start - 1], xdf->recs[g->end - 1])) {
>> +           recs_match(&xdf->recs[g->start - 1], &xdf->recs[g->end - 1])) {
>>                  xdf->rchg[--g->start] = 1;
>>                  xdf->rchg[--g->end] = 0;
>>
>> @@ -988,16 +988,16 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
>>
>>          for (xch = xscr; xch; xch = xch->next) {
>>                  int ignore = 1;
>> -               xrecord_t **rec;
>> +               xrecord_t *rec;
>>                  long i;
>>
>>                  rec = &xe->xdf1.recs[xch->i1];
>>                  for (i = 0; i < xch->chg1 && ignore; i++)
>> -                       ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags);
>> +                       ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
>>
>>                  rec = &xe->xdf2.recs[xch->i2];
>>                  for (i = 0; i < xch->chg2 && ignore; i++)
>> -                       ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags);
>> +                       ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
>>
>>                  xch->ignore = ignore;
>>          }
>> @@ -1021,7 +1021,7 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
>>          xdchange_t *xch;
>>
>>          for (xch = xscr; xch; xch = xch->next) {
>> -               xrecord_t **rec;
>> +               xrecord_t *rec;
>>                  int ignore = 1;
>>                  long i;
>>
>> @@ -1033,11 +1033,11 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
>>
>>                  rec = &xe->xdf1.recs[xch->i1];
>>                  for (i = 0; i < xch->chg1 && ignore; i++)
>> -                       ignore = record_matches_regex(rec[i], xpp);
>> +                       ignore = record_matches_regex(&rec[i], xpp);
>>
>>                  rec = &xe->xdf2.recs[xch->i2];
>>                  for (i = 0; i < xch->chg2 && ignore; i++)
>> -                       ignore = record_matches_regex(rec[i], xpp);
>> +                       ignore = record_matches_regex(&rec[i], xpp);
>>
>>                  xch->ignore = ignore;
>>          }
>> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
>> index 2161ac3cd0..b2f1f30cd3 100644
>> --- a/xdiff/xemit.c
>> +++ b/xdiff/xemit.c
>> @@ -25,7 +25,7 @@
>>
>>   static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb)
>>   {
>> -       xrecord_t *rec = xdf->recs[ri];
>> +       xrecord_t *rec = &xdf->recs[ri];
>>
>>          if (xdl_emit_diffrec(rec->ptr, rec->size, pre, strlen(pre), ecb) < 0)
>>                  return -1;
>> @@ -110,7 +110,7 @@ static long def_ff(const char *rec, long len, char *buf, long sz)
>>   static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri,
>>                             char *buf, long sz)
>>   {
>> -       xrecord_t *rec = xdf->recs[ri];
>> +       xrecord_t *rec = &xdf->recs[ri];
>>
>>          if (!xecfg->find_func)
>>                  return def_ff(rec->ptr, rec->size, buf, sz);
>> @@ -150,7 +150,7 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg,
>>
>>   static int is_empty_rec(xdfile_t *xdf, long ri)
>>   {
>> -       xrecord_t *rec = xdf->recs[ri];
>> +       xrecord_t *rec = &xdf->recs[ri];
>>          long i = 0;
>>
>>          for (; i < rec->size && XDL_ISSPACE(rec->ptr[i]); i++);
>> diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
>> index 040d81e0bc..4d857e8ae2 100644
>> --- a/xdiff/xhistogram.c
>> +++ b/xdiff/xhistogram.c
>> @@ -86,7 +86,7 @@ struct region {
>>          ((LINE_MAP(index, ptr))->cnt)
>>
>>   #define REC(env, s, l) \
>> -       (env->xdf##s.recs[l - 1])
>> +       (&env->xdf##s.recs[l - 1])
>>
>>   static int cmp_recs(xrecord_t *r1, xrecord_t *r2)
>>   {
>> diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
>> index af40c88a5b..fd600cbb5d 100644
>> --- a/xdiff/xmerge.c
>> +++ b/xdiff/xmerge.c
>> @@ -97,12 +97,12 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2,
>>                  int line_count, long flags)
>>   {
>>          int i;
>> -       xrecord_t **rec1 = xe1->xdf2.recs + i1;
>> -       xrecord_t **rec2 = xe2->xdf2.recs + i2;
>> +       xrecord_t *rec1 = xe1->xdf2.recs + i1;
>> +       xrecord_t *rec2 = xe2->xdf2.recs + i2;
>>
>>          for (i = 0; i < line_count; i++) {
>> -               int result = xdl_recmatch(rec1[i]->ptr, rec1[i]->size,
>> -                       rec2[i]->ptr, rec2[i]->size, flags);
>> +               int result = xdl_recmatch(rec1[i].ptr, rec1[i].size,
>> +                       rec2[i].ptr, rec2[i].size, flags);
>>                  if (!result)
>>                          return -1;
>>          }
>> @@ -111,7 +111,7 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2,
>>
>>   static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int needs_cr, int add_nl, char *dest)
>>   {
>> -       xrecord_t **recs;
>> +       xrecord_t *recs;
>>          int size = 0;
>>
>>          recs = (use_orig ? xe->xdf1.recs : xe->xdf2.recs) + i;
>> @@ -119,12 +119,12 @@ static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int nee
>>          if (count < 1)
>>                  return 0;
>>
>> -       for (i = 0; i < count; size += recs[i++]->size)
>> +       for (i = 0; i < count; size += recs[i++].size)
>>                  if (dest)
>> -                       memcpy(dest + size, recs[i]->ptr, recs[i]->size);
>> +                       memcpy(dest + size, recs[i].ptr, recs[i].size);
>>          if (add_nl) {
>> -               i = recs[count - 1]->size;
>> -               if (i == 0 || recs[count - 1]->ptr[i - 1] != '\n') {
>> +               i = recs[count - 1].size;
>> +               if (i == 0 || recs[count - 1].ptr[i - 1] != '\n') {
>>                          if (needs_cr) {
>>                                  if (dest)
>>                                          dest[size] = '\r';
>> @@ -160,22 +160,22 @@ static int is_eol_crlf(xdfile_t *file, int i)
>>
>>          if (i < file->nrec - 1)
>>                  /* All lines before the last *must* end in LF */
>> -               return (size = file->recs[i]->size) > 1 &&
>> -                       file->recs[i]->ptr[size - 2] == '\r';
>> +               return (size = file->recs[i].size) > 1 &&
>> +                       file->recs[i].ptr[size - 2] == '\r';
>>          if (!file->nrec)
>>                  /* Cannot determine eol style from empty file */
>>                  return -1;
>> -       if ((size = file->recs[i]->size) &&
>> -                       file->recs[i]->ptr[size - 1] == '\n')
>> +       if ((size = file->recs[i].size) &&
>> +                       file->recs[i].ptr[size - 1] == '\n')
>>                  /* Last line; ends in LF; Is it CR/LF? */
>>                  return size > 1 &&
>> -                       file->recs[i]->ptr[size - 2] == '\r';
>> +                       file->recs[i].ptr[size - 2] == '\r';
>>          if (!i)
>>                  /* The only line has no eol */
>>                  return -1;
>>          /* Determine eol from second-to-last line */
>> -       return (size = file->recs[i - 1]->size) > 1 &&
>> -               file->recs[i - 1]->ptr[size - 2] == '\r';
>> +       return (size = file->recs[i - 1].size) > 1 &&
>> +               file->recs[i - 1].ptr[size - 2] == '\r';
>>   }
>>
>>   static int is_cr_needed(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m)
>> @@ -334,22 +334,22 @@ static int recmatch(xrecord_t *rec1, xrecord_t *rec2, unsigned long flags)
>>   static void xdl_refine_zdiff3_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
>>                  xpparam_t const *xpp)
>>   {
>> -       xrecord_t **rec1 = xe1->xdf2.recs, **rec2 = xe2->xdf2.recs;
>> +       xrecord_t *rec1 = xe1->xdf2.recs, *rec2 = xe2->xdf2.recs;
>>          for (; m; m = m->next) {
>>                  /* let's handle just the conflicts */
>>                  if (m->mode)
>>                          continue;
>>
>>                  while(m->chg1 && m->chg2 &&
>> -                     recmatch(rec1[m->i1], rec2[m->i2], xpp->flags)) {
>> +                     recmatch(&rec1[m->i1], &rec2[m->i2], xpp->flags)) {
>>                          m->chg1--;
>>                          m->chg2--;
>>                          m->i1++;
>>                          m->i2++;
>>                  }
>>                  while (m->chg1 && m->chg2 &&
>> -                      recmatch(rec1[m->i1 + m->chg1 - 1],
>> -                               rec2[m->i2 + m->chg2 - 1], xpp->flags)) {
>> +                      recmatch(&rec1[m->i1 + m->chg1 - 1],
>> +                               &rec2[m->i2 + m->chg2 - 1], xpp->flags)) {
>>                          m->chg1--;
>>                          m->chg2--;
>>                  }
>> @@ -381,12 +381,12 @@ static int xdl_refine_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
>>                   * This probably does not work outside git, since
>>                   * we have a very simple mmfile structure.
>>                   */
>> -               t1.ptr = (char *)xe1->xdf2.recs[m->i1]->ptr;
>> -               t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1]->ptr
>> -                       + xe1->xdf2.recs[m->i1 + m->chg1 - 1]->size - t1.ptr;
>> -               t2.ptr = (char *)xe2->xdf2.recs[m->i2]->ptr;
>> -               t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1]->ptr
>> -                       + xe2->xdf2.recs[m->i2 + m->chg2 - 1]->size - t2.ptr;
>> +               t1.ptr = (char *)xe1->xdf2.recs[m->i1].ptr;
>> +               t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1].ptr
>> +                       + xe1->xdf2.recs[m->i1 + m->chg1 - 1].size - t1.ptr;
>> +               t2.ptr = (char *)xe2->xdf2.recs[m->i2].ptr;
>> +               t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1].ptr
>> +                       + xe2->xdf2.recs[m->i2 + m->chg2 - 1].size - t2.ptr;
>>                  if (xdl_do_diff(&t1, &t2, xpp, &xe) < 0)
>>                          return -1;
>>                  if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
>> @@ -440,8 +440,8 @@ static int line_contains_alnum(const char *ptr, long size)
>>   static int lines_contain_alnum(xdfenv_t *xe, int i, int chg)
>>   {
>>          for (; chg; chg--, i++)
>> -               if (line_contains_alnum(xe->xdf2.recs[i]->ptr,
>> -                               xe->xdf2.recs[i]->size))
>> +               if (line_contains_alnum(xe->xdf2.recs[i].ptr,
>> +                               xe->xdf2.recs[i].size))
>>                          return 1;
>>          return 0;
>>   }
>> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
>> index 77dc411d19..bf69a58527 100644
>> --- a/xdiff/xpatience.c
>> +++ b/xdiff/xpatience.c
>> @@ -88,9 +88,9 @@ static int is_anchor(xpparam_t const *xpp, const char *line)
>>   static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>>                            int pass)
>>   {
>> -       xrecord_t **records = pass == 1 ?
>> +       xrecord_t *records = pass == 1 ?
>>                  map->env->xdf1.recs : map->env->xdf2.recs;
>> -       xrecord_t *record = records[line - 1];
>> +       xrecord_t *record = &records[line - 1];
>>          /*
>>           * After xdl_prepare_env() (or more precisely, due to
>>           * xdl_classify_record()), the "ha" member of the records (AKA lines)
>> @@ -121,7 +121,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>>                  return;
>>          map->entries[index].line1 = line;
>>          map->entries[index].hash = record->ha;
>> -       map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1]->ptr);
>> +       map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1].ptr);
>>          if (!map->first)
>>                  map->first = map->entries + index;
>>          if (map->last) {
>> @@ -246,8 +246,8 @@ static int find_longest_common_sequence(struct hashmap *map, struct entry **res)
>>
>>   static int match(struct hashmap *map, int line1, int line2)
>>   {
>> -       xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
>> -       xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
>> +       xrecord_t *record1 = &map->env->xdf1.recs[line1 - 1];
>> +       xrecord_t *record2 = &map->env->xdf2.recs[line2 - 1];
>>          return record1->ha == record2->ha;
>>   }
>>
>> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
>> index 6f1d4b4725..92f9845003 100644
>> --- a/xdiff/xprepare.c
>> +++ b/xdiff/xprepare.c
>> @@ -131,7 +131,6 @@ static void xdl_free_ctx(xdfile_t *xdf)
>>          xdl_free(xdf->rindex);
>>          xdl_free(xdf->rchg - 1);
>>          xdl_free(xdf->recs);
>> -       xdl_cha_free(&xdf->rcha);
>>   }
>>
>>
>> @@ -146,8 +145,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>>          xdf->rchg = NULL;
>>          xdf->recs = NULL;
>>
>> -       if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
>> -               goto abort;
>>          if (!XDL_ALLOC_ARRAY(xdf->recs, narec))
>>                  goto abort;
>>
>> @@ -158,12 +155,10 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>>                          hav = xdl_hash_record(&cur, top, xpp->flags);
>>                          if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec))
>>                                  goto abort;
>> -                       if (!(crec = xdl_cha_alloc(&xdf->rcha)))
>> -                               goto abort;
>> +                       crec = &xdf->recs[xdf->nrec++];
>>                          crec->ptr = prev;
>>                          crec->size = (long) (cur - prev);
>>                          crec->ha = hav;
>> -                       xdf->recs[xdf->nrec++] = crec;
>>                          if (xdl_classify_record(pass, cf, crec) < 0)
>>                                  goto abort;
>>                  }
>> @@ -263,7 +258,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>>    */
>>   static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) {
>>          long i, nm, nreff, mlim;
>> -       xrecord_t **recs;
>> +       xrecord_t *recs;
>>          xdlclass_t *rcrec;
>>          char *dis, *dis1, *dis2;
>>          int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
>> @@ -276,7 +271,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>>          if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT)
>>                  mlim = XDL_MAX_EQLIMIT;
>>          for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
>> -               rcrec = cf->rcrecs[(*recs)->ha];
>> +               rcrec = cf->rcrecs[recs->ha];
>>                  nm = rcrec ? rcrec->len2 : 0;
>>                  dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>>          }
>> @@ -284,7 +279,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>>          if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
>>                  mlim = XDL_MAX_EQLIMIT;
>>          for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
>> -               rcrec = cf->rcrecs[(*recs)->ha];
>> +               rcrec = cf->rcrecs[recs->ha];
>>                  nm = rcrec ? rcrec->len1 : 0;
>>                  dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>>          }
>> @@ -320,13 +315,13 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>>    */
>>   static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
>>          long i, lim;
>> -       xrecord_t **recs1, **recs2;
>> +       xrecord_t *recs1, *recs2;
>>
>>          recs1 = xdf1->recs;
>>          recs2 = xdf2->recs;
>>          for (i = 0, lim = XDL_MIN(xdf1->nrec, xdf2->nrec); i < lim;
>>               i++, recs1++, recs2++)
>> -               if ((*recs1)->ha != (*recs2)->ha)
>> +               if (recs1->ha != recs2->ha)
>>                          break;
>>
>>          xdf1->dstart = xdf2->dstart = i;
>> @@ -334,7 +329,7 @@ static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
>>          recs1 = xdf1->recs + xdf1->nrec - 1;
>>          recs2 = xdf2->recs + xdf2->nrec - 1;
>>          for (lim -= i, i = 0; i < lim; i++, recs1--, recs2--)
>> -               if ((*recs1)->ha != (*recs2)->ha)
>> +               if (recs1->ha != recs2->ha)
>>                          break;
>>
>>          xdf1->dend = xdf1->nrec - i - 1;
>> diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
>> index 85848f1685..3d26cbf1ec 100644
>> --- a/xdiff/xtypes.h
>> +++ b/xdiff/xtypes.h
>> @@ -45,10 +45,9 @@ typedef struct s_xrecord {
>>   } xrecord_t;
>>
>>   typedef struct s_xdfile {
>> -       chastore_t rcha;
>> +       xrecord_t *recs;
>>          long nrec;
>>          long dstart, dend;
>> -       xrecord_t **recs;
>>          char *rchg;
>>          long *rindex;
>>          long nreff;
>> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
>> index 444a108f87..332982b509 100644
>> --- a/xdiff/xutils.c
>> +++ b/xdiff/xutils.c
>> @@ -416,12 +416,12 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
>>          mmfile_t subfile1, subfile2;
>>          xdfenv_t env;
>>
>> -       subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr;
>> -       subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr +
>> -               diff_env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
>> -       subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr;
>> -       subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr +
>> -               diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
>> +       subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1].ptr;
>> +       subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2].ptr +
>> +               diff_env->xdf1.recs[line1 + count1 - 2].size - subfile1.ptr;
>> +       subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1].ptr;
>> +       subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2].ptr +
>> +               diff_env->xdf2.recs[line2 + count2 - 2].size - subfile2.ptr;
>>          if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0)
>>                  return -1;
>>
>> --
>> gitgitgadget
> > You weren't kidding with the --color-words callout; there's an awful
> lot of places where you only change one or two characters (e.g. '->'
> becoming '.'); that's much easier to see when viewing the diff with
> that flag.
> > Anyway, looks good.
> 

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <newren@gmail.com> writes:

> On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> My personal bias is that things like "view with --color-words" makes
> more sense to include near the end of the commit message, just before
> the sign-offs.  Not sure if others agree on that.

FWIW, I am with you.  Certainly not on the title.  It is even fine
immediately below the three-dash line before the diffstat.

Thanks.  I am enjoying to follow along the series by reading your
reviews.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ben Knoble wrote (reply to this):

> Le 9 sept. 2025 à 04:58, Elijah Newren <newren@gmail.com> a écrit :
> 
> On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> 
>> From: Ezekiel Newren <ezekielnewren@gmail.com>
> 
> My personal bias is that things like "view with --color-words" makes
> more sense to include near the end of the commit message, just before
> the sign-offs.  Not sure if others agree on that.

I’ve been using a Best-viewed-with trailer; I saw Peff do something similar once, I think.

#define XDIFF_H

#ifdef __cplusplus
extern "C" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> Define macros NO(0), YES(1), MAYBE(2) as the enum values for rchg to
> make the code easier to follow. Perhaps 'rchg' should be renamed to
> 'changed'?
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
>  xdiff/xdiff.h      |  4 ++++
>  xdiff/xdiffi.c     | 29 ++++++++++++++---------------
>  xdiff/xhistogram.c |  8 ++++----
>  xdiff/xpatience.c  |  8 ++++----
>  xdiff/xprepare.c   | 24 ++++++++++++------------
>  5 files changed, 38 insertions(+), 35 deletions(-)
>
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index 2cecde5afe..7092879829 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -27,6 +27,10 @@
>  extern "C" {
>  #endif /* #ifdef __cplusplus */
>
> +#define NO 0
> +#define YES 1
> +#define MAYBE 2
> +
>  /* xpparm_t.flags */
>  #define XDF_NEED_MINIMAL (1 << 0)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index a66125d44a..44fd27823a 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -278,10 +278,10 @@ int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
>          */
>         if (off1 == lim1) {
>                 for (; off2 < lim2; off2++)
> -                       xdf2->rchg[xdf2->rindex[off2]] = 1;
> +                       xdf2->rchg[xdf2->rindex[off2]] = YES;
>         } else if (off2 == lim2) {
>                 for (; off1 < lim1; off1++)
> -                       xdf1->rchg[xdf1->rindex[off1]] = 1;
> +                       xdf1->rchg[xdf1->rindex[off1]] = YES;
>         } else {
>                 xdpsplit_t spl;
>                 spl.i1 = spl.i2 = 0;
> @@ -708,7 +708,7 @@ struct xdlgroup {
>  static void group_init(xdfile_t *xdf, struct xdlgroup *g)
>  {
>         g->start = g->end = 0;
> -       while (xdf->rchg[g->end])
> +       while (xdf->rchg[g->end] == YES)

You've got a few places like this where the old code would have
behaved differently if there were some MAYBE values.  I presume you've
carefully vetted that those can't happen at these points in the code,
but it might be worth calling that out in the commit message for
reviewers who'll otherwise wonder if there's a behavior change that
has occurred.

>                 g->end++;
>  }
>
> @@ -722,7 +722,7 @@ static inline int group_next(xdfile_t *xdf, struct xdlgroup *g)
>                 return -1;
>
>         g->start = g->end + 1;
> -       for (g->end = g->start; xdf->rchg[g->end]; g->end++)
> +       for (g->end = g->start; xdf->rchg[g->end] == YES; g->end++)

Here's another of those where you assume MAYBE isn't possible.

>                 ;
>
>         return 0;
> @@ -738,7 +738,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g)
>                 return -1;
>
>         g->end = g->start - 1;
> -       for (g->start = g->end; xdf->rchg[g->start - 1]; g->start--)
> +       for (g->start = g->end; xdf->rchg[g->start - 1] == YES; g->start--)

...and another.

>                 ;
>
>         return 0;
> @@ -753,10 +753,10 @@ static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
>  {
>         if (g->end < xdf->nrec &&
>             recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) {
> -               xdf->rchg[g->start++] = 0;
> -               xdf->rchg[g->end++] = 1;
> +               xdf->rchg[g->start++] = NO;
> +               xdf->rchg[g->end++] = YES;
>
> -               while (xdf->rchg[g->end])
> +               while (xdf->rchg[g->end] == YES)

...and another.

>                         g->end++;
>
>                 return 0;
> @@ -774,10 +774,10 @@ static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g)
>  {
>         if (g->start > 0 &&
>             recs_match(&xdf->recs[g->start - 1], &xdf->recs[g->end - 1])) {
> -               xdf->rchg[--g->start] = 1;
> -               xdf->rchg[--g->end] = 0;
> +               xdf->rchg[--g->start] = YES;
> +               xdf->rchg[--g->end] = NO;
>
> -               while (xdf->rchg[g->start - 1])
> +               while (xdf->rchg[g->start - 1] == YES)

...and another.

>                         g->start--;
>
>                 return 0;
> @@ -932,16 +932,15 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>
>  int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
>         xdchange_t *cscr = NULL, *xch;
> -       char *rchg1 = xe->xdf1.rchg, *rchg2 = xe->xdf2.rchg;
>         long i1, i2, l1, l2;
>
>         /*
>          * Trivial. Collects "groups" of changes and creates an edit script.
>          */
>         for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
> -               if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
> -                       for (l1 = i1; rchg1[i1 - 1]; i1--);
> -                       for (l2 = i2; rchg2[i2 - 1]; i2--);
> +               if (xe->xdf1.rchg[i1 - 1] || xe->xdf2.rchg[i2 - 1]) {
> +                       for (l1 = i1; xe->xdf1.rchg[i1 - 1]; i1--);
> +                       for (l2 = i2; xe->xdf2.rchg[i2 - 1]; i2--);

The changes in this xdl_build_script() function appear to be
orthogonal to what was described in the commit message.  If it's a
separate cleanup, perhaps justify it in a separate patch?

>
>                         if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {
>                                 xdl_free_script(cscr);
> diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
> index 4d857e8ae2..c2e85b8ab9 100644
> --- a/xdiff/xhistogram.c
> +++ b/xdiff/xhistogram.c
> @@ -318,11 +318,11 @@ redo:
>
>         if (!count1) {
>                 while(count2--)
> -                       env->xdf2.rchg[line2++ - 1] = 1;
> +                       env->xdf2.rchg[line2++ - 1] = YES;
>                 return 0;
>         } else if (!count2) {
>                 while(count1--)
> -                       env->xdf1.rchg[line1++ - 1] = 1;
> +                       env->xdf1.rchg[line1++ - 1] = YES;
>                 return 0;
>         }
>
> @@ -335,9 +335,9 @@ redo:
>         else {
>                 if (lcs.begin1 == 0 && lcs.begin2 == 0) {
>                         while (count1--)
> -                               env->xdf1.rchg[line1++ - 1] = 1;
> +                               env->xdf1.rchg[line1++ - 1] = YES;
>                         while (count2--)
> -                               env->xdf2.rchg[line2++ - 1] = 1;
> +                               env->xdf2.rchg[line2++ - 1] = YES;
>                         result = 0;
>                 } else {
>                         result = histogram_diff(xpp, env,
> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
> index bf69a58527..20cda5e258 100644
> --- a/xdiff/xpatience.c
> +++ b/xdiff/xpatience.c
> @@ -331,11 +331,11 @@ static int patience_diff(xpparam_t const *xpp, xdfenv_t *env,
>         /* trivial case: one side is empty */
>         if (!count1) {
>                 while(count2--)
> -                       env->xdf2.rchg[line2++ - 1] = 1;
> +                       env->xdf2.rchg[line2++ - 1] = YES;
>                 return 0;
>         } else if (!count2) {
>                 while(count1--)
> -                       env->xdf1.rchg[line1++ - 1] = 1;
> +                       env->xdf1.rchg[line1++ - 1] = YES;
>                 return 0;
>         }
>
> @@ -347,9 +347,9 @@ static int patience_diff(xpparam_t const *xpp, xdfenv_t *env,
>         /* are there any matching lines at all? */
>         if (!map.has_matches) {
>                 while(count1--)
> -                       env->xdf1.rchg[line1++ - 1] = 1;
> +                       env->xdf1.rchg[line1++ - 1] = YES;
>                 while(count2--)
> -                       env->xdf2.rchg[line2++ - 1] = 1;
> +                       env->xdf2.rchg[line2++ - 1] = YES;
>                 xdl_free(map.entries);
>                 return 0;
>         }
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 92f9845003..36437f91bb 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -215,9 +215,9 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>          * current line (i) is already a multimatch line.
>          */
>         for (r = 1, rdis0 = 0, rpdis0 = 1; (i - r) >= s; r++) {
> -               if (!dis[i - r])
> +               if (dis[i - r] == NO)
>                         rdis0++;
> -               else if (dis[i - r] == 2)
> +               else if (dis[i - r] == MAYBE)
>                         rpdis0++;
>                 else
>                         break;
> @@ -231,9 +231,9 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>         if (rdis0 == 0)
>                 return 0;
>         for (r = 1, rdis1 = 0, rpdis1 = 1; (i + r) <= e; r++) {
> -               if (!dis[i + r])
> +               if (dis[i + r] == NO)
>                         rdis1++;
> -               else if (dis[i + r] == 2)
> +               else if (dis[i + r] == MAYBE)
>                         rpdis1++;
>                 else
>                         break;
> @@ -273,7 +273,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>         for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
>                 rcrec = cf->rcrecs[recs->ha];
>                 nm = rcrec ? rcrec->len2 : 0;
> -               dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
> +               dis1[i] = (nm == 0) ? NO: (nm >= mlim && !need_min) ? MAYBE: YES;
>         }
>
>         if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
> @@ -281,26 +281,26 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>         for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
>                 rcrec = cf->rcrecs[recs->ha];
>                 nm = rcrec ? rcrec->len1 : 0;
> -               dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
> +               dis2[i] = (nm == 0) ? NO: (nm >= mlim && !need_min) ? MAYBE: YES;
>         }
>
>         for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
>              i <= xdf1->dend; i++, recs++) {
> -               if (dis1[i] == 1 ||
> -                   (dis1[i] == 2 && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) {
> +               if (dis1[i] == YES ||
> +                   (dis1[i] == MAYBE && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) {
>                         xdf1->rindex[nreff++] = i;
>                 } else
> -                       xdf1->rchg[i] = 1;
> +                       xdf1->rchg[i] = YES;
>         }
>         xdf1->nreff = nreff;
>
>         for (nreff = 0, i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart];
>              i <= xdf2->dend; i++, recs++) {
> -               if (dis2[i] == 1 ||
> -                   (dis2[i] == 2 && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) {
> +               if (dis2[i] == YES ||
> +                   (dis2[i] == MAYBE && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) {
>                         xdf2->rindex[nreff++] = i;
>                 } else
> -                       xdf2->rchg[i] = 1;
> +                       xdf2->rchg[i] = YES;
>         }
>         xdf2->nreff = nreff;
>
> --
> gitgitgadget

Everything else looks like the straightforward translation you called
out in your commit message.

Copy link

User Phillip Wood <phillip.wood123@gmail.com> has been added to the cc: list.

Every field in this struct is an alias for a certain field in xdfile_t.

diffdata_t.nrec   -> xdfile_t.nreff
diffdata_t.ha     -> xdfile_t.ha
diffdata_t.rindex -> xdfile_t.rindex
diffdata_t.rchg   -> xdfile_t.rchg

I think this struct existed before xdfile_t, and was kept for backward
compatibility reasons. I think xdiffi should have been refactored to
use the new (xdfile_t) struct, but was easier to alias it instead.

The local variables rchg* and rindex* don't shorten the lines by much,
nor do they really need to be there to make the code more readable.
Delete them.

Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
When 0 <= i < xdfile_t.nreff the following is true:
xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]]

This makes the code about 5% slower. The fields rindex and ha are
specific to the classic diff (myers and minimal). I plan on creating a
struct for classic diff, but there's a lot of cleanup that needs to be
done before that can happen and leaving ha in would make those cleanups
harder to follow.

A subsequent commit will delete the chastore cha from xdfile_t. That
later commit will investigate deleting ha and cha independently and
together.

Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
…ord_t

The fields from xdlclass_t are aliases of xrecord_t:
xdlclass_t.line -> xrecord_t.ptr
xdlclass_t.size -> xrecord_t.size
xdlclass_t.ha   -> xrecord_t.ha

xdlclass_t carries a copy of the data in xrecord_t, but instead of
embedding xrecord_t it duplicates the individual fields. A future
commit will change the types used in xrecord_t so embed it in
xdlclass_t first, so we don't have to remember to change the types
here as well.

Best-viewed-with: --color-words
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
xdfile_t currently uses chastore_t which is an arena allocator. I
think that xrecord_t used to be a linked list and recs didn't exist
originally. When recs was added I think they forgot to remove
xdfile_t.next, but was overlooked. This dual data structure setup
makes the code somewhat confusing.

Additionally the C type chastore_t isn't FFI friendly, and provides
little to no performance benefit over using realloc to grow an array.

Performance impact of deleting fields from xdfile_t:
Deleting ha is about 5% slower.
Deleting cha is about 5% faster.

Delete ha, but keep cha
  time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null'
  Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
    Time (mean ± σ):      1.269 s ±  0.017 s    [User: 1.135 s, System: 0.128 s]
    Range (min … max):    1.249 s …  1.286 s    10 runs

  Benchmark 2: build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
    Time (mean ± σ):      1.339 s ±  0.017 s    [User: 1.234 s, System: 0.099 s]
    Range (min … max):    1.320 s …  1.358 s    10 runs

  Summary
    build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran
      1.06 ± 0.02 times faster than build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null

Delete cha, but keep ha
  time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null'
  Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
    Time (mean ± σ):      1.290 s ±  0.001 s    [User: 1.154 s, System: 0.130 s]
    Range (min … max):    1.288 s …  1.292 s    10 runs

  Benchmark 2: build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
    Time (mean ± σ):      1.232 s ±  0.017 s    [User: 1.105 s, System: 0.121 s]
    Range (min … max):    1.205 s …  1.249 s    10 runs

  Summary
    build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran
      1.05 ± 0.01 times faster than build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null

Delete ha AND chastore
  time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha_and_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null'
  Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
    Time (mean ± σ):      1.291 s ±  0.002 s    [User: 1.156 s, System: 0.129 s]
    Range (min … max):    1.287 s …  1.295 s    10 runs

  Benchmark 2: build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
    Time (mean ± σ):      1.306 s ±  0.001 s    [User: 1.195 s, System: 0.105 s]
    Range (min … max):    1.305 s …  1.308 s    10 runs

  Summary
    build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran
      1.01 ± 0.00 times faster than build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null

Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
The field rchg (now 'changed') declares if a line in a file is changed
or not. A later commit will change it's type from 'char' to 'bool'
to make its purpose even more clear.

Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
This commit is refactor-only; no behavior is changed. A future commit
will use bool literals for changed[i].

The functions xdl_clean_mmatch() and xdl_cleanup_records() will be
cleaned up more in a future patch series. The changes to
xdl_cleanup_records(), in this patch, is just to make it clear why
`char rchg` is refactored to `bool changed`.

Rename dis* to action* and replace literal numericals with macros.
The old names came from when dis* (which I think was short for discard)
was treated like a boolean, but over time it grew into a ternary state
machine. The result was confusing because dis* and rchg* both used 0/1
values with different meanings.

The new names and macros make the states explicit. nm is short for
number of matches, and mlim is a heuristic limit:

  nm == 0       -> action[i] = DISCARD     -> changed[i] = true
  0 < nm < mlim -> action[i] = KEEP        -> changed[i] = false
  nm >= mlim    -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch()

When need_min is true, only DISCARD and KEEP occur because the limit
is effectively infinite.

Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
@ezekielnewren ezekielnewren force-pushed the use_rust_types_in_xdiff branch 2 times, most recently from a9b9c9f to 226a04a Compare September 26, 2025 20:47
The only values possible for 'changed' is 1 and 0, which exactly maps
to a bool type. It might not look like this because action1 and action2
(which use to be dis1, and dis2) were also of type char and were
assigned numerical values within a few lines of 'changed' (what used to
be rchg).

Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false
for changed[k] makes it clear to future readers that these are
logically separate concepts.

Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
@ezekielnewren ezekielnewren force-pushed the use_rust_types_in_xdiff branch from 226a04a to 83e1ace Compare September 26, 2025 21:47
@ezekielnewren
Copy link
Author

/submit

Copy link

Submitted as pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v6

To fetch this version to local tag pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v6:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v6

Copy link

This patch series was integrated into seen via 989d856.

Copy link

This patch series was integrated into seen via f2e5abc.

Copy link

There was a status update in the "Cooking" section about the branch en/xdiff-cleanup on the Git mailing list:

A lot of code clean-up of xdiff.
Split out of a larger topic.

Will merge to 'next'?
source: <pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com>

Copy link

This patch series was integrated into seen via a95b449.

@@ -22,23 +22,14 @@

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, "Kristoffer Haugsbakk" wrote (reply to this):

On Tue, Sep 23, 2025, at 23:24, Ezekiel Newren via GitGitGadget wrote:
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> When xrecord_t was a linked list, and recs didn't exist, I assume this
> function walked the list until it found the right record. Accessing
> a contiguous array is so trival that this function is now superfluous.

s/trival/trivial/

> Delete it.
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ezekiel Newren wrote (reply to this):

On Tue, Sep 30, 2025 at 7:31 AM Kristoffer Haugsbakk
<kristofferhaugsbakk@fastmail.com> wrote:
> > When xrecord_t was a linked list, and recs didn't exist, I assume this
> > function walked the list until it found the right record. Accessing
> > a contiguous array is so trival that this function is now superfluous.
>
> s/trival/trivial/

I think that other than this typo this patch series is ready to be
merged in. I would prefer that Junio fix this typo, so I don't spam
the mailing list with such a small change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ezekiel Newren <ezekielnewren@gmail.com> writes:

> On Tue, Sep 30, 2025 at 7:31 AM Kristoffer Haugsbakk
> <kristofferhaugsbakk@fastmail.com> wrote:
>> > When xrecord_t was a linked list, and recs didn't exist, I assume this
>> > function walked the list until it found the right record. Accessing
>> > a contiguous array is so trival that this function is now superfluous.
>>
>> s/trival/trivial/
>
> I think that other than this typo this patch series is ready to be
> merged in. I would prefer that Junio fix this typo, so I don't spam
> the mailing list with such a small change.

If it is only this instance, I can "rebase -i" it away, sure.

Thanks.

Copy link

User "Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> has been added to the cc: list.

Copy link

This patch series was integrated into seen via 8b3dcb6.

Copy link

This patch series was integrated into seen via 9e98a08.

Copy link

On the Git mailing list, Phillip Wood wrote (reply to this):

Hi Ezekiel

On 26/09/2025 23:41, Ezekiel Newren via GitGitGadget wrote:
> Changes since v5.
> >   * Address review feedback on commit messages.
>   * Drop commit "xdiff: delete rchg aliasing"
>   * Use DISCARD/KEEP/INVESTIGATE instead of NONE/SOME/TOO_MANY
>   * Fix the word wrapping in the comments of xprepare.c

Thanks for expanding the commit messages, I think the range diff looks good. There's a typo in patch 12 (see below) but its not worth re-rolling just for that.

> Range-diff vs v5:
> [...] >   12:  08a0fceb72 ! 11:  f08782a977 xdiff: use enum macros NONE(0), SOME(1), TOO_MANY(2) in xprepare.c
>       @@ Metadata
>        Author: Ezekiel Newren <ezekielnewren@gmail.com>
>        >         ## Commit message ##
>       -    xdiff: use enum macros NONE(0), SOME(1), TOO_MANY(2) in xprepare.c
>       +    xdiff: add macros DISCARD(0), KEEP(1), INVESTIGATE(2) in xprepare.c
>        >       -    Rename dis1, dis2 to matches1, matches2.
>       +    This commit is refactor-only; no behavior is changed. A future commit
>       +    will use bool literals for changed[i].
>        >       -    Define macros NONE(0), SOME(1), TOO_MANY(2) as the enum values for
>       -    matches1 and matches2. These states will influence whether changed[i]
>       -    is set to 1 or kept as 0.
>       +    The functions xdl_clean_mmatch() and xdl_cleanup_records() will be
>       +    cleaned up more in a future patch series. The changes to
>       +    xdl_cleanup_records(), in this patch, is just to make it clear why

Not worth a re-roll on its own s/is/are/

Thanks for working on this

Phillip

>       +    `char rchg` is refactored to `bool changed`.
>       +
>       +    Rename dis* to action* and replace literal numericals with macros.
>       +    The old names came from when dis* (which I think was short for discard)
>       +    was treated like a boolean, but over time it grew into a ternary state
>       +    machine. The result was confusing because dis* and rchg* both used 0/1
>       +    values with different meanings.
>       +
>       +    The new names and macros make the states explicit. nm is short for
>       +    number of matches, and mlim is a heuristic limit:
>       +
>       +      nm == 0       -> action[i] = DISCARD     -> changed[i] = true
>       +      0 < nm < mlim -> action[i] = KEEP        -> changed[i] = false
>       +      nm >= mlim    -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch()
>       +
>       +    When need_min is true, only DISCARD and KEEP occur because the limit
>       +    is effectively infinite.
>        >            Best-viewed-with: --color-words
>            Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
>       @@ xdiff/xprepare.c
>         #define XDL_GUESS_NLINES1 256
>         #define XDL_GUESS_NLINES2 20
>         >       -+#define NONE 0
>       -+#define SOME 1
>       -+#define TOO_MANY 2
>       ++#define DISCARD 0
>       ++#define KEEP 1
>       ++#define INVESTIGATE 2
>         >         typedef struct s_xdlclass {
>         	struct s_xdlclass *next;
>       @@ xdiff/xprepare.c: void xdl_free_env(xdfenv_t *xe) {
>         >         >        -static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>       -+static bool xdl_clean_mmatch(uint8_t const *matches, long i, long s, long e) {
>       ++static bool xdl_clean_mmatch(uint8_t const *action, long i, long s, long e) {
>         	long r, rdis0, rpdis0, rdis1, rpdis1;
>         >         	/*
>        -	 * Limits the window the is examined during the similar-lines
>        -	 * scan. The loops below stops when dis[i - r] == 1 (line that
>       +-	 * has no match), but there are corner cases where the loop
>       +-	 * proceed all the way to the extremities by causing huge
>       +-	 * performance penalties in case of big files.
>        +	 * Limits the window that is examined during the similar-lines
>       -+	 * scan. The loops below stops when matches[i - r] == SOME (line that
>       - 	 * has no match), but there are corner cases where the loop
>       - 	 * proceed all the way to the extremities by causing huge
>       - 	 * performance penalties in case of big files.
>       ++	 * scan. The loops below stops when action[i - r] == KEEP
>       ++	 * (line that has no match), but there are corner cases where
>       ++	 * the loop proceed all the way to the extremities by causing
>       ++	 * huge performance penalties in case of big files.
>       + 	 */
>       + 	if (i - s > XDL_SIMSCAN_WINDOW)
>       + 		s = i - XDL_SIMSCAN_WINDOW;
>        @@ xdiff/xprepare.c: static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>         >         	/*
>         	 * Scans the lines before 'i' to find a run of lines that either
>        -	 * have no match (dis[j] == 0) or have multiple matches (dis[j] > 1).
>        -	 * Note that we always call this function with dis[i] > 1, so the
>       -+	 * have no match (matches[j] == NONE) or have multiple matches (matches[j] == TOO_MANY).
>       -+	 * Note that we always call this function with matches[i] == TOO_MANY, so the
>       - 	 * current line (i) is already a multimatch line.
>       +-	 * current line (i) is already a multimatch line.
>       ++	 * have no match (action[j] == DISCARD) or have multiple matches
>       ++	 * (action[j] == INVESTIGATE). Note that we always call this
>       ++	 * function with action[i] == INVESTIGATE, so the current line
>       ++	 * (i) is already a multimatch line.
>         	 */
>         	for (r = 1, rdis0 = 0, rpdis0 = 1; (i - r) >= s; r++) {
>        -		if (!dis[i - r])
>       -+		if (matches[i - r] == NONE)
>       ++		if (action[i - r] == DISCARD)
>         			rdis0++;
>        -		else if (dis[i - r] == 2)
>       -+		else if (matches[i - r] == TOO_MANY)
>       ++		else if (action[i - r] == INVESTIGATE)
>         			rpdis0++;
>        -		else
>       -+		else if (matches[i - r] == SOME)
>       ++		else if (action[i - r] == KEEP)
>         			break;
>        +		else
>       -+			BUG("Illegal value for matches[i - r]");
>       ++			BUG("Illegal value for action[i - r]");
>         	}
>         	/*
>       - 	 * If the run before the line 'i' found only multimatch lines, we
>       +-	 * If the run before the line 'i' found only multimatch lines, we
>        -	 * return 0 and hence we don't make the current line (i) discarded.
>       -+	 * return false and hence we don't make the current line (i) discarded.
>       - 	 * We want to discard multimatch lines only when they appear in the
>       +-	 * We want to discard multimatch lines only when they appear in the
>        -	 * middle of runs with nomatch lines (dis[j] == 0).
>       -+	 * middle of runs with nomatch lines (matches[j] == NONE).
>       ++	 * If the run before the line 'i' found only multimatch lines,
>       ++	 * we return false and hence we don't make the current line (i)
>       ++	 * discarded. We want to discard multimatch lines only when
>       ++	 * they appear in the middle of runs with nomatch lines
>       ++	 * (action[j] == DISCARD).
>         	 */
>         	if (rdis0 == 0)
>         		return 0;
>         	for (r = 1, rdis1 = 0, rpdis1 = 1; (i + r) <= e; r++) {
>        -		if (!dis[i + r])
>       -+		if (matches[i + r] == NONE)
>       ++		if (action[i + r] == DISCARD)
>         			rdis1++;
>        -		else if (dis[i + r] == 2)
>       -+		else if (matches[i + r] == TOO_MANY)
>       ++		else if (action[i + r] == INVESTIGATE)
>         			rpdis1++;
>        -		else
>       -+		else if (matches[i + r] == SOME)
>       ++		else if (action[i + r] == KEEP)
>         			break;
>        +		else
>       -+			BUG("Illegal value for matches[i + r]");
>       ++			BUG("Illegal value for action[i + r]");
>         	}
>         	/*
>       - 	 * If the run after the line 'i' found only multimatch lines, we
>       +-	 * If the run after the line 'i' found only multimatch lines, we
>        -	 * return 0 and hence we don't make the current line (i) discarded.
>       -+	 * return false and hence we don't make the current line (i) discarded.
>       ++	 * If the run after the line 'i' found only multimatch lines,
>       ++	 * we return false and hence we don't make the current line (i)
>       ++	 * discarded.
>         	 */
>         	if (rdis1 == 0)
>        -		return 0;
>       @@ xdiff/xprepare.c: static int xdl_clean_mmatch(char const *dis, long i, long s, l
>         	xdlclass_t *rcrec;
>        -	char *dis, *dis1, *dis2;
>        -	int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
>       -+	uint8_t *matches1, *matches2;
>       -+	int status = 0;
>       ++	uint8_t *action1 = NULL, *action2 = NULL;
>        +	bool need_min = !!(cf->flags & XDF_NEED_MINIMAL);
>       ++	int ret = 0;
>         >        -	if (!XDL_CALLOC_ARRAY(dis, xdf1->nrec + xdf2->nrec + 2))
>        -		return -1;
>        -	dis1 = dis;
>        -	dis2 = dis1 + xdf1->nrec + 1;
>       -+	matches1 = NULL;
>       -+	matches2 = NULL;
>       -+
>        +	/*
>        +	 * Create temporary arrays that will help us decide if
>        +	 * changed[i] should remain 0 or become 1.
>        +	 */
>       -+	if (!XDL_CALLOC_ARRAY(matches1, xdf1->nrec + 1)) {
>       -+		status = -1;
>       ++	if (!XDL_CALLOC_ARRAY(action1, xdf1->nrec + 1)) {
>       ++		ret = -1;
>        +		goto cleanup;
>        +	}
>       -+	if (!XDL_CALLOC_ARRAY(matches2, xdf2->nrec + 1)) {
>       -+		status = -1;
>       ++	if (!XDL_CALLOC_ARRAY(action2, xdf2->nrec + 1)) {
>       ++		ret = -1;
>        +		goto cleanup;
>        +	}
>         >        +	/*
>       -+	 * Initialize temporary arrays with NONE, SOME, or TOO_MANY.
>       ++	 * Initialize temporary arrays with DISCARD, KEEP, or INVESTIGATE.
>        +	 */
>         	if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT)
>         		mlim = XDL_MAX_EQLIMIT;
>       @@ xdiff/xprepare.c: static int xdl_clean_mmatch(char const *dis, long i, long s, l
>         		rcrec = cf->rcrecs[recs->ha];
>         		nm = rcrec ? rcrec->len2 : 0;
>        -		dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>       -+		matches1[i] = (nm == 0) ? NONE: (nm >= mlim && !need_min) ? TOO_MANY: SOME;
>       ++		action1[i] = (nm == 0) ? DISCARD: (nm >= mlim && !need_min) ? INVESTIGATE: KEEP;
>         	}
>         >         	if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
>       @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
>         		rcrec = cf->rcrecs[recs->ha];
>         		nm = rcrec ? rcrec->len1 : 0;
>        -		dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>       -+		matches2[i] = (nm == 0) ? NONE: (nm >= mlim && !need_min) ? TOO_MANY: SOME;
>       ++		action2[i] = (nm == 0) ? DISCARD: (nm >= mlim && !need_min) ? INVESTIGATE: KEEP;
>         	}
>         >        +	/*
>       @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
>         	     i <= xdf1->dend; i++, recs++) {
>        -		if (dis1[i] == 1 ||
>        -		    (dis1[i] == 2 && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) {
>       -+		if (matches1[i] == SOME ||
>       -+		    (matches1[i] == TOO_MANY && !xdl_clean_mmatch(matches1, i, xdf1->dstart, xdf1->dend))) {
>       ++		if (action1[i] == KEEP ||
>       ++		    (action1[i] == INVESTIGATE && !xdl_clean_mmatch(action1, i, xdf1->dstart, xdf1->dend))) {
>         			xdf1->rindex[nreff++] = i;
>       -+			/* changed[i] remains 0 */
>       ++			/* changed[i] remains 0, i.e. keep */
>         		} else
>         			xdf1->changed[i] = 1;
>       ++			/* i.e. discard */
>         	}
>       -@@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>       + 	xdf1->nreff = nreff;
>         >         	for (nreff = 0, i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart];
>         	     i <= xdf2->dend; i++, recs++) {
>        -		if (dis2[i] == 1 ||
>        -		    (dis2[i] == 2 && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) {
>       -+		if (matches2[i] == SOME ||
>       -+		    (matches2[i] == TOO_MANY && !xdl_clean_mmatch(matches2, i, xdf2->dstart, xdf2->dend))) {
>       ++		if (action2[i] == KEEP ||
>       ++		    (action2[i] == INVESTIGATE && !xdl_clean_mmatch(action2, i, xdf2->dstart, xdf2->dend))) {
>         			xdf2->rindex[nreff++] = i;
>       -+			/* changed[i] remains 0 */
>       ++			/* changed[i] remains 0, i.e. keep */
>         		} else
>         			xdf2->changed[i] = 1;
>       ++			/* i.e. discard */
>         	}
>         	xdf2->nreff = nreff;
>         >        -	xdl_free(dis);
>        +cleanup:
>       -+	xdl_free(matches1);
>       -+	xdl_free(matches2);
>       ++	xdl_free(action1);
>       ++	xdl_free(action2);
>         >        -	return 0;
>       -+	return status;
>       ++	return ret;
>         }
>         >         >   13:  975e845bfa ! 12:  83e1ace5bd xdiff: change type of xdfile_t.changed from char to bool
>       @@ Commit message
>            xdiff: change type of xdfile_t.changed from char to bool
>        >            The only values possible for 'changed' is 1 and 0, which exactly maps
>       -    to a bool type. It might not look like this is the case because
>       -    matches1 and matches2 (which use to be dis1, and dis2) were also char
>       -    and were assigned numerical values within a few lines of 'changed'
>       -    (what used to be rchg).
>       +    to a bool type. It might not look like this because action1 and action2
>       +    (which use to be dis1, and dis2) were also of type char and were
>       +    assigned numerical values within a few lines of 'changed' (what used to
>       +    be rchg).
>        >       -    Using NONE, SOME, TOO_MANY for matches1[i]/matches2[j], and true/false
>       +    Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false
>            for changed[k] makes it clear to future readers that these are
>            logically separate concepts.
>        >       @@ xdiff/xdiffi.c: static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g)
>         >         		while (xdf->changed[g->start - 1])
>         			g->start--;
>       +@@ xdiff/xdiffi.c: int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>       +
>       + int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
>       + 	xdchange_t *cscr = NULL, *xch;
>       +-	char *changed1 = xe->xdf1.changed, *changed2 = xe->xdf2.changed;
>       ++	bool *changed1 = xe->xdf1.changed, *changed2 = xe->xdf2.changed;
>       + 	long i1, i2, l1, l2;
>       +
>       + 	/*
>        >         ## xdiff/xhistogram.c ##
>        @@ xdiff/xhistogram.c: redo:
>       @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
>        -	 * changed[i] should remain 0 or become 1.
>        +	 * changed[i] should remain false, or become true.
>         	 */
>       - 	if (!XDL_CALLOC_ARRAY(matches1, xdf1->nrec + 1)) {
>       - 		status = -1;
>       + 	if (!XDL_CALLOC_ARRAY(action1, xdf1->nrec + 1)) {
>       + 		ret = -1;
>        @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>         >         	/*
>       @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
>         	 */
>         	for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
>         	     i <= xdf1->dend; i++, recs++) {
>       - 		if (matches1[i] == SOME ||
>       - 		    (matches1[i] == TOO_MANY && !xdl_clean_mmatch(matches1, i, xdf1->dstart, xdf1->dend))) {
>       + 		if (action1[i] == KEEP ||
>       + 		    (action1[i] == INVESTIGATE && !xdl_clean_mmatch(action1, i, xdf1->dstart, xdf1->dend))) {
>         			xdf1->rindex[nreff++] = i;
>       --			/* changed[i] remains 0 */
>       -+			/* changed[i] remains false */
>       +-			/* changed[i] remains 0, i.e. keep */
>       ++			/* changed[i] remains false, i.e. keep */
>         		} else
>        -			xdf1->changed[i] = 1;
>        +			xdf1->changed[i] = true;
>       + 			/* i.e. discard */
>         	}
>         	xdf1->nreff = nreff;
>       -
>        @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>       - 		if (matches2[i] == SOME ||
>       - 		    (matches2[i] == TOO_MANY && !xdl_clean_mmatch(matches2, i, xdf2->dstart, xdf2->dend))) {
>       + 		if (action2[i] == KEEP ||
>       + 		    (action2[i] == INVESTIGATE && !xdl_clean_mmatch(action2, i, xdf2->dstart, xdf2->dend))) {
>         			xdf2->rindex[nreff++] = i;
>       --			/* changed[i] remains 0 */
>       -+			/* changed[i] remains false */
>       +-			/* changed[i] remains 0, i.e. keep */
>       ++			/* changed[i] remains false, i.e. keep */
>         		} else
>        -			xdf2->changed[i] = 1;
>        +			xdf2->changed[i] = true;
>       + 			/* i.e. discard */
>         	}
>         	xdf2->nreff = nreff;
>       -
>        >         ## xdiff/xtypes.h ##
>        @@ xdiff/xtypes.h: typedef struct s_xdfile {
> 

Copy link

This patch series was integrated into seen via 3f3e724.

Copy link

This patch series was integrated into seen via 3375e63.

Copy link

This patch series was integrated into seen via c5a7f67.

Copy link

There was a status update in the "Cooking" section about the branch en/xdiff-cleanup on the Git mailing list:

A lot of code clean-up of xdiff.
Split out of a larger topic.

Will merge to 'next'?
source: <pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com>

Copy link

This patch series was integrated into seen via c157ffe.

Copy link

This patch series was integrated into seen via b6d9543.

Copy link

This patch series was integrated into next via 5bc21d0.

@gitgitgadget-git gitgitgadget-git bot added the next label Oct 8, 2025
Copy link

There was a status update in the "Cooking" section about the branch en/xdiff-cleanup on the Git mailing list:

A lot of code clean-up of xdiff.
Split out of a larger topic.

Will merge to 'master'.
source: <pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants