Cleanup xdfile_t and xrecord_t in xdiff. #2048

ezekielnewren · 2025-09-07T18:35:27Z

Changes since v5.

Address review feedback on commit messages.
Drop commit "xdiff: delete rchg aliasing"
Use DISCARD/KEEP/INVESTIGATE instead of NONE/SOME/TOO_MANY
Fix the word wrapping in the comments of xprepare.c

Cleanup of the functions xdl_cleanup_records() and xdl_clean_mmatch() is out of scope for this patch series. The changes to them are incidental to explaining why 'char rchg' is refactored to 'bool changed'.

Changes since v4.

Make it clear that the field xdfile_t.rchg (now 'xdfile_t.changed') is distinct from the local variables dis1, dis2 (now 'matches1', 'matches2').
Use NONE, SOME, TOO_MANY instead of NO, YES, MAYBE.
Use bool literals for xdfile_t.changed.

Changes since v3.

Address review feedback.
Split the deletion of xdl_get_rec() into 2 commits.
Move NO, YES, MAYBE into xprepare.c, and use bool literals.
refactor 'char rchg' to 'bool changed'

Changes since v2.

No patch changes, just resending to get patch 9 to show up on the mailing list.
A few tweaks to the cover letter.

Changes since v1, to address review feedback.

Only include the clean up patches; The remaining patches will be split into a separate series.
Commit message clarifications.
Minor style cleanups.
Performance impacts included in commit message of patch 8.

Relevant part of the original cover letter follows:

Before:

typedef struct s_xrecord {
	struct s_xrecord *next;
	char const *ptr;
	long size;
	unsigned long ha;
} xrecord_t;

typedef struct s_xdfile {
	chastore_t rcha;
	long nrec;
	unsigned int hbits;
	xrecord_t **rhash;
	long dstart, dend;
	xrecord_t **recs;
	char *rchg;
	long *rindex;
	long nreff;
	unsigned long *ha;
} xdfile_t;

After cleanup:

typedef struct s_xrecord {
	char const *ptr;
	long size;
	unsigned long ha;
} xrecord_t;

typedef struct s_xdfile {
	xrecord_t *recs;
	long nrec;
	long dstart, dend;
	bool *changed;
	long *rindex;
	long nreff;
} xdfile_t;

===

cc: Elijah Newren newren@gmail.com
cc: Phillip Wood phillip.wood123@gmail.com
cc: Ben Knoble ben.knoble@gmail.com
cc: Jeff King peff@peff.net
cc: "Kristoffer Haugsbakk" kristofferhaugsbakk@fastmail.com

gitgitgadget-git · 2025-09-07T18:35:57Z

There are issues in commit d322e13:
xdiff: delete local variables and initialize/free xdfile_t directly
Commit not signed off

gitgitgadget-git · 2025-09-07T18:35:58Z

There are issues in commit 2e3f0c1:
xdiff: delete xdl_get_rec() in xemit
Commit not signed off

gitgitgadget-git · 2025-09-07T18:35:59Z

There are issues in commit 97c2453:
xdiff: delete struct diffdata_t
Commit not signed off

gitgitgadget-git · 2025-09-07T18:35:59Z

There are issues in commit e852992:
xdiff: delete redundant array xdfile_t.ha
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:00Z

There are issues in commit 3e5fc55:
xdiff: delete fields ha, line, size in xdlclass_t in favor of an xrecord_t
Commit checks stopped - the message is too short
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:00Z

There are issues in commit 7ad3e4d:
xdiff: delete chastore from xdfile_t, view with --color-words
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:01Z

There are issues in commit 85e79ce:
xdiff: treat xdfile_t.rchg like an enum
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:01Z

There are issues in commit 5286faa:
compat/rust_types.h: define rust primitive types
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:02Z

There are issues in commit ce227c5:
xdiff: include compat/rust_types.h
Commit checks stopped - the message is too short
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:03Z

There are issues in commit 3e7f722:
xdiff: make xrecord_t.ptr a u8 instead of char
Commit checks stopped - the message is too short
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:03Z

There are issues in commit 6269cd6:
xdiff: make xrecord_t.size a usize instead of long
Commit checks stopped - the message is too short
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:04Z

There are issues in commit 20cbe5d:
xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash
Commit checks stopped - the message is too short
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:04Z

There are issues in commit e7b31e7:
xdiff: make xdfile_t.nrec a usize instead of long
Commit checks stopped - the message is too short
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:05Z

There are issues in commit 8871ee0:
xdiff: make xdfile_t.nreff a usize instead of long
Commit checks stopped - the message is too short
Commit not signed off

gitgitgadget-git · 2025-09-07T18:36:05Z

There are issues in commit eb77b2a:
xdiff: change the types of dstart, dend, rchg, and rindex in xdfile_t
Commit checks stopped - the message is too short
Commit not signed off

ezekielnewren · 2025-09-07T19:44:29Z

/submit

gitgitgadget-git · 2025-09-07T19:46:51Z

Submitted as pull.2048.git.git.1757274320.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v1

To fetch this version to local tag pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v1

gitgitgadget-git · 2025-09-08T15:10:34Z

compat/rust_types.h

@@ -0,0 +1,28 @@
+#ifndef COMPAT_RUST_TYPES_H


On the Git mailing list, Junio C Hamano wrote (reply to this):

"Ezekiel Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > +/* > + * A typedef for bool is not needed because C bool and Rust bool are > + * the same if #include <stdbool.h> is used. > + */ > + > +typedef uint8_t u8; > +typedef uint16_t u16; > +typedef uint32_t u32; > +typedef uint64_t u64; > + > +typedef int8_t i8; > +typedef int16_t i16; > +typedef int32_t i32; > +typedef int64_t i64; The standard guarantees that these are all of the above are exactly N-bits wide, so I can buy the above types. But before I can buy the above typedefs, don't we need to rename existing variables that squat on these names? $ git grep -n -E -e '\<[ui](8|16|32|64)\>' gives some hits, like reftable/record.c:678: uint8_t i64[8]; t/helper/test-parse-options.c:123: uint16_t u16 = 0; t/helper/test-parse-options.c:148: OPT_UNSIGNED(0, "u16", &u16, "get a 16 bit unsigned integer"), to avoid confusion? There are handful other hits. > +typedef float f32; > +typedef double f64; It may be that they can be used interchangeably in practice on popular platforms, but are these guaranteed to be equivalent by some standard? C only cares about the minimum required range and precision, so you may have allocated enough bytes thinking you can fit a f32 but your float may not fit there. Or does Rust care only about platforms with IEEE 754 and would refuse to port to other exotic architectures so the above worries would not apply? > +typedef size_t usize; > +typedef ptrdiff_t isize; Interesting. I would have expected, "isize" that is a signed variant of "usize" to be aliased out of ssize_t (simply because it is declared that "usize" corresponds to "size_t" on the previous line), not using ptrdiff_t. > +typedef uint32_t rust_char; > + > +#endif /* COMPAT_RUST_TYPES_H */

On the Git mailing list, Ezekiel Newren wrote (reply to this):

On Mon, Sep 8, 2025 at 9:08 AM Junio C Hamano <gitster@pobox.com> wrote: > The standard guarantees that these are all of the above are exactly > N-bits wide, so I can buy the above types. But before I can buy the > above typedefs, don't we need to rename existing variables that > squat on these names? > > $ git grep -n -E -e '\<[ui](8|16|32|64)\>' > > gives some hits, like > > reftable/record.c:678: uint8_t i64[8]; > t/helper/test-parse-options.c:123: uint16_t u16 = 0; > t/helper/test-parse-options.c:148: OPT_UNSIGNED(0, "u16", &u16, "get a 16 bit unsigned integer"), > > to avoid confusion? There are handful other hits. Those places should be cleaned up, but it's not an immediate problem because compat/rust_types.h is not auto included anywhere. These typedefs live in git-compat-util.h in my "Introduce Rust" patch series and I never had a problem with compilation or testing. One reason I included compat/posix.h instead of git-compat-util.h is because it includes stdbool.h where git-compat-util.h doesn't. I'll be happy to do that cleanup. > > +typedef float f32; > > +typedef double f64; > > It may be that they can be used interchangeably in practice on > popular platforms, but are these guaranteed to be equivalent by some > standard? C only cares about the minimum required range and > precision, so you may have allocated enough bytes thinking you can > fit a f32 but your float may not fit there. > > Or does Rust care only about platforms with IEEE 754 and would > refuse to port to other exotic architectures so the above worries > would not apply? If the typedefs in compat/rust_types.h are incorrect for a platform/target then compat/posix.h or compat/rust_types.h should be updated rather than relying on Rust's core::ffi's guess as what it is. On the Rust side: core::ffi assumes that a C 'float' is the same as f32 and that a C 'double' is an f64[1,2,3]. So I am making the same assumption that the Rust maintainers are _and_ I'm keeping ambiguity on the C side rather than eroding Rust's type precision. If a platform does not follow these assumptions then the compat/rust_types.h should warn those using that platform or fail to build entirely. [1] https://doc.rust-lang.org/1.89.0/core/ffi/type.c_float.html [2] https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#80-81 [3] https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#34-35

gitgitgadget-git · 2025-09-09T09:19:59Z

xdiff/xprepare.c

 	long flags;
 } xdlclassifier_t;




On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Ezekiel Newren <ezekielnewren@gmail.com> > > Move xdl_prepare_env() later in the file to avoid the need > for static forward declarations. > > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > --- > xdiff/xprepare.c | 116 ++++++++++++++++++++--------------------------- > 1 file changed, 50 insertions(+), 66 deletions(-) > > diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c > index e1d4017b2d..a45c5ee208 100644 > --- a/xdiff/xprepare.c > +++ b/xdiff/xprepare.c > @@ -53,21 +53,6 @@ typedef struct s_xdlclassifier { > > > > -static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags); > -static void xdl_free_classifier(xdlclassifier_t *cf); > -static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t **rhash, > - unsigned int hbits, xrecord_t *rec); > -static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_t const *xpp, > - xdlclassifier_t *cf, xdfile_t *xdf); > -static void xdl_free_ctx(xdfile_t *xdf); > -static int xdl_clean_mmatch(char const *dis, long i, long s, long e); > -static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2); > -static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2); > -static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2); > - > - > - > - > static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags) { > cf->flags = flags; > > @@ -242,57 +227,6 @@ static void xdl_free_ctx(xdfile_t *xdf) { > } > > > -int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > - xdfenv_t *xe) { > - long enl1, enl2, sample; > - xdlclassifier_t cf; > - > - memset(&cf, 0, sizeof(cf)); > - > - /* > - * For histogram diff, we can afford a smaller sample size and > - * thus a poorer estimate of the number of lines, as the hash > - * table (rhash) won't be filled up/grown. The number of lines > - * (nrecs) will be updated correctly anyway by > - * xdl_prepare_ctx(). > - */ > - sample = (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF > - ? XDL_GUESS_NLINES2 : XDL_GUESS_NLINES1); > - > - enl1 = xdl_guess_lines(mf1, sample) + 1; > - enl2 = xdl_guess_lines(mf2, sample) + 1; > - > - if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) > - return -1; > - > - if (xdl_prepare_ctx(1, mf1, enl1, xpp, &cf, &xe->xdf1) < 0) { > - > - xdl_free_classifier(&cf); > - return -1; > - } > - if (xdl_prepare_ctx(2, mf2, enl2, xpp, &cf, &xe->xdf2) < 0) { > - > - xdl_free_ctx(&xe->xdf1); > - xdl_free_classifier(&cf); > - return -1; > - } > - > - if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) && > - (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF) && > - xdl_optimize_ctxs(&cf, &xe->xdf1, &xe->xdf2) < 0) { > - > - xdl_free_ctx(&xe->xdf2); > - xdl_free_ctx(&xe->xdf1); > - xdl_free_classifier(&cf); > - return -1; > - } > - > - xdl_free_classifier(&cf); > - > - return 0; > -} > - > - > void xdl_free_env(xdfenv_t *xe) { > > xdl_free_ctx(&xe->xdf2); > @@ -460,3 +394,53 @@ static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2 > > return 0; > } > + > +int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > + xdfenv_t *xe) { > + long enl1, enl2, sample; > + xdlclassifier_t cf; > + > + memset(&cf, 0, sizeof(cf)); > + > + /* > + * For histogram diff, we can afford a smaller sample size and > + * thus a poorer estimate of the number of lines, as the hash > + * table (rhash) won't be filled up/grown. The number of lines > + * (nrecs) will be updated correctly anyway by > + * xdl_prepare_ctx(). > + */ > + sample = (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF > + ? XDL_GUESS_NLINES2 : XDL_GUESS_NLINES1); > + > + enl1 = xdl_guess_lines(mf1, sample) + 1; > + enl2 = xdl_guess_lines(mf2, sample) + 1; > + > + if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) > + return -1; > + > + if (xdl_prepare_ctx(1, mf1, enl1, xpp, &cf, &xe->xdf1) < 0) { > + > + xdl_free_classifier(&cf); > + return -1; > + } > + if (xdl_prepare_ctx(2, mf2, enl2, xpp, &cf, &xe->xdf2) < 0) { > + > + xdl_free_ctx(&xe->xdf1); > + xdl_free_classifier(&cf); > + return -1; > + } > + > + if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) && > + (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF) && > + xdl_optimize_ctxs(&cf, &xe->xdf1, &xe->xdf2) < 0) { > + > + xdl_free_ctx(&xe->xdf2); > + xdl_free_ctx(&xe->xdf1); > + xdl_free_classifier(&cf); > + return -1; > + } > + > + xdl_free_classifier(&cf); > + > + return 0; > +} > -- > gitgitgadget Viewing this with --color-moved makes it clear that the changes exactly match what you summarize in the commit message.

gitgitgadget-git · 2025-09-09T09:20:01Z

User Elijah Newren <newren@gmail.com> has been added to the cc: list.

gitgitgadget-git · 2025-09-09T09:20:24Z

xdiff/xprepare.c

 	rhash[hi] = rec;

 	return 0;
 }


On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:48 PM Ezekiel Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Ezekiel Newren <ezekielnewren@gmail.com> > > xdl_prepare_ctx() uses local variables and assigns them to the > corresponding xdfile_t fields if there are no errors. Delete them and > use the fields of xdfile_t directly. In particular, those local variables are essentially a hand-rolled additional implementation of xdl_free_ctx() inlined into xdl_prepare_ctx(). You're just modifying the code to use the existing xdl_free_ctx() function so we don't have two ways to free such variables (especially since one of those two was an ugly inlining). > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > --- > xdiff/xprepare.c | 79 +++++++++++++++++++----------------------------- > 1 file changed, 31 insertions(+), 48 deletions(-) > > diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c > index a45c5ee208..2ed1785b09 100644 > --- a/xdiff/xprepare.c > +++ b/xdiff/xprepare.c > @@ -134,99 +134,82 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t > } > > > +static void xdl_free_ctx(xdfile_t *xdf) > +{ > + unnecessary blank line here > + xdl_free(xdf->rhash); > + xdl_free(xdf->rindex); > + xdl_free(xdf->rchg - 1); > + xdl_free(xdf->ha); > + xdl_free(xdf->recs); > + xdl_cha_free(&xdf->rcha); > +} > + > + > static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_t const *xpp, > xdlclassifier_t *cf, xdfile_t *xdf) { > - unsigned int hbits; > - long nrec, hsize, bsize; > + long bsize; > unsigned long hav; > char const *blk, *cur, *top, *prev; > xrecord_t *crec; > - xrecord_t **recs; > - xrecord_t **rhash; > - unsigned long *ha; > - char *rchg; > - long *rindex; > > - ha = NULL; > - rindex = NULL; > - rchg = NULL; > - rhash = NULL; > - recs = NULL; > + xdf->ha = NULL; > + xdf->rindex = NULL; > + xdf->rchg = NULL; > + xdf->rhash = NULL; > + xdf->recs = NULL; > > if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0) > goto abort; > - if (!XDL_ALLOC_ARRAY(recs, narec)) > + if (!XDL_ALLOC_ARRAY(xdf->recs, narec)) > goto abort; > > - hbits = xdl_hashbits((unsigned int) narec); > - hsize = 1 << hbits; > - if (!XDL_CALLOC_ARRAY(rhash, hsize)) > + xdf->hbits = xdl_hashbits((unsigned int) narec); > + if (!XDL_CALLOC_ARRAY(xdf->rhash, 1 << xdf->hbits)) > goto abort; > > - nrec = 0; > + xdf->nrec = 0; > if ((cur = blk = xdl_mmfile_first(mf, &bsize))) { > for (top = blk + bsize; cur < top; ) { > prev = cur; > hav = xdl_hash_record(&cur, top, xpp->flags); > - if (XDL_ALLOC_GROW(recs, nrec + 1, narec)) > + if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec)) > goto abort; > if (!(crec = xdl_cha_alloc(&xdf->rcha))) > goto abort; > crec->ptr = prev; > crec->size = (long) (cur - prev); > crec->ha = hav; > - recs[nrec++] = crec; > - if (xdl_classify_record(pass, cf, rhash, hbits, crec) < 0) > + xdf->recs[xdf->nrec++] = crec; > + if (xdl_classify_record(pass, cf, xdf->rhash, xdf->hbits, crec) < 0) > goto abort; > } > } > > - if (!XDL_CALLOC_ARRAY(rchg, nrec + 2)) > + if (!XDL_CALLOC_ARRAY(xdf->rchg, xdf->nrec + 2)) > goto abort; > > if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) && > (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF)) { > - if (!XDL_ALLOC_ARRAY(rindex, nrec + 1)) > + if (!XDL_ALLOC_ARRAY(xdf->rindex, xdf->nrec + 1)) > goto abort; > - if (!XDL_ALLOC_ARRAY(ha, nrec + 1)) > + if (!XDL_ALLOC_ARRAY(xdf->ha, xdf->nrec + 1)) > goto abort; > } > > - xdf->nrec = nrec; > - xdf->recs = recs; > - xdf->hbits = hbits; > - xdf->rhash = rhash; > - xdf->rchg = rchg + 1; > - xdf->rindex = rindex; > + xdf->rchg += 1; > xdf->nreff = 0; > - xdf->ha = ha; > xdf->dstart = 0; > - xdf->dend = nrec - 1; > + xdf->dend = xdf->nrec - 1; > > return 0; > > abort: > - xdl_free(ha); > - xdl_free(rindex); > - xdl_free(rchg); > - xdl_free(rhash); > - xdl_free(recs); > - xdl_cha_free(&xdf->rcha); > + xdl_free_ctx(xdf); > return -1; > } > > > -static void xdl_free_ctx(xdfile_t *xdf) { > - > - xdl_free(xdf->rhash); > - xdl_free(xdf->rindex); > - xdl_free(xdf->rchg - 1); > - xdl_free(xdf->ha); > - xdl_free(xdf->recs); > - xdl_cha_free(&xdf->rcha); > -} > - > - > void xdl_free_env(xdfenv_t *xe) { > > xdl_free_ctx(&xe->xdf2); > -- > gitgitgadget Looks good.

gitgitgadget-git · 2025-09-09T09:20:57Z

xdiff/xprepare.c

 	xdl_free(cf->rcrecs);
 	xdl_free(cf->rchash);
 	xdl_cha_free(&cf->ncha);
 }


On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Ezekiel Newren <ezekielnewren@gmail.com> > > xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized, > but never used for anything by the code. Remove them. > > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > --- > xdiff/xprepare.c | 15 ++------------- > xdiff/xtypes.h | 3 --- > 2 files changed, 2 insertions(+), 16 deletions(-) > > diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c > index 2ed1785b09..91b0ed54e0 100644 > --- a/xdiff/xprepare.c > +++ b/xdiff/xprepare.c > @@ -91,8 +91,7 @@ static void xdl_free_classifier(xdlclassifier_t *cf) { > } > > > -static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t **rhash, > - unsigned int hbits, xrecord_t *rec) { > +static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec) { > long hi; > char const *line; > xdlclass_t *rcrec; > @@ -126,10 +125,6 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t > > rec->ha = (unsigned long) rcrec->idx; > > - hi = (long) XDL_HASHLONG(rec->ha, hbits); > - rec->next = rhash[hi]; > - rhash[hi] = rec; > - > return 0; > } > > @@ -137,7 +132,6 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t > static void xdl_free_ctx(xdfile_t *xdf) > { > > - xdl_free(xdf->rhash); > xdl_free(xdf->rindex); > xdl_free(xdf->rchg - 1); > xdl_free(xdf->ha); > @@ -156,7 +150,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ > xdf->ha = NULL; > xdf->rindex = NULL; > xdf->rchg = NULL; > - xdf->rhash = NULL; > xdf->recs = NULL; > > if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0) > @@ -164,10 +157,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ > if (!XDL_ALLOC_ARRAY(xdf->recs, narec)) > goto abort; > > - xdf->hbits = xdl_hashbits((unsigned int) narec); > - if (!XDL_CALLOC_ARRAY(xdf->rhash, 1 << xdf->hbits)) > - goto abort; > - > xdf->nrec = 0; > if ((cur = blk = xdl_mmfile_first(mf, &bsize))) { > for (top = blk + bsize; cur < top; ) { > @@ -181,7 +170,7 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ > crec->size = (long) (cur - prev); > crec->ha = hav; > xdf->recs[xdf->nrec++] = crec; > - if (xdl_classify_record(pass, cf, xdf->rhash, xdf->hbits, crec) < 0) > + if (xdl_classify_record(pass, cf, crec) < 0) > goto abort; > } > } > diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h > index 8442bd436e..8b8467360e 100644 > --- a/xdiff/xtypes.h > +++ b/xdiff/xtypes.h > @@ -39,7 +39,6 @@ typedef struct s_chastore { > } chastore_t; > > typedef struct s_xrecord { > - struct s_xrecord *next; > char const *ptr; > long size; > unsigned long ha; > @@ -48,8 +47,6 @@ typedef struct s_xrecord { > typedef struct s_xdfile { > chastore_t rcha; > long nrec; > - unsigned int hbits; > - xrecord_t **rhash; > long dstart, dend; > xrecord_t **recs; > char *rchg; > -- > gitgitgadget Always nice to see unused fields get removed.

gitgitgadget-git · 2025-09-09T09:21:40Z

xdiff/xemit.c

 *  Davide Libenzi <davidel@xmailserver.org>
 *
 */



On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Ezekiel Newren <ezekielnewren@gmail.com> > > This function aliases the fields of xrecord_t, which makes it harder > to track the usages of those fields. Delete it. > > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > --- > xdiff/xemit.c | 38 +++++++++++++------------------------- > 1 file changed, 13 insertions(+), 25 deletions(-) > > diff --git a/xdiff/xemit.c b/xdiff/xemit.c > index 1d40c9cb40..2161ac3cd0 100644 > --- a/xdiff/xemit.c > +++ b/xdiff/xemit.c > @@ -22,23 +22,13 @@ > > #include "xinclude.h" > > -static long xdl_get_rec(xdfile_t *xdf, long ri, char const **rec) { > Can we remove this line too, to simplify the diff? (i.e. make there only be one blank line between the include of xinclude.h and xdl_emit_record? > - *rec = xdf->recs[ri]->ptr; > - > - return xdf->recs[ri]->size; > -} > - > - > -static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb) { > - long size, psize = strlen(pre); > - char const *rec; > - > - size = xdl_get_rec(xdf, ri, &rec); > - if (xdl_emit_diffrec(rec, size, pre, psize, ecb) < 0) { > +static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb) > +{ The change of the opening curly brace to be on a new line does match our general coding guidelines, but cleanups like this should be in a separate patch. > + xrecord_t *rec = xdf->recs[ri]; > > + if (xdl_emit_diffrec(rec->ptr, rec->size, pre, strlen(pre), ecb) < 0) > return -1; > - } While in this case you were modifying the line in question and thus fixing the code to also not use curly braces around a single statement, which is more justified, it still makes the patch slightly harder for reviewers to read since you are doing multiple things (what you said in the commit message, plus cleaning up style "violations"). It'd be better to leave the existing style violations in place, or fix them in a separate patch. > > return 0; > } > @@ -120,11 +110,11 @@ static long def_ff(const char *rec, long len, char *buf, long sz) > static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri, > char *buf, long sz) > { > - const char *rec; > - long len = xdl_get_rec(xdf, ri, &rec); > + xrecord_t *rec = xdf->recs[ri]; > + > if (!xecfg->find_func) > - return def_ff(rec, len, buf, sz); > - return xecfg->find_func(rec, len, buf, sz, xecfg->find_func_priv); > + return def_ff(rec->ptr, rec->size, buf, sz); > + return xecfg->find_func(rec->ptr, rec->size, buf, sz, xecfg->find_func_priv); > } > > static int is_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri) > @@ -160,14 +150,12 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg, > > static int is_empty_rec(xdfile_t *xdf, long ri) > { > - const char *rec; > - long len = xdl_get_rec(xdf, ri, &rec); > + xrecord_t *rec = xdf->recs[ri]; > + long i = 0; > > - while (len > 0 && XDL_ISSPACE(*rec)) { > - rec++; > - len--; > - } > - return !len; > + for (; i < rec->size && XDL_ISSPACE(rec->ptr[i]); i++); > + > + return i == rec->size; > } I agree that the code is easier to follow without the aliasing.

gitgitgadget-git · 2025-09-09T09:22:30Z

xdiff/xdiffi.c


 /*
 * Rule: "Divide et Impera" (divide & conquer). Recursively split the box in
 * sub-boxes by calling the box splitting function. Note that the real job


On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Ezekiel Newren <ezekielnewren@gmail.com> > > Every field in this struct is an alias for a certain field in xdfile_t. > > diffdata_t.nrec -> xdfile_t.nreff > diffdata_t.ha -> xdfile_t.ha > diffdata_t.rindex -> xdfile_t.rindex > diffdata_t.rchg -> xdfile_t.recharge > > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > --- > xdiff/xdiffi.c | 32 ++++++++------------------------ > xdiff/xdiffi.h | 11 ++--------- > 2 files changed, 10 insertions(+), 33 deletions(-) > > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index 5a96e36dfb..bbf0161f84 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -257,10 +257,10 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1, > * sub-boxes by calling the box splitting function. Note that the real job > * (marking changed lines) is done in the two boundary reaching checks. > */ > -int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1, > - diffdata_t *dd2, long off2, long lim2, > +int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1, > + xdfile_t *xdf2, long off2, long lim2, > long *kvdf, long *kvdb, int need_min, xdalgoenv_t *xenv) { > - unsigned long const *ha1 = dd1->ha, *ha2 = dd2->ha; > + unsigned long const *ha1 = xdf1->ha, *ha2 = xdf2->ha; > > /* > * Shrink the box by walking through each diagonal snake (SW and NE). > @@ -273,17 +273,11 @@ int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1, > * be obviously changed. > */ > if (off1 == lim1) { > - char *rchg2 = dd2->rchg; > - long *rindex2 = dd2->rindex; > - > for (; off2 < lim2; off2++) > - rchg2[rindex2[off2]] = 1; > + xdf2->rchg[xdf2->rindex[off2]] = 1; > } else if (off2 == lim2) { > - char *rchg1 = dd1->rchg; > - long *rindex1 = dd1->rindex; > - > for (; off1 < lim1; off1++) > - rchg1[rindex1[off1]] = 1; > + xdf1->rchg[xdf1->rindex[off1]] = 1; > } else { > xdpsplit_t spl; > spl.i1 = spl.i2 = 0; > @@ -300,9 +294,9 @@ int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1, > /* > * ... et Impera. > */ > - if (xdl_recs_cmp(dd1, off1, spl.i1, dd2, off2, spl.i2, > + if (xdl_recs_cmp(xdf1, off1, spl.i1, xdf2, off2, spl.i2, > kvdf, kvdb, spl.min_lo, xenv) < 0 || > - xdl_recs_cmp(dd1, spl.i1, lim1, dd2, spl.i2, lim2, > + xdl_recs_cmp(xdf1, spl.i1, lim1, xdf2, spl.i2, lim2, > kvdf, kvdb, spl.min_hi, xenv) < 0) { > > return -1; > @@ -318,7 +312,6 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > long ndiags; > long *kvd, *kvdf, *kvdb; > xdalgoenv_t xenv; > - diffdata_t dd1, dd2; > int res; > > if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0) > @@ -357,16 +350,7 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > xenv.snake_cnt = XDL_SNAKE_CNT; > xenv.heur_min = XDL_HEUR_MIN_COST; > > - dd1.nrec = xe->xdf1.nreff; > - dd1.ha = xe->xdf1.ha; > - dd1.rchg = xe->xdf1.rchg; > - dd1.rindex = xe->xdf1.rindex; > - dd2.nrec = xe->xdf2.nreff; > - dd2.ha = xe->xdf2.ha; > - dd2.rchg = xe->xdf2.rchg; > - dd2.rindex = xe->xdf2.rindex; > - > - res = xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec, > + res = xdl_recs_cmp(&xe->xdf1, 0, xe->xdf1.nreff, &xe->xdf2, 0, xe->xdf2.nreff, > kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0, > &xenv); > xdl_free(kvd); > diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h > index 126c9d8ff4..49e52c67f9 100644 > --- a/xdiff/xdiffi.h > +++ b/xdiff/xdiffi.h > @@ -24,13 +24,6 @@ > #define XDIFFI_H > > > -typedef struct s_diffdata { > - long nrec; > - unsigned long const *ha; > - long *rindex; > - char *rchg; > -} diffdata_t; > - > typedef struct s_xdalgoenv { > long mxcost; > long snake_cnt; > @@ -46,8 +39,8 @@ typedef struct s_xdchange { > > > > -int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1, > - diffdata_t *dd2, long off2, long lim2, > +int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1, > + xdfile_t *xdf2, long off2, long lim2, > long *kvdf, long *kvdb, int need_min, xdalgoenv_t *xenv); > int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > xdfenv_t *xe); > -- > gitgitgadget Viewing this commit with --color-moved helps highlight in the code what you say in the commit message. Makes sense.

gitgitgadget-git · 2025-09-09T09:23:28Z

xdiff/xdiffi.c

 *  Davide Libenzi <davidel@xmailserver.org>
 *
 */



On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Ezekiel Newren <ezekielnewren@gmail.com> > > When 0 <= i < xdfile_t.nreff the following is true: > xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]] I like getting rid of redundant stuff. One thing to note here is that you're replacing a single indirection with two... > > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > --- > xdiff/xdiffi.c | 24 ++++++++++++++---------- > xdiff/xprepare.c | 12 ++---------- > xdiff/xtypes.h | 1 - > 3 files changed, 16 insertions(+), 21 deletions(-) > > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index bbf0161f84..11cd090b53 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -22,6 +22,11 @@ > > #include "xinclude.h" > > +static unsigned long get_hash(xdfile_t *xdf, long index) > +{ > + return xdf->recs[xdf->rindex[index]]->ha; > +} > + > #define XDL_MAX_COST_MIN 256 > #define XDL_HEUR_MIN_COST 256 > #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1) > @@ -42,8 +47,8 @@ typedef struct s_xdpsplit { > * using this algorithm, so a little bit of heuristic is needed to cut the > * search and to return a suboptimal point. > */ > -static long xdl_split(unsigned long const *ha1, long off1, long lim1, > - unsigned long const *ha2, long off2, long lim2, > +static long xdl_split(xdfile_t *xdf1, long off1, long lim1, > + xdfile_t *xdf2, long off2, long lim2, > long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl, > xdalgoenv_t *xenv) { > long dmin = off1 - lim2, dmax = lim1 - off2; > @@ -87,7 +92,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1, > i1 = kvdf[d + 1]; > prev1 = i1; > i2 = i1 - d; > - for (; i1 < lim1 && i2 < lim2 && ha1[i1] == ha2[i2]; i1++, i2++); > + for (; i1 < lim1 && i2 < lim2 && get_hash(xdf1, i1) == get_hash(xdf2, i2); i1++, i2++); You're not going to be happy with me asking, so sorry in advance, but I'm really curious...we are now replacing a single indirection with a double-indirection inside a for loop, which is nested within two other for loops. Three levels of for-loops to me suggests it might be a hot codepath. Does this double indirection in these codepaths affect performance? > if (i1 - prev1 > xenv->snake_cnt) > got_snake = 1; > kvdf[d] = i1; > @@ -124,7 +129,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1, > i1 = kvdb[d + 1] - 1; > prev1 = i1; > i2 = i1 - d; > - for (; i1 > off1 && i2 > off2 && ha1[i1 - 1] == ha2[i2 - 1]; i1--, i2--); > + for (; i1 > off1 && i2 > off2 && get_hash(xdf1, i1 - 1) == get_hash(xdf2, i2 - 1); i1--, i2--); > if (prev1 - i1 > xenv->snake_cnt) > got_snake = 1; > kvdb[d] = i1; > @@ -159,7 +164,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1, > if (v > XDL_K_HEUR * ec && v > best && > off1 + xenv->snake_cnt <= i1 && i1 < lim1 && > off2 + xenv->snake_cnt <= i2 && i2 < lim2) { > - for (k = 1; ha1[i1 - k] == ha2[i2 - k]; k++) > + for (k = 1; get_hash(xdf1, i1 - k) == get_hash(xdf2, i2 - k); k++) > if (k == xenv->snake_cnt) { > best = v; > spl->i1 = i1; > @@ -183,7 +188,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1, > if (v > XDL_K_HEUR * ec && v > best && > off1 < i1 && i1 <= lim1 - xenv->snake_cnt && > off2 < i2 && i2 <= lim2 - xenv->snake_cnt) { > - for (k = 0; ha1[i1 + k] == ha2[i2 + k]; k++) > + for (k = 0; get_hash(xdf1, i1 + k) == get_hash(xdf2, i2 + k); k++) > if (k == xenv->snake_cnt - 1) { > best = v; > spl->i1 = i1; > @@ -260,13 +265,12 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1, > int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1, > xdfile_t *xdf2, long off2, long lim2, > long *kvdf, long *kvdb, int need_min, xdalgoenv_t *xenv) { > - unsigned long const *ha1 = xdf1->ha, *ha2 = xdf2->ha; > > /* > * Shrink the box by walking through each diagonal snake (SW and NE). > */ > - for (; off1 < lim1 && off2 < lim2 && ha1[off1] == ha2[off2]; off1++, off2++); > - for (; off1 < lim1 && off2 < lim2 && ha1[lim1 - 1] == ha2[lim2 - 1]; lim1--, lim2--); > + for (; off1 < lim1 && off2 < lim2 && get_hash(xdf1, off1) == get_hash(xdf2, off2); off1++, off2++); > + for (; off1 < lim1 && off2 < lim2 && get_hash(xdf1, lim1 - 1) == get_hash(xdf2, lim2 - 1); lim1--, lim2--); > > /* > * If one dimension is empty, then all records on the other one must > @@ -285,7 +289,7 @@ int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1, > /* > * Divide ... > */ > - if (xdl_split(ha1, off1, lim1, ha2, off2, lim2, kvdf, kvdb, > + if (xdl_split(xdf1, off1, lim1, xdf2, off2, lim2, kvdf, kvdb, > need_min, &spl, xenv) < 0) { > > return -1; > diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c > index 91b0ed54e0..59730989a3 100644 > --- a/xdiff/xprepare.c > +++ b/xdiff/xprepare.c > @@ -134,7 +134,6 @@ static void xdl_free_ctx(xdfile_t *xdf) > > xdl_free(xdf->rindex); > xdl_free(xdf->rchg - 1); > - xdl_free(xdf->ha); > xdl_free(xdf->recs); > xdl_cha_free(&xdf->rcha); > } > @@ -147,7 +146,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ > char const *blk, *cur, *top, *prev; > xrecord_t *crec; > > - xdf->ha = NULL; > xdf->rindex = NULL; > xdf->rchg = NULL; > xdf->recs = NULL; > @@ -182,8 +180,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ > (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF)) { > if (!XDL_ALLOC_ARRAY(xdf->rindex, xdf->nrec + 1)) > goto abort; > - if (!XDL_ALLOC_ARRAY(xdf->ha, xdf->nrec + 1)) > - goto abort; > } > > xdf->rchg += 1; > @@ -301,9 +297,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd > i <= xdf1->dend; i++, recs++) { > if (dis1[i] == 1 || > (dis1[i] == 2 && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) { > - xdf1->rindex[nreff] = i; > - xdf1->ha[nreff] = (*recs)->ha; > - nreff++; > + xdf1->rindex[nreff++] = i; > } else > xdf1->rchg[i] = 1; > } > @@ -313,9 +307,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd > i <= xdf2->dend; i++, recs++) { > if (dis2[i] == 1 || > (dis2[i] == 2 && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) { > - xdf2->rindex[nreff] = i; > - xdf2->ha[nreff] = (*recs)->ha; > - nreff++; > + xdf2->rindex[nreff++] = i; > } else > xdf2->rchg[i] = 1; > } > diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h > index 8b8467360e..85848f1685 100644 > --- a/xdiff/xtypes.h > +++ b/xdiff/xtypes.h > @@ -52,7 +52,6 @@ typedef struct s_xdfile { > char *rchg; > long *rindex; > long nreff; > - unsigned long *ha; > } xdfile_t; > > typedef struct s_xdfenv { > -- > gitgitgadget Other than the performance question, it looks like you've made a straightforward mechanical change as highlighted in the commit message.

gitgitgadget-git · 2025-09-09T09:24:36Z

xdiff/xprepare.c

 #define XDL_GUESS_NLINES1 256
 #define XDL_GUESS_NLINES2 20




On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Ezekiel Newren <ezekielnewren@gmail.com> > > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > --- > xdiff/xprepare.c | 16 ++++++---------- > 1 file changed, 6 insertions(+), 10 deletions(-) > > diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c > index 59730989a3..6f1d4b4725 100644 > --- a/xdiff/xprepare.c > +++ b/xdiff/xprepare.c > @@ -32,9 +32,7 @@ > > typedef struct s_xdlclass { > struct s_xdlclass *next; > - unsigned long ha; > - char const *line; > - long size; > + xrecord_t rec; > long idx; > long len1, len2; > } xdlclass_t; > @@ -93,14 +91,12 @@ static void xdl_free_classifier(xdlclassifier_t *cf) { > > static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec) { > long hi; > - char const *line; > xdlclass_t *rcrec; > > - line = rec->ptr; > hi = (long) XDL_HASHLONG(rec->ha, cf->hbits); > for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next) > - if (rcrec->ha == rec->ha && > - xdl_recmatch(rcrec->line, rcrec->size, > + if (rcrec->rec.ha == rec->ha && > + xdl_recmatch(rcrec->rec.ptr, rcrec->rec.size, > rec->ptr, rec->size, cf->flags)) > break; > > @@ -113,9 +109,9 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t > if (XDL_ALLOC_GROW(cf->rcrecs, cf->count, cf->alloc)) > return -1; > cf->rcrecs[rcrec->idx] = rcrec; > - rcrec->line = line; > - rcrec->size = rec->size; > - rcrec->ha = rec->ha; > + rcrec->rec.ptr = rec->ptr; > + rcrec->rec.size = rec->size; > + rcrec->rec.ha = rec->ha; > rcrec->len1 = rcrec->len2 = 0; > rcrec->next = cf->rchash[hi]; > cf->rchash[hi] = rcrec; > -- > gitgitgadget I can see the changes match the one-line summary. And I think the point is simplification or reducing redundancy or something...but could a single sentence motivation (stating which of these purposes is at play) be added to the commit message?

gitgitgadget-git · 2025-09-09T09:25:53Z

xdiff/xdiffi.c

 */

 #include "xinclude.h"



On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Ezekiel Newren <ezekielnewren@gmail.com> My personal bias is that things like "view with --color-words" makes more sense to include near the end of the commit message, just before the sign-offs. Not sure if others agree on that. > The chastore_t type is very unfriendly to Rust FFI. It's also redundant > since 'recs' is a vector type that grows every time an xrecord_t is > added. The second sentence seems to presume the reader knows what chastore_t type is for, and about the confusing dual layering between it and recs.its confusing dual layering. I liked your more extended explanation in https://lore.kernel.org/git/7ea2dccd71fc502f20614ce217fc9885d1b17413.1756496539.git.gitgitgadget@gmail.com/; could some of that be used here? > > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > --- > xdiff/xdiffi.c | 24 ++++++++++---------- > xdiff/xemit.c | 6 ++--- > xdiff/xhistogram.c | 2 +- > xdiff/xmerge.c | 56 +++++++++++++++++++++++----------------------- > xdiff/xpatience.c | 10 ++++----- > xdiff/xprepare.c | 19 ++++++---------- > xdiff/xtypes.h | 3 +-- > xdiff/xutils.c | 12 +++++----- > 8 files changed, 63 insertions(+), 69 deletions(-) > > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index 11cd090b53..a66125d44a 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -24,7 +24,7 @@ > > static unsigned long get_hash(xdfile_t *xdf, long index) > { > - return xdf->recs[xdf->rindex[index]]->ha; > + return xdf->recs[xdf->rindex[index]].ha; > } > > #define XDL_MAX_COST_MIN 256 > @@ -489,13 +489,13 @@ static void measure_split(const xdfile_t *xdf, long split, > m->indent = -1; > } else { > m->end_of_file = 0; > - m->indent = get_indent(xdf->recs[split]); > + m->indent = get_indent(&xdf->recs[split]); > } > > m->pre_blank = 0; > m->pre_indent = -1; > for (i = split - 1; i >= 0; i--) { > - m->pre_indent = get_indent(xdf->recs[i]); > + m->pre_indent = get_indent(&xdf->recs[i]); > if (m->pre_indent != -1) > break; > m->pre_blank += 1; > @@ -508,7 +508,7 @@ static void measure_split(const xdfile_t *xdf, long split, > m->post_blank = 0; > m->post_indent = -1; > for (i = split + 1; i < xdf->nrec; i++) { > - m->post_indent = get_indent(xdf->recs[i]); > + m->post_indent = get_indent(&xdf->recs[i]); > if (m->post_indent != -1) > break; > m->post_blank += 1; > @@ -752,7 +752,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g) > static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g) > { > if (g->end < xdf->nrec && > - recs_match(xdf->recs[g->start], xdf->recs[g->end])) { > + recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) { > xdf->rchg[g->start++] = 0; > xdf->rchg[g->end++] = 1; > > @@ -773,7 +773,7 @@ static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g) > static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g) > { > if (g->start > 0 && > - recs_match(xdf->recs[g->start - 1], xdf->recs[g->end - 1])) { > + recs_match(&xdf->recs[g->start - 1], &xdf->recs[g->end - 1])) { > xdf->rchg[--g->start] = 1; > xdf->rchg[--g->end] = 0; > > @@ -988,16 +988,16 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags) > > for (xch = xscr; xch; xch = xch->next) { > int ignore = 1; > - xrecord_t **rec; > + xrecord_t *rec; > long i; > > rec = &xe->xdf1.recs[xch->i1]; > for (i = 0; i < xch->chg1 && ignore; i++) > - ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags); > + ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags); > > rec = &xe->xdf2.recs[xch->i2]; > for (i = 0; i < xch->chg2 && ignore; i++) > - ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags); > + ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags); > > xch->ignore = ignore; > } > @@ -1021,7 +1021,7 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, > xdchange_t *xch; > > for (xch = xscr; xch; xch = xch->next) { > - xrecord_t **rec; > + xrecord_t *rec; > int ignore = 1; > long i; > > @@ -1033,11 +1033,11 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, > > rec = &xe->xdf1.recs[xch->i1]; > for (i = 0; i < xch->chg1 && ignore; i++) > - ignore = record_matches_regex(rec[i], xpp); > + ignore = record_matches_regex(&rec[i], xpp); > > rec = &xe->xdf2.recs[xch->i2]; > for (i = 0; i < xch->chg2 && ignore; i++) > - ignore = record_matches_regex(rec[i], xpp); > + ignore = record_matches_regex(&rec[i], xpp); > > xch->ignore = ignore; > } > diff --git a/xdiff/xemit.c b/xdiff/xemit.c > index 2161ac3cd0..b2f1f30cd3 100644 > --- a/xdiff/xemit.c > +++ b/xdiff/xemit.c > @@ -25,7 +25,7 @@ > > static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb) > { > - xrecord_t *rec = xdf->recs[ri]; > + xrecord_t *rec = &xdf->recs[ri]; > > if (xdl_emit_diffrec(rec->ptr, rec->size, pre, strlen(pre), ecb) < 0) > return -1; > @@ -110,7 +110,7 @@ static long def_ff(const char *rec, long len, char *buf, long sz) > static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri, > char *buf, long sz) > { > - xrecord_t *rec = xdf->recs[ri]; > + xrecord_t *rec = &xdf->recs[ri]; > > if (!xecfg->find_func) > return def_ff(rec->ptr, rec->size, buf, sz); > @@ -150,7 +150,7 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg, > > static int is_empty_rec(xdfile_t *xdf, long ri) > { > - xrecord_t *rec = xdf->recs[ri]; > + xrecord_t *rec = &xdf->recs[ri]; > long i = 0; > > for (; i < rec->size && XDL_ISSPACE(rec->ptr[i]); i++); > diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c > index 040d81e0bc..4d857e8ae2 100644 > --- a/xdiff/xhistogram.c > +++ b/xdiff/xhistogram.c > @@ -86,7 +86,7 @@ struct region { > ((LINE_MAP(index, ptr))->cnt) > > #define REC(env, s, l) \ > - (env->xdf##s.recs[l - 1]) > + (&env->xdf##s.recs[l - 1]) > > static int cmp_recs(xrecord_t *r1, xrecord_t *r2) > { > diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c > index af40c88a5b..fd600cbb5d 100644 > --- a/xdiff/xmerge.c > +++ b/xdiff/xmerge.c > @@ -97,12 +97,12 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2, > int line_count, long flags) > { > int i; > - xrecord_t **rec1 = xe1->xdf2.recs + i1; > - xrecord_t **rec2 = xe2->xdf2.recs + i2; > + xrecord_t *rec1 = xe1->xdf2.recs + i1; > + xrecord_t *rec2 = xe2->xdf2.recs + i2; > > for (i = 0; i < line_count; i++) { > - int result = xdl_recmatch(rec1[i]->ptr, rec1[i]->size, > - rec2[i]->ptr, rec2[i]->size, flags); > + int result = xdl_recmatch(rec1[i].ptr, rec1[i].size, > + rec2[i].ptr, rec2[i].size, flags); > if (!result) > return -1; > } > @@ -111,7 +111,7 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2, > > static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int needs_cr, int add_nl, char *dest) > { > - xrecord_t **recs; > + xrecord_t *recs; > int size = 0; > > recs = (use_orig ? xe->xdf1.recs : xe->xdf2.recs) + i; > @@ -119,12 +119,12 @@ static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int nee > if (count < 1) > return 0; > > - for (i = 0; i < count; size += recs[i++]->size) > + for (i = 0; i < count; size += recs[i++].size) > if (dest) > - memcpy(dest + size, recs[i]->ptr, recs[i]->size); > + memcpy(dest + size, recs[i].ptr, recs[i].size); > if (add_nl) { > - i = recs[count - 1]->size; > - if (i == 0 || recs[count - 1]->ptr[i - 1] != '\n') { > + i = recs[count - 1].size; > + if (i == 0 || recs[count - 1].ptr[i - 1] != '\n') { > if (needs_cr) { > if (dest) > dest[size] = '\r'; > @@ -160,22 +160,22 @@ static int is_eol_crlf(xdfile_t *file, int i) > > if (i < file->nrec - 1) > /* All lines before the last *must* end in LF */ > - return (size = file->recs[i]->size) > 1 && > - file->recs[i]->ptr[size - 2] == '\r'; > + return (size = file->recs[i].size) > 1 && > + file->recs[i].ptr[size - 2] == '\r'; > if (!file->nrec) > /* Cannot determine eol style from empty file */ > return -1; > - if ((size = file->recs[i]->size) && > - file->recs[i]->ptr[size - 1] == '\n') > + if ((size = file->recs[i].size) && > + file->recs[i].ptr[size - 1] == '\n') > /* Last line; ends in LF; Is it CR/LF? */ > return size > 1 && > - file->recs[i]->ptr[size - 2] == '\r'; > + file->recs[i].ptr[size - 2] == '\r'; > if (!i) > /* The only line has no eol */ > return -1; > /* Determine eol from second-to-last line */ > - return (size = file->recs[i - 1]->size) > 1 && > - file->recs[i - 1]->ptr[size - 2] == '\r'; > + return (size = file->recs[i - 1].size) > 1 && > + file->recs[i - 1].ptr[size - 2] == '\r'; > } > > static int is_cr_needed(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m) > @@ -334,22 +334,22 @@ static int recmatch(xrecord_t *rec1, xrecord_t *rec2, unsigned long flags) > static void xdl_refine_zdiff3_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m, > xpparam_t const *xpp) > { > - xrecord_t **rec1 = xe1->xdf2.recs, **rec2 = xe2->xdf2.recs; > + xrecord_t *rec1 = xe1->xdf2.recs, *rec2 = xe2->xdf2.recs; > for (; m; m = m->next) { > /* let's handle just the conflicts */ > if (m->mode) > continue; > > while(m->chg1 && m->chg2 && > - recmatch(rec1[m->i1], rec2[m->i2], xpp->flags)) { > + recmatch(&rec1[m->i1], &rec2[m->i2], xpp->flags)) { > m->chg1--; > m->chg2--; > m->i1++; > m->i2++; > } > while (m->chg1 && m->chg2 && > - recmatch(rec1[m->i1 + m->chg1 - 1], > - rec2[m->i2 + m->chg2 - 1], xpp->flags)) { > + recmatch(&rec1[m->i1 + m->chg1 - 1], > + &rec2[m->i2 + m->chg2 - 1], xpp->flags)) { > m->chg1--; > m->chg2--; > } > @@ -381,12 +381,12 @@ static int xdl_refine_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m, > * This probably does not work outside git, since > * we have a very simple mmfile structure. > */ > - t1.ptr = (char *)xe1->xdf2.recs[m->i1]->ptr; > - t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1]->ptr > - + xe1->xdf2.recs[m->i1 + m->chg1 - 1]->size - t1.ptr; > - t2.ptr = (char *)xe2->xdf2.recs[m->i2]->ptr; > - t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1]->ptr > - + xe2->xdf2.recs[m->i2 + m->chg2 - 1]->size - t2.ptr; > + t1.ptr = (char *)xe1->xdf2.recs[m->i1].ptr; > + t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1].ptr > + + xe1->xdf2.recs[m->i1 + m->chg1 - 1].size - t1.ptr; > + t2.ptr = (char *)xe2->xdf2.recs[m->i2].ptr; > + t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1].ptr > + + xe2->xdf2.recs[m->i2 + m->chg2 - 1].size - t2.ptr; > if (xdl_do_diff(&t1, &t2, xpp, &xe) < 0) > return -1; > if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 || > @@ -440,8 +440,8 @@ static int line_contains_alnum(const char *ptr, long size) > static int lines_contain_alnum(xdfenv_t *xe, int i, int chg) > { > for (; chg; chg--, i++) > - if (line_contains_alnum(xe->xdf2.recs[i]->ptr, > - xe->xdf2.recs[i]->size)) > + if (line_contains_alnum(xe->xdf2.recs[i].ptr, > + xe->xdf2.recs[i].size)) > return 1; > return 0; > } > diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c > index 77dc411d19..bf69a58527 100644 > --- a/xdiff/xpatience.c > +++ b/xdiff/xpatience.c > @@ -88,9 +88,9 @@ static int is_anchor(xpparam_t const *xpp, const char *line) > static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map, > int pass) > { > - xrecord_t **records = pass == 1 ? > + xrecord_t *records = pass == 1 ? > map->env->xdf1.recs : map->env->xdf2.recs; > - xrecord_t *record = records[line - 1]; > + xrecord_t *record = &records[line - 1]; > /* > * After xdl_prepare_env() (or more precisely, due to > * xdl_classify_record()), the "ha" member of the records (AKA lines) > @@ -121,7 +121,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map, > return; > map->entries[index].line1 = line; > map->entries[index].hash = record->ha; > - map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1]->ptr); > + map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1].ptr); > if (!map->first) > map->first = map->entries + index; > if (map->last) { > @@ -246,8 +246,8 @@ static int find_longest_common_sequence(struct hashmap *map, struct entry **res) > > static int match(struct hashmap *map, int line1, int line2) > { > - xrecord_t *record1 = map->env->xdf1.recs[line1 - 1]; > - xrecord_t *record2 = map->env->xdf2.recs[line2 - 1]; > + xrecord_t *record1 = &map->env->xdf1.recs[line1 - 1]; > + xrecord_t *record2 = &map->env->xdf2.recs[line2 - 1]; > return record1->ha == record2->ha; > } > > diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c > index 6f1d4b4725..92f9845003 100644 > --- a/xdiff/xprepare.c > +++ b/xdiff/xprepare.c > @@ -131,7 +131,6 @@ static void xdl_free_ctx(xdfile_t *xdf) > xdl_free(xdf->rindex); > xdl_free(xdf->rchg - 1); > xdl_free(xdf->recs); > - xdl_cha_free(&xdf->rcha); > } > > > @@ -146,8 +145,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ > xdf->rchg = NULL; > xdf->recs = NULL; > > - if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0) > - goto abort; > if (!XDL_ALLOC_ARRAY(xdf->recs, narec)) > goto abort; > > @@ -158,12 +155,10 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ > hav = xdl_hash_record(&cur, top, xpp->flags); > if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec)) > goto abort; > - if (!(crec = xdl_cha_alloc(&xdf->rcha))) > - goto abort; > + crec = &xdf->recs[xdf->nrec++]; > crec->ptr = prev; > crec->size = (long) (cur - prev); > crec->ha = hav; > - xdf->recs[xdf->nrec++] = crec; > if (xdl_classify_record(pass, cf, crec) < 0) > goto abort; > } > @@ -263,7 +258,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) { > */ > static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) { > long i, nm, nreff, mlim; > - xrecord_t **recs; > + xrecord_t *recs; > xdlclass_t *rcrec; > char *dis, *dis1, *dis2; > int need_min = !!(cf->flags & XDF_NEED_MINIMAL); > @@ -276,7 +271,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd > if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT) > mlim = XDL_MAX_EQLIMIT; > for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) { > - rcrec = cf->rcrecs[(*recs)->ha]; > + rcrec = cf->rcrecs[recs->ha]; > nm = rcrec ? rcrec->len2 : 0; > dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1; > } > @@ -284,7 +279,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd > if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT) > mlim = XDL_MAX_EQLIMIT; > for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) { > - rcrec = cf->rcrecs[(*recs)->ha]; > + rcrec = cf->rcrecs[recs->ha]; > nm = rcrec ? rcrec->len1 : 0; > dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1; > } > @@ -320,13 +315,13 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd > */ > static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) { > long i, lim; > - xrecord_t **recs1, **recs2; > + xrecord_t *recs1, *recs2; > > recs1 = xdf1->recs; > recs2 = xdf2->recs; > for (i = 0, lim = XDL_MIN(xdf1->nrec, xdf2->nrec); i < lim; > i++, recs1++, recs2++) > - if ((*recs1)->ha != (*recs2)->ha) > + if (recs1->ha != recs2->ha) > break; > > xdf1->dstart = xdf2->dstart = i; > @@ -334,7 +329,7 @@ static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) { > recs1 = xdf1->recs + xdf1->nrec - 1; > recs2 = xdf2->recs + xdf2->nrec - 1; > for (lim -= i, i = 0; i < lim; i++, recs1--, recs2--) > - if ((*recs1)->ha != (*recs2)->ha) > + if (recs1->ha != recs2->ha) > break; > > xdf1->dend = xdf1->nrec - i - 1; > diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h > index 85848f1685..3d26cbf1ec 100644 > --- a/xdiff/xtypes.h > +++ b/xdiff/xtypes.h > @@ -45,10 +45,9 @@ typedef struct s_xrecord { > } xrecord_t; > > typedef struct s_xdfile { > - chastore_t rcha; > + xrecord_t *recs; > long nrec; > long dstart, dend; > - xrecord_t **recs; > char *rchg; > long *rindex; > long nreff; > diff --git a/xdiff/xutils.c b/xdiff/xutils.c > index 444a108f87..332982b509 100644 > --- a/xdiff/xutils.c > +++ b/xdiff/xutils.c > @@ -416,12 +416,12 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp, > mmfile_t subfile1, subfile2; > xdfenv_t env; > > - subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr; > - subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr + > - diff_env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr; > - subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr; > - subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr + > - diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr; > + subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1].ptr; > + subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2].ptr + > + diff_env->xdf1.recs[line1 + count1 - 2].size - subfile1.ptr; > + subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1].ptr; > + subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2].ptr + > + diff_env->xdf2.recs[line2 + count2 - 2].size - subfile2.ptr; > if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0) > return -1; > > -- > gitgitgadget You weren't kidding with the --color-words callout; there's an awful lot of places where you only change one or two characters (e.g. '->' becoming '.'); that's much easier to see when viewing the diff with that flag. Anyway, looks good.

On the Git mailing list, Phillip Wood wrote (reply to this):

On 09/09/2025 09:58, Elijah Newren wrote: > On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Ezekiel Newren <ezekielnewren@gmail.com> >> The chastore_t type is very unfriendly to Rust FFI. It's also redundant >> since 'recs' is a vector type that grows every time an xrecord_t is >> added. > > The second sentence seems to presume the reader knows what chastore_t > type is for, and about the confusing dual layering between it and > recs.its confusing dual layering. I liked your more extended > explanation in https://lore.kernel.org/git/7ea2dccd71fc502f20614ce217fc9885d1b17413.1756496539.git.gitgitgadget@gmail.com/; > could some of that be used here? I agree that's a better explaination. I also think it would be helpful to spell out the implications of this change. If I understand the change correctly we now store all the records in a contiguous array, rather than having the records in a arena and storing a separate array of pointers to those records. As sizeof(xrecord_t) is pretty small the change to contiguous storage hopefully wont cause any allocation issues, though I guess it does mean we end up copying more data as we grow the array compared to using an arena. Overall these first few patches look like a really nice cleanup. Thanks Phillip >> >> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> >> --- >> xdiff/xdiffi.c | 24 ++++++++++---------- >> xdiff/xemit.c | 6 ++--- >> xdiff/xhistogram.c | 2 +- >> xdiff/xmerge.c | 56 +++++++++++++++++++++++----------------------- >> xdiff/xpatience.c | 10 ++++----- >> xdiff/xprepare.c | 19 ++++++---------- >> xdiff/xtypes.h | 3 +-- >> xdiff/xutils.c | 12 +++++----- >> 8 files changed, 63 insertions(+), 69 deletions(-) >> >> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c >> index 11cd090b53..a66125d44a 100644 >> --- a/xdiff/xdiffi.c >> +++ b/xdiff/xdiffi.c >> @@ -24,7 +24,7 @@ >> >> static unsigned long get_hash(xdfile_t *xdf, long index) >> { >> - return xdf->recs[xdf->rindex[index]]->ha; >> + return xdf->recs[xdf->rindex[index]].ha; >> } >> >> #define XDL_MAX_COST_MIN 256 >> @@ -489,13 +489,13 @@ static void measure_split(const xdfile_t *xdf, long split, >> m->indent = -1; >> } else { >> m->end_of_file = 0; >> - m->indent = get_indent(xdf->recs[split]); >> + m->indent = get_indent(&xdf->recs[split]); >> } >> >> m->pre_blank = 0; >> m->pre_indent = -1; >> for (i = split - 1; i >= 0; i--) { >> - m->pre_indent = get_indent(xdf->recs[i]); >> + m->pre_indent = get_indent(&xdf->recs[i]); >> if (m->pre_indent != -1) >> break; >> m->pre_blank += 1; >> @@ -508,7 +508,7 @@ static void measure_split(const xdfile_t *xdf, long split, >> m->post_blank = 0; >> m->post_indent = -1; >> for (i = split + 1; i < xdf->nrec; i++) { >> - m->post_indent = get_indent(xdf->recs[i]); >> + m->post_indent = get_indent(&xdf->recs[i]); >> if (m->post_indent != -1) >> break; >> m->post_blank += 1; >> @@ -752,7 +752,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g) >> static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g) >> { >> if (g->end < xdf->nrec && >> - recs_match(xdf->recs[g->start], xdf->recs[g->end])) { >> + recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) { >> xdf->rchg[g->start++] = 0; >> xdf->rchg[g->end++] = 1; >> >> @@ -773,7 +773,7 @@ static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g) >> static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g) >> { >> if (g->start > 0 && >> - recs_match(xdf->recs[g->start - 1], xdf->recs[g->end - 1])) { >> + recs_match(&xdf->recs[g->start - 1], &xdf->recs[g->end - 1])) { >> xdf->rchg[--g->start] = 1; >> xdf->rchg[--g->end] = 0; >> >> @@ -988,16 +988,16 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags) >> >> for (xch = xscr; xch; xch = xch->next) { >> int ignore = 1; >> - xrecord_t **rec; >> + xrecord_t *rec; >> long i; >> >> rec = &xe->xdf1.recs[xch->i1]; >> for (i = 0; i < xch->chg1 && ignore; i++) >> - ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags); >> + ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags); >> >> rec = &xe->xdf2.recs[xch->i2]; >> for (i = 0; i < xch->chg2 && ignore; i++) >> - ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags); >> + ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags); >> >> xch->ignore = ignore; >> } >> @@ -1021,7 +1021,7 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, >> xdchange_t *xch; >> >> for (xch = xscr; xch; xch = xch->next) { >> - xrecord_t **rec; >> + xrecord_t *rec; >> int ignore = 1; >> long i; >> >> @@ -1033,11 +1033,11 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, >> >> rec = &xe->xdf1.recs[xch->i1]; >> for (i = 0; i < xch->chg1 && ignore; i++) >> - ignore = record_matches_regex(rec[i], xpp); >> + ignore = record_matches_regex(&rec[i], xpp); >> >> rec = &xe->xdf2.recs[xch->i2]; >> for (i = 0; i < xch->chg2 && ignore; i++) >> - ignore = record_matches_regex(rec[i], xpp); >> + ignore = record_matches_regex(&rec[i], xpp); >> >> xch->ignore = ignore; >> } >> diff --git a/xdiff/xemit.c b/xdiff/xemit.c >> index 2161ac3cd0..b2f1f30cd3 100644 >> --- a/xdiff/xemit.c >> +++ b/xdiff/xemit.c >> @@ -25,7 +25,7 @@ >> >> static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb) >> { >> - xrecord_t *rec = xdf->recs[ri]; >> + xrecord_t *rec = &xdf->recs[ri]; >> >> if (xdl_emit_diffrec(rec->ptr, rec->size, pre, strlen(pre), ecb) < 0) >> return -1; >> @@ -110,7 +110,7 @@ static long def_ff(const char *rec, long len, char *buf, long sz) >> static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri, >> char *buf, long sz) >> { >> - xrecord_t *rec = xdf->recs[ri]; >> + xrecord_t *rec = &xdf->recs[ri]; >> >> if (!xecfg->find_func) >> return def_ff(rec->ptr, rec->size, buf, sz); >> @@ -150,7 +150,7 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg, >> >> static int is_empty_rec(xdfile_t *xdf, long ri) >> { >> - xrecord_t *rec = xdf->recs[ri]; >> + xrecord_t *rec = &xdf->recs[ri]; >> long i = 0; >> >> for (; i < rec->size && XDL_ISSPACE(rec->ptr[i]); i++); >> diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c >> index 040d81e0bc..4d857e8ae2 100644 >> --- a/xdiff/xhistogram.c >> +++ b/xdiff/xhistogram.c >> @@ -86,7 +86,7 @@ struct region { >> ((LINE_MAP(index, ptr))->cnt) >> >> #define REC(env, s, l) \ >> - (env->xdf##s.recs[l - 1]) >> + (&env->xdf##s.recs[l - 1]) >> >> static int cmp_recs(xrecord_t *r1, xrecord_t *r2) >> { >> diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c >> index af40c88a5b..fd600cbb5d 100644 >> --- a/xdiff/xmerge.c >> +++ b/xdiff/xmerge.c >> @@ -97,12 +97,12 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2, >> int line_count, long flags) >> { >> int i; >> - xrecord_t **rec1 = xe1->xdf2.recs + i1; >> - xrecord_t **rec2 = xe2->xdf2.recs + i2; >> + xrecord_t *rec1 = xe1->xdf2.recs + i1; >> + xrecord_t *rec2 = xe2->xdf2.recs + i2; >> >> for (i = 0; i < line_count; i++) { >> - int result = xdl_recmatch(rec1[i]->ptr, rec1[i]->size, >> - rec2[i]->ptr, rec2[i]->size, flags); >> + int result = xdl_recmatch(rec1[i].ptr, rec1[i].size, >> + rec2[i].ptr, rec2[i].size, flags); >> if (!result) >> return -1; >> } >> @@ -111,7 +111,7 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2, >> >> static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int needs_cr, int add_nl, char *dest) >> { >> - xrecord_t **recs; >> + xrecord_t *recs; >> int size = 0; >> >> recs = (use_orig ? xe->xdf1.recs : xe->xdf2.recs) + i; >> @@ -119,12 +119,12 @@ static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int nee >> if (count < 1) >> return 0; >> >> - for (i = 0; i < count; size += recs[i++]->size) >> + for (i = 0; i < count; size += recs[i++].size) >> if (dest) >> - memcpy(dest + size, recs[i]->ptr, recs[i]->size); >> + memcpy(dest + size, recs[i].ptr, recs[i].size); >> if (add_nl) { >> - i = recs[count - 1]->size; >> - if (i == 0 || recs[count - 1]->ptr[i - 1] != '\n') { >> + i = recs[count - 1].size; >> + if (i == 0 || recs[count - 1].ptr[i - 1] != '\n') { >> if (needs_cr) { >> if (dest) >> dest[size] = '\r'; >> @@ -160,22 +160,22 @@ static int is_eol_crlf(xdfile_t *file, int i) >> >> if (i < file->nrec - 1) >> /* All lines before the last *must* end in LF */ >> - return (size = file->recs[i]->size) > 1 && >> - file->recs[i]->ptr[size - 2] == '\r'; >> + return (size = file->recs[i].size) > 1 && >> + file->recs[i].ptr[size - 2] == '\r'; >> if (!file->nrec) >> /* Cannot determine eol style from empty file */ >> return -1; >> - if ((size = file->recs[i]->size) && >> - file->recs[i]->ptr[size - 1] == '\n') >> + if ((size = file->recs[i].size) && >> + file->recs[i].ptr[size - 1] == '\n') >> /* Last line; ends in LF; Is it CR/LF? */ >> return size > 1 && >> - file->recs[i]->ptr[size - 2] == '\r'; >> + file->recs[i].ptr[size - 2] == '\r'; >> if (!i) >> /* The only line has no eol */ >> return -1; >> /* Determine eol from second-to-last line */ >> - return (size = file->recs[i - 1]->size) > 1 && >> - file->recs[i - 1]->ptr[size - 2] == '\r'; >> + return (size = file->recs[i - 1].size) > 1 && >> + file->recs[i - 1].ptr[size - 2] == '\r'; >> } >> >> static int is_cr_needed(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m) >> @@ -334,22 +334,22 @@ static int recmatch(xrecord_t *rec1, xrecord_t *rec2, unsigned long flags) >> static void xdl_refine_zdiff3_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m, >> xpparam_t const *xpp) >> { >> - xrecord_t **rec1 = xe1->xdf2.recs, **rec2 = xe2->xdf2.recs; >> + xrecord_t *rec1 = xe1->xdf2.recs, *rec2 = xe2->xdf2.recs; >> for (; m; m = m->next) { >> /* let's handle just the conflicts */ >> if (m->mode) >> continue; >> >> while(m->chg1 && m->chg2 && >> - recmatch(rec1[m->i1], rec2[m->i2], xpp->flags)) { >> + recmatch(&rec1[m->i1], &rec2[m->i2], xpp->flags)) { >> m->chg1--; >> m->chg2--; >> m->i1++; >> m->i2++; >> } >> while (m->chg1 && m->chg2 && >> - recmatch(rec1[m->i1 + m->chg1 - 1], >> - rec2[m->i2 + m->chg2 - 1], xpp->flags)) { >> + recmatch(&rec1[m->i1 + m->chg1 - 1], >> + &rec2[m->i2 + m->chg2 - 1], xpp->flags)) { >> m->chg1--; >> m->chg2--; >> } >> @@ -381,12 +381,12 @@ static int xdl_refine_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m, >> * This probably does not work outside git, since >> * we have a very simple mmfile structure. >> */ >> - t1.ptr = (char *)xe1->xdf2.recs[m->i1]->ptr; >> - t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1]->ptr >> - + xe1->xdf2.recs[m->i1 + m->chg1 - 1]->size - t1.ptr; >> - t2.ptr = (char *)xe2->xdf2.recs[m->i2]->ptr; >> - t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1]->ptr >> - + xe2->xdf2.recs[m->i2 + m->chg2 - 1]->size - t2.ptr; >> + t1.ptr = (char *)xe1->xdf2.recs[m->i1].ptr; >> + t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1].ptr >> + + xe1->xdf2.recs[m->i1 + m->chg1 - 1].size - t1.ptr; >> + t2.ptr = (char *)xe2->xdf2.recs[m->i2].ptr; >> + t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1].ptr >> + + xe2->xdf2.recs[m->i2 + m->chg2 - 1].size - t2.ptr; >> if (xdl_do_diff(&t1, &t2, xpp, &xe) < 0) >> return -1; >> if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 || >> @@ -440,8 +440,8 @@ static int line_contains_alnum(const char *ptr, long size) >> static int lines_contain_alnum(xdfenv_t *xe, int i, int chg) >> { >> for (; chg; chg--, i++) >> - if (line_contains_alnum(xe->xdf2.recs[i]->ptr, >> - xe->xdf2.recs[i]->size)) >> + if (line_contains_alnum(xe->xdf2.recs[i].ptr, >> + xe->xdf2.recs[i].size)) >> return 1; >> return 0; >> } >> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c >> index 77dc411d19..bf69a58527 100644 >> --- a/xdiff/xpatience.c >> +++ b/xdiff/xpatience.c >> @@ -88,9 +88,9 @@ static int is_anchor(xpparam_t const *xpp, const char *line) >> static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map, >> int pass) >> { >> - xrecord_t **records = pass == 1 ? >> + xrecord_t *records = pass == 1 ? >> map->env->xdf1.recs : map->env->xdf2.recs; >> - xrecord_t *record = records[line - 1]; >> + xrecord_t *record = &records[line - 1]; >> /* >> * After xdl_prepare_env() (or more precisely, due to >> * xdl_classify_record()), the "ha" member of the records (AKA lines) >> @@ -121,7 +121,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map, >> return; >> map->entries[index].line1 = line; >> map->entries[index].hash = record->ha; >> - map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1]->ptr); >> + map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1].ptr); >> if (!map->first) >> map->first = map->entries + index; >> if (map->last) { >> @@ -246,8 +246,8 @@ static int find_longest_common_sequence(struct hashmap *map, struct entry **res) >> >> static int match(struct hashmap *map, int line1, int line2) >> { >> - xrecord_t *record1 = map->env->xdf1.recs[line1 - 1]; >> - xrecord_t *record2 = map->env->xdf2.recs[line2 - 1]; >> + xrecord_t *record1 = &map->env->xdf1.recs[line1 - 1]; >> + xrecord_t *record2 = &map->env->xdf2.recs[line2 - 1]; >> return record1->ha == record2->ha; >> } >> >> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c >> index 6f1d4b4725..92f9845003 100644 >> --- a/xdiff/xprepare.c >> +++ b/xdiff/xprepare.c >> @@ -131,7 +131,6 @@ static void xdl_free_ctx(xdfile_t *xdf) >> xdl_free(xdf->rindex); >> xdl_free(xdf->rchg - 1); >> xdl_free(xdf->recs); >> - xdl_cha_free(&xdf->rcha); >> } >> >> >> @@ -146,8 +145,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ >> xdf->rchg = NULL; >> xdf->recs = NULL; >> >> - if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0) >> - goto abort; >> if (!XDL_ALLOC_ARRAY(xdf->recs, narec)) >> goto abort; >> >> @@ -158,12 +155,10 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ >> hav = xdl_hash_record(&cur, top, xpp->flags); >> if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec)) >> goto abort; >> - if (!(crec = xdl_cha_alloc(&xdf->rcha))) >> - goto abort; >> + crec = &xdf->recs[xdf->nrec++]; >> crec->ptr = prev; >> crec->size = (long) (cur - prev); >> crec->ha = hav; >> - xdf->recs[xdf->nrec++] = crec; >> if (xdl_classify_record(pass, cf, crec) < 0) >> goto abort; >> } >> @@ -263,7 +258,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) { >> */ >> static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) { >> long i, nm, nreff, mlim; >> - xrecord_t **recs; >> + xrecord_t *recs; >> xdlclass_t *rcrec; >> char *dis, *dis1, *dis2; >> int need_min = !!(cf->flags & XDF_NEED_MINIMAL); >> @@ -276,7 +271,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd >> if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT) >> mlim = XDL_MAX_EQLIMIT; >> for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) { >> - rcrec = cf->rcrecs[(*recs)->ha]; >> + rcrec = cf->rcrecs[recs->ha]; >> nm = rcrec ? rcrec->len2 : 0; >> dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1; >> } >> @@ -284,7 +279,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd >> if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT) >> mlim = XDL_MAX_EQLIMIT; >> for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) { >> - rcrec = cf->rcrecs[(*recs)->ha]; >> + rcrec = cf->rcrecs[recs->ha]; >> nm = rcrec ? rcrec->len1 : 0; >> dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1; >> } >> @@ -320,13 +315,13 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd >> */ >> static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) { >> long i, lim; >> - xrecord_t **recs1, **recs2; >> + xrecord_t *recs1, *recs2; >> >> recs1 = xdf1->recs; >> recs2 = xdf2->recs; >> for (i = 0, lim = XDL_MIN(xdf1->nrec, xdf2->nrec); i < lim; >> i++, recs1++, recs2++) >> - if ((*recs1)->ha != (*recs2)->ha) >> + if (recs1->ha != recs2->ha) >> break; >> >> xdf1->dstart = xdf2->dstart = i; >> @@ -334,7 +329,7 @@ static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) { >> recs1 = xdf1->recs + xdf1->nrec - 1; >> recs2 = xdf2->recs + xdf2->nrec - 1; >> for (lim -= i, i = 0; i < lim; i++, recs1--, recs2--) >> - if ((*recs1)->ha != (*recs2)->ha) >> + if (recs1->ha != recs2->ha) >> break; >> >> xdf1->dend = xdf1->nrec - i - 1; >> diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h >> index 85848f1685..3d26cbf1ec 100644 >> --- a/xdiff/xtypes.h >> +++ b/xdiff/xtypes.h >> @@ -45,10 +45,9 @@ typedef struct s_xrecord { >> } xrecord_t; >> >> typedef struct s_xdfile { >> - chastore_t rcha; >> + xrecord_t *recs; >> long nrec; >> long dstart, dend; >> - xrecord_t **recs; >> char *rchg; >> long *rindex; >> long nreff; >> diff --git a/xdiff/xutils.c b/xdiff/xutils.c >> index 444a108f87..332982b509 100644 >> --- a/xdiff/xutils.c >> +++ b/xdiff/xutils.c >> @@ -416,12 +416,12 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp, >> mmfile_t subfile1, subfile2; >> xdfenv_t env; >> >> - subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr; >> - subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr + >> - diff_env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr; >> - subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr; >> - subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr + >> - diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr; >> + subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1].ptr; >> + subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2].ptr + >> + diff_env->xdf1.recs[line1 + count1 - 2].size - subfile1.ptr; >> + subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1].ptr; >> + subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2].ptr + >> + diff_env->xdf2.recs[line2 + count2 - 2].size - subfile2.ptr; >> if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0) >> return -1; >> >> -- >> gitgitgadget > > You weren't kidding with the --color-words callout; there's an awful > lot of places where you only change one or two characters (e.g. '->' > becoming '.'); that's much easier to see when viewing the diff with > that flag. > > Anyway, looks good. >

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <newren@gmail.com> writes: > On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Ezekiel Newren <ezekielnewren@gmail.com> > > My personal bias is that things like "view with --color-words" makes > more sense to include near the end of the commit message, just before > the sign-offs. Not sure if others agree on that. FWIW, I am with you. Certainly not on the title. It is even fine immediately below the three-dash line before the diffstat. Thanks. I am enjoying to follow along the series by reading your reviews.

On the Git mailing list, Ben Knoble wrote (reply to this):

> Le 9 sept. 2025 à 04:58, Elijah Newren <newren@gmail.com> a écrit : > > On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Ezekiel Newren <ezekielnewren@gmail.com> > > My personal bias is that things like "view with --color-words" makes > more sense to include near the end of the commit message, just before > the sign-offs. Not sure if others agree on that. I’ve been using a Best-viewed-with trailer; I saw Peff do something similar once, I think.

gitgitgadget-git · 2025-09-09T09:27:20Z

xdiff/xdiff.h

 #define XDIFF_H

 #ifdef __cplusplus
 extern "C" {


On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Ezekiel Newren <ezekielnewren@gmail.com> > > Define macros NO(0), YES(1), MAYBE(2) as the enum values for rchg to > make the code easier to follow. Perhaps 'rchg' should be renamed to > 'changed'? > > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > --- > xdiff/xdiff.h | 4 ++++ > xdiff/xdiffi.c | 29 ++++++++++++++--------------- > xdiff/xhistogram.c | 8 ++++---- > xdiff/xpatience.c | 8 ++++---- > xdiff/xprepare.c | 24 ++++++++++++------------ > 5 files changed, 38 insertions(+), 35 deletions(-) > > diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h > index 2cecde5afe..7092879829 100644 > --- a/xdiff/xdiff.h > +++ b/xdiff/xdiff.h > @@ -27,6 +27,10 @@ > extern "C" { > #endif /* #ifdef __cplusplus */ > > +#define NO 0 > +#define YES 1 > +#define MAYBE 2 > + > /* xpparm_t.flags */ > #define XDF_NEED_MINIMAL (1 << 0) > > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index a66125d44a..44fd27823a 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -278,10 +278,10 @@ int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1, > */ > if (off1 == lim1) { > for (; off2 < lim2; off2++) > - xdf2->rchg[xdf2->rindex[off2]] = 1; > + xdf2->rchg[xdf2->rindex[off2]] = YES; > } else if (off2 == lim2) { > for (; off1 < lim1; off1++) > - xdf1->rchg[xdf1->rindex[off1]] = 1; > + xdf1->rchg[xdf1->rindex[off1]] = YES; > } else { > xdpsplit_t spl; > spl.i1 = spl.i2 = 0; > @@ -708,7 +708,7 @@ struct xdlgroup { > static void group_init(xdfile_t *xdf, struct xdlgroup *g) > { > g->start = g->end = 0; > - while (xdf->rchg[g->end]) > + while (xdf->rchg[g->end] == YES) You've got a few places like this where the old code would have behaved differently if there were some MAYBE values. I presume you've carefully vetted that those can't happen at these points in the code, but it might be worth calling that out in the commit message for reviewers who'll otherwise wonder if there's a behavior change that has occurred. > g->end++; > } > > @@ -722,7 +722,7 @@ static inline int group_next(xdfile_t *xdf, struct xdlgroup *g) > return -1; > > g->start = g->end + 1; > - for (g->end = g->start; xdf->rchg[g->end]; g->end++) > + for (g->end = g->start; xdf->rchg[g->end] == YES; g->end++) Here's another of those where you assume MAYBE isn't possible. > ; > > return 0; > @@ -738,7 +738,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g) > return -1; > > g->end = g->start - 1; > - for (g->start = g->end; xdf->rchg[g->start - 1]; g->start--) > + for (g->start = g->end; xdf->rchg[g->start - 1] == YES; g->start--) ...and another. > ; > > return 0; > @@ -753,10 +753,10 @@ static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g) > { > if (g->end < xdf->nrec && > recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) { > - xdf->rchg[g->start++] = 0; > - xdf->rchg[g->end++] = 1; > + xdf->rchg[g->start++] = NO; > + xdf->rchg[g->end++] = YES; > > - while (xdf->rchg[g->end]) > + while (xdf->rchg[g->end] == YES) ...and another. > g->end++; > > return 0; > @@ -774,10 +774,10 @@ static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g) > { > if (g->start > 0 && > recs_match(&xdf->recs[g->start - 1], &xdf->recs[g->end - 1])) { > - xdf->rchg[--g->start] = 1; > - xdf->rchg[--g->end] = 0; > + xdf->rchg[--g->start] = YES; > + xdf->rchg[--g->end] = NO; > > - while (xdf->rchg[g->start - 1]) > + while (xdf->rchg[g->start - 1] == YES) ...and another. > g->start--; > > return 0; > @@ -932,16 +932,15 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { > > int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) { > xdchange_t *cscr = NULL, *xch; > - char *rchg1 = xe->xdf1.rchg, *rchg2 = xe->xdf2.rchg; > long i1, i2, l1, l2; > > /* > * Trivial. Collects "groups" of changes and creates an edit script. > */ > for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--) > - if (rchg1[i1 - 1] || rchg2[i2 - 1]) { > - for (l1 = i1; rchg1[i1 - 1]; i1--); > - for (l2 = i2; rchg2[i2 - 1]; i2--); > + if (xe->xdf1.rchg[i1 - 1] || xe->xdf2.rchg[i2 - 1]) { > + for (l1 = i1; xe->xdf1.rchg[i1 - 1]; i1--); > + for (l2 = i2; xe->xdf2.rchg[i2 - 1]; i2--); The changes in this xdl_build_script() function appear to be orthogonal to what was described in the commit message. If it's a separate cleanup, perhaps justify it in a separate patch? > > if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) { > xdl_free_script(cscr); > diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c > index 4d857e8ae2..c2e85b8ab9 100644 > --- a/xdiff/xhistogram.c > +++ b/xdiff/xhistogram.c > @@ -318,11 +318,11 @@ redo: > > if (!count1) { > while(count2--) > - env->xdf2.rchg[line2++ - 1] = 1; > + env->xdf2.rchg[line2++ - 1] = YES; > return 0; > } else if (!count2) { > while(count1--) > - env->xdf1.rchg[line1++ - 1] = 1; > + env->xdf1.rchg[line1++ - 1] = YES; > return 0; > } > > @@ -335,9 +335,9 @@ redo: > else { > if (lcs.begin1 == 0 && lcs.begin2 == 0) { > while (count1--) > - env->xdf1.rchg[line1++ - 1] = 1; > + env->xdf1.rchg[line1++ - 1] = YES; > while (count2--) > - env->xdf2.rchg[line2++ - 1] = 1; > + env->xdf2.rchg[line2++ - 1] = YES; > result = 0; > } else { > result = histogram_diff(xpp, env, > diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c > index bf69a58527..20cda5e258 100644 > --- a/xdiff/xpatience.c > +++ b/xdiff/xpatience.c > @@ -331,11 +331,11 @@ static int patience_diff(xpparam_t const *xpp, xdfenv_t *env, > /* trivial case: one side is empty */ > if (!count1) { > while(count2--) > - env->xdf2.rchg[line2++ - 1] = 1; > + env->xdf2.rchg[line2++ - 1] = YES; > return 0; > } else if (!count2) { > while(count1--) > - env->xdf1.rchg[line1++ - 1] = 1; > + env->xdf1.rchg[line1++ - 1] = YES; > return 0; > } > > @@ -347,9 +347,9 @@ static int patience_diff(xpparam_t const *xpp, xdfenv_t *env, > /* are there any matching lines at all? */ > if (!map.has_matches) { > while(count1--) > - env->xdf1.rchg[line1++ - 1] = 1; > + env->xdf1.rchg[line1++ - 1] = YES; > while(count2--) > - env->xdf2.rchg[line2++ - 1] = 1; > + env->xdf2.rchg[line2++ - 1] = YES; > xdl_free(map.entries); > return 0; > } > diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c > index 92f9845003..36437f91bb 100644 > --- a/xdiff/xprepare.c > +++ b/xdiff/xprepare.c > @@ -215,9 +215,9 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) { > * current line (i) is already a multimatch line. > */ > for (r = 1, rdis0 = 0, rpdis0 = 1; (i - r) >= s; r++) { > - if (!dis[i - r]) > + if (dis[i - r] == NO) > rdis0++; > - else if (dis[i - r] == 2) > + else if (dis[i - r] == MAYBE) > rpdis0++; > else > break; > @@ -231,9 +231,9 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) { > if (rdis0 == 0) > return 0; > for (r = 1, rdis1 = 0, rpdis1 = 1; (i + r) <= e; r++) { > - if (!dis[i + r]) > + if (dis[i + r] == NO) > rdis1++; > - else if (dis[i + r] == 2) > + else if (dis[i + r] == MAYBE) > rpdis1++; > else > break; > @@ -273,7 +273,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd > for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) { > rcrec = cf->rcrecs[recs->ha]; > nm = rcrec ? rcrec->len2 : 0; > - dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1; > + dis1[i] = (nm == 0) ? NO: (nm >= mlim && !need_min) ? MAYBE: YES; > } > > if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT) > @@ -281,26 +281,26 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd > for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) { > rcrec = cf->rcrecs[recs->ha]; > nm = rcrec ? rcrec->len1 : 0; > - dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1; > + dis2[i] = (nm == 0) ? NO: (nm >= mlim && !need_min) ? MAYBE: YES; > } > > for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; > i <= xdf1->dend; i++, recs++) { > - if (dis1[i] == 1 || > - (dis1[i] == 2 && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) { > + if (dis1[i] == YES || > + (dis1[i] == MAYBE && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) { > xdf1->rindex[nreff++] = i; > } else > - xdf1->rchg[i] = 1; > + xdf1->rchg[i] = YES; > } > xdf1->nreff = nreff; > > for (nreff = 0, i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; > i <= xdf2->dend; i++, recs++) { > - if (dis2[i] == 1 || > - (dis2[i] == 2 && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) { > + if (dis2[i] == YES || > + (dis2[i] == MAYBE && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) { > xdf2->rindex[nreff++] = i; > } else > - xdf2->rchg[i] = 1; > + xdf2->rchg[i] = YES; > } > xdf2->nreff = nreff; > > -- > gitgitgadget Everything else looks like the straightforward translation you called out in your commit message.

gitgitgadget-git · 2025-09-09T13:53:13Z

User Phillip Wood <phillip.wood123@gmail.com> has been added to the cc: list.

Every field in this struct is an alias for a certain field in xdfile_t. diffdata_t.nrec -> xdfile_t.nreff diffdata_t.ha -> xdfile_t.ha diffdata_t.rindex -> xdfile_t.rindex diffdata_t.rchg -> xdfile_t.rchg I think this struct existed before xdfile_t, and was kept for backward compatibility reasons. I think xdiffi should have been refactored to use the new (xdfile_t) struct, but was easier to alias it instead. The local variables rchg* and rindex* don't shorten the lines by much, nor do they really need to be there to make the code more readable. Delete them. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>

When 0 <= i < xdfile_t.nreff the following is true: xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]] This makes the code about 5% slower. The fields rindex and ha are specific to the classic diff (myers and minimal). I plan on creating a struct for classic diff, but there's a lot of cleanup that needs to be done before that can happen and leaving ha in would make those cleanups harder to follow. A subsequent commit will delete the chastore cha from xdfile_t. That later commit will investigate deleting ha and cha independently and together. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>

…ord_t The fields from xdlclass_t are aliases of xrecord_t: xdlclass_t.line -> xrecord_t.ptr xdlclass_t.size -> xrecord_t.size xdlclass_t.ha -> xrecord_t.ha xdlclass_t carries a copy of the data in xrecord_t, but instead of embedding xrecord_t it duplicates the individual fields. A future commit will change the types used in xrecord_t so embed it in xdlclass_t first, so we don't have to remember to change the types here as well. Best-viewed-with: --color-words Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>

xdfile_t currently uses chastore_t which is an arena allocator. I think that xrecord_t used to be a linked list and recs didn't exist originally. When recs was added I think they forgot to remove xdfile_t.next, but was overlooked. This dual data structure setup makes the code somewhat confusing. Additionally the C type chastore_t isn't FFI friendly, and provides little to no performance benefit over using realloc to grow an array. Performance impact of deleting fields from xdfile_t: Deleting ha is about 5% slower. Deleting cha is about 5% faster. Delete ha, but keep cha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.269 s ± 0.017 s [User: 1.135 s, System: 0.128 s] Range (min … max): 1.249 s … 1.286 s 10 runs Benchmark 2: build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.339 s ± 0.017 s [User: 1.234 s, System: 0.099 s] Range (min … max): 1.320 s … 1.358 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.06 ± 0.02 times faster than build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete cha, but keep ha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.290 s ± 0.001 s [User: 1.154 s, System: 0.130 s] Range (min … max): 1.288 s … 1.292 s 10 runs Benchmark 2: build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.232 s ± 0.017 s [User: 1.105 s, System: 0.121 s] Range (min … max): 1.205 s … 1.249 s 10 runs Summary build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.05 ± 0.01 times faster than build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete ha AND chastore time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha_and_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.291 s ± 0.002 s [User: 1.156 s, System: 0.129 s] Range (min … max): 1.287 s … 1.295 s 10 runs Benchmark 2: build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.306 s ± 0.001 s [User: 1.195 s, System: 0.105 s] Range (min … max): 1.305 s … 1.308 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.01 ± 0.00 times faster than build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>

The field rchg (now 'changed') declares if a line in a file is changed or not. A later commit will change it's type from 'char' to 'bool' to make its purpose even more clear. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>

This commit is refactor-only; no behavior is changed. A future commit will use bool literals for changed[i]. The functions xdl_clean_mmatch() and xdl_cleanup_records() will be cleaned up more in a future patch series. The changes to xdl_cleanup_records(), in this patch, is just to make it clear why `char rchg` is refactored to `bool changed`. Rename dis* to action* and replace literal numericals with macros. The old names came from when dis* (which I think was short for discard) was treated like a boolean, but over time it grew into a ternary state machine. The result was confusing because dis* and rchg* both used 0/1 values with different meanings. The new names and macros make the states explicit. nm is short for number of matches, and mlim is a heuristic limit: nm == 0 -> action[i] = DISCARD -> changed[i] = true 0 < nm < mlim -> action[i] = KEEP -> changed[i] = false nm >= mlim -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch() When need_min is true, only DISCARD and KEEP occur because the limit is effectively infinite. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>

The only values possible for 'changed' is 1 and 0, which exactly maps to a bool type. It might not look like this because action1 and action2 (which use to be dis1, and dis2) were also of type char and were assigned numerical values within a few lines of 'changed' (what used to be rchg). Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false for changed[k] makes it clear to future readers that these are logically separate concepts. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>

ezekielnewren · 2025-09-26T22:40:56Z

/submit

gitgitgadget-git · 2025-09-26T22:43:17Z

Submitted as pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v6

To fetch this version to local tag pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v6:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-2048/ezekielnewren/use_rust_types_in_xdiff-v6

gitgitgadget-git · 2025-09-26T23:31:53Z

This patch series was integrated into seen via 989d856.

gitgitgadget-git · 2025-09-29T20:10:41Z

This patch series was integrated into seen via f2e5abc.

gitgitgadget-git · 2025-09-29T22:27:44Z

There was a status update in the "Cooking" section about the branch en/xdiff-cleanup on the Git mailing list:

A lot of code clean-up of xdiff.
Split out of a larger topic.

Will merge to 'next'?
source: <pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com>

gitgitgadget-git · 2025-09-29T22:45:42Z

This patch series was integrated into seen via a95b449.

gitgitgadget-git · 2025-09-30T13:48:54Z

xdiff/xemit.c

@@ -22,23 +22,14 @@



On the Git mailing list, "Kristoffer Haugsbakk" wrote (reply to this):

On Tue, Sep 23, 2025, at 23:24, Ezekiel Newren via GitGitGadget wrote: > From: Ezekiel Newren <ezekielnewren@gmail.com> > > When xrecord_t was a linked list, and recs didn't exist, I assume this > function walked the list until it found the right record. Accessing > a contiguous array is so trival that this function is now superfluous. s/trival/trivial/ > Delete it. > > Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> > ---

On the Git mailing list, Ezekiel Newren wrote (reply to this):

On Tue, Sep 30, 2025 at 7:31 AM Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com> wrote: > > When xrecord_t was a linked list, and recs didn't exist, I assume this > > function walked the list until it found the right record. Accessing > > a contiguous array is so trival that this function is now superfluous. > > s/trival/trivial/ I think that other than this typo this patch series is ready to be merged in. I would prefer that Junio fix this typo, so I don't spam the mailing list with such a small change.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ezekiel Newren <ezekielnewren@gmail.com> writes: > On Tue, Sep 30, 2025 at 7:31 AM Kristoffer Haugsbakk > <kristofferhaugsbakk@fastmail.com> wrote: >> > When xrecord_t was a linked list, and recs didn't exist, I assume this >> > function walked the list until it found the right record. Accessing >> > a contiguous array is so trival that this function is now superfluous. >> >> s/trival/trivial/ > > I think that other than this typo this patch series is ready to be > merged in. I would prefer that Junio fix this typo, so I don't spam > the mailing list with such a small change. If it is only this instance, I can "rebase -i" it away, sure. Thanks.

gitgitgadget-git · 2025-09-30T13:48:56Z

User "Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> has been added to the cc: list.

gitgitgadget-git · 2025-09-30T22:19:16Z

This patch series was integrated into seen via 8b3dcb6.

gitgitgadget-git · 2025-10-02T21:13:41Z

This patch series was integrated into seen via 9e98a08.

gitgitgadget-git · 2025-10-03T13:49:27Z

On the Git mailing list, Phillip Wood wrote (reply to this):

Hi Ezekiel

On 26/09/2025 23:41, Ezekiel Newren via GitGitGadget wrote:
> Changes since v5.
> >   * Address review feedback on commit messages.
>   * Drop commit "xdiff: delete rchg aliasing"
>   * Use DISCARD/KEEP/INVESTIGATE instead of NONE/SOME/TOO_MANY
>   * Fix the word wrapping in the comments of xprepare.c

Thanks for expanding the commit messages, I think the range diff looks good. There's a typo in patch 12 (see below) but its not worth re-rolling just for that.

> Range-diff vs v5:
> [...] >   12:  08a0fceb72 ! 11:  f08782a977 xdiff: use enum macros NONE(0), SOME(1), TOO_MANY(2) in xprepare.c
>       @@ Metadata
>        Author: Ezekiel Newren <ezekielnewren@gmail.com>
>        >         ## Commit message ##
>       -    xdiff: use enum macros NONE(0), SOME(1), TOO_MANY(2) in xprepare.c
>       +    xdiff: add macros DISCARD(0), KEEP(1), INVESTIGATE(2) in xprepare.c
>        >       -    Rename dis1, dis2 to matches1, matches2.
>       +    This commit is refactor-only; no behavior is changed. A future commit
>       +    will use bool literals for changed[i].
>        >       -    Define macros NONE(0), SOME(1), TOO_MANY(2) as the enum values for
>       -    matches1 and matches2. These states will influence whether changed[i]
>       -    is set to 1 or kept as 0.
>       +    The functions xdl_clean_mmatch() and xdl_cleanup_records() will be
>       +    cleaned up more in a future patch series. The changes to
>       +    xdl_cleanup_records(), in this patch, is just to make it clear why

Not worth a re-roll on its own s/is/are/

Thanks for working on this

Phillip

>       +    `char rchg` is refactored to `bool changed`.
>       +
>       +    Rename dis* to action* and replace literal numericals with macros.
>       +    The old names came from when dis* (which I think was short for discard)
>       +    was treated like a boolean, but over time it grew into a ternary state
>       +    machine. The result was confusing because dis* and rchg* both used 0/1
>       +    values with different meanings.
>       +
>       +    The new names and macros make the states explicit. nm is short for
>       +    number of matches, and mlim is a heuristic limit:
>       +
>       +      nm == 0       -> action[i] = DISCARD     -> changed[i] = true
>       +      0 < nm < mlim -> action[i] = KEEP        -> changed[i] = false
>       +      nm >= mlim    -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch()
>       +
>       +    When need_min is true, only DISCARD and KEEP occur because the limit
>       +    is effectively infinite.
>        >            Best-viewed-with: --color-words
>            Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
>       @@ xdiff/xprepare.c
>         #define XDL_GUESS_NLINES1 256
>         #define XDL_GUESS_NLINES2 20
>         >       -+#define NONE 0
>       -+#define SOME 1
>       -+#define TOO_MANY 2
>       ++#define DISCARD 0
>       ++#define KEEP 1
>       ++#define INVESTIGATE 2
>         >         typedef struct s_xdlclass {
>         	struct s_xdlclass *next;
>       @@ xdiff/xprepare.c: void xdl_free_env(xdfenv_t *xe) {
>         >         >        -static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>       -+static bool xdl_clean_mmatch(uint8_t const *matches, long i, long s, long e) {
>       ++static bool xdl_clean_mmatch(uint8_t const *action, long i, long s, long e) {
>         	long r, rdis0, rpdis0, rdis1, rpdis1;
>         >         	/*
>        -	 * Limits the window the is examined during the similar-lines
>        -	 * scan. The loops below stops when dis[i - r] == 1 (line that
>       +-	 * has no match), but there are corner cases where the loop
>       +-	 * proceed all the way to the extremities by causing huge
>       +-	 * performance penalties in case of big files.
>        +	 * Limits the window that is examined during the similar-lines
>       -+	 * scan. The loops below stops when matches[i - r] == SOME (line that
>       - 	 * has no match), but there are corner cases where the loop
>       - 	 * proceed all the way to the extremities by causing huge
>       - 	 * performance penalties in case of big files.
>       ++	 * scan. The loops below stops when action[i - r] == KEEP
>       ++	 * (line that has no match), but there are corner cases where
>       ++	 * the loop proceed all the way to the extremities by causing
>       ++	 * huge performance penalties in case of big files.
>       + 	 */
>       + 	if (i - s > XDL_SIMSCAN_WINDOW)
>       + 		s = i - XDL_SIMSCAN_WINDOW;
>        @@ xdiff/xprepare.c: static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>         >         	/*
>         	 * Scans the lines before 'i' to find a run of lines that either
>        -	 * have no match (dis[j] == 0) or have multiple matches (dis[j] > 1).
>        -	 * Note that we always call this function with dis[i] > 1, so the
>       -+	 * have no match (matches[j] == NONE) or have multiple matches (matches[j] == TOO_MANY).
>       -+	 * Note that we always call this function with matches[i] == TOO_MANY, so the
>       - 	 * current line (i) is already a multimatch line.
>       +-	 * current line (i) is already a multimatch line.
>       ++	 * have no match (action[j] == DISCARD) or have multiple matches
>       ++	 * (action[j] == INVESTIGATE). Note that we always call this
>       ++	 * function with action[i] == INVESTIGATE, so the current line
>       ++	 * (i) is already a multimatch line.
>         	 */
>         	for (r = 1, rdis0 = 0, rpdis0 = 1; (i - r) >= s; r++) {
>        -		if (!dis[i - r])
>       -+		if (matches[i - r] == NONE)
>       ++		if (action[i - r] == DISCARD)
>         			rdis0++;
>        -		else if (dis[i - r] == 2)
>       -+		else if (matches[i - r] == TOO_MANY)
>       ++		else if (action[i - r] == INVESTIGATE)
>         			rpdis0++;
>        -		else
>       -+		else if (matches[i - r] == SOME)
>       ++		else if (action[i - r] == KEEP)
>         			break;
>        +		else
>       -+			BUG("Illegal value for matches[i - r]");
>       ++			BUG("Illegal value for action[i - r]");
>         	}
>         	/*
>       - 	 * If the run before the line 'i' found only multimatch lines, we
>       +-	 * If the run before the line 'i' found only multimatch lines, we
>        -	 * return 0 and hence we don't make the current line (i) discarded.
>       -+	 * return false and hence we don't make the current line (i) discarded.
>       - 	 * We want to discard multimatch lines only when they appear in the
>       +-	 * We want to discard multimatch lines only when they appear in the
>        -	 * middle of runs with nomatch lines (dis[j] == 0).
>       -+	 * middle of runs with nomatch lines (matches[j] == NONE).
>       ++	 * If the run before the line 'i' found only multimatch lines,
>       ++	 * we return false and hence we don't make the current line (i)
>       ++	 * discarded. We want to discard multimatch lines only when
>       ++	 * they appear in the middle of runs with nomatch lines
>       ++	 * (action[j] == DISCARD).
>         	 */
>         	if (rdis0 == 0)
>         		return 0;
>         	for (r = 1, rdis1 = 0, rpdis1 = 1; (i + r) <= e; r++) {
>        -		if (!dis[i + r])
>       -+		if (matches[i + r] == NONE)
>       ++		if (action[i + r] == DISCARD)
>         			rdis1++;
>        -		else if (dis[i + r] == 2)
>       -+		else if (matches[i + r] == TOO_MANY)
>       ++		else if (action[i + r] == INVESTIGATE)
>         			rpdis1++;
>        -		else
>       -+		else if (matches[i + r] == SOME)
>       ++		else if (action[i + r] == KEEP)
>         			break;
>        +		else
>       -+			BUG("Illegal value for matches[i + r]");
>       ++			BUG("Illegal value for action[i + r]");
>         	}
>         	/*
>       - 	 * If the run after the line 'i' found only multimatch lines, we
>       +-	 * If the run after the line 'i' found only multimatch lines, we
>        -	 * return 0 and hence we don't make the current line (i) discarded.
>       -+	 * return false and hence we don't make the current line (i) discarded.
>       ++	 * If the run after the line 'i' found only multimatch lines,
>       ++	 * we return false and hence we don't make the current line (i)
>       ++	 * discarded.
>         	 */
>         	if (rdis1 == 0)
>        -		return 0;
>       @@ xdiff/xprepare.c: static int xdl_clean_mmatch(char const *dis, long i, long s, l
>         	xdlclass_t *rcrec;
>        -	char *dis, *dis1, *dis2;
>        -	int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
>       -+	uint8_t *matches1, *matches2;
>       -+	int status = 0;
>       ++	uint8_t *action1 = NULL, *action2 = NULL;
>        +	bool need_min = !!(cf->flags & XDF_NEED_MINIMAL);
>       ++	int ret = 0;
>         >        -	if (!XDL_CALLOC_ARRAY(dis, xdf1->nrec + xdf2->nrec + 2))
>        -		return -1;
>        -	dis1 = dis;
>        -	dis2 = dis1 + xdf1->nrec + 1;
>       -+	matches1 = NULL;
>       -+	matches2 = NULL;
>       -+
>        +	/*
>        +	 * Create temporary arrays that will help us decide if
>        +	 * changed[i] should remain 0 or become 1.
>        +	 */
>       -+	if (!XDL_CALLOC_ARRAY(matches1, xdf1->nrec + 1)) {
>       -+		status = -1;
>       ++	if (!XDL_CALLOC_ARRAY(action1, xdf1->nrec + 1)) {
>       ++		ret = -1;
>        +		goto cleanup;
>        +	}
>       -+	if (!XDL_CALLOC_ARRAY(matches2, xdf2->nrec + 1)) {
>       -+		status = -1;
>       ++	if (!XDL_CALLOC_ARRAY(action2, xdf2->nrec + 1)) {
>       ++		ret = -1;
>        +		goto cleanup;
>        +	}
>         >        +	/*
>       -+	 * Initialize temporary arrays with NONE, SOME, or TOO_MANY.
>       ++	 * Initialize temporary arrays with DISCARD, KEEP, or INVESTIGATE.
>        +	 */
>         	if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT)
>         		mlim = XDL_MAX_EQLIMIT;
>       @@ xdiff/xprepare.c: static int xdl_clean_mmatch(char const *dis, long i, long s, l
>         		rcrec = cf->rcrecs[recs->ha];
>         		nm = rcrec ? rcrec->len2 : 0;
>        -		dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>       -+		matches1[i] = (nm == 0) ? NONE: (nm >= mlim && !need_min) ? TOO_MANY: SOME;
>       ++		action1[i] = (nm == 0) ? DISCARD: (nm >= mlim && !need_min) ? INVESTIGATE: KEEP;
>         	}
>         >         	if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
>       @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
>         		rcrec = cf->rcrecs[recs->ha];
>         		nm = rcrec ? rcrec->len1 : 0;
>        -		dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>       -+		matches2[i] = (nm == 0) ? NONE: (nm >= mlim && !need_min) ? TOO_MANY: SOME;
>       ++		action2[i] = (nm == 0) ? DISCARD: (nm >= mlim && !need_min) ? INVESTIGATE: KEEP;
>         	}
>         >        +	/*
>       @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
>         	     i <= xdf1->dend; i++, recs++) {
>        -		if (dis1[i] == 1 ||
>        -		    (dis1[i] == 2 && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) {
>       -+		if (matches1[i] == SOME ||
>       -+		    (matches1[i] == TOO_MANY && !xdl_clean_mmatch(matches1, i, xdf1->dstart, xdf1->dend))) {
>       ++		if (action1[i] == KEEP ||
>       ++		    (action1[i] == INVESTIGATE && !xdl_clean_mmatch(action1, i, xdf1->dstart, xdf1->dend))) {
>         			xdf1->rindex[nreff++] = i;
>       -+			/* changed[i] remains 0 */
>       ++			/* changed[i] remains 0, i.e. keep */
>         		} else
>         			xdf1->changed[i] = 1;
>       ++			/* i.e. discard */
>         	}
>       -@@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>       + 	xdf1->nreff = nreff;
>         >         	for (nreff = 0, i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart];
>         	     i <= xdf2->dend; i++, recs++) {
>        -		if (dis2[i] == 1 ||
>        -		    (dis2[i] == 2 && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) {
>       -+		if (matches2[i] == SOME ||
>       -+		    (matches2[i] == TOO_MANY && !xdl_clean_mmatch(matches2, i, xdf2->dstart, xdf2->dend))) {
>       ++		if (action2[i] == KEEP ||
>       ++		    (action2[i] == INVESTIGATE && !xdl_clean_mmatch(action2, i, xdf2->dstart, xdf2->dend))) {
>         			xdf2->rindex[nreff++] = i;
>       -+			/* changed[i] remains 0 */
>       ++			/* changed[i] remains 0, i.e. keep */
>         		} else
>         			xdf2->changed[i] = 1;
>       ++			/* i.e. discard */
>         	}
>         	xdf2->nreff = nreff;
>         >        -	xdl_free(dis);
>        +cleanup:
>       -+	xdl_free(matches1);
>       -+	xdl_free(matches2);
>       ++	xdl_free(action1);
>       ++	xdl_free(action2);
>         >        -	return 0;
>       -+	return status;
>       ++	return ret;
>         }
>         >         >   13:  975e845bfa ! 12:  83e1ace5bd xdiff: change type of xdfile_t.changed from char to bool
>       @@ Commit message
>            xdiff: change type of xdfile_t.changed from char to bool
>        >            The only values possible for 'changed' is 1 and 0, which exactly maps
>       -    to a bool type. It might not look like this is the case because
>       -    matches1 and matches2 (which use to be dis1, and dis2) were also char
>       -    and were assigned numerical values within a few lines of 'changed'
>       -    (what used to be rchg).
>       +    to a bool type. It might not look like this because action1 and action2
>       +    (which use to be dis1, and dis2) were also of type char and were
>       +    assigned numerical values within a few lines of 'changed' (what used to
>       +    be rchg).
>        >       -    Using NONE, SOME, TOO_MANY for matches1[i]/matches2[j], and true/false
>       +    Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false
>            for changed[k] makes it clear to future readers that these are
>            logically separate concepts.
>        >       @@ xdiff/xdiffi.c: static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g)
>         >         		while (xdf->changed[g->start - 1])
>         			g->start--;
>       +@@ xdiff/xdiffi.c: int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>       +
>       + int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
>       + 	xdchange_t *cscr = NULL, *xch;
>       +-	char *changed1 = xe->xdf1.changed, *changed2 = xe->xdf2.changed;
>       ++	bool *changed1 = xe->xdf1.changed, *changed2 = xe->xdf2.changed;
>       + 	long i1, i2, l1, l2;
>       +
>       + 	/*
>        >         ## xdiff/xhistogram.c ##
>        @@ xdiff/xhistogram.c: redo:
>       @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
>        -	 * changed[i] should remain 0 or become 1.
>        +	 * changed[i] should remain false, or become true.
>         	 */
>       - 	if (!XDL_CALLOC_ARRAY(matches1, xdf1->nrec + 1)) {
>       - 		status = -1;
>       + 	if (!XDL_CALLOC_ARRAY(action1, xdf1->nrec + 1)) {
>       + 		ret = -1;
>        @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>         >         	/*
>       @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
>         	 */
>         	for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
>         	     i <= xdf1->dend; i++, recs++) {
>       - 		if (matches1[i] == SOME ||
>       - 		    (matches1[i] == TOO_MANY && !xdl_clean_mmatch(matches1, i, xdf1->dstart, xdf1->dend))) {
>       + 		if (action1[i] == KEEP ||
>       + 		    (action1[i] == INVESTIGATE && !xdl_clean_mmatch(action1, i, xdf1->dstart, xdf1->dend))) {
>         			xdf1->rindex[nreff++] = i;
>       --			/* changed[i] remains 0 */
>       -+			/* changed[i] remains false */
>       +-			/* changed[i] remains 0, i.e. keep */
>       ++			/* changed[i] remains false, i.e. keep */
>         		} else
>        -			xdf1->changed[i] = 1;
>        +			xdf1->changed[i] = true;
>       + 			/* i.e. discard */
>         	}
>         	xdf1->nreff = nreff;
>       -
>        @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>       - 		if (matches2[i] == SOME ||
>       - 		    (matches2[i] == TOO_MANY && !xdl_clean_mmatch(matches2, i, xdf2->dstart, xdf2->dend))) {
>       + 		if (action2[i] == KEEP ||
>       + 		    (action2[i] == INVESTIGATE && !xdl_clean_mmatch(action2, i, xdf2->dstart, xdf2->dend))) {
>         			xdf2->rindex[nreff++] = i;
>       --			/* changed[i] remains 0 */
>       -+			/* changed[i] remains false */
>       +-			/* changed[i] remains 0, i.e. keep */
>       ++			/* changed[i] remains false, i.e. keep */
>         		} else
>        -			xdf2->changed[i] = 1;
>        +			xdf2->changed[i] = true;
>       + 			/* i.e. discard */
>         	}
>         	xdf2->nreff = nreff;
>       -
>        >         ## xdiff/xtypes.h ##
>        @@ xdiff/xtypes.h: typedef struct s_xdfile {
>

gitgitgadget-git · 2025-10-03T19:15:29Z

This patch series was integrated into seen via 3f3e724.

gitgitgadget-git · 2025-10-05T23:03:07Z

This patch series was integrated into seen via 3375e63.

gitgitgadget-git · 2025-10-06T18:53:30Z

This patch series was integrated into seen via c5a7f67.

gitgitgadget-git · 2025-10-06T23:43:57Z

There was a status update in the "Cooking" section about the branch en/xdiff-cleanup on the Git mailing list:

A lot of code clean-up of xdiff.
Split out of a larger topic.

Will merge to 'next'?
source: <pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com>

gitgitgadget-git · 2025-10-07T22:33:10Z

This patch series was integrated into seen via c157ffe.

gitgitgadget-git · 2025-10-08T21:03:41Z

This patch series was integrated into seen via b6d9543.

gitgitgadget-git · 2025-10-08T21:03:42Z

This patch series was integrated into next via 5bc21d0.

gitgitgadget-git · 2025-10-09T01:57:58Z

There was a status update in the "Cooking" section about the branch en/xdiff-cleanup on the Git mailing list:

A lot of code clean-up of xdiff.
Split out of a larger topic.

Will merge to 'master'.
source: <pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com>

ezekielnewren force-pushed the use_rust_types_in_xdiff branch from eb77b2a to 00401e7 Compare September 7, 2025 18:50

gitgitgadget-git bot reviewed Sep 8, 2025

View reviewed changes

gitgitgadget-git bot reviewed Sep 9, 2025

View reviewed changes

ezekielnewren added 6 commits September 26, 2025 11:07

ezekielnewren force-pushed the use_rust_types_in_xdiff branch 2 times, most recently from a9b9c9f to 226a04a Compare September 26, 2025 20:47

ezekielnewren force-pushed the use_rust_types_in_xdiff branch from 226a04a to 83e1ace Compare September 26, 2025 21:47

gitgitgadget-git bot reviewed Sep 30, 2025

View reviewed changes

gitgitgadget-git bot added the next label Oct 8, 2025

Cleanup xdfile_t and xrecord_t in xdiff. #2048

Are you sure you want to change the base?

Cleanup xdfile_t and xrecord_t in xdiff. #2048

Uh oh!

Conversation

ezekielnewren commented Sep 7, 2025 • edited by gitgitgadget-git bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant part of the original cover letter follows:

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

ezekielnewren commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot commented Sep 7, 2025

Uh oh!

gitgitgadget-git bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot commented Sep 9, 2025

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gitgitgadget-git bot Sep 9, 2025

Choose a reason for hiding this comment

ezekielnewren commented Sep 7, 2025 •

edited by gitgitgadget-git bot

Loading