-
Notifications
You must be signed in to change notification settings - Fork 26.8k
Cleanup xdfile_t and xrecord_t in xdiff. #2048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There are issues in commit d322e13: |
There are issues in commit 2e3f0c1: |
There are issues in commit 97c2453: |
There are issues in commit e852992: |
There are issues in commit 3e5fc55: |
There are issues in commit 7ad3e4d: |
There are issues in commit 85e79ce: |
There are issues in commit 5286faa: |
There are issues in commit ce227c5: |
There are issues in commit 3e7f722: |
There are issues in commit 6269cd6: |
There are issues in commit 20cbe5d: |
There are issues in commit e7b31e7: |
There are issues in commit 8871ee0: |
There are issues in commit eb77b2a: |
eb77b2a
to
00401e7
Compare
/submit |
Submitted as pull.2048.git.git.1757274320.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
compat/rust_types.h
Outdated
@@ -0,0 +1,28 @@ | |||
#ifndef COMPAT_RUST_TYPES_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Ezekiel Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +/*
> + * A typedef for bool is not needed because C bool and Rust bool are
> + * the same if #include <stdbool.h> is used.
> + */
> +
> +typedef uint8_t u8;
> +typedef uint16_t u16;
> +typedef uint32_t u32;
> +typedef uint64_t u64;
> +
> +typedef int8_t i8;
> +typedef int16_t i16;
> +typedef int32_t i32;
> +typedef int64_t i64;
The standard guarantees that these are all of the above are exactly
N-bits wide, so I can buy the above types. But before I can buy the
above typedefs, don't we need to rename existing variables that
squat on these names?
$ git grep -n -E -e '\<[ui](8|16|32|64)\>'
gives some hits, like
reftable/record.c:678: uint8_t i64[8];
t/helper/test-parse-options.c:123: uint16_t u16 = 0;
t/helper/test-parse-options.c:148: OPT_UNSIGNED(0, "u16", &u16, "get a 16 bit unsigned integer"),
to avoid confusion? There are handful other hits.
> +typedef float f32;
> +typedef double f64;
It may be that they can be used interchangeably in practice on
popular platforms, but are these guaranteed to be equivalent by some
standard? C only cares about the minimum required range and
precision, so you may have allocated enough bytes thinking you can
fit a f32 but your float may not fit there.
Or does Rust care only about platforms with IEEE 754 and would
refuse to port to other exotic architectures so the above worries
would not apply?
> +typedef size_t usize;
> +typedef ptrdiff_t isize;
Interesting. I would have expected, "isize" that is a signed
variant of "usize" to be aliased out of ssize_t (simply because it
is declared that "usize" corresponds to "size_t" on the previous
line), not using ptrdiff_t.
> +typedef uint32_t rust_char;
> +
> +#endif /* COMPAT_RUST_TYPES_H */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ezekiel Newren wrote (reply to this):
On Mon, Sep 8, 2025 at 9:08 AM Junio C Hamano <gitster@pobox.com> wrote:
> The standard guarantees that these are all of the above are exactly
> N-bits wide, so I can buy the above types. But before I can buy the
> above typedefs, don't we need to rename existing variables that
> squat on these names?
>
> $ git grep -n -E -e '\<[ui](8|16|32|64)\>'
>
> gives some hits, like
>
> reftable/record.c:678: uint8_t i64[8];
> t/helper/test-parse-options.c:123: uint16_t u16 = 0;
> t/helper/test-parse-options.c:148: OPT_UNSIGNED(0, "u16", &u16, "get a 16 bit unsigned integer"),
>
> to avoid confusion? There are handful other hits.
Those places should be cleaned up, but it's not an immediate problem
because compat/rust_types.h is not auto included anywhere. These
typedefs live in git-compat-util.h in my "Introduce Rust" patch series
and I never had a problem with compilation or testing. One reason I
included compat/posix.h instead of git-compat-util.h is because it
includes stdbool.h where git-compat-util.h doesn't. I'll be happy to
do that cleanup.
> > +typedef float f32;
> > +typedef double f64;
>
> It may be that they can be used interchangeably in practice on
> popular platforms, but are these guaranteed to be equivalent by some
> standard? C only cares about the minimum required range and
> precision, so you may have allocated enough bytes thinking you can
> fit a f32 but your float may not fit there.
>
> Or does Rust care only about platforms with IEEE 754 and would
> refuse to port to other exotic architectures so the above worries
> would not apply?
If the typedefs in compat/rust_types.h are incorrect for a
platform/target then compat/posix.h or compat/rust_types.h should be
updated rather than relying on Rust's core::ffi's guess as what it is.
On the Rust side: core::ffi assumes that a C 'float' is the same as
f32 and that a C 'double' is an f64[1,2,3]. So I am making the same
assumption that the Rust maintainers are _and_ I'm keeping ambiguity
on the C side rather than eroding Rust's type precision. If a platform
does not follow these assumptions then the compat/rust_types.h should
warn those using that platform or fail to build entirely.
[1] https://doc.rust-lang.org/1.89.0/core/ffi/type.c_float.html
[2] https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#80-81
[3] https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#34-35
long flags; | ||
} xdlclassifier_t; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> Move xdl_prepare_env() later in the file to avoid the need
> for static forward declarations.
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
> xdiff/xprepare.c | 116 ++++++++++++++++++++---------------------------
> 1 file changed, 50 insertions(+), 66 deletions(-)
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index e1d4017b2d..a45c5ee208 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -53,21 +53,6 @@ typedef struct s_xdlclassifier {
>
>
>
> -static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags);
> -static void xdl_free_classifier(xdlclassifier_t *cf);
> -static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t **rhash,
> - unsigned int hbits, xrecord_t *rec);
> -static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_t const *xpp,
> - xdlclassifier_t *cf, xdfile_t *xdf);
> -static void xdl_free_ctx(xdfile_t *xdf);
> -static int xdl_clean_mmatch(char const *dis, long i, long s, long e);
> -static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2);
> -static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2);
> -static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2);
> -
> -
> -
> -
> static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags) {
> cf->flags = flags;
>
> @@ -242,57 +227,6 @@ static void xdl_free_ctx(xdfile_t *xdf) {
> }
>
>
> -int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> - xdfenv_t *xe) {
> - long enl1, enl2, sample;
> - xdlclassifier_t cf;
> -
> - memset(&cf, 0, sizeof(cf));
> -
> - /*
> - * For histogram diff, we can afford a smaller sample size and
> - * thus a poorer estimate of the number of lines, as the hash
> - * table (rhash) won't be filled up/grown. The number of lines
> - * (nrecs) will be updated correctly anyway by
> - * xdl_prepare_ctx().
> - */
> - sample = (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF
> - ? XDL_GUESS_NLINES2 : XDL_GUESS_NLINES1);
> -
> - enl1 = xdl_guess_lines(mf1, sample) + 1;
> - enl2 = xdl_guess_lines(mf2, sample) + 1;
> -
> - if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0)
> - return -1;
> -
> - if (xdl_prepare_ctx(1, mf1, enl1, xpp, &cf, &xe->xdf1) < 0) {
> -
> - xdl_free_classifier(&cf);
> - return -1;
> - }
> - if (xdl_prepare_ctx(2, mf2, enl2, xpp, &cf, &xe->xdf2) < 0) {
> -
> - xdl_free_ctx(&xe->xdf1);
> - xdl_free_classifier(&cf);
> - return -1;
> - }
> -
> - if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) &&
> - (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF) &&
> - xdl_optimize_ctxs(&cf, &xe->xdf1, &xe->xdf2) < 0) {
> -
> - xdl_free_ctx(&xe->xdf2);
> - xdl_free_ctx(&xe->xdf1);
> - xdl_free_classifier(&cf);
> - return -1;
> - }
> -
> - xdl_free_classifier(&cf);
> -
> - return 0;
> -}
> -
> -
> void xdl_free_env(xdfenv_t *xe) {
>
> xdl_free_ctx(&xe->xdf2);
> @@ -460,3 +394,53 @@ static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2
>
> return 0;
> }
> +
> +int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> + xdfenv_t *xe) {
> + long enl1, enl2, sample;
> + xdlclassifier_t cf;
> +
> + memset(&cf, 0, sizeof(cf));
> +
> + /*
> + * For histogram diff, we can afford a smaller sample size and
> + * thus a poorer estimate of the number of lines, as the hash
> + * table (rhash) won't be filled up/grown. The number of lines
> + * (nrecs) will be updated correctly anyway by
> + * xdl_prepare_ctx().
> + */
> + sample = (XDF_DIFF_ALG(xpp->flags) == XDF_HISTOGRAM_DIFF
> + ? XDL_GUESS_NLINES2 : XDL_GUESS_NLINES1);
> +
> + enl1 = xdl_guess_lines(mf1, sample) + 1;
> + enl2 = xdl_guess_lines(mf2, sample) + 1;
> +
> + if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0)
> + return -1;
> +
> + if (xdl_prepare_ctx(1, mf1, enl1, xpp, &cf, &xe->xdf1) < 0) {
> +
> + xdl_free_classifier(&cf);
> + return -1;
> + }
> + if (xdl_prepare_ctx(2, mf2, enl2, xpp, &cf, &xe->xdf2) < 0) {
> +
> + xdl_free_ctx(&xe->xdf1);
> + xdl_free_classifier(&cf);
> + return -1;
> + }
> +
> + if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) &&
> + (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF) &&
> + xdl_optimize_ctxs(&cf, &xe->xdf1, &xe->xdf2) < 0) {
> +
> + xdl_free_ctx(&xe->xdf2);
> + xdl_free_ctx(&xe->xdf1);
> + xdl_free_classifier(&cf);
> + return -1;
> + }
> +
> + xdl_free_classifier(&cf);
> +
> + return 0;
> +}
> --
> gitgitgadget
Viewing this with --color-moved makes it clear that the changes
exactly match what you summarize in the commit message.
User |
rhash[hi] = rec; | ||
|
||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Sun, Sep 7, 2025 at 12:48 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> xdl_prepare_ctx() uses local variables and assigns them to the
> corresponding xdfile_t fields if there are no errors. Delete them and
> use the fields of xdfile_t directly.
In particular, those local variables are essentially a hand-rolled
additional implementation of xdl_free_ctx() inlined into
xdl_prepare_ctx(). You're just modifying the code to use the existing
xdl_free_ctx() function so we don't have two ways to free such
variables (especially since one of those two was an ugly inlining).
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
> xdiff/xprepare.c | 79 +++++++++++++++++++-----------------------------
> 1 file changed, 31 insertions(+), 48 deletions(-)
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index a45c5ee208..2ed1785b09 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -134,99 +134,82 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
> }
>
>
> +static void xdl_free_ctx(xdfile_t *xdf)
> +{
> +
unnecessary blank line here
> + xdl_free(xdf->rhash);
> + xdl_free(xdf->rindex);
> + xdl_free(xdf->rchg - 1);
> + xdl_free(xdf->ha);
> + xdl_free(xdf->recs);
> + xdl_cha_free(&xdf->rcha);
> +}
> +
> +
> static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_t const *xpp,
> xdlclassifier_t *cf, xdfile_t *xdf) {
> - unsigned int hbits;
> - long nrec, hsize, bsize;
> + long bsize;
> unsigned long hav;
> char const *blk, *cur, *top, *prev;
> xrecord_t *crec;
> - xrecord_t **recs;
> - xrecord_t **rhash;
> - unsigned long *ha;
> - char *rchg;
> - long *rindex;
>
> - ha = NULL;
> - rindex = NULL;
> - rchg = NULL;
> - rhash = NULL;
> - recs = NULL;
> + xdf->ha = NULL;
> + xdf->rindex = NULL;
> + xdf->rchg = NULL;
> + xdf->rhash = NULL;
> + xdf->recs = NULL;
>
> if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
> goto abort;
> - if (!XDL_ALLOC_ARRAY(recs, narec))
> + if (!XDL_ALLOC_ARRAY(xdf->recs, narec))
> goto abort;
>
> - hbits = xdl_hashbits((unsigned int) narec);
> - hsize = 1 << hbits;
> - if (!XDL_CALLOC_ARRAY(rhash, hsize))
> + xdf->hbits = xdl_hashbits((unsigned int) narec);
> + if (!XDL_CALLOC_ARRAY(xdf->rhash, 1 << xdf->hbits))
> goto abort;
>
> - nrec = 0;
> + xdf->nrec = 0;
> if ((cur = blk = xdl_mmfile_first(mf, &bsize))) {
> for (top = blk + bsize; cur < top; ) {
> prev = cur;
> hav = xdl_hash_record(&cur, top, xpp->flags);
> - if (XDL_ALLOC_GROW(recs, nrec + 1, narec))
> + if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec))
> goto abort;
> if (!(crec = xdl_cha_alloc(&xdf->rcha)))
> goto abort;
> crec->ptr = prev;
> crec->size = (long) (cur - prev);
> crec->ha = hav;
> - recs[nrec++] = crec;
> - if (xdl_classify_record(pass, cf, rhash, hbits, crec) < 0)
> + xdf->recs[xdf->nrec++] = crec;
> + if (xdl_classify_record(pass, cf, xdf->rhash, xdf->hbits, crec) < 0)
> goto abort;
> }
> }
>
> - if (!XDL_CALLOC_ARRAY(rchg, nrec + 2))
> + if (!XDL_CALLOC_ARRAY(xdf->rchg, xdf->nrec + 2))
> goto abort;
>
> if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) &&
> (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF)) {
> - if (!XDL_ALLOC_ARRAY(rindex, nrec + 1))
> + if (!XDL_ALLOC_ARRAY(xdf->rindex, xdf->nrec + 1))
> goto abort;
> - if (!XDL_ALLOC_ARRAY(ha, nrec + 1))
> + if (!XDL_ALLOC_ARRAY(xdf->ha, xdf->nrec + 1))
> goto abort;
> }
>
> - xdf->nrec = nrec;
> - xdf->recs = recs;
> - xdf->hbits = hbits;
> - xdf->rhash = rhash;
> - xdf->rchg = rchg + 1;
> - xdf->rindex = rindex;
> + xdf->rchg += 1;
> xdf->nreff = 0;
> - xdf->ha = ha;
> xdf->dstart = 0;
> - xdf->dend = nrec - 1;
> + xdf->dend = xdf->nrec - 1;
>
> return 0;
>
> abort:
> - xdl_free(ha);
> - xdl_free(rindex);
> - xdl_free(rchg);
> - xdl_free(rhash);
> - xdl_free(recs);
> - xdl_cha_free(&xdf->rcha);
> + xdl_free_ctx(xdf);
> return -1;
> }
>
>
> -static void xdl_free_ctx(xdfile_t *xdf) {
> -
> - xdl_free(xdf->rhash);
> - xdl_free(xdf->rindex);
> - xdl_free(xdf->rchg - 1);
> - xdl_free(xdf->ha);
> - xdl_free(xdf->recs);
> - xdl_cha_free(&xdf->rcha);
> -}
> -
> -
> void xdl_free_env(xdfenv_t *xe) {
>
> xdl_free_ctx(&xe->xdf2);
> --
> gitgitgadget
Looks good.
xdl_free(cf->rcrecs); | ||
xdl_free(cf->rchash); | ||
xdl_cha_free(&cf->ncha); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized,
> but never used for anything by the code. Remove them.
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
> xdiff/xprepare.c | 15 ++-------------
> xdiff/xtypes.h | 3 ---
> 2 files changed, 2 insertions(+), 16 deletions(-)
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 2ed1785b09..91b0ed54e0 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -91,8 +91,7 @@ static void xdl_free_classifier(xdlclassifier_t *cf) {
> }
>
>
> -static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t **rhash,
> - unsigned int hbits, xrecord_t *rec) {
> +static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec) {
> long hi;
> char const *line;
> xdlclass_t *rcrec;
> @@ -126,10 +125,6 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
>
> rec->ha = (unsigned long) rcrec->idx;
>
> - hi = (long) XDL_HASHLONG(rec->ha, hbits);
> - rec->next = rhash[hi];
> - rhash[hi] = rec;
> -
> return 0;
> }
>
> @@ -137,7 +132,6 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
> static void xdl_free_ctx(xdfile_t *xdf)
> {
>
> - xdl_free(xdf->rhash);
> xdl_free(xdf->rindex);
> xdl_free(xdf->rchg - 1);
> xdl_free(xdf->ha);
> @@ -156,7 +150,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
> xdf->ha = NULL;
> xdf->rindex = NULL;
> xdf->rchg = NULL;
> - xdf->rhash = NULL;
> xdf->recs = NULL;
>
> if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
> @@ -164,10 +157,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
> if (!XDL_ALLOC_ARRAY(xdf->recs, narec))
> goto abort;
>
> - xdf->hbits = xdl_hashbits((unsigned int) narec);
> - if (!XDL_CALLOC_ARRAY(xdf->rhash, 1 << xdf->hbits))
> - goto abort;
> -
> xdf->nrec = 0;
> if ((cur = blk = xdl_mmfile_first(mf, &bsize))) {
> for (top = blk + bsize; cur < top; ) {
> @@ -181,7 +170,7 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
> crec->size = (long) (cur - prev);
> crec->ha = hav;
> xdf->recs[xdf->nrec++] = crec;
> - if (xdl_classify_record(pass, cf, xdf->rhash, xdf->hbits, crec) < 0)
> + if (xdl_classify_record(pass, cf, crec) < 0)
> goto abort;
> }
> }
> diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
> index 8442bd436e..8b8467360e 100644
> --- a/xdiff/xtypes.h
> +++ b/xdiff/xtypes.h
> @@ -39,7 +39,6 @@ typedef struct s_chastore {
> } chastore_t;
>
> typedef struct s_xrecord {
> - struct s_xrecord *next;
> char const *ptr;
> long size;
> unsigned long ha;
> @@ -48,8 +47,6 @@ typedef struct s_xrecord {
> typedef struct s_xdfile {
> chastore_t rcha;
> long nrec;
> - unsigned int hbits;
> - xrecord_t **rhash;
> long dstart, dend;
> xrecord_t **recs;
> char *rchg;
> --
> gitgitgadget
Always nice to see unused fields get removed.
* Davide Libenzi <davidel@xmailserver.org> | ||
* | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> This function aliases the fields of xrecord_t, which makes it harder
> to track the usages of those fields. Delete it.
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
> xdiff/xemit.c | 38 +++++++++++++-------------------------
> 1 file changed, 13 insertions(+), 25 deletions(-)
>
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index 1d40c9cb40..2161ac3cd0 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -22,23 +22,13 @@
>
> #include "xinclude.h"
>
> -static long xdl_get_rec(xdfile_t *xdf, long ri, char const **rec) {
>
Can we remove this line too, to simplify the diff? (i.e. make there
only be one blank line between the include of xinclude.h and
xdl_emit_record?
> - *rec = xdf->recs[ri]->ptr;
> -
> - return xdf->recs[ri]->size;
> -}
> -
> -
> -static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb) {
> - long size, psize = strlen(pre);
> - char const *rec;
> -
> - size = xdl_get_rec(xdf, ri, &rec);
> - if (xdl_emit_diffrec(rec, size, pre, psize, ecb) < 0) {
> +static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb)
> +{
The change of the opening curly brace to be on a new line does match
our general coding guidelines, but cleanups like this should be in a
separate patch.
> + xrecord_t *rec = xdf->recs[ri];
>
> + if (xdl_emit_diffrec(rec->ptr, rec->size, pre, strlen(pre), ecb) < 0)
> return -1;
> - }
While in this case you were modifying the line in question and thus
fixing the code to also not use curly braces around a single
statement, which is more justified, it still makes the patch slightly
harder for reviewers to read since you are doing multiple things (what
you said in the commit message, plus cleaning up style "violations").
It'd be better to leave the existing style violations in place, or fix
them in a separate patch.
>
> return 0;
> }
> @@ -120,11 +110,11 @@ static long def_ff(const char *rec, long len, char *buf, long sz)
> static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri,
> char *buf, long sz)
> {
> - const char *rec;
> - long len = xdl_get_rec(xdf, ri, &rec);
> + xrecord_t *rec = xdf->recs[ri];
> +
> if (!xecfg->find_func)
> - return def_ff(rec, len, buf, sz);
> - return xecfg->find_func(rec, len, buf, sz, xecfg->find_func_priv);
> + return def_ff(rec->ptr, rec->size, buf, sz);
> + return xecfg->find_func(rec->ptr, rec->size, buf, sz, xecfg->find_func_priv);
> }
>
> static int is_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri)
> @@ -160,14 +150,12 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg,
>
> static int is_empty_rec(xdfile_t *xdf, long ri)
> {
> - const char *rec;
> - long len = xdl_get_rec(xdf, ri, &rec);
> + xrecord_t *rec = xdf->recs[ri];
> + long i = 0;
>
> - while (len > 0 && XDL_ISSPACE(*rec)) {
> - rec++;
> - len--;
> - }
> - return !len;
> + for (; i < rec->size && XDL_ISSPACE(rec->ptr[i]); i++);
> +
> + return i == rec->size;
> }
I agree that the code is easier to follow without the aliasing.
|
||
/* | ||
* Rule: "Divide et Impera" (divide & conquer). Recursively split the box in | ||
* sub-boxes by calling the box splitting function. Note that the real job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Sun, Sep 7, 2025 at 12:45 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> Every field in this struct is an alias for a certain field in xdfile_t.
>
> diffdata_t.nrec -> xdfile_t.nreff
> diffdata_t.ha -> xdfile_t.ha
> diffdata_t.rindex -> xdfile_t.rindex
> diffdata_t.rchg -> xdfile_t.recharge
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
> xdiff/xdiffi.c | 32 ++++++++------------------------
> xdiff/xdiffi.h | 11 ++---------
> 2 files changed, 10 insertions(+), 33 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 5a96e36dfb..bbf0161f84 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -257,10 +257,10 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
> * sub-boxes by calling the box splitting function. Note that the real job
> * (marking changed lines) is done in the two boundary reaching checks.
> */
> -int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,
> - diffdata_t *dd2, long off2, long lim2,
> +int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
> + xdfile_t *xdf2, long off2, long lim2,
> long *kvdf, long *kvdb, int need_min, xdalgoenv_t *xenv) {
> - unsigned long const *ha1 = dd1->ha, *ha2 = dd2->ha;
> + unsigned long const *ha1 = xdf1->ha, *ha2 = xdf2->ha;
>
> /*
> * Shrink the box by walking through each diagonal snake (SW and NE).
> @@ -273,17 +273,11 @@ int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,
> * be obviously changed.
> */
> if (off1 == lim1) {
> - char *rchg2 = dd2->rchg;
> - long *rindex2 = dd2->rindex;
> -
> for (; off2 < lim2; off2++)
> - rchg2[rindex2[off2]] = 1;
> + xdf2->rchg[xdf2->rindex[off2]] = 1;
> } else if (off2 == lim2) {
> - char *rchg1 = dd1->rchg;
> - long *rindex1 = dd1->rindex;
> -
> for (; off1 < lim1; off1++)
> - rchg1[rindex1[off1]] = 1;
> + xdf1->rchg[xdf1->rindex[off1]] = 1;
> } else {
> xdpsplit_t spl;
> spl.i1 = spl.i2 = 0;
> @@ -300,9 +294,9 @@ int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,
> /*
> * ... et Impera.
> */
> - if (xdl_recs_cmp(dd1, off1, spl.i1, dd2, off2, spl.i2,
> + if (xdl_recs_cmp(xdf1, off1, spl.i1, xdf2, off2, spl.i2,
> kvdf, kvdb, spl.min_lo, xenv) < 0 ||
> - xdl_recs_cmp(dd1, spl.i1, lim1, dd2, spl.i2, lim2,
> + xdl_recs_cmp(xdf1, spl.i1, lim1, xdf2, spl.i2, lim2,
> kvdf, kvdb, spl.min_hi, xenv) < 0) {
>
> return -1;
> @@ -318,7 +312,6 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> long ndiags;
> long *kvd, *kvdf, *kvdb;
> xdalgoenv_t xenv;
> - diffdata_t dd1, dd2;
> int res;
>
> if (xdl_prepare_env(mf1, mf2, xpp, xe) < 0)
> @@ -357,16 +350,7 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> xenv.snake_cnt = XDL_SNAKE_CNT;
> xenv.heur_min = XDL_HEUR_MIN_COST;
>
> - dd1.nrec = xe->xdf1.nreff;
> - dd1.ha = xe->xdf1.ha;
> - dd1.rchg = xe->xdf1.rchg;
> - dd1.rindex = xe->xdf1.rindex;
> - dd2.nrec = xe->xdf2.nreff;
> - dd2.ha = xe->xdf2.ha;
> - dd2.rchg = xe->xdf2.rchg;
> - dd2.rindex = xe->xdf2.rindex;
> -
> - res = xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
> + res = xdl_recs_cmp(&xe->xdf1, 0, xe->xdf1.nreff, &xe->xdf2, 0, xe->xdf2.nreff,
> kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0,
> &xenv);
> xdl_free(kvd);
> diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
> index 126c9d8ff4..49e52c67f9 100644
> --- a/xdiff/xdiffi.h
> +++ b/xdiff/xdiffi.h
> @@ -24,13 +24,6 @@
> #define XDIFFI_H
>
>
> -typedef struct s_diffdata {
> - long nrec;
> - unsigned long const *ha;
> - long *rindex;
> - char *rchg;
> -} diffdata_t;
> -
> typedef struct s_xdalgoenv {
> long mxcost;
> long snake_cnt;
> @@ -46,8 +39,8 @@ typedef struct s_xdchange {
>
>
>
> -int xdl_recs_cmp(diffdata_t *dd1, long off1, long lim1,
> - diffdata_t *dd2, long off2, long lim2,
> +int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
> + xdfile_t *xdf2, long off2, long lim2,
> long *kvdf, long *kvdb, int need_min, xdalgoenv_t *xenv);
> int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> xdfenv_t *xe);
> --
> gitgitgadget
Viewing this commit with --color-moved helps highlight in the code
what you say in the commit message. Makes sense.
* Davide Libenzi <davidel@xmailserver.org> | ||
* | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> When 0 <= i < xdfile_t.nreff the following is true:
> xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]]
I like getting rid of redundant stuff. One thing to note here is that
you're replacing a single indirection with two...
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
> xdiff/xdiffi.c | 24 ++++++++++++++----------
> xdiff/xprepare.c | 12 ++----------
> xdiff/xtypes.h | 1 -
> 3 files changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index bbf0161f84..11cd090b53 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -22,6 +22,11 @@
>
> #include "xinclude.h"
>
> +static unsigned long get_hash(xdfile_t *xdf, long index)
> +{
> + return xdf->recs[xdf->rindex[index]]->ha;
> +}
> +
> #define XDL_MAX_COST_MIN 256
> #define XDL_HEUR_MIN_COST 256
> #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
> @@ -42,8 +47,8 @@ typedef struct s_xdpsplit {
> * using this algorithm, so a little bit of heuristic is needed to cut the
> * search and to return a suboptimal point.
> */
> -static long xdl_split(unsigned long const *ha1, long off1, long lim1,
> - unsigned long const *ha2, long off2, long lim2,
> +static long xdl_split(xdfile_t *xdf1, long off1, long lim1,
> + xdfile_t *xdf2, long off2, long lim2,
> long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
> xdalgoenv_t *xenv) {
> long dmin = off1 - lim2, dmax = lim1 - off2;
> @@ -87,7 +92,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
> i1 = kvdf[d + 1];
> prev1 = i1;
> i2 = i1 - d;
> - for (; i1 < lim1 && i2 < lim2 && ha1[i1] == ha2[i2]; i1++, i2++);
> + for (; i1 < lim1 && i2 < lim2 && get_hash(xdf1, i1) == get_hash(xdf2, i2); i1++, i2++);
You're not going to be happy with me asking, so sorry in advance, but
I'm really curious...we are now replacing a single indirection with a
double-indirection inside a for loop, which is nested within two other
for loops. Three levels of for-loops to me suggests it might be a hot
codepath. Does this double indirection in these codepaths affect
performance?
> if (i1 - prev1 > xenv->snake_cnt)
> got_snake = 1;
> kvdf[d] = i1;
> @@ -124,7 +129,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
> i1 = kvdb[d + 1] - 1;
> prev1 = i1;
> i2 = i1 - d;
> - for (; i1 > off1 && i2 > off2 && ha1[i1 - 1] == ha2[i2 - 1]; i1--, i2--);
> + for (; i1 > off1 && i2 > off2 && get_hash(xdf1, i1 - 1) == get_hash(xdf2, i2 - 1); i1--, i2--);
> if (prev1 - i1 > xenv->snake_cnt)
> got_snake = 1;
> kvdb[d] = i1;
> @@ -159,7 +164,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
> if (v > XDL_K_HEUR * ec && v > best &&
> off1 + xenv->snake_cnt <= i1 && i1 < lim1 &&
> off2 + xenv->snake_cnt <= i2 && i2 < lim2) {
> - for (k = 1; ha1[i1 - k] == ha2[i2 - k]; k++)
> + for (k = 1; get_hash(xdf1, i1 - k) == get_hash(xdf2, i2 - k); k++)
> if (k == xenv->snake_cnt) {
> best = v;
> spl->i1 = i1;
> @@ -183,7 +188,7 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
> if (v > XDL_K_HEUR * ec && v > best &&
> off1 < i1 && i1 <= lim1 - xenv->snake_cnt &&
> off2 < i2 && i2 <= lim2 - xenv->snake_cnt) {
> - for (k = 0; ha1[i1 + k] == ha2[i2 + k]; k++)
> + for (k = 0; get_hash(xdf1, i1 + k) == get_hash(xdf2, i2 + k); k++)
> if (k == xenv->snake_cnt - 1) {
> best = v;
> spl->i1 = i1;
> @@ -260,13 +265,12 @@ static long xdl_split(unsigned long const *ha1, long off1, long lim1,
> int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
> xdfile_t *xdf2, long off2, long lim2,
> long *kvdf, long *kvdb, int need_min, xdalgoenv_t *xenv) {
> - unsigned long const *ha1 = xdf1->ha, *ha2 = xdf2->ha;
>
> /*
> * Shrink the box by walking through each diagonal snake (SW and NE).
> */
> - for (; off1 < lim1 && off2 < lim2 && ha1[off1] == ha2[off2]; off1++, off2++);
> - for (; off1 < lim1 && off2 < lim2 && ha1[lim1 - 1] == ha2[lim2 - 1]; lim1--, lim2--);
> + for (; off1 < lim1 && off2 < lim2 && get_hash(xdf1, off1) == get_hash(xdf2, off2); off1++, off2++);
> + for (; off1 < lim1 && off2 < lim2 && get_hash(xdf1, lim1 - 1) == get_hash(xdf2, lim2 - 1); lim1--, lim2--);
>
> /*
> * If one dimension is empty, then all records on the other one must
> @@ -285,7 +289,7 @@ int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
> /*
> * Divide ...
> */
> - if (xdl_split(ha1, off1, lim1, ha2, off2, lim2, kvdf, kvdb,
> + if (xdl_split(xdf1, off1, lim1, xdf2, off2, lim2, kvdf, kvdb,
> need_min, &spl, xenv) < 0) {
>
> return -1;
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 91b0ed54e0..59730989a3 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -134,7 +134,6 @@ static void xdl_free_ctx(xdfile_t *xdf)
>
> xdl_free(xdf->rindex);
> xdl_free(xdf->rchg - 1);
> - xdl_free(xdf->ha);
> xdl_free(xdf->recs);
> xdl_cha_free(&xdf->rcha);
> }
> @@ -147,7 +146,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
> char const *blk, *cur, *top, *prev;
> xrecord_t *crec;
>
> - xdf->ha = NULL;
> xdf->rindex = NULL;
> xdf->rchg = NULL;
> xdf->recs = NULL;
> @@ -182,8 +180,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
> (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF)) {
> if (!XDL_ALLOC_ARRAY(xdf->rindex, xdf->nrec + 1))
> goto abort;
> - if (!XDL_ALLOC_ARRAY(xdf->ha, xdf->nrec + 1))
> - goto abort;
> }
>
> xdf->rchg += 1;
> @@ -301,9 +297,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> i <= xdf1->dend; i++, recs++) {
> if (dis1[i] == 1 ||
> (dis1[i] == 2 && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) {
> - xdf1->rindex[nreff] = i;
> - xdf1->ha[nreff] = (*recs)->ha;
> - nreff++;
> + xdf1->rindex[nreff++] = i;
> } else
> xdf1->rchg[i] = 1;
> }
> @@ -313,9 +307,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> i <= xdf2->dend; i++, recs++) {
> if (dis2[i] == 1 ||
> (dis2[i] == 2 && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) {
> - xdf2->rindex[nreff] = i;
> - xdf2->ha[nreff] = (*recs)->ha;
> - nreff++;
> + xdf2->rindex[nreff++] = i;
> } else
> xdf2->rchg[i] = 1;
> }
> diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
> index 8b8467360e..85848f1685 100644
> --- a/xdiff/xtypes.h
> +++ b/xdiff/xtypes.h
> @@ -52,7 +52,6 @@ typedef struct s_xdfile {
> char *rchg;
> long *rindex;
> long nreff;
> - unsigned long *ha;
> } xdfile_t;
>
> typedef struct s_xdfenv {
> --
> gitgitgadget
Other than the performance question, it looks like you've made a
straightforward mechanical change as highlighted in the commit
message.
#define XDL_GUESS_NLINES1 256 | ||
#define XDL_GUESS_NLINES2 20 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
> xdiff/xprepare.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 59730989a3..6f1d4b4725 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -32,9 +32,7 @@
>
> typedef struct s_xdlclass {
> struct s_xdlclass *next;
> - unsigned long ha;
> - char const *line;
> - long size;
> + xrecord_t rec;
> long idx;
> long len1, len2;
> } xdlclass_t;
> @@ -93,14 +91,12 @@ static void xdl_free_classifier(xdlclassifier_t *cf) {
>
> static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t *rec) {
> long hi;
> - char const *line;
> xdlclass_t *rcrec;
>
> - line = rec->ptr;
> hi = (long) XDL_HASHLONG(rec->ha, cf->hbits);
> for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
> - if (rcrec->ha == rec->ha &&
> - xdl_recmatch(rcrec->line, rcrec->size,
> + if (rcrec->rec.ha == rec->ha &&
> + xdl_recmatch(rcrec->rec.ptr, rcrec->rec.size,
> rec->ptr, rec->size, cf->flags))
> break;
>
> @@ -113,9 +109,9 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
> if (XDL_ALLOC_GROW(cf->rcrecs, cf->count, cf->alloc))
> return -1;
> cf->rcrecs[rcrec->idx] = rcrec;
> - rcrec->line = line;
> - rcrec->size = rec->size;
> - rcrec->ha = rec->ha;
> + rcrec->rec.ptr = rec->ptr;
> + rcrec->rec.size = rec->size;
> + rcrec->rec.ha = rec->ha;
> rcrec->len1 = rcrec->len2 = 0;
> rcrec->next = cf->rchash[hi];
> cf->rchash[hi] = rcrec;
> --
> gitgitgadget
I can see the changes match the one-line summary. And I think the
point is simplification or reducing redundancy or something...but
could a single sentence motivation (stating which of these purposes is
at play) be added to the commit message?
*/ | ||
|
||
#include "xinclude.h" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
My personal bias is that things like "view with --color-words" makes
more sense to include near the end of the commit message, just before
the sign-offs. Not sure if others agree on that.
> The chastore_t type is very unfriendly to Rust FFI. It's also redundant
> since 'recs' is a vector type that grows every time an xrecord_t is
> added.
The second sentence seems to presume the reader knows what chastore_t
type is for, and about the confusing dual layering between it and
recs.its confusing dual layering. I liked your more extended
explanation in https://lore.kernel.org/git/7ea2dccd71fc502f20614ce217fc9885d1b17413.1756496539.git.gitgitgadget@gmail.com/;
could some of that be used here?
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
> xdiff/xdiffi.c | 24 ++++++++++----------
> xdiff/xemit.c | 6 ++---
> xdiff/xhistogram.c | 2 +-
> xdiff/xmerge.c | 56 +++++++++++++++++++++++-----------------------
> xdiff/xpatience.c | 10 ++++-----
> xdiff/xprepare.c | 19 ++++++----------
> xdiff/xtypes.h | 3 +--
> xdiff/xutils.c | 12 +++++-----
> 8 files changed, 63 insertions(+), 69 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 11cd090b53..a66125d44a 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -24,7 +24,7 @@
>
> static unsigned long get_hash(xdfile_t *xdf, long index)
> {
> - return xdf->recs[xdf->rindex[index]]->ha;
> + return xdf->recs[xdf->rindex[index]].ha;
> }
>
> #define XDL_MAX_COST_MIN 256
> @@ -489,13 +489,13 @@ static void measure_split(const xdfile_t *xdf, long split,
> m->indent = -1;
> } else {
> m->end_of_file = 0;
> - m->indent = get_indent(xdf->recs[split]);
> + m->indent = get_indent(&xdf->recs[split]);
> }
>
> m->pre_blank = 0;
> m->pre_indent = -1;
> for (i = split - 1; i >= 0; i--) {
> - m->pre_indent = get_indent(xdf->recs[i]);
> + m->pre_indent = get_indent(&xdf->recs[i]);
> if (m->pre_indent != -1)
> break;
> m->pre_blank += 1;
> @@ -508,7 +508,7 @@ static void measure_split(const xdfile_t *xdf, long split,
> m->post_blank = 0;
> m->post_indent = -1;
> for (i = split + 1; i < xdf->nrec; i++) {
> - m->post_indent = get_indent(xdf->recs[i]);
> + m->post_indent = get_indent(&xdf->recs[i]);
> if (m->post_indent != -1)
> break;
> m->post_blank += 1;
> @@ -752,7 +752,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g)
> static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
> {
> if (g->end < xdf->nrec &&
> - recs_match(xdf->recs[g->start], xdf->recs[g->end])) {
> + recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) {
> xdf->rchg[g->start++] = 0;
> xdf->rchg[g->end++] = 1;
>
> @@ -773,7 +773,7 @@ static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
> static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g)
> {
> if (g->start > 0 &&
> - recs_match(xdf->recs[g->start - 1], xdf->recs[g->end - 1])) {
> + recs_match(&xdf->recs[g->start - 1], &xdf->recs[g->end - 1])) {
> xdf->rchg[--g->start] = 1;
> xdf->rchg[--g->end] = 0;
>
> @@ -988,16 +988,16 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
>
> for (xch = xscr; xch; xch = xch->next) {
> int ignore = 1;
> - xrecord_t **rec;
> + xrecord_t *rec;
> long i;
>
> rec = &xe->xdf1.recs[xch->i1];
> for (i = 0; i < xch->chg1 && ignore; i++)
> - ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags);
> + ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
>
> rec = &xe->xdf2.recs[xch->i2];
> for (i = 0; i < xch->chg2 && ignore; i++)
> - ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags);
> + ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
>
> xch->ignore = ignore;
> }
> @@ -1021,7 +1021,7 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> xdchange_t *xch;
>
> for (xch = xscr; xch; xch = xch->next) {
> - xrecord_t **rec;
> + xrecord_t *rec;
> int ignore = 1;
> long i;
>
> @@ -1033,11 +1033,11 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
>
> rec = &xe->xdf1.recs[xch->i1];
> for (i = 0; i < xch->chg1 && ignore; i++)
> - ignore = record_matches_regex(rec[i], xpp);
> + ignore = record_matches_regex(&rec[i], xpp);
>
> rec = &xe->xdf2.recs[xch->i2];
> for (i = 0; i < xch->chg2 && ignore; i++)
> - ignore = record_matches_regex(rec[i], xpp);
> + ignore = record_matches_regex(&rec[i], xpp);
>
> xch->ignore = ignore;
> }
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index 2161ac3cd0..b2f1f30cd3 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -25,7 +25,7 @@
>
> static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb)
> {
> - xrecord_t *rec = xdf->recs[ri];
> + xrecord_t *rec = &xdf->recs[ri];
>
> if (xdl_emit_diffrec(rec->ptr, rec->size, pre, strlen(pre), ecb) < 0)
> return -1;
> @@ -110,7 +110,7 @@ static long def_ff(const char *rec, long len, char *buf, long sz)
> static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri,
> char *buf, long sz)
> {
> - xrecord_t *rec = xdf->recs[ri];
> + xrecord_t *rec = &xdf->recs[ri];
>
> if (!xecfg->find_func)
> return def_ff(rec->ptr, rec->size, buf, sz);
> @@ -150,7 +150,7 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg,
>
> static int is_empty_rec(xdfile_t *xdf, long ri)
> {
> - xrecord_t *rec = xdf->recs[ri];
> + xrecord_t *rec = &xdf->recs[ri];
> long i = 0;
>
> for (; i < rec->size && XDL_ISSPACE(rec->ptr[i]); i++);
> diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
> index 040d81e0bc..4d857e8ae2 100644
> --- a/xdiff/xhistogram.c
> +++ b/xdiff/xhistogram.c
> @@ -86,7 +86,7 @@ struct region {
> ((LINE_MAP(index, ptr))->cnt)
>
> #define REC(env, s, l) \
> - (env->xdf##s.recs[l - 1])
> + (&env->xdf##s.recs[l - 1])
>
> static int cmp_recs(xrecord_t *r1, xrecord_t *r2)
> {
> diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
> index af40c88a5b..fd600cbb5d 100644
> --- a/xdiff/xmerge.c
> +++ b/xdiff/xmerge.c
> @@ -97,12 +97,12 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2,
> int line_count, long flags)
> {
> int i;
> - xrecord_t **rec1 = xe1->xdf2.recs + i1;
> - xrecord_t **rec2 = xe2->xdf2.recs + i2;
> + xrecord_t *rec1 = xe1->xdf2.recs + i1;
> + xrecord_t *rec2 = xe2->xdf2.recs + i2;
>
> for (i = 0; i < line_count; i++) {
> - int result = xdl_recmatch(rec1[i]->ptr, rec1[i]->size,
> - rec2[i]->ptr, rec2[i]->size, flags);
> + int result = xdl_recmatch(rec1[i].ptr, rec1[i].size,
> + rec2[i].ptr, rec2[i].size, flags);
> if (!result)
> return -1;
> }
> @@ -111,7 +111,7 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2,
>
> static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int needs_cr, int add_nl, char *dest)
> {
> - xrecord_t **recs;
> + xrecord_t *recs;
> int size = 0;
>
> recs = (use_orig ? xe->xdf1.recs : xe->xdf2.recs) + i;
> @@ -119,12 +119,12 @@ static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int nee
> if (count < 1)
> return 0;
>
> - for (i = 0; i < count; size += recs[i++]->size)
> + for (i = 0; i < count; size += recs[i++].size)
> if (dest)
> - memcpy(dest + size, recs[i]->ptr, recs[i]->size);
> + memcpy(dest + size, recs[i].ptr, recs[i].size);
> if (add_nl) {
> - i = recs[count - 1]->size;
> - if (i == 0 || recs[count - 1]->ptr[i - 1] != '\n') {
> + i = recs[count - 1].size;
> + if (i == 0 || recs[count - 1].ptr[i - 1] != '\n') {
> if (needs_cr) {
> if (dest)
> dest[size] = '\r';
> @@ -160,22 +160,22 @@ static int is_eol_crlf(xdfile_t *file, int i)
>
> if (i < file->nrec - 1)
> /* All lines before the last *must* end in LF */
> - return (size = file->recs[i]->size) > 1 &&
> - file->recs[i]->ptr[size - 2] == '\r';
> + return (size = file->recs[i].size) > 1 &&
> + file->recs[i].ptr[size - 2] == '\r';
> if (!file->nrec)
> /* Cannot determine eol style from empty file */
> return -1;
> - if ((size = file->recs[i]->size) &&
> - file->recs[i]->ptr[size - 1] == '\n')
> + if ((size = file->recs[i].size) &&
> + file->recs[i].ptr[size - 1] == '\n')
> /* Last line; ends in LF; Is it CR/LF? */
> return size > 1 &&
> - file->recs[i]->ptr[size - 2] == '\r';
> + file->recs[i].ptr[size - 2] == '\r';
> if (!i)
> /* The only line has no eol */
> return -1;
> /* Determine eol from second-to-last line */
> - return (size = file->recs[i - 1]->size) > 1 &&
> - file->recs[i - 1]->ptr[size - 2] == '\r';
> + return (size = file->recs[i - 1].size) > 1 &&
> + file->recs[i - 1].ptr[size - 2] == '\r';
> }
>
> static int is_cr_needed(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m)
> @@ -334,22 +334,22 @@ static int recmatch(xrecord_t *rec1, xrecord_t *rec2, unsigned long flags)
> static void xdl_refine_zdiff3_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
> xpparam_t const *xpp)
> {
> - xrecord_t **rec1 = xe1->xdf2.recs, **rec2 = xe2->xdf2.recs;
> + xrecord_t *rec1 = xe1->xdf2.recs, *rec2 = xe2->xdf2.recs;
> for (; m; m = m->next) {
> /* let's handle just the conflicts */
> if (m->mode)
> continue;
>
> while(m->chg1 && m->chg2 &&
> - recmatch(rec1[m->i1], rec2[m->i2], xpp->flags)) {
> + recmatch(&rec1[m->i1], &rec2[m->i2], xpp->flags)) {
> m->chg1--;
> m->chg2--;
> m->i1++;
> m->i2++;
> }
> while (m->chg1 && m->chg2 &&
> - recmatch(rec1[m->i1 + m->chg1 - 1],
> - rec2[m->i2 + m->chg2 - 1], xpp->flags)) {
> + recmatch(&rec1[m->i1 + m->chg1 - 1],
> + &rec2[m->i2 + m->chg2 - 1], xpp->flags)) {
> m->chg1--;
> m->chg2--;
> }
> @@ -381,12 +381,12 @@ static int xdl_refine_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
> * This probably does not work outside git, since
> * we have a very simple mmfile structure.
> */
> - t1.ptr = (char *)xe1->xdf2.recs[m->i1]->ptr;
> - t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1]->ptr
> - + xe1->xdf2.recs[m->i1 + m->chg1 - 1]->size - t1.ptr;
> - t2.ptr = (char *)xe2->xdf2.recs[m->i2]->ptr;
> - t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1]->ptr
> - + xe2->xdf2.recs[m->i2 + m->chg2 - 1]->size - t2.ptr;
> + t1.ptr = (char *)xe1->xdf2.recs[m->i1].ptr;
> + t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1].ptr
> + + xe1->xdf2.recs[m->i1 + m->chg1 - 1].size - t1.ptr;
> + t2.ptr = (char *)xe2->xdf2.recs[m->i2].ptr;
> + t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1].ptr
> + + xe2->xdf2.recs[m->i2 + m->chg2 - 1].size - t2.ptr;
> if (xdl_do_diff(&t1, &t2, xpp, &xe) < 0)
> return -1;
> if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
> @@ -440,8 +440,8 @@ static int line_contains_alnum(const char *ptr, long size)
> static int lines_contain_alnum(xdfenv_t *xe, int i, int chg)
> {
> for (; chg; chg--, i++)
> - if (line_contains_alnum(xe->xdf2.recs[i]->ptr,
> - xe->xdf2.recs[i]->size))
> + if (line_contains_alnum(xe->xdf2.recs[i].ptr,
> + xe->xdf2.recs[i].size))
> return 1;
> return 0;
> }
> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
> index 77dc411d19..bf69a58527 100644
> --- a/xdiff/xpatience.c
> +++ b/xdiff/xpatience.c
> @@ -88,9 +88,9 @@ static int is_anchor(xpparam_t const *xpp, const char *line)
> static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
> int pass)
> {
> - xrecord_t **records = pass == 1 ?
> + xrecord_t *records = pass == 1 ?
> map->env->xdf1.recs : map->env->xdf2.recs;
> - xrecord_t *record = records[line - 1];
> + xrecord_t *record = &records[line - 1];
> /*
> * After xdl_prepare_env() (or more precisely, due to
> * xdl_classify_record()), the "ha" member of the records (AKA lines)
> @@ -121,7 +121,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
> return;
> map->entries[index].line1 = line;
> map->entries[index].hash = record->ha;
> - map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1]->ptr);
> + map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1].ptr);
> if (!map->first)
> map->first = map->entries + index;
> if (map->last) {
> @@ -246,8 +246,8 @@ static int find_longest_common_sequence(struct hashmap *map, struct entry **res)
>
> static int match(struct hashmap *map, int line1, int line2)
> {
> - xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
> - xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
> + xrecord_t *record1 = &map->env->xdf1.recs[line1 - 1];
> + xrecord_t *record2 = &map->env->xdf2.recs[line2 - 1];
> return record1->ha == record2->ha;
> }
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 6f1d4b4725..92f9845003 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -131,7 +131,6 @@ static void xdl_free_ctx(xdfile_t *xdf)
> xdl_free(xdf->rindex);
> xdl_free(xdf->rchg - 1);
> xdl_free(xdf->recs);
> - xdl_cha_free(&xdf->rcha);
> }
>
>
> @@ -146,8 +145,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
> xdf->rchg = NULL;
> xdf->recs = NULL;
>
> - if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
> - goto abort;
> if (!XDL_ALLOC_ARRAY(xdf->recs, narec))
> goto abort;
>
> @@ -158,12 +155,10 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
> hav = xdl_hash_record(&cur, top, xpp->flags);
> if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec))
> goto abort;
> - if (!(crec = xdl_cha_alloc(&xdf->rcha)))
> - goto abort;
> + crec = &xdf->recs[xdf->nrec++];
> crec->ptr = prev;
> crec->size = (long) (cur - prev);
> crec->ha = hav;
> - xdf->recs[xdf->nrec++] = crec;
> if (xdl_classify_record(pass, cf, crec) < 0)
> goto abort;
> }
> @@ -263,7 +258,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
> */
> static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) {
> long i, nm, nreff, mlim;
> - xrecord_t **recs;
> + xrecord_t *recs;
> xdlclass_t *rcrec;
> char *dis, *dis1, *dis2;
> int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
> @@ -276,7 +271,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT)
> mlim = XDL_MAX_EQLIMIT;
> for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
> - rcrec = cf->rcrecs[(*recs)->ha];
> + rcrec = cf->rcrecs[recs->ha];
> nm = rcrec ? rcrec->len2 : 0;
> dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
> }
> @@ -284,7 +279,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
> mlim = XDL_MAX_EQLIMIT;
> for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
> - rcrec = cf->rcrecs[(*recs)->ha];
> + rcrec = cf->rcrecs[recs->ha];
> nm = rcrec ? rcrec->len1 : 0;
> dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
> }
> @@ -320,13 +315,13 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> */
> static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
> long i, lim;
> - xrecord_t **recs1, **recs2;
> + xrecord_t *recs1, *recs2;
>
> recs1 = xdf1->recs;
> recs2 = xdf2->recs;
> for (i = 0, lim = XDL_MIN(xdf1->nrec, xdf2->nrec); i < lim;
> i++, recs1++, recs2++)
> - if ((*recs1)->ha != (*recs2)->ha)
> + if (recs1->ha != recs2->ha)
> break;
>
> xdf1->dstart = xdf2->dstart = i;
> @@ -334,7 +329,7 @@ static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
> recs1 = xdf1->recs + xdf1->nrec - 1;
> recs2 = xdf2->recs + xdf2->nrec - 1;
> for (lim -= i, i = 0; i < lim; i++, recs1--, recs2--)
> - if ((*recs1)->ha != (*recs2)->ha)
> + if (recs1->ha != recs2->ha)
> break;
>
> xdf1->dend = xdf1->nrec - i - 1;
> diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
> index 85848f1685..3d26cbf1ec 100644
> --- a/xdiff/xtypes.h
> +++ b/xdiff/xtypes.h
> @@ -45,10 +45,9 @@ typedef struct s_xrecord {
> } xrecord_t;
>
> typedef struct s_xdfile {
> - chastore_t rcha;
> + xrecord_t *recs;
> long nrec;
> long dstart, dend;
> - xrecord_t **recs;
> char *rchg;
> long *rindex;
> long nreff;
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index 444a108f87..332982b509 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -416,12 +416,12 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
> mmfile_t subfile1, subfile2;
> xdfenv_t env;
>
> - subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr;
> - subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr +
> - diff_env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
> - subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr;
> - subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr +
> - diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
> + subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1].ptr;
> + subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2].ptr +
> + diff_env->xdf1.recs[line1 + count1 - 2].size - subfile1.ptr;
> + subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1].ptr;
> + subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2].ptr +
> + diff_env->xdf2.recs[line2 + count2 - 2].size - subfile2.ptr;
> if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0)
> return -1;
>
> --
> gitgitgadget
You weren't kidding with the --color-words callout; there's an awful
lot of places where you only change one or two characters (e.g. '->'
becoming '.'); that's much easier to see when viewing the diff with
that flag.
Anyway, looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
On 09/09/2025 09:58, Elijah Newren wrote:
> On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Ezekiel Newren <ezekielnewren@gmail.com>
>> The chastore_t type is very unfriendly to Rust FFI. It's also redundant
>> since 'recs' is a vector type that grows every time an xrecord_t is
>> added.
> > The second sentence seems to presume the reader knows what chastore_t
> type is for, and about the confusing dual layering between it and
> recs.its confusing dual layering. I liked your more extended
> explanation in https://lore.kernel.org/git/7ea2dccd71fc502f20614ce217fc9885d1b17413.1756496539.git.gitgitgadget@gmail.com/;
> could some of that be used here?
I agree that's a better explaination. I also think it would be helpful to spell out the implications of this change. If I understand the change correctly we now store all the records in a contiguous array, rather than having the records in a arena and storing a separate array of pointers to those records. As sizeof(xrecord_t) is pretty small the change to contiguous storage hopefully wont cause any allocation issues, though I guess it does mean we end up copying more data as we grow the array compared to using an arena.
Overall these first few patches look like a really nice cleanup.
Thanks
Phillip
>>
>> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
>> ---
>> xdiff/xdiffi.c | 24 ++++++++++----------
>> xdiff/xemit.c | 6 ++---
>> xdiff/xhistogram.c | 2 +-
>> xdiff/xmerge.c | 56 +++++++++++++++++++++++-----------------------
>> xdiff/xpatience.c | 10 ++++-----
>> xdiff/xprepare.c | 19 ++++++----------
>> xdiff/xtypes.h | 3 +--
>> xdiff/xutils.c | 12 +++++-----
>> 8 files changed, 63 insertions(+), 69 deletions(-)
>>
>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>> index 11cd090b53..a66125d44a 100644
>> --- a/xdiff/xdiffi.c
>> +++ b/xdiff/xdiffi.c
>> @@ -24,7 +24,7 @@
>>
>> static unsigned long get_hash(xdfile_t *xdf, long index)
>> {
>> - return xdf->recs[xdf->rindex[index]]->ha;
>> + return xdf->recs[xdf->rindex[index]].ha;
>> }
>>
>> #define XDL_MAX_COST_MIN 256
>> @@ -489,13 +489,13 @@ static void measure_split(const xdfile_t *xdf, long split,
>> m->indent = -1;
>> } else {
>> m->end_of_file = 0;
>> - m->indent = get_indent(xdf->recs[split]);
>> + m->indent = get_indent(&xdf->recs[split]);
>> }
>>
>> m->pre_blank = 0;
>> m->pre_indent = -1;
>> for (i = split - 1; i >= 0; i--) {
>> - m->pre_indent = get_indent(xdf->recs[i]);
>> + m->pre_indent = get_indent(&xdf->recs[i]);
>> if (m->pre_indent != -1)
>> break;
>> m->pre_blank += 1;
>> @@ -508,7 +508,7 @@ static void measure_split(const xdfile_t *xdf, long split,
>> m->post_blank = 0;
>> m->post_indent = -1;
>> for (i = split + 1; i < xdf->nrec; i++) {
>> - m->post_indent = get_indent(xdf->recs[i]);
>> + m->post_indent = get_indent(&xdf->recs[i]);
>> if (m->post_indent != -1)
>> break;
>> m->post_blank += 1;
>> @@ -752,7 +752,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g)
>> static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
>> {
>> if (g->end < xdf->nrec &&
>> - recs_match(xdf->recs[g->start], xdf->recs[g->end])) {
>> + recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) {
>> xdf->rchg[g->start++] = 0;
>> xdf->rchg[g->end++] = 1;
>>
>> @@ -773,7 +773,7 @@ static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
>> static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g)
>> {
>> if (g->start > 0 &&
>> - recs_match(xdf->recs[g->start - 1], xdf->recs[g->end - 1])) {
>> + recs_match(&xdf->recs[g->start - 1], &xdf->recs[g->end - 1])) {
>> xdf->rchg[--g->start] = 1;
>> xdf->rchg[--g->end] = 0;
>>
>> @@ -988,16 +988,16 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
>>
>> for (xch = xscr; xch; xch = xch->next) {
>> int ignore = 1;
>> - xrecord_t **rec;
>> + xrecord_t *rec;
>> long i;
>>
>> rec = &xe->xdf1.recs[xch->i1];
>> for (i = 0; i < xch->chg1 && ignore; i++)
>> - ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags);
>> + ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
>>
>> rec = &xe->xdf2.recs[xch->i2];
>> for (i = 0; i < xch->chg2 && ignore; i++)
>> - ignore = xdl_blankline(rec[i]->ptr, rec[i]->size, flags);
>> + ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
>>
>> xch->ignore = ignore;
>> }
>> @@ -1021,7 +1021,7 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
>> xdchange_t *xch;
>>
>> for (xch = xscr; xch; xch = xch->next) {
>> - xrecord_t **rec;
>> + xrecord_t *rec;
>> int ignore = 1;
>> long i;
>>
>> @@ -1033,11 +1033,11 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
>>
>> rec = &xe->xdf1.recs[xch->i1];
>> for (i = 0; i < xch->chg1 && ignore; i++)
>> - ignore = record_matches_regex(rec[i], xpp);
>> + ignore = record_matches_regex(&rec[i], xpp);
>>
>> rec = &xe->xdf2.recs[xch->i2];
>> for (i = 0; i < xch->chg2 && ignore; i++)
>> - ignore = record_matches_regex(rec[i], xpp);
>> + ignore = record_matches_regex(&rec[i], xpp);
>>
>> xch->ignore = ignore;
>> }
>> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
>> index 2161ac3cd0..b2f1f30cd3 100644
>> --- a/xdiff/xemit.c
>> +++ b/xdiff/xemit.c
>> @@ -25,7 +25,7 @@
>>
>> static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb)
>> {
>> - xrecord_t *rec = xdf->recs[ri];
>> + xrecord_t *rec = &xdf->recs[ri];
>>
>> if (xdl_emit_diffrec(rec->ptr, rec->size, pre, strlen(pre), ecb) < 0)
>> return -1;
>> @@ -110,7 +110,7 @@ static long def_ff(const char *rec, long len, char *buf, long sz)
>> static long match_func_rec(xdfile_t *xdf, xdemitconf_t const *xecfg, long ri,
>> char *buf, long sz)
>> {
>> - xrecord_t *rec = xdf->recs[ri];
>> + xrecord_t *rec = &xdf->recs[ri];
>>
>> if (!xecfg->find_func)
>> return def_ff(rec->ptr, rec->size, buf, sz);
>> @@ -150,7 +150,7 @@ static long get_func_line(xdfenv_t *xe, xdemitconf_t const *xecfg,
>>
>> static int is_empty_rec(xdfile_t *xdf, long ri)
>> {
>> - xrecord_t *rec = xdf->recs[ri];
>> + xrecord_t *rec = &xdf->recs[ri];
>> long i = 0;
>>
>> for (; i < rec->size && XDL_ISSPACE(rec->ptr[i]); i++);
>> diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
>> index 040d81e0bc..4d857e8ae2 100644
>> --- a/xdiff/xhistogram.c
>> +++ b/xdiff/xhistogram.c
>> @@ -86,7 +86,7 @@ struct region {
>> ((LINE_MAP(index, ptr))->cnt)
>>
>> #define REC(env, s, l) \
>> - (env->xdf##s.recs[l - 1])
>> + (&env->xdf##s.recs[l - 1])
>>
>> static int cmp_recs(xrecord_t *r1, xrecord_t *r2)
>> {
>> diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
>> index af40c88a5b..fd600cbb5d 100644
>> --- a/xdiff/xmerge.c
>> +++ b/xdiff/xmerge.c
>> @@ -97,12 +97,12 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2,
>> int line_count, long flags)
>> {
>> int i;
>> - xrecord_t **rec1 = xe1->xdf2.recs + i1;
>> - xrecord_t **rec2 = xe2->xdf2.recs + i2;
>> + xrecord_t *rec1 = xe1->xdf2.recs + i1;
>> + xrecord_t *rec2 = xe2->xdf2.recs + i2;
>>
>> for (i = 0; i < line_count; i++) {
>> - int result = xdl_recmatch(rec1[i]->ptr, rec1[i]->size,
>> - rec2[i]->ptr, rec2[i]->size, flags);
>> + int result = xdl_recmatch(rec1[i].ptr, rec1[i].size,
>> + rec2[i].ptr, rec2[i].size, flags);
>> if (!result)
>> return -1;
>> }
>> @@ -111,7 +111,7 @@ static int xdl_merge_cmp_lines(xdfenv_t *xe1, int i1, xdfenv_t *xe2, int i2,
>>
>> static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int needs_cr, int add_nl, char *dest)
>> {
>> - xrecord_t **recs;
>> + xrecord_t *recs;
>> int size = 0;
>>
>> recs = (use_orig ? xe->xdf1.recs : xe->xdf2.recs) + i;
>> @@ -119,12 +119,12 @@ static int xdl_recs_copy_0(int use_orig, xdfenv_t *xe, int i, int count, int nee
>> if (count < 1)
>> return 0;
>>
>> - for (i = 0; i < count; size += recs[i++]->size)
>> + for (i = 0; i < count; size += recs[i++].size)
>> if (dest)
>> - memcpy(dest + size, recs[i]->ptr, recs[i]->size);
>> + memcpy(dest + size, recs[i].ptr, recs[i].size);
>> if (add_nl) {
>> - i = recs[count - 1]->size;
>> - if (i == 0 || recs[count - 1]->ptr[i - 1] != '\n') {
>> + i = recs[count - 1].size;
>> + if (i == 0 || recs[count - 1].ptr[i - 1] != '\n') {
>> if (needs_cr) {
>> if (dest)
>> dest[size] = '\r';
>> @@ -160,22 +160,22 @@ static int is_eol_crlf(xdfile_t *file, int i)
>>
>> if (i < file->nrec - 1)
>> /* All lines before the last *must* end in LF */
>> - return (size = file->recs[i]->size) > 1 &&
>> - file->recs[i]->ptr[size - 2] == '\r';
>> + return (size = file->recs[i].size) > 1 &&
>> + file->recs[i].ptr[size - 2] == '\r';
>> if (!file->nrec)
>> /* Cannot determine eol style from empty file */
>> return -1;
>> - if ((size = file->recs[i]->size) &&
>> - file->recs[i]->ptr[size - 1] == '\n')
>> + if ((size = file->recs[i].size) &&
>> + file->recs[i].ptr[size - 1] == '\n')
>> /* Last line; ends in LF; Is it CR/LF? */
>> return size > 1 &&
>> - file->recs[i]->ptr[size - 2] == '\r';
>> + file->recs[i].ptr[size - 2] == '\r';
>> if (!i)
>> /* The only line has no eol */
>> return -1;
>> /* Determine eol from second-to-last line */
>> - return (size = file->recs[i - 1]->size) > 1 &&
>> - file->recs[i - 1]->ptr[size - 2] == '\r';
>> + return (size = file->recs[i - 1].size) > 1 &&
>> + file->recs[i - 1].ptr[size - 2] == '\r';
>> }
>>
>> static int is_cr_needed(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m)
>> @@ -334,22 +334,22 @@ static int recmatch(xrecord_t *rec1, xrecord_t *rec2, unsigned long flags)
>> static void xdl_refine_zdiff3_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
>> xpparam_t const *xpp)
>> {
>> - xrecord_t **rec1 = xe1->xdf2.recs, **rec2 = xe2->xdf2.recs;
>> + xrecord_t *rec1 = xe1->xdf2.recs, *rec2 = xe2->xdf2.recs;
>> for (; m; m = m->next) {
>> /* let's handle just the conflicts */
>> if (m->mode)
>> continue;
>>
>> while(m->chg1 && m->chg2 &&
>> - recmatch(rec1[m->i1], rec2[m->i2], xpp->flags)) {
>> + recmatch(&rec1[m->i1], &rec2[m->i2], xpp->flags)) {
>> m->chg1--;
>> m->chg2--;
>> m->i1++;
>> m->i2++;
>> }
>> while (m->chg1 && m->chg2 &&
>> - recmatch(rec1[m->i1 + m->chg1 - 1],
>> - rec2[m->i2 + m->chg2 - 1], xpp->flags)) {
>> + recmatch(&rec1[m->i1 + m->chg1 - 1],
>> + &rec2[m->i2 + m->chg2 - 1], xpp->flags)) {
>> m->chg1--;
>> m->chg2--;
>> }
>> @@ -381,12 +381,12 @@ static int xdl_refine_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
>> * This probably does not work outside git, since
>> * we have a very simple mmfile structure.
>> */
>> - t1.ptr = (char *)xe1->xdf2.recs[m->i1]->ptr;
>> - t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1]->ptr
>> - + xe1->xdf2.recs[m->i1 + m->chg1 - 1]->size - t1.ptr;
>> - t2.ptr = (char *)xe2->xdf2.recs[m->i2]->ptr;
>> - t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1]->ptr
>> - + xe2->xdf2.recs[m->i2 + m->chg2 - 1]->size - t2.ptr;
>> + t1.ptr = (char *)xe1->xdf2.recs[m->i1].ptr;
>> + t1.size = xe1->xdf2.recs[m->i1 + m->chg1 - 1].ptr
>> + + xe1->xdf2.recs[m->i1 + m->chg1 - 1].size - t1.ptr;
>> + t2.ptr = (char *)xe2->xdf2.recs[m->i2].ptr;
>> + t2.size = xe2->xdf2.recs[m->i2 + m->chg2 - 1].ptr
>> + + xe2->xdf2.recs[m->i2 + m->chg2 - 1].size - t2.ptr;
>> if (xdl_do_diff(&t1, &t2, xpp, &xe) < 0)
>> return -1;
>> if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
>> @@ -440,8 +440,8 @@ static int line_contains_alnum(const char *ptr, long size)
>> static int lines_contain_alnum(xdfenv_t *xe, int i, int chg)
>> {
>> for (; chg; chg--, i++)
>> - if (line_contains_alnum(xe->xdf2.recs[i]->ptr,
>> - xe->xdf2.recs[i]->size))
>> + if (line_contains_alnum(xe->xdf2.recs[i].ptr,
>> + xe->xdf2.recs[i].size))
>> return 1;
>> return 0;
>> }
>> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
>> index 77dc411d19..bf69a58527 100644
>> --- a/xdiff/xpatience.c
>> +++ b/xdiff/xpatience.c
>> @@ -88,9 +88,9 @@ static int is_anchor(xpparam_t const *xpp, const char *line)
>> static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>> int pass)
>> {
>> - xrecord_t **records = pass == 1 ?
>> + xrecord_t *records = pass == 1 ?
>> map->env->xdf1.recs : map->env->xdf2.recs;
>> - xrecord_t *record = records[line - 1];
>> + xrecord_t *record = &records[line - 1];
>> /*
>> * After xdl_prepare_env() (or more precisely, due to
>> * xdl_classify_record()), the "ha" member of the records (AKA lines)
>> @@ -121,7 +121,7 @@ static void insert_record(xpparam_t const *xpp, int line, struct hashmap *map,
>> return;
>> map->entries[index].line1 = line;
>> map->entries[index].hash = record->ha;
>> - map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1]->ptr);
>> + map->entries[index].anchor = is_anchor(xpp, map->env->xdf1.recs[line - 1].ptr);
>> if (!map->first)
>> map->first = map->entries + index;
>> if (map->last) {
>> @@ -246,8 +246,8 @@ static int find_longest_common_sequence(struct hashmap *map, struct entry **res)
>>
>> static int match(struct hashmap *map, int line1, int line2)
>> {
>> - xrecord_t *record1 = map->env->xdf1.recs[line1 - 1];
>> - xrecord_t *record2 = map->env->xdf2.recs[line2 - 1];
>> + xrecord_t *record1 = &map->env->xdf1.recs[line1 - 1];
>> + xrecord_t *record2 = &map->env->xdf2.recs[line2 - 1];
>> return record1->ha == record2->ha;
>> }
>>
>> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
>> index 6f1d4b4725..92f9845003 100644
>> --- a/xdiff/xprepare.c
>> +++ b/xdiff/xprepare.c
>> @@ -131,7 +131,6 @@ static void xdl_free_ctx(xdfile_t *xdf)
>> xdl_free(xdf->rindex);
>> xdl_free(xdf->rchg - 1);
>> xdl_free(xdf->recs);
>> - xdl_cha_free(&xdf->rcha);
>> }
>>
>>
>> @@ -146,8 +145,6 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>> xdf->rchg = NULL;
>> xdf->recs = NULL;
>>
>> - if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
>> - goto abort;
>> if (!XDL_ALLOC_ARRAY(xdf->recs, narec))
>> goto abort;
>>
>> @@ -158,12 +155,10 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
>> hav = xdl_hash_record(&cur, top, xpp->flags);
>> if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec))
>> goto abort;
>> - if (!(crec = xdl_cha_alloc(&xdf->rcha)))
>> - goto abort;
>> + crec = &xdf->recs[xdf->nrec++];
>> crec->ptr = prev;
>> crec->size = (long) (cur - prev);
>> crec->ha = hav;
>> - xdf->recs[xdf->nrec++] = crec;
>> if (xdl_classify_record(pass, cf, crec) < 0)
>> goto abort;
>> }
>> @@ -263,7 +258,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>> */
>> static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) {
>> long i, nm, nreff, mlim;
>> - xrecord_t **recs;
>> + xrecord_t *recs;
>> xdlclass_t *rcrec;
>> char *dis, *dis1, *dis2;
>> int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
>> @@ -276,7 +271,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>> if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT)
>> mlim = XDL_MAX_EQLIMIT;
>> for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
>> - rcrec = cf->rcrecs[(*recs)->ha];
>> + rcrec = cf->rcrecs[recs->ha];
>> nm = rcrec ? rcrec->len2 : 0;
>> dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>> }
>> @@ -284,7 +279,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>> if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
>> mlim = XDL_MAX_EQLIMIT;
>> for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
>> - rcrec = cf->rcrecs[(*recs)->ha];
>> + rcrec = cf->rcrecs[recs->ha];
>> nm = rcrec ? rcrec->len1 : 0;
>> dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>> }
>> @@ -320,13 +315,13 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>> */
>> static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
>> long i, lim;
>> - xrecord_t **recs1, **recs2;
>> + xrecord_t *recs1, *recs2;
>>
>> recs1 = xdf1->recs;
>> recs2 = xdf2->recs;
>> for (i = 0, lim = XDL_MIN(xdf1->nrec, xdf2->nrec); i < lim;
>> i++, recs1++, recs2++)
>> - if ((*recs1)->ha != (*recs2)->ha)
>> + if (recs1->ha != recs2->ha)
>> break;
>>
>> xdf1->dstart = xdf2->dstart = i;
>> @@ -334,7 +329,7 @@ static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
>> recs1 = xdf1->recs + xdf1->nrec - 1;
>> recs2 = xdf2->recs + xdf2->nrec - 1;
>> for (lim -= i, i = 0; i < lim; i++, recs1--, recs2--)
>> - if ((*recs1)->ha != (*recs2)->ha)
>> + if (recs1->ha != recs2->ha)
>> break;
>>
>> xdf1->dend = xdf1->nrec - i - 1;
>> diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
>> index 85848f1685..3d26cbf1ec 100644
>> --- a/xdiff/xtypes.h
>> +++ b/xdiff/xtypes.h
>> @@ -45,10 +45,9 @@ typedef struct s_xrecord {
>> } xrecord_t;
>>
>> typedef struct s_xdfile {
>> - chastore_t rcha;
>> + xrecord_t *recs;
>> long nrec;
>> long dstart, dend;
>> - xrecord_t **recs;
>> char *rchg;
>> long *rindex;
>> long nreff;
>> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
>> index 444a108f87..332982b509 100644
>> --- a/xdiff/xutils.c
>> +++ b/xdiff/xutils.c
>> @@ -416,12 +416,12 @@ int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp,
>> mmfile_t subfile1, subfile2;
>> xdfenv_t env;
>>
>> - subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1]->ptr;
>> - subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2]->ptr +
>> - diff_env->xdf1.recs[line1 + count1 - 2]->size - subfile1.ptr;
>> - subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1]->ptr;
>> - subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2]->ptr +
>> - diff_env->xdf2.recs[line2 + count2 - 2]->size - subfile2.ptr;
>> + subfile1.ptr = (char *)diff_env->xdf1.recs[line1 - 1].ptr;
>> + subfile1.size = diff_env->xdf1.recs[line1 + count1 - 2].ptr +
>> + diff_env->xdf1.recs[line1 + count1 - 2].size - subfile1.ptr;
>> + subfile2.ptr = (char *)diff_env->xdf2.recs[line2 - 1].ptr;
>> + subfile2.size = diff_env->xdf2.recs[line2 + count2 - 2].ptr +
>> + diff_env->xdf2.recs[line2 + count2 - 2].size - subfile2.ptr;
>> if (xdl_do_diff(&subfile1, &subfile2, xpp, &env) < 0)
>> return -1;
>>
>> --
>> gitgitgadget
> > You weren't kidding with the --color-words callout; there's an awful
> lot of places where you only change one or two characters (e.g. '->'
> becoming '.'); that's much easier to see when viewing the diff with
> that flag.
> > Anyway, looks good.
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Elijah Newren <newren@gmail.com> writes:
> On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> My personal bias is that things like "view with --color-words" makes
> more sense to include near the end of the commit message, just before
> the sign-offs. Not sure if others agree on that.
FWIW, I am with you. Certainly not on the title. It is even fine
immediately below the three-dash line before the diffstat.
Thanks. I am enjoying to follow along the series by reading your
reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ben Knoble wrote (reply to this):
> Le 9 sept. 2025 à 04:58, Elijah Newren <newren@gmail.com> a écrit :
>
> On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> My personal bias is that things like "view with --color-words" makes
> more sense to include near the end of the commit message, just before
> the sign-offs. Not sure if others agree on that.
I’ve been using a Best-viewed-with trailer; I saw Peff do something similar once, I think.
#define XDIFF_H | ||
|
||
#ifdef __cplusplus | ||
extern "C" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Sun, Sep 7, 2025 at 12:46 PM Ezekiel Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> Define macros NO(0), YES(1), MAYBE(2) as the enum values for rchg to
> make the code easier to follow. Perhaps 'rchg' should be renamed to
> 'changed'?
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
> xdiff/xdiff.h | 4 ++++
> xdiff/xdiffi.c | 29 ++++++++++++++---------------
> xdiff/xhistogram.c | 8 ++++----
> xdiff/xpatience.c | 8 ++++----
> xdiff/xprepare.c | 24 ++++++++++++------------
> 5 files changed, 38 insertions(+), 35 deletions(-)
>
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index 2cecde5afe..7092879829 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -27,6 +27,10 @@
> extern "C" {
> #endif /* #ifdef __cplusplus */
>
> +#define NO 0
> +#define YES 1
> +#define MAYBE 2
> +
> /* xpparm_t.flags */
> #define XDF_NEED_MINIMAL (1 << 0)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index a66125d44a..44fd27823a 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -278,10 +278,10 @@ int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
> */
> if (off1 == lim1) {
> for (; off2 < lim2; off2++)
> - xdf2->rchg[xdf2->rindex[off2]] = 1;
> + xdf2->rchg[xdf2->rindex[off2]] = YES;
> } else if (off2 == lim2) {
> for (; off1 < lim1; off1++)
> - xdf1->rchg[xdf1->rindex[off1]] = 1;
> + xdf1->rchg[xdf1->rindex[off1]] = YES;
> } else {
> xdpsplit_t spl;
> spl.i1 = spl.i2 = 0;
> @@ -708,7 +708,7 @@ struct xdlgroup {
> static void group_init(xdfile_t *xdf, struct xdlgroup *g)
> {
> g->start = g->end = 0;
> - while (xdf->rchg[g->end])
> + while (xdf->rchg[g->end] == YES)
You've got a few places like this where the old code would have
behaved differently if there were some MAYBE values. I presume you've
carefully vetted that those can't happen at these points in the code,
but it might be worth calling that out in the commit message for
reviewers who'll otherwise wonder if there's a behavior change that
has occurred.
> g->end++;
> }
>
> @@ -722,7 +722,7 @@ static inline int group_next(xdfile_t *xdf, struct xdlgroup *g)
> return -1;
>
> g->start = g->end + 1;
> - for (g->end = g->start; xdf->rchg[g->end]; g->end++)
> + for (g->end = g->start; xdf->rchg[g->end] == YES; g->end++)
Here's another of those where you assume MAYBE isn't possible.
> ;
>
> return 0;
> @@ -738,7 +738,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g)
> return -1;
>
> g->end = g->start - 1;
> - for (g->start = g->end; xdf->rchg[g->start - 1]; g->start--)
> + for (g->start = g->end; xdf->rchg[g->start - 1] == YES; g->start--)
...and another.
> ;
>
> return 0;
> @@ -753,10 +753,10 @@ static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
> {
> if (g->end < xdf->nrec &&
> recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) {
> - xdf->rchg[g->start++] = 0;
> - xdf->rchg[g->end++] = 1;
> + xdf->rchg[g->start++] = NO;
> + xdf->rchg[g->end++] = YES;
>
> - while (xdf->rchg[g->end])
> + while (xdf->rchg[g->end] == YES)
...and another.
> g->end++;
>
> return 0;
> @@ -774,10 +774,10 @@ static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g)
> {
> if (g->start > 0 &&
> recs_match(&xdf->recs[g->start - 1], &xdf->recs[g->end - 1])) {
> - xdf->rchg[--g->start] = 1;
> - xdf->rchg[--g->end] = 0;
> + xdf->rchg[--g->start] = YES;
> + xdf->rchg[--g->end] = NO;
>
> - while (xdf->rchg[g->start - 1])
> + while (xdf->rchg[g->start - 1] == YES)
...and another.
> g->start--;
>
> return 0;
> @@ -932,16 +932,15 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>
> int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
> xdchange_t *cscr = NULL, *xch;
> - char *rchg1 = xe->xdf1.rchg, *rchg2 = xe->xdf2.rchg;
> long i1, i2, l1, l2;
>
> /*
> * Trivial. Collects "groups" of changes and creates an edit script.
> */
> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
> - if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
> - for (l1 = i1; rchg1[i1 - 1]; i1--);
> - for (l2 = i2; rchg2[i2 - 1]; i2--);
> + if (xe->xdf1.rchg[i1 - 1] || xe->xdf2.rchg[i2 - 1]) {
> + for (l1 = i1; xe->xdf1.rchg[i1 - 1]; i1--);
> + for (l2 = i2; xe->xdf2.rchg[i2 - 1]; i2--);
The changes in this xdl_build_script() function appear to be
orthogonal to what was described in the commit message. If it's a
separate cleanup, perhaps justify it in a separate patch?
>
> if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {
> xdl_free_script(cscr);
> diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
> index 4d857e8ae2..c2e85b8ab9 100644
> --- a/xdiff/xhistogram.c
> +++ b/xdiff/xhistogram.c
> @@ -318,11 +318,11 @@ redo:
>
> if (!count1) {
> while(count2--)
> - env->xdf2.rchg[line2++ - 1] = 1;
> + env->xdf2.rchg[line2++ - 1] = YES;
> return 0;
> } else if (!count2) {
> while(count1--)
> - env->xdf1.rchg[line1++ - 1] = 1;
> + env->xdf1.rchg[line1++ - 1] = YES;
> return 0;
> }
>
> @@ -335,9 +335,9 @@ redo:
> else {
> if (lcs.begin1 == 0 && lcs.begin2 == 0) {
> while (count1--)
> - env->xdf1.rchg[line1++ - 1] = 1;
> + env->xdf1.rchg[line1++ - 1] = YES;
> while (count2--)
> - env->xdf2.rchg[line2++ - 1] = 1;
> + env->xdf2.rchg[line2++ - 1] = YES;
> result = 0;
> } else {
> result = histogram_diff(xpp, env,
> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
> index bf69a58527..20cda5e258 100644
> --- a/xdiff/xpatience.c
> +++ b/xdiff/xpatience.c
> @@ -331,11 +331,11 @@ static int patience_diff(xpparam_t const *xpp, xdfenv_t *env,
> /* trivial case: one side is empty */
> if (!count1) {
> while(count2--)
> - env->xdf2.rchg[line2++ - 1] = 1;
> + env->xdf2.rchg[line2++ - 1] = YES;
> return 0;
> } else if (!count2) {
> while(count1--)
> - env->xdf1.rchg[line1++ - 1] = 1;
> + env->xdf1.rchg[line1++ - 1] = YES;
> return 0;
> }
>
> @@ -347,9 +347,9 @@ static int patience_diff(xpparam_t const *xpp, xdfenv_t *env,
> /* are there any matching lines at all? */
> if (!map.has_matches) {
> while(count1--)
> - env->xdf1.rchg[line1++ - 1] = 1;
> + env->xdf1.rchg[line1++ - 1] = YES;
> while(count2--)
> - env->xdf2.rchg[line2++ - 1] = 1;
> + env->xdf2.rchg[line2++ - 1] = YES;
> xdl_free(map.entries);
> return 0;
> }
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 92f9845003..36437f91bb 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -215,9 +215,9 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
> * current line (i) is already a multimatch line.
> */
> for (r = 1, rdis0 = 0, rpdis0 = 1; (i - r) >= s; r++) {
> - if (!dis[i - r])
> + if (dis[i - r] == NO)
> rdis0++;
> - else if (dis[i - r] == 2)
> + else if (dis[i - r] == MAYBE)
> rpdis0++;
> else
> break;
> @@ -231,9 +231,9 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
> if (rdis0 == 0)
> return 0;
> for (r = 1, rdis1 = 0, rpdis1 = 1; (i + r) <= e; r++) {
> - if (!dis[i + r])
> + if (dis[i + r] == NO)
> rdis1++;
> - else if (dis[i + r] == 2)
> + else if (dis[i + r] == MAYBE)
> rpdis1++;
> else
> break;
> @@ -273,7 +273,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
> rcrec = cf->rcrecs[recs->ha];
> nm = rcrec ? rcrec->len2 : 0;
> - dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
> + dis1[i] = (nm == 0) ? NO: (nm >= mlim && !need_min) ? MAYBE: YES;
> }
>
> if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
> @@ -281,26 +281,26 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
> rcrec = cf->rcrecs[recs->ha];
> nm = rcrec ? rcrec->len1 : 0;
> - dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
> + dis2[i] = (nm == 0) ? NO: (nm >= mlim && !need_min) ? MAYBE: YES;
> }
>
> for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
> i <= xdf1->dend; i++, recs++) {
> - if (dis1[i] == 1 ||
> - (dis1[i] == 2 && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) {
> + if (dis1[i] == YES ||
> + (dis1[i] == MAYBE && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) {
> xdf1->rindex[nreff++] = i;
> } else
> - xdf1->rchg[i] = 1;
> + xdf1->rchg[i] = YES;
> }
> xdf1->nreff = nreff;
>
> for (nreff = 0, i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart];
> i <= xdf2->dend; i++, recs++) {
> - if (dis2[i] == 1 ||
> - (dis2[i] == 2 && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) {
> + if (dis2[i] == YES ||
> + (dis2[i] == MAYBE && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) {
> xdf2->rindex[nreff++] = i;
> } else
> - xdf2->rchg[i] = 1;
> + xdf2->rchg[i] = YES;
> }
> xdf2->nreff = nreff;
>
> --
> gitgitgadget
Everything else looks like the straightforward translation you called
out in your commit message.
User |
Every field in this struct is an alias for a certain field in xdfile_t. diffdata_t.nrec -> xdfile_t.nreff diffdata_t.ha -> xdfile_t.ha diffdata_t.rindex -> xdfile_t.rindex diffdata_t.rchg -> xdfile_t.rchg I think this struct existed before xdfile_t, and was kept for backward compatibility reasons. I think xdiffi should have been refactored to use the new (xdfile_t) struct, but was easier to alias it instead. The local variables rchg* and rindex* don't shorten the lines by much, nor do they really need to be there to make the code more readable. Delete them. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
When 0 <= i < xdfile_t.nreff the following is true: xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]] This makes the code about 5% slower. The fields rindex and ha are specific to the classic diff (myers and minimal). I plan on creating a struct for classic diff, but there's a lot of cleanup that needs to be done before that can happen and leaving ha in would make those cleanups harder to follow. A subsequent commit will delete the chastore cha from xdfile_t. That later commit will investigate deleting ha and cha independently and together. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
…ord_t The fields from xdlclass_t are aliases of xrecord_t: xdlclass_t.line -> xrecord_t.ptr xdlclass_t.size -> xrecord_t.size xdlclass_t.ha -> xrecord_t.ha xdlclass_t carries a copy of the data in xrecord_t, but instead of embedding xrecord_t it duplicates the individual fields. A future commit will change the types used in xrecord_t so embed it in xdlclass_t first, so we don't have to remember to change the types here as well. Best-viewed-with: --color-words Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
xdfile_t currently uses chastore_t which is an arena allocator. I think that xrecord_t used to be a linked list and recs didn't exist originally. When recs was added I think they forgot to remove xdfile_t.next, but was overlooked. This dual data structure setup makes the code somewhat confusing. Additionally the C type chastore_t isn't FFI friendly, and provides little to no performance benefit over using realloc to grow an array. Performance impact of deleting fields from xdfile_t: Deleting ha is about 5% slower. Deleting cha is about 5% faster. Delete ha, but keep cha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.269 s ± 0.017 s [User: 1.135 s, System: 0.128 s] Range (min … max): 1.249 s … 1.286 s 10 runs Benchmark 2: build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.339 s ± 0.017 s [User: 1.234 s, System: 0.099 s] Range (min … max): 1.320 s … 1.358 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.06 ± 0.02 times faster than build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete cha, but keep ha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.290 s ± 0.001 s [User: 1.154 s, System: 0.130 s] Range (min … max): 1.288 s … 1.292 s 10 runs Benchmark 2: build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.232 s ± 0.017 s [User: 1.105 s, System: 0.121 s] Range (min … max): 1.205 s … 1.249 s 10 runs Summary build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.05 ± 0.01 times faster than build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete ha AND chastore time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha_and_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.291 s ± 0.002 s [User: 1.156 s, System: 0.129 s] Range (min … max): 1.287 s … 1.295 s 10 runs Benchmark 2: build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.306 s ± 0.001 s [User: 1.195 s, System: 0.105 s] Range (min … max): 1.305 s … 1.308 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.01 ± 0.00 times faster than build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
The field rchg (now 'changed') declares if a line in a file is changed or not. A later commit will change it's type from 'char' to 'bool' to make its purpose even more clear. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
This commit is refactor-only; no behavior is changed. A future commit will use bool literals for changed[i]. The functions xdl_clean_mmatch() and xdl_cleanup_records() will be cleaned up more in a future patch series. The changes to xdl_cleanup_records(), in this patch, is just to make it clear why `char rchg` is refactored to `bool changed`. Rename dis* to action* and replace literal numericals with macros. The old names came from when dis* (which I think was short for discard) was treated like a boolean, but over time it grew into a ternary state machine. The result was confusing because dis* and rchg* both used 0/1 values with different meanings. The new names and macros make the states explicit. nm is short for number of matches, and mlim is a heuristic limit: nm == 0 -> action[i] = DISCARD -> changed[i] = true 0 < nm < mlim -> action[i] = KEEP -> changed[i] = false nm >= mlim -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch() When need_min is true, only DISCARD and KEEP occur because the limit is effectively infinite. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
a9b9c9f
to
226a04a
Compare
The only values possible for 'changed' is 1 and 0, which exactly maps to a bool type. It might not look like this because action1 and action2 (which use to be dis1, and dis2) were also of type char and were assigned numerical values within a few lines of 'changed' (what used to be rchg). Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false for changed[k] makes it clear to future readers that these are logically separate concepts. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
226a04a
to
83e1ace
Compare
/submit |
Submitted as pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
This patch series was integrated into seen via 989d856. |
This patch series was integrated into seen via f2e5abc. |
There was a status update in the "Cooking" section about the branch A lot of code clean-up of xdiff. Split out of a larger topic. Will merge to 'next'? source: <pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com> |
This patch series was integrated into seen via a95b449. |
@@ -22,23 +22,14 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, "Kristoffer Haugsbakk" wrote (reply to this):
On Tue, Sep 23, 2025, at 23:24, Ezekiel Newren via GitGitGadget wrote:
> From: Ezekiel Newren <ezekielnewren@gmail.com>
>
> When xrecord_t was a linked list, and recs didn't exist, I assume this
> function walked the list until it found the right record. Accessing
> a contiguous array is so trival that this function is now superfluous.
s/trival/trivial/
> Delete it.
>
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> ---
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ezekiel Newren wrote (reply to this):
On Tue, Sep 30, 2025 at 7:31 AM Kristoffer Haugsbakk
<kristofferhaugsbakk@fastmail.com> wrote:
> > When xrecord_t was a linked list, and recs didn't exist, I assume this
> > function walked the list until it found the right record. Accessing
> > a contiguous array is so trival that this function is now superfluous.
>
> s/trival/trivial/
I think that other than this typo this patch series is ready to be
merged in. I would prefer that Junio fix this typo, so I don't spam
the mailing list with such a small change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Ezekiel Newren <ezekielnewren@gmail.com> writes:
> On Tue, Sep 30, 2025 at 7:31 AM Kristoffer Haugsbakk
> <kristofferhaugsbakk@fastmail.com> wrote:
>> > When xrecord_t was a linked list, and recs didn't exist, I assume this
>> > function walked the list until it found the right record. Accessing
>> > a contiguous array is so trival that this function is now superfluous.
>>
>> s/trival/trivial/
>
> I think that other than this typo this patch series is ready to be
> merged in. I would prefer that Junio fix this typo, so I don't spam
> the mailing list with such a small change.
If it is only this instance, I can "rebase -i" it away, sure.
Thanks.
User |
This patch series was integrated into seen via 8b3dcb6. |
This patch series was integrated into seen via 9e98a08. |
On the Git mailing list, Phillip Wood wrote (reply to this): Hi Ezekiel
On 26/09/2025 23:41, Ezekiel Newren via GitGitGadget wrote:
> Changes since v5.
> > * Address review feedback on commit messages.
> * Drop commit "xdiff: delete rchg aliasing"
> * Use DISCARD/KEEP/INVESTIGATE instead of NONE/SOME/TOO_MANY
> * Fix the word wrapping in the comments of xprepare.c
Thanks for expanding the commit messages, I think the range diff looks good. There's a typo in patch 12 (see below) but its not worth re-rolling just for that.
> Range-diff vs v5:
> [...] > 12: 08a0fceb72 ! 11: f08782a977 xdiff: use enum macros NONE(0), SOME(1), TOO_MANY(2) in xprepare.c
> @@ Metadata
> Author: Ezekiel Newren <ezekielnewren@gmail.com>
> > ## Commit message ##
> - xdiff: use enum macros NONE(0), SOME(1), TOO_MANY(2) in xprepare.c
> + xdiff: add macros DISCARD(0), KEEP(1), INVESTIGATE(2) in xprepare.c
> > - Rename dis1, dis2 to matches1, matches2.
> + This commit is refactor-only; no behavior is changed. A future commit
> + will use bool literals for changed[i].
> > - Define macros NONE(0), SOME(1), TOO_MANY(2) as the enum values for
> - matches1 and matches2. These states will influence whether changed[i]
> - is set to 1 or kept as 0.
> + The functions xdl_clean_mmatch() and xdl_cleanup_records() will be
> + cleaned up more in a future patch series. The changes to
> + xdl_cleanup_records(), in this patch, is just to make it clear why
Not worth a re-roll on its own s/is/are/
Thanks for working on this
Phillip
> + `char rchg` is refactored to `bool changed`.
> +
> + Rename dis* to action* and replace literal numericals with macros.
> + The old names came from when dis* (which I think was short for discard)
> + was treated like a boolean, but over time it grew into a ternary state
> + machine. The result was confusing because dis* and rchg* both used 0/1
> + values with different meanings.
> +
> + The new names and macros make the states explicit. nm is short for
> + number of matches, and mlim is a heuristic limit:
> +
> + nm == 0 -> action[i] = DISCARD -> changed[i] = true
> + 0 < nm < mlim -> action[i] = KEEP -> changed[i] = false
> + nm >= mlim -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch()
> +
> + When need_min is true, only DISCARD and KEEP occur because the limit
> + is effectively infinite.
> > Best-viewed-with: --color-words
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
> @@ xdiff/xprepare.c
> #define XDL_GUESS_NLINES1 256
> #define XDL_GUESS_NLINES2 20
> > -+#define NONE 0
> -+#define SOME 1
> -+#define TOO_MANY 2
> ++#define DISCARD 0
> ++#define KEEP 1
> ++#define INVESTIGATE 2
> > typedef struct s_xdlclass {
> struct s_xdlclass *next;
> @@ xdiff/xprepare.c: void xdl_free_env(xdfenv_t *xe) {
> > > -static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
> -+static bool xdl_clean_mmatch(uint8_t const *matches, long i, long s, long e) {
> ++static bool xdl_clean_mmatch(uint8_t const *action, long i, long s, long e) {
> long r, rdis0, rpdis0, rdis1, rpdis1;
> > /*
> - * Limits the window the is examined during the similar-lines
> - * scan. The loops below stops when dis[i - r] == 1 (line that
> +- * has no match), but there are corner cases where the loop
> +- * proceed all the way to the extremities by causing huge
> +- * performance penalties in case of big files.
> + * Limits the window that is examined during the similar-lines
> -+ * scan. The loops below stops when matches[i - r] == SOME (line that
> - * has no match), but there are corner cases where the loop
> - * proceed all the way to the extremities by causing huge
> - * performance penalties in case of big files.
> ++ * scan. The loops below stops when action[i - r] == KEEP
> ++ * (line that has no match), but there are corner cases where
> ++ * the loop proceed all the way to the extremities by causing
> ++ * huge performance penalties in case of big files.
> + */
> + if (i - s > XDL_SIMSCAN_WINDOW)
> + s = i - XDL_SIMSCAN_WINDOW;
> @@ xdiff/xprepare.c: static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
> > /*
> * Scans the lines before 'i' to find a run of lines that either
> - * have no match (dis[j] == 0) or have multiple matches (dis[j] > 1).
> - * Note that we always call this function with dis[i] > 1, so the
> -+ * have no match (matches[j] == NONE) or have multiple matches (matches[j] == TOO_MANY).
> -+ * Note that we always call this function with matches[i] == TOO_MANY, so the
> - * current line (i) is already a multimatch line.
> +- * current line (i) is already a multimatch line.
> ++ * have no match (action[j] == DISCARD) or have multiple matches
> ++ * (action[j] == INVESTIGATE). Note that we always call this
> ++ * function with action[i] == INVESTIGATE, so the current line
> ++ * (i) is already a multimatch line.
> */
> for (r = 1, rdis0 = 0, rpdis0 = 1; (i - r) >= s; r++) {
> - if (!dis[i - r])
> -+ if (matches[i - r] == NONE)
> ++ if (action[i - r] == DISCARD)
> rdis0++;
> - else if (dis[i - r] == 2)
> -+ else if (matches[i - r] == TOO_MANY)
> ++ else if (action[i - r] == INVESTIGATE)
> rpdis0++;
> - else
> -+ else if (matches[i - r] == SOME)
> ++ else if (action[i - r] == KEEP)
> break;
> + else
> -+ BUG("Illegal value for matches[i - r]");
> ++ BUG("Illegal value for action[i - r]");
> }
> /*
> - * If the run before the line 'i' found only multimatch lines, we
> +- * If the run before the line 'i' found only multimatch lines, we
> - * return 0 and hence we don't make the current line (i) discarded.
> -+ * return false and hence we don't make the current line (i) discarded.
> - * We want to discard multimatch lines only when they appear in the
> +- * We want to discard multimatch lines only when they appear in the
> - * middle of runs with nomatch lines (dis[j] == 0).
> -+ * middle of runs with nomatch lines (matches[j] == NONE).
> ++ * If the run before the line 'i' found only multimatch lines,
> ++ * we return false and hence we don't make the current line (i)
> ++ * discarded. We want to discard multimatch lines only when
> ++ * they appear in the middle of runs with nomatch lines
> ++ * (action[j] == DISCARD).
> */
> if (rdis0 == 0)
> return 0;
> for (r = 1, rdis1 = 0, rpdis1 = 1; (i + r) <= e; r++) {
> - if (!dis[i + r])
> -+ if (matches[i + r] == NONE)
> ++ if (action[i + r] == DISCARD)
> rdis1++;
> - else if (dis[i + r] == 2)
> -+ else if (matches[i + r] == TOO_MANY)
> ++ else if (action[i + r] == INVESTIGATE)
> rpdis1++;
> - else
> -+ else if (matches[i + r] == SOME)
> ++ else if (action[i + r] == KEEP)
> break;
> + else
> -+ BUG("Illegal value for matches[i + r]");
> ++ BUG("Illegal value for action[i + r]");
> }
> /*
> - * If the run after the line 'i' found only multimatch lines, we
> +- * If the run after the line 'i' found only multimatch lines, we
> - * return 0 and hence we don't make the current line (i) discarded.
> -+ * return false and hence we don't make the current line (i) discarded.
> ++ * If the run after the line 'i' found only multimatch lines,
> ++ * we return false and hence we don't make the current line (i)
> ++ * discarded.
> */
> if (rdis1 == 0)
> - return 0;
> @@ xdiff/xprepare.c: static int xdl_clean_mmatch(char const *dis, long i, long s, l
> xdlclass_t *rcrec;
> - char *dis, *dis1, *dis2;
> - int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
> -+ uint8_t *matches1, *matches2;
> -+ int status = 0;
> ++ uint8_t *action1 = NULL, *action2 = NULL;
> + bool need_min = !!(cf->flags & XDF_NEED_MINIMAL);
> ++ int ret = 0;
> > - if (!XDL_CALLOC_ARRAY(dis, xdf1->nrec + xdf2->nrec + 2))
> - return -1;
> - dis1 = dis;
> - dis2 = dis1 + xdf1->nrec + 1;
> -+ matches1 = NULL;
> -+ matches2 = NULL;
> -+
> + /*
> + * Create temporary arrays that will help us decide if
> + * changed[i] should remain 0 or become 1.
> + */
> -+ if (!XDL_CALLOC_ARRAY(matches1, xdf1->nrec + 1)) {
> -+ status = -1;
> ++ if (!XDL_CALLOC_ARRAY(action1, xdf1->nrec + 1)) {
> ++ ret = -1;
> + goto cleanup;
> + }
> -+ if (!XDL_CALLOC_ARRAY(matches2, xdf2->nrec + 1)) {
> -+ status = -1;
> ++ if (!XDL_CALLOC_ARRAY(action2, xdf2->nrec + 1)) {
> ++ ret = -1;
> + goto cleanup;
> + }
> > + /*
> -+ * Initialize temporary arrays with NONE, SOME, or TOO_MANY.
> ++ * Initialize temporary arrays with DISCARD, KEEP, or INVESTIGATE.
> + */
> if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT)
> mlim = XDL_MAX_EQLIMIT;
> @@ xdiff/xprepare.c: static int xdl_clean_mmatch(char const *dis, long i, long s, l
> rcrec = cf->rcrecs[recs->ha];
> nm = rcrec ? rcrec->len2 : 0;
> - dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
> -+ matches1[i] = (nm == 0) ? NONE: (nm >= mlim && !need_min) ? TOO_MANY: SOME;
> ++ action1[i] = (nm == 0) ? DISCARD: (nm >= mlim && !need_min) ? INVESTIGATE: KEEP;
> }
> > if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
> @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
> rcrec = cf->rcrecs[recs->ha];
> nm = rcrec ? rcrec->len1 : 0;
> - dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
> -+ matches2[i] = (nm == 0) ? NONE: (nm >= mlim && !need_min) ? TOO_MANY: SOME;
> ++ action2[i] = (nm == 0) ? DISCARD: (nm >= mlim && !need_min) ? INVESTIGATE: KEEP;
> }
> > + /*
> @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
> i <= xdf1->dend; i++, recs++) {
> - if (dis1[i] == 1 ||
> - (dis1[i] == 2 && !xdl_clean_mmatch(dis1, i, xdf1->dstart, xdf1->dend))) {
> -+ if (matches1[i] == SOME ||
> -+ (matches1[i] == TOO_MANY && !xdl_clean_mmatch(matches1, i, xdf1->dstart, xdf1->dend))) {
> ++ if (action1[i] == KEEP ||
> ++ (action1[i] == INVESTIGATE && !xdl_clean_mmatch(action1, i, xdf1->dstart, xdf1->dend))) {
> xdf1->rindex[nreff++] = i;
> -+ /* changed[i] remains 0 */
> ++ /* changed[i] remains 0, i.e. keep */
> } else
> xdf1->changed[i] = 1;
> ++ /* i.e. discard */
> }
> -@@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> + xdf1->nreff = nreff;
> > for (nreff = 0, i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart];
> i <= xdf2->dend; i++, recs++) {
> - if (dis2[i] == 1 ||
> - (dis2[i] == 2 && !xdl_clean_mmatch(dis2, i, xdf2->dstart, xdf2->dend))) {
> -+ if (matches2[i] == SOME ||
> -+ (matches2[i] == TOO_MANY && !xdl_clean_mmatch(matches2, i, xdf2->dstart, xdf2->dend))) {
> ++ if (action2[i] == KEEP ||
> ++ (action2[i] == INVESTIGATE && !xdl_clean_mmatch(action2, i, xdf2->dstart, xdf2->dend))) {
> xdf2->rindex[nreff++] = i;
> -+ /* changed[i] remains 0 */
> ++ /* changed[i] remains 0, i.e. keep */
> } else
> xdf2->changed[i] = 1;
> ++ /* i.e. discard */
> }
> xdf2->nreff = nreff;
> > - xdl_free(dis);
> +cleanup:
> -+ xdl_free(matches1);
> -+ xdl_free(matches2);
> ++ xdl_free(action1);
> ++ xdl_free(action2);
> > - return 0;
> -+ return status;
> ++ return ret;
> }
> > > 13: 975e845bfa ! 12: 83e1ace5bd xdiff: change type of xdfile_t.changed from char to bool
> @@ Commit message
> xdiff: change type of xdfile_t.changed from char to bool
> > The only values possible for 'changed' is 1 and 0, which exactly maps
> - to a bool type. It might not look like this is the case because
> - matches1 and matches2 (which use to be dis1, and dis2) were also char
> - and were assigned numerical values within a few lines of 'changed'
> - (what used to be rchg).
> + to a bool type. It might not look like this because action1 and action2
> + (which use to be dis1, and dis2) were also of type char and were
> + assigned numerical values within a few lines of 'changed' (what used to
> + be rchg).
> > - Using NONE, SOME, TOO_MANY for matches1[i]/matches2[j], and true/false
> + Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false
> for changed[k] makes it clear to future readers that these are
> logically separate concepts.
> > @@ xdiff/xdiffi.c: static int group_slide_up(xdfile_t *xdf, struct xdlgroup *g)
> > while (xdf->changed[g->start - 1])
> g->start--;
> +@@ xdiff/xdiffi.c: int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
> +
> + int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
> + xdchange_t *cscr = NULL, *xch;
> +- char *changed1 = xe->xdf1.changed, *changed2 = xe->xdf2.changed;
> ++ bool *changed1 = xe->xdf1.changed, *changed2 = xe->xdf2.changed;
> + long i1, i2, l1, l2;
> +
> + /*
> > ## xdiff/xhistogram.c ##
> @@ xdiff/xhistogram.c: redo:
> @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
> - * changed[i] should remain 0 or become 1.
> + * changed[i] should remain false, or become true.
> */
> - if (!XDL_CALLOC_ARRAY(matches1, xdf1->nrec + 1)) {
> - status = -1;
> + if (!XDL_CALLOC_ARRAY(action1, xdf1->nrec + 1)) {
> + ret = -1;
> @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> > /*
> @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *
> */
> for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
> i <= xdf1->dend; i++, recs++) {
> - if (matches1[i] == SOME ||
> - (matches1[i] == TOO_MANY && !xdl_clean_mmatch(matches1, i, xdf1->dstart, xdf1->dend))) {
> + if (action1[i] == KEEP ||
> + (action1[i] == INVESTIGATE && !xdl_clean_mmatch(action1, i, xdf1->dstart, xdf1->dend))) {
> xdf1->rindex[nreff++] = i;
> -- /* changed[i] remains 0 */
> -+ /* changed[i] remains false */
> +- /* changed[i] remains 0, i.e. keep */
> ++ /* changed[i] remains false, i.e. keep */
> } else
> - xdf1->changed[i] = 1;
> + xdf1->changed[i] = true;
> + /* i.e. discard */
> }
> xdf1->nreff = nreff;
> -
> @@ xdiff/xprepare.c: static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> - if (matches2[i] == SOME ||
> - (matches2[i] == TOO_MANY && !xdl_clean_mmatch(matches2, i, xdf2->dstart, xdf2->dend))) {
> + if (action2[i] == KEEP ||
> + (action2[i] == INVESTIGATE && !xdl_clean_mmatch(action2, i, xdf2->dstart, xdf2->dend))) {
> xdf2->rindex[nreff++] = i;
> -- /* changed[i] remains 0 */
> -+ /* changed[i] remains false */
> +- /* changed[i] remains 0, i.e. keep */
> ++ /* changed[i] remains false, i.e. keep */
> } else
> - xdf2->changed[i] = 1;
> + xdf2->changed[i] = true;
> + /* i.e. discard */
> }
> xdf2->nreff = nreff;
> -
> > ## xdiff/xtypes.h ##
> @@ xdiff/xtypes.h: typedef struct s_xdfile {
> |
This patch series was integrated into seen via 3f3e724. |
This patch series was integrated into seen via 3375e63. |
This patch series was integrated into seen via c5a7f67. |
There was a status update in the "Cooking" section about the branch A lot of code clean-up of xdiff. Split out of a larger topic. Will merge to 'next'? source: <pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com> |
This patch series was integrated into seen via c157ffe. |
This patch series was integrated into seen via b6d9543. |
This patch series was integrated into next via 5bc21d0. |
There was a status update in the "Cooking" section about the branch A lot of code clean-up of xdiff. Split out of a larger topic. Will merge to 'master'. source: <pull.2048.v6.git.git.1758926520.gitgitgadget@gmail.com> |
Changes since v5.
Cleanup of the functions xdl_cleanup_records() and xdl_clean_mmatch() is out of scope for this patch series. The changes to them are incidental to explaining why 'char rchg' is refactored to 'bool changed'.
Changes since v4.
Changes since v3.
Changes since v2.
Changes since v1, to address review feedback.
Relevant part of the original cover letter follows:
Before:
After cleanup:
===
cc: Elijah Newren newren@gmail.com
cc: Phillip Wood phillip.wood123@gmail.com
cc: Ben Knoble ben.knoble@gmail.com
cc: Jeff King peff@peff.net
cc: "Kristoffer Haugsbakk" kristofferhaugsbakk@fastmail.com