Skip to content

Conversation

@keithc-ca
Copy link
Contributor

@keithc-ca keithc-ca commented May 12, 2022

We need to inform the compiler that our uses of dcbz clobber memory, otherwise the optimizer can reasonably assume, e.g. in omrcpu_startup(), that buf[0] == 255 at the beginning of the for loop.

:"r"((void *) &buf[512]));
memset(buf, 255, sizeof(buf));

__asm__ __volatile__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the __volatile__ isn't needed. The port library copy doesn't use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nm, I see you just added it here. I guess it doesn't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was used in both places in j9memclr.cpp and won't hurt. In fact, it may save us from a future compiler optimization.

@keithc-ca keithc-ca marked this pull request as ready for review May 13, 2022 01:57
@keithc-ca keithc-ca requested a review from mstoodle as a code owner May 13, 2022 01:57
@pshipton
Copy link
Contributor

jenkins build all

@pshipton
Copy link
Contributor

pshipton commented May 16, 2022

@jdmpapin @dsouzai is there a committer that can look at this?

@jdmpapin
Copy link
Contributor

The changes LGTM, but can the cache line size detection be factored out into a single place?

@keithc-ca
Copy link
Contributor Author

The changes LGTM, but can the cache line size detection be factored out into a single place?

I'll look into whether there's a clear path to that.

@keithc-ca keithc-ca force-pushed the ppc_cache_line_size branch from df47457 to 18814e6 Compare May 16, 2022 18:24
@keithc-ca
Copy link
Contributor Author

Updated as suggested, reusing getCacheLineSize() in omrcpu_startup() and using uint32_t as the storage type for the cache line size everywhere.

@jdmpapin
Copy link
Contributor

The description of the functional fix (i.e. the memory clobber) seems to have gone missing from the commit message. It's probably more important now than it was before, since there are more changes

The compiler needs to be told that our uses of dcbz clobber memory,
otherwise the optimizer can reasonably assume in getCacheLineSize(),
that each element of buf still has the value (255) assigned by memset().

* reuse getCacheLineSize() in omrcpu_startup()
* use uint32_t consistently for cache line size

Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com>
@keithc-ca keithc-ca force-pushed the ppc_cache_line_size branch from 18814e6 to 8b3bf5b Compare May 16, 2022 19:20
@keithc-ca
Copy link
Contributor Author

That description was never in the commit message itself (only in the description here); it's now also in the commit message.

@jdmpapin
Copy link
Contributor

Jenkins build all

@jdmpapin jdmpapin merged commit c67d754 into eclipse-omr:master May 16, 2022
@keithc-ca keithc-ca deleted the ppc_cache_line_size branch May 17, 2022 12:17
keithc-ca added a commit to keithc-ca/omr that referenced this pull request Jun 23, 2022
This allows the compiler to do the widening operation once,
rather than repeatedly in loops, fixing a performance regression
introduced in eclipse-omr#6515.

Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com>
keithc-ca added a commit to keithc-ca/omr that referenced this pull request Jun 23, 2022
This allows the compiler to do the widening operation once,
rather than repeatedly in loops, fixing a performance regression
introduced in eclipse-omr#6515.

Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants