Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Make G_FUZZING constexpr, require -DBUILD_FOR_FUZZING=ON to fuzz #31191

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Oct 31, 2024

g_fuzzing is used inside Assume at runtime, causing significant overhead in hot paths. See #31178

One could simply remove the g_fuzzing check from the Assume, but this would make fuzzing a bit less useful. Also, it would be unclear if g_fuzzing adds a runtime overhead in other code paths today or in the future.

Fix all issues by making G_FUZZING equal to the build option BUILD_FOR_FUZZING, and for consistency in fuzzing, require it to be set when executing any fuzz target.

Fixes #31178

Temporarily this drops fuzzing from two CI tasks, but they can be re-added in a follow-up with something like #31073

MarcoFalke added 2 commits October 31, 2024 13:51
The fuzz binary is still compiled. This is required for the next commit.
@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 31, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31191.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK dergoegge, marcofleon, davidgumberg, ryanofsky

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31161 (cmake: Set top-level target output locations by hebasto)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@maflcko maflcko changed the title Make G_FUZZING constexpr, require -DBUILD_FOR_FUZZING=ON to fuzz build: Make G_FUZZING constexpr, require -DBUILD_FOR_FUZZING=ON to fuzz Oct 31, 2024
Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK fafbf8a

src/util/check.h Show resolved Hide resolved
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/32337137613

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@marcofleon
Copy link
Contributor

Tested ACK fafbf8a

@davidgumberg
Copy link
Contributor

davidgumberg commented Oct 31, 2024

Tested ACK fafbf8a

This solves #31178

./build/src/bench/bench_bitcoin -filter=LinearizeOptimallyExample11 -min-time=30000

branch ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
master (f07a533) 380,031,566.62 2.63 0.3% 7,324,884,836.62 1,321,463,005.62 5.543 898,388,779.12 0.2% 32.69 LinearizeOptimallyExample11
branch (fafbf8a) 301,497,436.82 3.32 0.2% 4,990,499,128.40 1,046,022,020.91 4.771 352,054,052.20 0.3% 32.53 LinearizeOptimallyExample11

It also seems to reasonably solve #30950 & #31057 which were the motivation of #31093 which introduced the regression.

I don't have a view on this, but just want to document that this PR does not address the described use case of building for fuzzing without building the fuzz binary (#31057 (comment)) which is what motivated making g_fuzzing a runtime check.

BUILD_FOR_FUZZING:BOOL=OFF
BUILD_FUZZ_BINARY:BOOL=ON

I build with these options to be able to be able to know if changes not related to fuzzing will break the build.

@maflcko
Copy link
Member Author

maflcko commented Nov 1, 2024

I don't have a view on this, but just want to document that this PR does not address the described use case of building for fuzzing without building the fuzz binary (#31057 (comment)) which is what motivated making g_fuzzing a runtime check.

BUILD_FOR_FUZZING:BOOL=OFF
BUILD_FUZZ_BINARY:BOOL=ON

I build with these options to be able to be able to know if changes not related to fuzzing will break the build.

Can you explain this a bit better? This pull request does not change anything about being able to build with these options and be able to see if the fuzz binary build breaks.

Specifically,

  • Everything in an if constexpr is still looked at by the compiler and any code issues should be detected, even with the above build options, before and after this pull request. See also refactor: Compile unreachable walletdb code #29315 for a different change using if constexpr for that purpose (as opposed to preprocessor directives).
  • All fuzz pre-run options such as PRINT_ALL_FUZZ_TARGETS_AND_ABORT, or WRITE_ALL_FUZZ_TARGETS_AND_ABORT, etc will still work normally before and after this pull request.
  • Only executing a fuzz target with the above build options is not possible, which is the reason for the first temporary CI commit.

Maybe I am misunderstanding, so it would help to share exact steps to reproduce of the use case that you are claiming is not addressed.

@ryanofsky
Copy link
Contributor

Code review ACK fafbf8a but approach -0, because this approach means libraries built for fuzz testing do not function correctly if used in a release, and libraries built for releases are mostly useless for fuzz testing. So I would like to at least consider other solutions to this problem even if we go with this one.

#31178 makes it pretty clear that if we want to be able to write Assume() statements in hot paths, we need to be able to compile them out in release builds to avoid impacting performance, and leave them compiled into builds that are used for fuzzing. So I do think the build system should support compiling libraries differently for fuzz testing and releases. But I don't think the build system should go so far as to make libraries built for fuzz testing and libraries built for releases mutually incompatible.

Also I don't really buy the idea that if you put an Assume() statement in a hot path, you just be to assume it has no cost and that compiler will optimize it out. I think majority of Assume() statements are not in hot paths, and main use-case for Assume() is not to be a faster Assert() that you use in performance-critical code, but to be a safer Assert() that you can use to catch bugs during development, but will not crash the entire program and provide a terrible user experience when bugs (inevitably) occur in production.

So I think a better alternative to this PR might just be to provide a better alternative to Assume(). I think most uses of Assume() are fine as they are, but a few really are on hot paths, and I tried adding a simple counter to identify them in the linearizeoptimallyexample11 benchmark. If I disable these checks, the speedup is equivalent to this PR without drawbacks of this PR. The change I would propose based on this is:

diff

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -225,6 +225,7 @@ if(BUILD_FOR_FUZZING)
   set(BUILD_GUI_TESTS OFF)
   set(BUILD_BENCH OFF)
   set(BUILD_FUZZ_BINARY ON)
+  target_compile_definitions(core_interface INTERFACE ENABLE_SLOWCHECK)
 endif()
 
 include(ProcessConfigurations)
--- a/cmake/module/ProcessConfigurations.cmake
+++ b/cmake/module/ProcessConfigurations.cmake
@@ -126,6 +126,7 @@ target_compile_definitions(core_interface_debug INTERFACE
   DEBUG_LOCKCONTENTION
   RPC_DOC_CHECK
   ABORT_ON_FAILED_ASSUME
+  ENABLE_SLOWCHECK
 )
 # We leave assertions on.
 if(MSVC)
--- a/src/cluster_linearize.h
+++ b/src/cluster_linearize.h
@@ -337,7 +337,7 @@ struct SetInfo
     /** Add a transaction to this SetInfo (which must not yet be in it). */
     void Set(const DepGraph<SetType>& depgraph, ClusterIndex pos) noexcept
     {
-        Assume(!transactions[pos]);
+        SLOWCHECK(!transactions[pos]);
         transactions.Set(pos);
         feerate += depgraph.FeeRate(pos);
     }
--- a/src/util/bitset.h
+++ b/src/util/bitset.h
@@ -281,7 +281,7 @@ class MultiIntBitSet
         /** Progress to the next 1 bit (only if != IteratorEnd). */
         constexpr Iterator& operator++() noexcept
         {
-            Assume(m_idx < N);
+            SLOWCHECK(m_idx < N);
             m_val &= m_val - I{1U};
             if (m_val == 0) {
                 while (true) {
@@ -301,7 +301,7 @@ class MultiIntBitSet
         /** Get the current bit position (only if != IteratorEnd). */
         constexpr unsigned operator*() const noexcept
         {
-            Assume(m_idx < N);
+            SLOWCHECK(m_idx < N);
             return m_pos;
         }
     };
@@ -316,7 +316,7 @@ public:
     /** Set a bit to 1. */
     void constexpr Set(unsigned pos) noexcept
     {
-        Assume(pos < MAX_SIZE);
+        SLOWCHECK(pos < MAX_SIZE);
         m_val[pos / LIMB_BITS] |= I{1U} << (pos % LIMB_BITS);
     }
     /** Set a bit to the specified value. */
--- a/src/util/check.h
+++ b/src/util/check.h
@@ -81,6 +81,12 @@ constexpr T&& inline_assertion_check(LIFETIMEBOUND T&& val, [[maybe_unused]] con
 /**
  * Assume is the identity function.
  *
+ * Assume() should be used instead of Assert() in cases where if the condition
+ * is not true, it indicates there is a bug, and you want to handle the bug by
+ * crashing in debug builds and in fuzz tests, but want to avoid unnecessary
+ * crashes in release builds and handle it with logging, warning, or other
+ * fallback behavior.
+ *
  * - Should be used to run non-fatal checks. In debug builds it behaves like
  *   Assert()/assert() to notify developers and testers about non-fatal errors.
  *   In production it doesn't warn or log anything.
@@ -90,6 +96,24 @@ constexpr T&& inline_assertion_check(LIFETIMEBOUND T&& val, [[maybe_unused]] con
  */
 #define Assume(val) inline_assertion_check<false>(val, __FILE__, __LINE__, __func__, #val)
 
+/**
+ * SLOWCHECK() can be used to perform checks that are enabled in debug and fuzz
+ * builds but are skipped in release builds. It is meant to be used for checks
+ * that are slow, either because they occur in hot code paths, or because the
+ * checks themselves are expensive. Assert() should be preferred for critical
+ * checks which are not in performance-sensitive code.
+ *
+ * SLOWCHECK() is basically equivalent to the traditional C/C++ assert() macro
+ * For historic reasons, Bitcoin Core cannot be compiled with NDEBUG so there is
+ * no way to skip assert() checks in release builds, and SLOWCHECK() restores
+ * this lost functionality.
+ */
+#ifdef ENABLE_SLOWCHECK
+#define SLOWCHECK(val) assert(val);
+#else
+#define SLOWCHECK(val) assert(1 || (val));
+#endif
+
 /**
  * NONFATAL_UNREACHABLE() is a macro that is used to mark unreachable code. It throws a NonFatalCheckError.
  */
--- a/src/util/vecdeque.h
+++ b/src/util/vecdeque.h
@@ -74,7 +74,7 @@ class VecDeque
     /** What index in the buffer does logical entry number pos have? */
     size_t BufferIndex(size_t pos) const noexcept
     {
-        Assume(pos < m_capacity);
+        SLOWCHECK(pos < m_capacity);
         // The expression below is used instead of the more obvious (pos + m_offset >= m_capacity),
         // because the addition there could in theory overflow with very large deques.
         if (pos >= m_capacity - m_offset) {

I would curious to know from @sipa and others if they think this approach makes sense and does not add too much of a burden.

Specifically results I saw testing this with the linearizeoptimallyexample11 benchmark were 1.42 op/sec on master, 1.77 op/sec reverting 9f243cd (#31093), 1.77 op/sec cherry-picking fafbf8a from this PR, and 1.83 op/sec with the proposed change.

@maflcko
Copy link
Member Author

maflcko commented Nov 1, 2024

this approach means libraries built for fuzz testing do not function correctly if used in a release, and libraries built for releases are mostly useless for fuzz testing.

I don't think it is a supported use-case to take libraries from one build (with different build options) and drop them into another build (with different build options) and expect it to work, or be a supported use-case. This has also been the case up until two weeks ago, before commit 9f243cd. Also your SLOWCHECK suggestion doesn't seem to change the fact that mix-matching differently compiled libraries can result in an unsafe or less useful binary using those libraries.

Also I don't really buy the idea that if you put an Assume() statement in a hot path, you just be to assume it has no cost and that compiler will optimize it out.

I don't think I've claimed this. It has always been the assumption that Assume(expr), Assert(expr), and assert(expr) have a runtime cost of at least (void)(expr). This is also explained in the dev notes: "the expression is always evaluated." On current master, as explained by you in #31178 (comment), the runtime cost of Assume(expr) in some cases increased by an additional check of bool{g_fuzzing}. This change here is simply restoring what has been the assumption in all previous releases of Bitcoin Core.

I think majority of Assume() statements are not in hot paths, and main use-case for Assume() is not to be a faster Assert() that you use in performance-critical code

I agree and I don't think I've claimed otherwise. I just don't see the use-case to mix-match libraries from different builds and I think this change makes sense even if you go ahead with your SLOWCHECK idea.

SLOWCHECK

It may be best to submit this as a separate pull request. I think it is fine if this one keeps sitting for a few more weeks and it can be closed/merged, possibly depending on the result of your SLOWCHECK pull request.

@ryanofsky
Copy link
Contributor

I don't think it is a supported use-case to take libraries from one build (with different build options) and drop them into another build

Agree it should not be a generally supported use case. And I do think it's nice to have a BUILD_FOR_FUZZING option that enables the best options for builds specifically doing fuzzing.

But for normal builds, I don't think the whole codebase should need to be recompiled with different options just to get a useful fuzz binary. And I don't think it is good for BUILD_FOR_FUZZING option to create strange libraries that seem functional but skip proof of work checks. I did ACK this PR in case we think these problems are worth having so Assume() can be optimized out in hot code paths. But I think these are negative consequences of this PR, even if worth the tradeoff.

Also I don't really buy the idea that if you put an Assume() statement in a hot path, you just be to assume it has no cost and that compiler will optimize it out.

I don't think I've claimed this.

Yes sorry, the comments above about Assume() usage were meant for @sipa who wrote "I have been using Assume in many places assuming it'll be optimized out entirely in production code" and similar things in the other issues.

It may be best to submit this as a separate pull request.

Yes, will do if it seems like something that is helpful.

@maflcko
Copy link
Member Author

maflcko commented Nov 1, 2024

And I don't think it is good for BUILD_FOR_FUZZING option to create strange libraries that seem functional but skip proof of work checks.

I don't think this conceptual issue is addressed by your alternative SLOWCHECK suggestion. If this was a supported use-case, someone could link a fuzzing-compiled SLOWCHECK library that seems functional, but is possibly slow, or may even crash in production, when normally it should not. So your use-case would also be a reason against SLOWCHECK. I don't think your use-case is possible to support, so I don't think it should be used as a reason for or against a change.

@davidgumberg
Copy link
Contributor

  • Everything in an if constexpr is still looked at by the compiler and any code issues should be detected.

Didn't think of this, that makes sense to me. I also may have misunderstood or misrepresented the use-case/concern I tried to describe, but it's not my own and I don't have any view on it, so it seems silly that I even mentioned it.

I still ACK fafbf8a for fixing the regression measured in #31178.

@ryanofsky
Copy link
Contributor

ryanofsky commented Nov 4, 2024

The use-case which I think this breaks and should be supported is the ability to use fuzzing in a normal build instead of a dedicated build.

As an analogy, if you want use a debugger, the best place to use it is in a dedicated debug build, but you should also be able to generate debug symbols and use debuggers with some limitations in release builds. Or if you want to check memory safety, the best way to do it may be to run MSan or valgrind in dedicated builds, but it should also be possible to use ASan in normal builds. Similarly, if you want to run fuzz tests, the best way to run them is in a dedicated build with -DBUILD_FOR_FUZZING=ON, but I think you should also be able to run fuzz tests in a normal build with other functioning executables. This flexibility lets you have a single build supporting most developer tools, even if it is not the ideal build for running every tool. I think it lowers the barrier to start trying new tools if using them only requires toggling build options not creating entirely new build configurations.

AFAICT before #31093, libraries with BUILD_FOR_FUZZING did not function normally because they skipped proof of work checks, and libraries without BUILD_FOR_FUZZING were not very useful for fuzzing because they skipped most Assume() checks. #31093 fixed both of those issues and this PR breaks them again. Maybe this is acceptable, but doesn't seem good, and an alternate approach like the one in #31191 (comment) might be better. I'd be happy to open a PR with that approach it makes sense, or to learn that I am wrong and it doesn't make sense, or to get feedback that it is not a good solution for other reasons.

I also think this PR is ok, but I don't like the tradeoff it is making, so I'm just pointing out the disadvantages I see here.

@marcofleon
Copy link
Contributor

I think it lowers the barrier to start trying new tools if using them only requires toggling build options not creating entirely new build configurations.

I see what you're saying about tools being more accessible if they're able to be used within a single build. However, I do think with cmake it's straightforward to maintain separate builds for different purposes. And fuzz testing might be involved enough to warrant its own build.

I don't see many advantages to fuzzing in a normal build that isn't optimized for it. If we're still able to verify that the fuzz binary builds properly and run pre-fuzz checks, as mentioned in #31191 (comment), then to me it feels reasonable to require BUILD_FOR_FUZZING when we intend to do some actual fuzz testing.

I also think it's cleaner to isolate fuzz-specific code paths (like skipping the PoW check) to a dedicated build. Of course, I'm open to reviewing other potential solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bench: linearizeoptimallyexample11 benchmark now running 4x slow than previously
6 participants