Skip to content

nix flake show SIGABRTs via std::terminate when a GC OOM happens inside the noexcept Value::mkFailed (finalizer registration for Failed : gc_cleanup) #16023

Description

@ncaq

Describe the bug

On Nix 2.34.7 I intermittently get hard SIGABRT crashes from nix flake show. The core dumps all share the same signature: the Boehm GC OOM handler throws std::bad_alloc while it is being called from inside Value::mkFailed, which is declared noexcept. Since an exception escapes a noexcept function, the runtime calls std::terminateabortSIGABRT.

The crash originates from the "failed value" / recoverable-error path introduced in 2.34.0 (#15286), so it is specific to evaluations that go through handleEvalExceptionForThunk — which nix flake show does heavily, since it deliberately keeps evaluating attributes that throw.

Stack trace (from a real core dump, nix 2.34.7)

#2  abort (libc.so.6)
#3  nix::(anonymous namespace)::onTerminate()                      [std::set_terminate handler]
#4  __cxxabiv1::__terminate(...)
#5  __cxa_call_terminate
#6  __gxx_personality_v0
#7  _Unwind_RaiseException_Phase2
#8  _Unwind_RaiseException
#9  __cxa_throw
#10 nix::oomHandler(unsigned long)                                 [throws std::bad_alloc]
#11 GC_register_finalizer_inner                                    [Boehm GC]
#12 nix::Value::mkFailed(std::exception_ptr, nix::Value*)          [declared noexcept]
#13 nix::EvalState::handleEvalExceptionForThunk(...)
#14 nix::ExprVar::eval(...)
#15 nix::ExprOpHasAttr::eval(...)
...

The unwinder reaches onTerminate directly from __cxa_throw in oomHandler — i.e. the exception is being thrown from a context that cannot propagate it (a noexcept frame), so it terminates instead of unwinding to a handler.

Root cause

The chain is, in 2.34.7:

  1. oomHandler is registered as the Boehm GC out-of-memory callback and throws a C++ exception:
    https://github.com/NixOS/nix/blob/2.34.7/src/libexpr/eval-gc.cc#L43

    static void * oomHandler(size_t requested)
    {
        /* Convert this to a proper C++ exception. */
        throw std::bad_alloc();
    }
  2. Value::mkFailed is noexcept but allocates a Value::Failed on the GC heap:
    https://github.com/NixOS/nix/blob/2.34.7/src/libexpr/include/nix/expr/value.hh#L1288

    inline void mkFailed(std::exception_ptr e, Value * recovery) noexcept
    {
        setStorage(new Value::Failed(e, recovery));
    }
  3. Value::Failed derives from gc_cleanup, so constructing it implicitly calls GC_register_finalizer (GC_register_finalizer_inner), which itself allocates from the GC heap:
    https://github.com/NixOS/nix/blob/2.34.7/src/libexpr/include/nix/expr/value.hh#L431

    struct Failed : gc_cleanup
    {
        std::exception_ptr ex;
        Value * recoveryValue;
        ...
    };
  4. EvalState::handleEvalExceptionForThunk calls mkFailed for every thunk that threw during evaluation:
    https://github.com/NixOS/nix/blob/2.34.7/src/libexpr/eval.cc#L2188

So when the GC heap is exhausted at the moment mkFailed registers the finalizer, oomHandler throws std::bad_alloc through the noexcept boundary and the process aborts instead of surfacing a normal "out of memory" eval error.

The core problem

Value::mkFailed is marked noexcept, but it transitively performs GC allocations (new Value::Failedgc_cleanup ctor → GC_register_finalizer → internal GC malloc) that can invoke oomHandler, which throws. A function that can reach a throwing allocation path should not be noexcept — or it must not allocate in a way that can call oomHandler. Either the noexcept is wrong, or the allocation must be made non-throwing/pre-reserved.

Reproduction

This is intermittent by nature — it requires the GC OOM to land specifically inside the finalizer registration within mkFailed. I was not able to construct a deterministic minimal flake that reliably hits this exact instant (capping GC_MAXIMUM_HEAP_SIZE and forcing many throwing attributes through nix flake show produces normal exit-1 eval errors, not the abort).

In the wild it recurs every couple of days. Every one of the ~22 core dumps I have collected over the past ~10 days comes from the same kind of command — nix flake show --experimental-features 'nix-command flakes' --json -v --legacy path:/nix/store/...-source against various large flakes — and they all show the trace above. nix flake show is a natural trigger because it intentionally keeps evaluating attributes that throw, exercising the handleEvalExceptionForThunkmkFailed path heavily.

If a maintainer can suggest a way to deterministically force an OOM inside GC_register_finalizer_inner, I'm happy to provide a self-contained reproducer.

Relationship to #15990

#15990 ("Don't memoise Interrupted errors", merged 2026-06-08) reworks exactly this area: it stops struct Failed from inheriting gc_cleanup and moves the exception_ptr into a separate ExceptionRef, explicitly to avoid finalizer cycles that Boehm warns about. That PR is not in 2.34.7 (tagged 2026-05-04) and, as of writing, has not yet landed on the 2.34-maintenance branch despite carrying the backport 2.34-maintenance label. It is unclear to me whether the #15990 refactor fully removes the noexcept-throwing-allocation hazard (the replacement ExceptionRef still derives from gc_cleanup), so I'm filing this so the abort-on-OOM path is tracked explicitly rather than as a side effect of the Interrupted-memoisation fix.

Versions / environment

  • Nix 2.34.7 (upstream CppNix, as shipped by NixOS 26.05)
  • NixOS 26.05, x86_64-linux
  • 94 GiB RAM, no ulimit -v and no systemd MemoryMax limit (the abort happens well below physical memory)

Possible fixes

  • Remove noexcept from Value::mkFailed (and audit callers), or
  • Make the Failed/ExceptionRef allocation not go through a path that can call the throwing oomHandler (e.g. pre-reserve, or use a non-throwing allocation here), or
  • Have oomHandler not throw when invoked from within finalizer registration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions