Fix all deadlock issues known so far #11

Ollrogge · 2025-12-18T16:40:39Z

This PR fixes the deadlocks we’ve seen by switching LibAFL multiprocessing from fork()-based clients to an exec-based Launcher mode (re-execing the same binary for each client).

With a Go target linked via -buildmode=c-archive, the Go runtime is initialized automatically via a global constructor (.init_array) before main. That turns the process into a multi-threaded runtime (GC workers, sysmon/netpoll, cgo init, etc.) that relies on background threads and synchronization primitives to make progress. If the Launcher spawns clients using fork(). the child inherits a snapshot of this runtime state but not the threads that are supposed to drive it, so waits/condvars/channels can never be satisfied and the clients deadlock (e.g. cgo runtime init barriers, GC worker startup, and exec.Command internals).

Using exec-based clients avoids inheriting a partially-initialized Go runtime across fork(). Each fuzzing client now starts from a fresh process image and initializes Go normally, eliminating these deadlocks.

Based on very minor testing, there is also a speed improvement which I think is caused by the fuzzing client not having to restart when the memory limits we set are hit and possibly also improvements internally to LibAFL.

Running the caddy harness with 4 clients for 1 minute prior to change:

[UserStats #2] run time: 1m-1s, clients: 5, corpus: 13503, objectives: 0, executions: 3916841, exec/sec: 64.10k, stability: 1653/1809 (91%), edges: 2087/297052 (0%)
[UserStats #2] run time: 1m-1s, clients: 5, corpus: 13503, objectives: 0, executions: 3916841, exec/sec: 64.10k, stability: 1653/1809 (91%), edges: 2087/297052 (0%)
[UserStats #4] run time: 1m-1s, clients: 5, corpus: 13503, objectives: 0, executions: 3919274, exec/sec: 64.10k, stability: 1809/1928 (93%), edges: 2087/297052 (0%)
[UserStats #4] run time: 1m-1s, clients: 5, corpus: 13503, objectives: 0, executions: 3919274, exec/sec: 64.10k, stability: 1809/1928 (93%), edges: 2087/297052 (0%)
[UserStats #1] run time: 1m-1s, clients: 5, corpus: 13503, objectives: 0, executions: 3920682, exec/sec: 64.11k, stability: 1988/2031 (97%), edges: 2087/297052 (0%)
[UserStats #4] run time: 1m-1s, clients: 5, corpus: 13503, objectives: 0, executions: 3920799, exec/sec: 64.11k, stability: 1809/1928 (93%), edges: 2087/297052 (0%)
[Testcase #4] run time: 1m-1s, clients: 5, corpus: 13506, objectives: 0, executions: 3920799, exec/sec: 64.11k, stability: 1809/1928 (93%), edges: 2087/297052 (0%)

After (roughly 6.8k exec/s better):

[UserStats #2] run time: 1m-0s, clients: 5, corpus: 15160, objectives: 0, executions: 4303506, exec/sec: 70.93k, edges: 1943/297051 (0%), edges_stability: 1839/1900 (96%)
[Testcase #2] run time: 1m-0s, clients: 5, corpus: 15166, objectives: 0, executions: 4303506, exec/sec: 70.93k, edges: 1943/297051 (0%), edges_stability: 1839/1900 (96%)
[UserStats #1] run time: 1m-0s, clients: 5, corpus: 15166, objectives: 0, executions: 4303770, exec/sec: 70.92k, edges: 1943/297051 (0%), edges_stability: 1881/1942 (96%)
[UserStats #5] run time: 1m-0s, clients: 5, corpus: 15166, objectives: 0, executions: 4305268, exec/sec: 70.94k, edges_stability: 1813/1874 (96%), edges: 1943/297051 (0%)
[UserStats #5] run time: 1m-0s, clients: 5, corpus: 15166, objectives: 0, executions: 4305268, exec/sec: 70.94k, edges_stability: 1813/1874 (96%), edges: 1943/297051 (0%)
[UserStats #2] run time: 1m-0s, clients: 5, corpus: 15166, objectives: 0, executions: 4305880, exec/sec: 70.91k, edges: 1943/297051 (0%), edges_stability: 1839/1900 (96%)
[Testcase #2] run time: 1m-0s, clients: 5, corpus: 15167, objectives: 0, executions: 4305880, exec/sec: 70.91k, edges: 1943/297051 (0%), edges_stability: 1839/1900 (96%)

Note: Using a commit hash from LibAFL for now, since the fork runtime flag has only recently been added. Will change this to a proper version once a new release is out.

Fix all deadlock issues known so far

7f1970e

Ollrogge merged commit d765676 into main Dec 18, 2025
1 check passed

Ollrogge mentioned this pull request Dec 18, 2025

Calling a subprocess and capturing it's output freezes the fuzzer. #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix all deadlock issues known so far #11

Fix all deadlock issues known so far #11

Uh oh!

Ollrogge commented Dec 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix all deadlock issues known so far #11

Fix all deadlock issues known so far #11

Uh oh!

Conversation

Ollrogge commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ollrogge commented Dec 18, 2025 •

edited

Loading