fix: resolve memory leak in VPI callback cleanup by Dragon-Git · Pull Request #5547 · cocotb/cocotb

Dragon-Git · 2026-04-24T15:33:49Z

This PR resolves the linear RSS memory leak when repeatedly using Timer triggers on Synopsys VCS and Cadence Xcelium.

codecov · 2026-04-24T18:07:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.94%. Comparing base (af0150d) to head (9f0ec6a).
⚠️ Report is 11 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5547      +/-   ##
==========================================
+ Coverage   80.86%   80.94%   +0.08%     
==========================================
  Files          65       65              
  Lines        9422     9425       +3     
  Branches     2468     2463       -5     
==========================================
+ Hits         7619     7629      +10     
+ Misses       1312     1304       -8     
- Partials      491      492       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ktbarrett · 2026-04-25T14:57:33Z

Only thing missing is a newsfragment. See docs/source/newsfragments/README.rst for some details.

ktbarrett · 2026-04-25T15:17:10Z

These memory leak tests take quite a long time. Can they still do a decent job if the number of iterations is cut to 200k or 500k?

ktbarrett · 2026-04-25T16:21:46Z

The tests are so long they are blowing up the coverage tool.

Dragon-Git · 2026-04-26T03:14:15Z

The testing time is indeed too long. I reduced the number of iterations to 100K.

Dragon-Git · 2026-04-26T03:38:38Z

I observed a failure in the last CI run. Specifically, the NextTimeStep test failed when executed with Verilator 5.046 on the Ubuntu. Memory consumption reached 128KB. I tested it on the Verilator 5.046 macOS platform, and the diff is 0. I don't have an Ubuntu environment. Do we need to survey the environment, or do we need to feed this back upstream?

ktbarrett · 2026-04-27T22:06:16Z

It seems like these tests are a bit flakey now that they are smaller. Is that expected? The Riviera tests weren't failing before and now they all are. Even Xcelium with the fix is failing in this run.

I'm wondering if we keep this test at a larger size, like 200k or 500k, but run it as part of a different regression set. That isn't something you would have to do, just throwing it out there to see what you think.

Dragon-Git · 2026-04-28T00:29:29Z

It seems that the noise is higher than I expected, but I think 100K tests are sufficient. We need to raise the threshold. Assuming that each leak is the size of the callback structure, that is, on a 32-bit system, the minimum occupancy is 28 Bytes, and 100k calls, the leak will reach 2.67 MB. It is reasonable for us to set the threshold to 2 MB.

typedef struct t_cb_data
{
  PLI_INT32    reason;                        /* callback reason */
  PLI_INT32    (*cb_rtn)(struct t_cb_data *); /* call routine */
  vpiHandle    obj;                           /* trigger object */
  p_vpi_time   time;                          /* callback time */
  p_vpi_value  value;                         /* trigger object value */
  PLI_INT32    index;                         /* index of the memory word or
                                                 var select that changed */
  PLI_BYTE8   *user_data;
} s_cb_data, *p_cb_data;

Dragon-Git · 2026-04-28T02:38:32Z

Could you please share the Questa and Riviera test logs for this PR?
I’d like to check the exact RSS diff values to verify whether there is a genuine memory leak or just normal memory fluctuation.

ktbarrett · 2026-04-28T15:46:58Z

riviera.log

The questa failure has to do with it not consistently supporting NextTimeStep. It's kind of a finicky trigger. The issue is that the Clock and the Timer you have "finish" at the same time and so there are no active events and that causes this trigger to segfault. Playing with the Timer to not fall on the same time as the clock may fix it. Otherwise feel free to skip it.

Dragon-Git · 2026-04-28T17:15:07Z

I skipped the test of next_step_time on Questa. Riviera seems that there is a memory leak. I tried to add a vpi_free_object to it.

ktbarrett · 2026-04-29T23:00:36Z

Looks like Riviera also need to modify VpiValueCbHdl::run.

Dragon-Git · 2026-04-29T23:32:46Z

vpi_remove_cb is now called in the VpiValueCbHdl::run, There should be no more memory leaks. Based on the last log, we should also consider raising the threshold for Riviera.

Dragon-Git · 2026-04-30T00:20:32Z

Could you please share the Riviera test logs for this CI?

ktbarrett · 2026-04-30T01:38:24Z

log looks like Questa has the same leak of the callback, the define is MODELSIM.

Also, I noticed the test name is test_triger_memory_leak, there's a missing 'g' there.

Dragon-Git · 2026-04-30T13:43:52Z

I noticed that for Riviera and ModelSim, both the rising edge callback and the falling edge callback consume more memory than the value change callback. I guess for these two emulators, we need to manually remove the callback or release the handle on each value change callback.

ktbarrett · 2026-04-30T17:12:34Z

Riviera segfaults on the rising_edge test.

Questa:

# 802999.00ns WARNING  cocotb.regression                  test_trigger_memory_leak.test_raise_edge_leak failed
#                                                         Traceback (most recent call last):
#                                                           File "cocotb/tests/test_cases/test_trigger_memory_leak/test_trigger_memory_leak.py", line 117, in test_raise_edge_leak
#                                                             assert diff <= MEMORY_LEAK_TH, "Memory leak"
#                                                         AssertionError: Memory leak
#                                                         assert 11403264 <= 4194304
# 802999.00ns INFO     cocotb.regression                  running test_trigger_memory_leak.test_falling_edge_leak (5/8)
# 1002998.50ns WARNING  cocotb.regression                  test_trigger_memory_leak.test_falling_edge_leak failed
#                                                         Traceback (most recent call last):
#                                                           File "cocotb/tests/test_cases/test_trigger_memory_leak/test_trigger_memory_leak.py", line 132, in test_falling_edge_leak
#                                                             assert diff <= MEMORY_LEAK_TH, "Memory leak"
#                                                         AssertionError: Memory leak
#                                                         assert 12189696 <= 4194304
# 1002998.50ns INFO     cocotb.regression                  running test_trigger_memory_leak.test_value_change_leak (6/8)
# 1152998.00ns WARNING  cocotb.regression                  test_trigger_memory_leak.test_value_change_leak failed
#                                                         Traceback (most recent call last):
#                                                           File "cocotb/tests/test_cases/test_trigger_memory_leak/test_trigger_memory_leak.py", line 147, in test_value_change_leak
#                                                             assert diff <= MEMORY_LEAK_TH, "Memory leak"
#                                                         AssertionError: Memory leak
#                                                         assert 4718592 <= 4194304

Dragon-Git · 2026-05-02T06:54:12Z

The attempt failed, but the above phenomenon still deserves attention. I reverted to the previous version, and skipped the value change related tests on questa. Do you have any ideas about this phenomenon?

ktbarrett · 2026-05-02T12:18:11Z

About rising edge and falling edge using more? Yeah that's because the VPI only uses "value change" and we check to see if the value after change is 1 (rising edge) or 0 (falling edge) and if the current value change is to a different value, we keep going. If you are measuring clock cycles you'll get two edges per actual rising edge where the other is an ignored falling edge.

Dragon-Git · 2026-05-02T13:09:23Z

Yes, I've test in Verilator, and confirmed that as long as we don't remove the callback, the callback's vpiHandle remains the same, so no memory leak occurs. However, for Questa and Riviera, I suspect that a different vpiHandle is generated each time, which causes memory leaks for half of the callbacks. So how should we handle this? Should we add dedicated handling for these two simulators when pass == false, by remove the callback and re-registering a new one? This may impact performance, or should we report this issue upstream instead?

Dragon-Git · 2026-05-02T14:13:25Z

I have tested the scheme of unregistering and re-registering callbacks in Verilator. The performance impact is negligible.

In the test with 10M rising-edge trigger, both approaches finish in around 30 seconds.
For the 20M rising-edge trigger, the baseline scheme without unregistering/re-registering takes approximately 58 seconds, while the scheme with callback unregistration followed by re-registration takes about 60 seconds.

If the tests in Questa and Riviera pass, we can add this change, but we need to provide feedback to the upstream.

Dragon-Git · 2026-05-02T16:05:43Z

Is the diff still consistent with before?

ktbarrett · 2026-05-14T14:01:20Z

Sorry, been very distracted the past couple weeks. Let's try to get this merged this week. I appreciate the effort.

Dragon-Git · 2026-05-14T14:21:46Z

Thanks for the feedback! Regarding the memory issue with Questa/Riviera-PRO during rising/falling edge tests, I've temporarily skipped the related test on Questa due to limited local debugging environment and reliance on CI for iteration. To prioritize landing this PR this week, should we:

Option A: Merge this and address the skipped test in a follow-up? (Ideal if the issue is isolated)

Option B: Debug deeper before merging? (If critical to resolve upfront)

Please advise your preference—I'm ready to adjust accordingly. Appreciate your guidance!

ktbarrett · 2026-05-14T14:32:02Z

I want a deeper debug. I have access to the machine with the commercial tools so I can runs diagnostics more quickly.

Just to summarize where we are:

Questa is failing on these tests and we are uncertain if the issue is that is also needs the vpi_free_object or if it's creating a new callback on every iteration meaning we should be using vpi_remove_cb?

Dragon-Git · 2026-05-14T14:38:38Z

Yes, I have tested both of these ideas in previous CIs, but they have failed. We may need to use valgrind to track memory. Riviera and Questa both have a similar phenomenon, but Questa use more memory.

ktbarrett · 2026-05-14T14:52:19Z

It's possible it's a leak in Questa, or maybe removing a cb and adding it back in the same cycle triggers it. I wonder if it's repeating like Verilator and if so if we can just cache it and disable/re-enable it. cocotb had a system like that previously but it was broken over the years partially because it wasn't easy to understand the way it was written and totally uncommented.

ktbarrett · 2026-05-15T02:44:56Z

Yeah this is just due to Questa not supporting vpi_remove_cb while the callback is executing. This will require a hefty refactor. Let's just get what we have in.

Dragon-Git marked this pull request as ready for review April 25, 2026 03:13

Dragon-Git force-pushed the fix/5545-vpi-timer-memory-leak branch from d4a5a41 to 49ce17e Compare April 25, 2026 04:45

Dragon-Git added 3 commits April 26, 2026 12:43

fix: resolve memory leak in VPI callback cleanup for VCS/Xcelium

70cd301

Add triger memory leak test

2dd23f2

Apply review

cbba87f

Dragon-Git force-pushed the fix/5545-vpi-timer-memory-leak branch from 8cc487f to cbba87f Compare April 26, 2026 04:44

Raise the threshold to 2 MB.

8714df1

Try to fix ci

ca102e8

Dragon-Git force-pushed the fix/5545-vpi-timer-memory-leak branch from a109270 to ca102e8 Compare April 28, 2026 17:18

ktbarrett reviewed Apr 29, 2026

View reviewed changes

Comment thread tests/test_cases/test_triger_memory_leak/test_triger_memory_leak.py Outdated

Apply review

d3df7c5

fixed typo

a3b85ce

Dragon-Git force-pushed the fix/5545-vpi-timer-memory-leak branch from cee2ae1 to 4df88d3 Compare April 30, 2026 14:21

Dragon-Git force-pushed the fix/5545-vpi-timer-memory-leak branch from 4df88d3 to b6373a7 Compare May 2, 2026 06:41

Dragon-Git force-pushed the fix/5545-vpi-timer-memory-leak branch from b6373a7 to 369d4e0 Compare May 2, 2026 14:04

ktbarrett added this to the 2.1 milestone May 13, 2026

skip RisingEdge/FallingEdge/ValueChange test on questa

9f0ec6a

Dragon-Git force-pushed the fix/5545-vpi-timer-memory-leak branch from 369d4e0 to 9f0ec6a Compare May 14, 2026 13:55

ktbarrett changed the title ~~fix: resolve memory leak in VPI callback cleanup for VCS/Xcelium~~ fix: resolve memory leak in VPI callback cleanup May 14, 2026

Touch ups

539108f

ktbarrett force-pushed the fix/5545-vpi-timer-memory-leak branch from e07a7bd to 539108f Compare May 15, 2026 02:47

Conversation

Dragon-Git commented Apr 24, 2026

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ktbarrett commented Apr 25, 2026

Uh oh!

ktbarrett commented Apr 25, 2026

Uh oh!

ktbarrett commented Apr 25, 2026

Uh oh!

Dragon-Git commented Apr 26, 2026

Uh oh!

Dragon-Git commented Apr 26, 2026

Uh oh!

ktbarrett commented Apr 27, 2026

Uh oh!

Dragon-Git commented Apr 28, 2026

Uh oh!

Dragon-Git commented Apr 28, 2026

Uh oh!

ktbarrett commented Apr 28, 2026

Uh oh!

Dragon-Git commented Apr 28, 2026

Uh oh!

Uh oh!

ktbarrett commented Apr 29, 2026

Uh oh!

Dragon-Git commented Apr 29, 2026

Uh oh!

Dragon-Git commented Apr 30, 2026

Uh oh!

ktbarrett commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dragon-Git commented Apr 30, 2026

Uh oh!

ktbarrett commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dragon-Git commented May 2, 2026

Uh oh!

ktbarrett commented May 2, 2026

Uh oh!

Dragon-Git commented May 2, 2026

Uh oh!

Dragon-Git commented May 2, 2026

Uh oh!

Dragon-Git commented May 2, 2026

Uh oh!

ktbarrett commented May 14, 2026

Uh oh!

Dragon-Git commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ktbarrett commented May 14, 2026

Uh oh!

Dragon-Git commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ktbarrett commented May 14, 2026

Uh oh!

ktbarrett commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 24, 2026 •

edited

Loading

ktbarrett commented Apr 30, 2026 •

edited

Loading

ktbarrett commented Apr 30, 2026 •

edited

Loading

Dragon-Git commented May 14, 2026 •

edited

Loading

Dragon-Git commented May 14, 2026 •

edited

Loading

ktbarrett commented May 15, 2026 •

edited

Loading