fix: resolve memory leak in VPI callback cleanup#5547
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5547 +/- ##
==========================================
+ Coverage 80.86% 80.94% +0.08%
==========================================
Files 65 65
Lines 9422 9425 +3
Branches 2468 2463 -5
==========================================
+ Hits 7619 7629 +10
+ Misses 1312 1304 -8
- Partials 491 492 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d4a5a41 to
49ce17e
Compare
|
Only thing missing is a newsfragment. See docs/source/newsfragments/README.rst for some details. |
|
These memory leak tests take quite a long time. Can they still do a decent job if the number of iterations is cut to 200k or 500k? |
|
The tests are so long they are blowing up the coverage tool. |
|
The testing time is indeed too long. I reduced the number of iterations to 100K. |
|
I observed a failure in the last CI run. Specifically, the |
8cc487f to
cbba87f
Compare
|
It seems like these tests are a bit flakey now that they are smaller. Is that expected? The Riviera tests weren't failing before and now they all are. Even Xcelium with the fix is failing in this run. I'm wondering if we keep this test at a larger size, like 200k or 500k, but run it as part of a different regression set. That isn't something you would have to do, just throwing it out there to see what you think. |
|
It seems that the noise is higher than I expected, but I think 100K tests are sufficient. We need to raise the threshold. Assuming that each leak is the size of the callback structure, that is, on a 32-bit system, the minimum occupancy is 28 Bytes, and 100k calls, the leak will reach 2.67 MB. It is reasonable for us to set the threshold to 2 MB. typedef struct t_cb_data
{
PLI_INT32 reason; /* callback reason */
PLI_INT32 (*cb_rtn)(struct t_cb_data *); /* call routine */
vpiHandle obj; /* trigger object */
p_vpi_time time; /* callback time */
p_vpi_value value; /* trigger object value */
PLI_INT32 index; /* index of the memory word or
var select that changed */
PLI_BYTE8 *user_data;
} s_cb_data, *p_cb_data; |
|
Could you please share the Questa and Riviera test logs for this PR? |
|
The questa failure has to do with it not consistently supporting NextTimeStep. It's kind of a finicky trigger. The issue is that the Clock and the Timer you have "finish" at the same time and so there are no active events and that causes this trigger to segfault. Playing with the Timer to not fall on the same time as the clock may fix it. Otherwise feel free to skip it. |
|
I skipped the test of |
a109270 to
ca102e8
Compare
|
Looks like Riviera also need to modify |
|
|
|
Could you please share the Riviera test logs for this CI? |
|
log looks like Questa has the same leak of the callback, the define is Also, I noticed the test name is |
|
I noticed that for Riviera and ModelSim, both the rising edge callback and the falling edge callback consume more memory than the value change callback. I guess for these two emulators, we need to manually remove the callback or release the handle on each value change callback. |
cee2ae1 to
4df88d3
Compare
|
Riviera segfaults on the rising_edge test. Questa: |
4df88d3 to
b6373a7
Compare
|
The attempt failed, but the above phenomenon still deserves attention. I reverted to the previous version, and skipped the value change related tests on questa. Do you have any ideas about this phenomenon? |
|
About rising edge and falling edge using more? Yeah that's because the VPI only uses "value change" and we check to see if the value after change is 1 (rising edge) or 0 (falling edge) and if the current value change is to a different value, we keep going. If you are measuring clock cycles you'll get two edges per actual rising edge where the other is an ignored falling edge. |
|
Yes, I've test in Verilator, and confirmed that as long as we don't remove the callback, the callback's vpiHandle remains the same, so no memory leak occurs. However, for Questa and Riviera, I suspect that a different vpiHandle is generated each time, which causes memory leaks for half of the callbacks. So how should we handle this? Should we add dedicated handling for these two simulators when |
b6373a7 to
369d4e0
Compare
|
I have tested the scheme of unregistering and re-registering callbacks in Verilator. The performance impact is negligible. In the test with 10M rising-edge trigger, both approaches finish in around 30 seconds. If the tests in Questa and Riviera pass, we can add this change, but we need to provide feedback to the upstream. |
|
Is the diff still consistent with before? |
369d4e0 to
9f0ec6a
Compare
|
Sorry, been very distracted the past couple weeks. Let's try to get this merged this week. I appreciate the effort. |
|
Thanks for the feedback! Regarding the memory issue with Questa/Riviera-PRO during rising/falling edge tests, I've temporarily skipped the related test on Questa due to limited local debugging environment and reliance on CI for iteration. To prioritize landing this PR this week, should we: Option A: Merge this and address the skipped test in a follow-up? (Ideal if the issue is isolated) Option B: Debug deeper before merging? (If critical to resolve upfront) Please advise your preference—I'm ready to adjust accordingly. Appreciate your guidance! |
|
I want a deeper debug. I have access to the machine with the commercial tools so I can runs diagnostics more quickly. Just to summarize where we are: Questa is failing on these tests and we are uncertain if the issue is that is also needs the |
|
Yes, I have tested both of these ideas in previous CIs, but they have failed. We may need to use valgrind to track memory. Riviera and Questa both have a similar phenomenon, but Questa use more memory. |
|
It's possible it's a leak in Questa, or maybe removing a cb and adding it back in the same cycle triggers it. I wonder if it's repeating like Verilator and if so if we can just cache it and disable/re-enable it. cocotb had a system like that previously but it was broken over the years partially because it wasn't easy to understand the way it was written and totally uncommented. |
|
Yeah this is just due to Questa not supporting |
e07a7bd to
539108f
Compare
Fixes #5545
This PR resolves the linear RSS memory leak when repeatedly using
Timertriggers on Synopsys VCS and Cadence Xcelium.