ZynqMP GEM (Ethernet) Driver#681
Conversation
ab041d3 to
511ed94
Compare
This looks fine to me, and is inline with how we normally handle multiple hardware queues, see my comments in the review.
Your UDP results look fine to me. I wouldn't be too considered with the differences in RTT. I don't think it is particularly significant. For example, the board you are testing could have higher IRQ processing times. This would explain the results, as at lower throughputs, you received proportionately more IRQs (since the driver can ACK each one). At higher throughputs, the driver starts to lag behind with ACKing, so you can process more packets per IRQ. Thus the effects of having slower IRQs reduces. I doubt it is due to Tx IRQ coalescing, as we have other NICs (meson) where we also don't have coalescing, and we end up with fairly similar RTTs as the IMX, see here.
This also looks good to me.
I can't comment on this, as I am unfamiliar with the benchmarking setup. While the UDP performance looks great to me, the TCP is slightly concerning. I am not entirely sure why the throughput is so low. It is also not in-line with results from our other boards, which can generally get > 700Mb/s. Also, if you look through the results in the google sheet I listed, typically to get these throughputs our boards get fairly close to 100% CPU utilisation. Your testing indicates you are only close to 50%. This seems to me to indicate that your driver is lagging, as the other network components have plenty of idle time. I would fix up the issues I mentioned in my review, and see if that makes a difference. If not, I would recommend looking into it more... |
bf5c023 to
355bd5c
Compare
|
I have addressed all the PR comments now, but unfortunately TCP performance remains unchanged. CPU Utilisation in both UDP and TCP is slightly lower as well. CPU utilisation peaks at 55.7% and 50.4% for UDP and TCP respectively. |
This could also be due to the way we internally connect to the Kria using a custom ethernet adapter (RJ45 to ribbon cable) since we only use UDP. But I am not familiar with the specific ethernet adapter hardware to definitively answer that. Some testing on ZCU102 could help confirm that. |
|
@SeedRizvi @Courtney3141 What's the current state of this PR? This platform is useful to support, so I think it's worth making sure we get this in before it becomes too stale. |
|
Hello! I am no longer at Skykraft so I haven't made any progress on this, but it is possible the TCP issues were specific to our hardware configuration as internally we only used UDP. @potanin might be able to help with testing if needed. |
No worries at all. Given that this driver is useful and appears to work, perhaps we can move towards merging it and leave debugging the TCP performance for an issue? @Courtney3141 do you have any concerns with merging this? |
|
Hi @SeedRizvi, @omeh-a Yes, that is definitely my intention! I will try and get to this next week, very sorry for the delay in my end, and thanks for addressing all the comments. |
|
No worries! |
ethernet driver following structure of the other network drivers Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
GEM3 address for both kria and zcu102 Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
commented out in zcu102 as I have only tested on kria Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
adds kria as a supported board; not zcu102 as it is untested Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
Misleading name for unused #define + clearer comment Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
Fix documented undefined behaviour of priority queue 1 Please see function doc for disable_pq1() Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
Add zynqmp as eth driver configured for hw checksum Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
remove whitespace Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
add config.json Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
fix linting issues Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
Make the difference between Rx and Tx descriptors clear Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
not needed due to typecasts, also removes unused assignments Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
fixes issue with fence prior to finishing config in rx Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
No longer mask to fix lower bits in rx_provide Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
No longer mask to fix lower bits in rx_provide Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
Previously unchecked status regs now checked and cleared Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
Build cfg register and write it in one go Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
Now lets rx_provide handle init Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
Assertion for upper addr bits in rx/tx queue Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
clang format Signed-off-by: SeedRizvi <syedasadrazarizvi1@gmail.com>
Signed-off-by: Courtney Darville <courtneydarville94@outlook.com>
Signed-off-by: Courtney Darville <courtneydarville94@outlook.com>
355bd5c to
e62e4c2
Compare
General
This PR adds the GEM driver for the ZynqMP platform, tested on the Kria K26 board. Made in collaboration with @potanin.
Below are some key configurations set in the driver.
Regarding disabling "priority queue 1", I would appreciate feedback as to whether the approach is reasonable (please see
ethernet.c/disable_pq1()).Benchmark
Combined Rx/Tx Throughput as they are virtually identical. Compare the results against Xilinx: ZynqMP lwIP Performance.
UDP
TCP
Clearly, there's an interesting bottleneck here somewhere. All checksum components have been offloaded to HW, and I believe I enabled the appropriate sDDF setting in
include/sddf/network/constants.h. In debugging, I also ran Xilinx's own lwip_tcp_perf_server (outside of seL4 world) and observed a max TCP throughput of 270 Mbps. So comparatively the GEM driver performs better in sDDF. I also used the specific lwIP settings as that repo and observed no change.The only way to get anywhere near Xilinx TCP performance in
lwip_tcp_perf_serveris to use TCP clients in parallel (see the Table below), which I cannot currently test in sDDF.Results
See RAW data here:
output-UDP.csv
output-TCP.csv
Comparing these results against IMX Maaxboard results (12/11/2025, thanks @Ivan-Velickovic !), the GEM driver has significantly higher mean and median RTT times.
Using UDP results
The GEM median RTT (and standard deviation) is a fair bit higher than IMX Maaxboard. The IMX driver configures 128-frame TX interrupt coalescing, which is not available on the ZynqMP GEM. Instead a Tx completion interrupt fires for each transmitted frame, consuming a larger share of the driver's scheduling budget.
Both drivers converging in RTT around 800-900 Mbps should support this, since at that point both drivers are processing packets back-to-back and the per-interrupt overhead is amortised across larger bursts (more IRQs processed per context switch). If the gap were caused by a driver bug or hardware limitation rather than interrupt overhead, I imagine it would persist (or worsen) at high loads, not disappear (obviously disregarding the saturation at Gigabit speeds).
Other
I've refrained from adding ZCU102 as a network-supported board for the moment since I haven't tested it on that board but it would be great to be able to do that as a part of this PR as well.