-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Inline function related changes #3144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
|
|
Hi Joey, Please rename commit "qcom: show percent of covered pcs in funcs" to "pkg/cover: show percent of covered pcs in funcs". |
|
I've reviewed only the first 2 commits for now: They look good to me. The other look more complex. |
Split into #3157 |
nm or llvm-nm cannot show inline function and its address, size.
However, dwarf info contains attributes for inline function.
For example:
static int do_pcm_hwsync(struct snd_pcm_substream *substream);
Function do_pcm_hwsync above has such entry in elf.
0x07adfc27: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x07ae83cb "do_pcm_hwsync")
DW_AT_ranges (0x00b68bc0
[0xffffffd01160b27c, 0xffffffd01160b2d4)
[0xffffffd01160b9ec, 0xffffffd01160ba4c)
[0xffffffd01160be50, 0xffffffd01160be60))
DW_AT_call_file ("xxx/sound/core/pcm_native.c")
DW_AT_call_line (2934)
DW_AT_call_column (0x08)
So, we can use the info as address range to help coverage display.
With it, subsystem coverage is more reasonble now:
- Total PCs are align with __sanitizer_cov_trace_pc count in elf file
- Covered PCs are filtered only for the current subsystem paths.
(while previously header files like in include/linux/xx.h are also included.)
pkg/cover/report.go
Outdated
| if pc < s.Start || pc > s.End { | ||
| idx-- | ||
| var s *backend.Symbol | ||
| for j := idx; j < len(rg.Ranges); j++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was apparently meant to be very fast (therefore there was only a binary search and a map access). With the newly introduced loop such a guarantee no longer exists.
Can you please explain why you needed the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rg.Ranges are sorted here, 1st by End address, then by start address. The idea is to let inline function appear before normal function.
sort.Slice(allRanges, func(i, j int) bool {
if allRanges[i].End != allRanges[j].End {
return allRanges[i].End < allRanges[j].End
}
return allRanges[i].Start > allRanges[j].Start
})
So the sort.Search only return the idx first satisfy pc >= rg.Ranges[i].Start (which might be an inline function), but we still need to continue the search from idx if the pc is inside normal function.
In average in Android kernel case, there are less than 10 inline functions for each function. So the j loop maximum will iterate 10 times, but if the pc belongs to inline function, the binary search will be enough and the j loop will only iterate once.
I didn't find any better solution than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idx := sort.Search(len(rg.Ranges), func(i int) bool {
return pc < rg.Ranges[i].Start
})
This will find the first index when rg.Ranges[i].Start is bigger than pc (assuming that rg.Ranges are sorted by Start in an increasing order).
One point to verify -- is the precondition valid? From your sorting loop I see that it's only guaranteed that it's ordered by End. The existing code removes overlapping symbols, so if it's sorted by one end, it's also sorted by the other. But the new code, as I see, keeps everything - and therefore it's not correct to assume that we can binary-search over Start.
Maybe it makes sense to preprocess the ranges array so that it also only contains non-overlapping segments? But instead of just deleting ones that overlap you can (and should, as it seems) split big ranges into smaller ones, giving priority to those segments that are smaller. So, as a result, in findSymbol you will be able to just binary-search the segment that contains the PC and take a symbol pointer from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to implement as you suggested, but I don't think it can overcome this problem to do binary search once.
Basically it works (split into ranges and sort) if normal function contains only one level inline functions.
However, it won't work if 1st level inline function contains 2nd level inline functions, even further level of inline functions. For this case, we still need to use binary search + linear search.
So I think sort range by End and then by Start => do binary search + limit linear search is still better. I didn't find any performance issue with this for 1 million pcs. The buildSymbol function finishes within 1 second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will bring back the previous code and do some further opimization to limit subprograms to only those in text section in next push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the patch, pls check again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my understanding of your PR and comments, please correct me where I'm wrong.
allSymbolsinmakeDWARFUnsafecontains both normal functions and various inlined functions.- For each function, we have a number of PC ranges that belong to them.
- For normal functions, their ranges also include PCs that belong to functions inlined to them.
- Inlined functions can also have other inlined functions in them, and their PC ranges would also include PC of those inlined-inlined functions.
From the points above it follows that ranges are either completely contained within each other or don't intersect at all. De-facto there's just some kind of tree-like structure behind it.
And in findSymbol we either want to find the covering range that has the minimum width (end PC - start PC) and covers the requested PC. Or, in a tree interpretation, we want to each a leaf node.
But also the current approach to do it is fast enough. Because if you find e.g. the first End of a range that's bigger than the requested PC, then in the worst case you need to traverse all nested inline functions whose ranges end on the same End PC. Which just cannot be a too big number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically yes, let me put more info here in case future we or someone need to revisit this.
There are some issues on current way of elf parsing:
- Functions are reading from .symbtab section and are assumed that the function is in continous address space. But the truth is not. I observed some functions are placed in ranges. For example asan.module_ctor below.
0000000000000000 l .text 0000000000000000 $x.3
0000000000000638 l F .text 0000000000000024 asan.module_ctor
000000000000065c l F .text 0000000000000024 asan.module_dtor
0000000000000000 l d .text 0000000000000000 .text
0000000000000680 l .text 0000000000000000 $x.1
0000000000000680 l F .text 000000000000000c asan.module_ctor
- No inline function info is present and so header files are assumed to have 1 total pc.
- Functions from .init.text or .exit.text sections are not excluded, the function start address for some symbols can be same. So during findSymbol, it might find incorrect one. (correctly me if it's already considered)
And the PR is to leverage the inline function info from dwarf records.
- yes for Alex's point 1, allSymbols contains both normal functions and various inlined functions. In dwarf presentation, each normal function (aka subprogram) might contain several inline functions. So for each inline function, we know which normal function it belongs, but we don't know if it's a child of another inline function easily.
- Yes for Alex's point 2. To avoid the relationship building of normal function and inline function, even inline function of another inline function, we use dwarf pc range (allRanges) to first sort by range.End and then range.Start, so that we can do binary search + linear search later to find which function the pc belongs to in buildSymbols.
- Yes for Alex's point 3. Normal functions ranges cover their inline function ranges. However, 1 pc either belongs to one normal function or one inline function. In this way, we can later do statics in /cover or /subsystemcover to say how many covered pcs in normal functions, and how many covered pcs in all functions including inline functions.
- Yes for Alex's point 4. In current PR implementation, wo don't need to walk inline function recursively.
- There might be a tree structure, but when I decoded the debuginfo, the edge info is missing at least for inline function of another inline function. Also, if there is a tree, then it's tree of DIE while not tree of ranges. So the address layout can have many senorios, for example function A range 1 + function a range 1 + function A range 2 + function a range 2 + function b range1... It seems harder to construct back to a tree.
- Yes, the current approach is pretty fast, 1 second for 1 million pcs in buildSymbols. So it shouldn't be a problem during findSymbol operations.
- Meanwhile, only functions in .text section are considered, I think it's more syzkaller friendly. Because kcov process doesn't collect any coverage from functions in .init.text and .exit.text sections.
The overall performance is not too bad, I think. The makeDWARFUnsafe only runs once which takes 1min20s, while existing way of decoding from symbtab takes ~1min if I remember correctly (on normal PC with 4 cores). For blade servers with many cores, I didn't see much impact. For refreshing /cover or /subsystemcover page, they finished in seconds.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any further consideration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, I'll take one more look with a fresh mind.
joeyjiaojg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
c686e16 to
72e2f7a
Compare
|
I tried to compare the upstream syzkaller vs this PR using v5.18.1 Linux (0047d57e6c91177bb731bed5ada6c211868bc27c, compiled using this config with gcc 11.2.0) I applied this simple patch to measure time. For the current syzkaller code: For this PR something went wrong. |
you are right, only tested linux kernel compiled by clang. let me check. I think gcc encodes into different gwarf info. |
|
Verified gcc compiled kernel won't work, as the dwarf is encoded into different format. DW_AT_sibling is used, and cap_task_setnice doesn't have range att Below shows clang compiled kernel can have range info. |
|
Syzkaller should definitely be able to adequately handle both gcc- and clang- compiled kernels. |
Right, I'm recoding some of the logic, it works for both gcc and clang now, but still need some tests before pushing. |
Performance testing result for upstream linux compiled by clang:
2022/06/08 11:21:33 initializing coverage information...
2022/06/08 11:22:08 readSymbolsFromDwarf took 33.243243324s for 536142 functions
2022/06/08 11:22:10 buildSymbols for 2221532 coverPoints, 525856 ranges
2022/06/08 11:22:10 buildSymbols took 383.339292ms
|
@a-nogikh pushed the new code, please review again Tested on linux master with both gcc-9, gcc-11, clang-14. |
We know when addr2line decodes one PC which has multiple frames, we should show these multiple frames to explictly show that the function is hit too. The reason behind is for the inline function, it can have multiple callers, and multiple repensentation in dwarf section with different ranges. For Some examples that seemed a bit confusing to me. For audit_log_format, the same case for L1600. It's more reasonable to say there is a PC can be reachable from L1600. Different sets of covered coverage points (black) Hope it answers all the doubts. |
The PR is to introduce inline function decoding from debug_info info compared to previous symbols from symbol table.
Meanwhile, some fixes to fit for the inline function change.
With these changes,