Skip to content

Implement more arguments for exporting symbols#974

Merged
mati865 merged 1 commit into
wild-linker:mainfrom
mati865:push-qmtvyzyykkxp
Jul 30, 2025
Merged

Implement more arguments for exporting symbols#974
mati865 merged 1 commit into
wild-linker:mainfrom
mati865:push-qmtvyzyykkxp

Conversation

@mati865
Copy link
Copy Markdown
Member

@mati865 mati865 commented Jul 1, 2025

@mati865 mati865 force-pushed the push-qmtvyzyykkxp branch 2 times, most recently from b84b171 to bcbf0fd Compare July 5, 2025 18:26
@mati865 mati865 changed the title Implemnet --export-dynamic-symbol and --export-dynamic-symbol-list Implement --export-dynamic-symbol and --export-dynamic-symbol-list Jul 7, 2025
@mati865 mati865 force-pushed the push-qmtvyzyykkxp branch from 4ed0f0f to 29eec02 Compare July 14, 2025 18:31
Copy link
Copy Markdown
Member Author

@mati865 mati865 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more code repetition than I'd like, but apart from the missing unit tests, this is quite serviceable now. I don't know if this impacts performance yet.

Comment thread libwild/src/input_data.rs Outdated
Comment thread libwild/src/input_data.rs Outdated
Comment thread libwild/src/layout.rs Outdated
if !export_all_dynamic
&& !&symbol_db
.explicitly_explicitly_export_symbol_list_data_list
.contains(&UnversionedSymbolName::prehashed(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prehashed is called quite often now:

❯ rg "::prehashed\("
libwild/src/version_script.rs
77:                self.exact.insert(UnversionedSymbolName::prehashed(exact));
98:                .contains(&UnversionedSymbolName::prehashed(demangled_name.as_bytes()))
532:        let Some(m) = script.find_match(&UnversionedSymbolName::prehashed(name.as_bytes())) else {

libwild/src/diagnostics.rs
64:            .get_unversioned(&UnversionedSymbolName::prehashed(name.as_bytes()));

libwild/src/symbol_db.rs
1119:            let name = UnversionedSymbolName::prehashed(info.name_bytes);
1231:            let name = UnversionedSymbolName::prehashed(name_bytes);
1553:        Self::from_prehashed(symbol_id, UnversionedSymbolName::prehashed(name))
1572:            name: VersionedSymbolName::prehashed(name, version),

libwild/src/symbol.rs
86:        let name = UnversionedSymbolName::prehashed(name_info.name_bytes);
88:            PreHashedSymbolName::Versioned(VersionedSymbolName::prehashed(name, version))

libwild/src/layout.rs
3173:                .get_unversioned(&UnversionedSymbolName::prehashed(
4644:            .contains(&UnversionedSymbolName::prehashed(

libwild/src/resolution.rs
627:        .get_unversioned(&UnversionedSymbolName::prehashed(name))

libwild/src/elf_writer.rs
2807:                            .version_for_symbol(&UnversionedSymbolName::prehashed(sym_def.name))

but I don't know if this is avoidable.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be something that can be included into refactoring as mentioned in #994.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While multiple calls for the same symbol could be bad for performance, I'd say what perhaps matters most here is whether we actually make that call most of the time. I'd assume that in the vast majority of linker usages, the list of explicit exports will be empty. So, if we can avoid calling prehashed in that case, then we're probably OK.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if it would be possible and sensible to cache symbol_id -> symbol name lookups in additional hashmap/BTreeMap.
All the calls of prehashed in this PR should be avoidable unless we are given names of the symbols to export.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. I'm unsure if it would be a performance win or not though. How many times do we hash a symbol name in the common case? I'd kind hope that it would still be 1. For the extra calls that we make in less common cases, I guess we could run some benchmarks that hit those cases and see how much of a problem it is.

It may be possible to get rid of some unnecessary calls via other means. e.g. the call you're adding in compute_value_flags - that function is called from load_symbols which is the primary place where symbol name hashes are computed. So in that case, I could imagine passing a &mut SymbolInfo to compute_value_flags where SymbolInfo contains the &crate::elf::Symbol that we currently pass, but also an Option<Prehashed<UnversionedSymbolName>>.

Copy link
Copy Markdown
Member Author

@mati865 mati865 Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on samply profile (with 10 000 Hz) when linking Clang using main branch with a single thread, prehashed() is called from:

  • libwild::symbol::UnversionedSymbolName::prehashed inside libwild::symbol::PreHashedSymbolName::from_raw called by libwild::resolution::resolve_symbol inside libwild::resolution::resolve_symbols - 7.9% of samples
  • libwild::symbol::UnversionedSymbolName::prehashed called by libwild::symbol_db::SymbolLoader::load_symbols - 3.2% of samples
  • libwild::symbol::UnversionedSymbolName::prehashed called by libwild::symbol_db::SymbolLoader::load_symbols (again, looks like a different object given to Rayon?) - 1.6% of samples
    • libwild::symbol::UnversionedSymbolName::prehashed called by libwild::symbol_db::SymbolLoader::load_symbols (again, looks like a different object given to Rayon?) - 0.8% of samples
  • ... and so on with libwild::symbol_db::SymbolLoader::load_symbols

So, looks like 2 calls sites. One of them is parallel already and the other one will be after #1003

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at that. Those two call sites should in theory amount to calling once per symtab entry. The call in resolve_symbol should only be for undefined symbols (references) and the call in load_symbols should only be for definitions.

@mati865 mati865 changed the title Implement --export-dynamic-symbol and --export-dynamic-symbol-list Implement arguments for exporting dynamic symbols Jul 15, 2025
@mati865 mati865 force-pushed the push-qmtvyzyykkxp branch from 729c917 to d7bbb12 Compare July 15, 2025 19:24
Comment thread libwild/src/export_symbol_list.rs Outdated
@mati865 mati865 force-pushed the push-qmtvyzyykkxp branch from 8c3347d to 44dbb8d Compare July 20, 2025 21:15
@mati865
Copy link
Copy Markdown
Member Author

mati865 commented Jul 20, 2025

This should be probably ready for the review, but I want to test it with Qt first.

If somebody has ideas for better naming of these things, don't hesitate to leave a comment.

Comment thread libwild/src/input_data.rs Outdated
Comment thread libwild/src/export_symbol_list.rs Outdated
return Ok(ParsedSymbolMatcher::CxxMatchers(matchers));
}

// TODO: This is the only significant divergence between `parse_matcher` from this file and version_script.rs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO that you intend to resolve before merging, or something for later?

Copy link
Copy Markdown
Member Author

@mati865 mati865 Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was kind of "loud thinking" in case I or somebody who does the review gets any idea on how to improve it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would changing the version in version_script.rs to accept from_arg be an option?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, while not useful for version scripts, it deduplicates things a lot. So, probably worth the tradeoff.

Comment thread libwild/src/symbol_db.rs Outdated
version_script_data: Option<VersionScriptData<'data>>,
args: &'data Args,
linker_scripts: &[InputLinkerScript<'data>],
explicitly_explicitly_export_symbol_list_data_list_data: Option<ExportListData<'data>>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name repeats "explicitly" and "list_data" twice. The field above also has "explicitly" twice. I haven't thought deeply about the name, so feel free to disregard, but what about just export_list, explicit_export_list, explicit_symbol_exports or some similar variant?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had misclicked something in Helix and didn't notice the change (yay, feels like the early days in Vim). I have fixed it before seeing your comment, I think I'll go ahead with export_list.

Comment thread libwild/src/layout.rs Outdated
if !export_all_dynamic
&& !&symbol_db
.explicitly_explicitly_export_symbol_list_data_list
.contains(&UnversionedSymbolName::prehashed(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While multiple calls for the same symbol could be bad for performance, I'd say what perhaps matters most here is whether we actually make that call most of the time. I'd assume that in the vast majority of linker usages, the list of explicit exports will be empty. So, if we can avoid calling prehashed in that case, then we're probably OK.

Comment thread libwild/src/symbol_db.rs Outdated
Comment thread libwild/src/symbol_db.rs Outdated
@mati865
Copy link
Copy Markdown
Member Author

mati865 commented Jul 21, 2025

Turns out Qt also uses extern "C" syntax, so I have added it here but I'll probably split it out to a new PR.
https://github.com/qt/qtbase/blob/3babea74b2acfae64c9db2a42f33ce2c4cfad445/src/corelib/QtCore.dynlist

@mati865 mati865 force-pushed the push-qmtvyzyykkxp branch from 760a176 to b522808 Compare July 24, 2025 22:04
@mati865 mati865 changed the title Implement arguments for exporting dynamic symbols Implement more arguments for exporting symbols Jul 24, 2025
@mati865 mati865 marked this pull request as ready for review July 24, 2025 22:05
@mati865 mati865 force-pushed the push-qmtvyzyykkxp branch 3 times, most recently from bd48b4c to 22d159e Compare July 24, 2025 23:10
struct MatchRules<'data> {
general: BasicMatchRules<'data>,
cxx: BasicMatchRules<'data>,
pub(crate) struct MatchRules<'data> {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno if we should somehow reorganise or perhaps extract the code that is shared between version scripts and export list.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is other work around this area that is right now blocked by this PR, let's leave this discussion for later.

@mati865 mati865 force-pushed the push-qmtvyzyykkxp branch from 22d159e to 31d9982 Compare July 29, 2025 22:49
@mati865 mati865 requested review from davidlattimore and marxin July 30, 2025 00:51
Copy link
Copy Markdown
Member

@davidlattimore davidlattimore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Only one very minor comment, that you could ignore if you wanted.

Comment thread libwild/src/version_script.rs Outdated
@mati865 mati865 force-pushed the push-qmtvyzyykkxp branch from 31d9982 to 430fb26 Compare July 30, 2025 19:15
@mati865 mati865 merged commit 8288947 into wild-linker:main Jul 30, 2025
23 checks passed
@mati865 mati865 deleted the push-qmtvyzyykkxp branch July 30, 2025 19:19
@mati865
Copy link
Copy Markdown
Member Author

mati865 commented Jul 30, 2025

Thanks for the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for -dynamic-list

3 participants