Skip to content

Conversation

@heatd
Copy link
Owner

@heatd heatd commented Sep 6, 2025

Scheduler fixes, mm fixes

This helps debugging uninitialized memory issues, and may improve security by wiping old
page contents. There is however a performance penalty.

Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
This fixes a found lockup when an IRQ woke a thread up, as a thread was
being spawned and added to the queue.

Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
If rw_lock_tryread() fails, in rw_lock_spin_read(), we previously fell
through the loop, without actually reloading the counter. This resulted
in us attempting to spin on weird, inconsistent state, such as NULL
threads.

The fix is very simple, just retry the whole loop if tryread() fails.
This will re-read the counter and thus we should see consistent state.

Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
We could, with the old logic, observe wrong states coming from sequences
like:

/* prev->active_mm = mm */
if (prev->active_mm) {
	cpumask_unset_atomic(&mm->active_mask, cpu);
	/* cpu no longer in mm->active_mask */
}

if (mm != thread->owner->get_aspace()) {
	/* skipped, never runs, we get the correct mm, but we are no
longer in the active_mask, and will miss any TLB shootdowns */
}

This resulted in easy to reproduce crashes resulting from TLB
inconsistency with regards to user thread stacks, when spawning lots of
threads. First found by fio with posixaio, and later reproduced with a
very simple C++ test program:

 #include <thread>

int main()
{
    for (;;) {
        std::thread t{[](){}};
        t.detach();
    }
}

Fix the problem by reordering the two operations. We now load the new
mm, if required, and only later drop ourselves (the active_mm).
Unsetting ourselves from the mm active_mask only happens if we indeed
switched mm's, and has an ordering requirement - it NEEDS to happen
after arch_load_process, aka the new mm was already loaded. It's okay
that we are in two active_masks at once.

Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Before aee3360, simply holding the vm_object's lock would mean
that vm_obj_remove_page could simply exclude itself from anyone
trying to grab a page. Now, after that commit, vmo_get stopped
taking the lock. This manifested itself in brokenness, with reclaim
racing with filemap code trying to grab a page from the cache.

To fix that, we introduce the concept of a "frozen page", which is
simply a page with refcount = 0. The code in vmo_get() already
recognized such a concept, with page_try_get() aborting() if refcount =
0. This concept is used in a change in semantics, where
vm_obj_remove_page() itself freezes the page before removing it and
returning (note that freezing is an atomic operation, and may fail).

Fixes: aee3360 ("filemap: Remove the locking in vmo_get")
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Releasing the file is actively wrong (and a UAF) if it destroys the
inode before we get to releasing the vm_obj.

Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
@heatd heatd merged commit c922fa1 into master Sep 6, 2025
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants