Stable fixes #165

heatd · 2025-09-06T01:24:14Z

Scheduler fixes, mm fixes

This helps debugging uninitialized memory issues, and may improve security by wiping old page contents. There is however a performance penalty. Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

This fixes a found lockup when an IRQ woke a thread up, as a thread was being spawned and added to the queue. Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

If rw_lock_tryread() fails, in rw_lock_spin_read(), we previously fell through the loop, without actually reloading the counter. This resulted in us attempting to spin on weird, inconsistent state, such as NULL threads. The fix is very simple, just retry the whole loop if tryread() fails. This will re-read the counter and thus we should see consistent state. Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

We could, with the old logic, observe wrong states coming from sequences like: /* prev->active_mm = mm */ if (prev->active_mm) { cpumask_unset_atomic(&mm->active_mask, cpu); /* cpu no longer in mm->active_mask */ } if (mm != thread->owner->get_aspace()) { /* skipped, never runs, we get the correct mm, but we are no longer in the active_mask, and will miss any TLB shootdowns */ } This resulted in easy to reproduce crashes resulting from TLB inconsistency with regards to user thread stacks, when spawning lots of threads. First found by fio with posixaio, and later reproduced with a very simple C++ test program: #include <thread> int main() { for (;;) { std::thread t{[](){}}; t.detach(); } } Fix the problem by reordering the two operations. We now load the new mm, if required, and only later drop ourselves (the active_mm). Unsetting ourselves from the mm active_mask only happens if we indeed switched mm's, and has an ordering requirement - it NEEDS to happen after arch_load_process, aka the new mm was already loaded. It's okay that we are in two active_masks at once. Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

Before aee3360, simply holding the vm_object's lock would mean that vm_obj_remove_page could simply exclude itself from anyone trying to grab a page. Now, after that commit, vmo_get stopped taking the lock. This manifested itself in brokenness, with reclaim racing with filemap code trying to grab a page from the cache. To fix that, we introduce the concept of a "frozen page", which is simply a page with refcount = 0. The code in vmo_get() already recognized such a concept, with page_try_get() aborting() if refcount = 0. This concept is used in a change in semantics, where vm_obj_remove_page() itself freezes the page before removing it and returning (note that freezing is an atomic operation, and may fail). Fixes: aee3360 ("filemap: Remove the locking in vmo_get") Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

Releasing the file is actively wrong (and a UAF) if it destroys the inode before we get to releasing the vm_obj. Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

heatd added 7 commits September 6, 2025 02:23

mm/pagealloc: Add page poisoning on alloc

23f546c

This helps debugging uninitialized memory issues, and may improve security by wiping old page contents. There is however a performance penalty. Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

sched: Grab the scheduler lock as irqsave in sched_append_to_queue

f740f90

This fixes a found lockup when an IRQ woke a thread up, as a thread was being spawned and added to the queue. Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

mm/munmap: Only put the file reference after the vm_obj

7e97456

Releasing the file is actively wrong (and a UAF) if it destroys the inode before we get to releasing the vm_obj. Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

mm/kasan: Properly check if we should skip reporting a KASAN failure

dc44c69

Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

heatd merged commit c922fa1 into master Sep 6, 2025
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stable fixes #165

Stable fixes #165

Uh oh!

heatd commented Sep 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stable fixes #165

Stable fixes #165

Uh oh!

Conversation

heatd commented Sep 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants