kernel: schedule upcall, bool -> Result#2656
Conversation
aad91a9 to
89af0c0
Compare
I'm reasonably confident that we, at least somewhere in Tock 2.0 development, decided to not tell capsules whether an Currently, |
|
For reference, reasons schedule() can fail:
In practice, (1) cannot happen, because the grant.enter() would fail first, and the process has to remain valid while the grant is entered, and (4) won't happen, because that would be a kernel bug that must be fixed and everything would break immediately. For (3), subscribe_num is almost certainly going to be a hardcoded value (all instances today are hardcoded), and I don't see any value in a runtime check. For (2) I agree capsules shouldn't do anything in response to a null upcall. Then (5) is the only case I could see returning an error, but I don't see what a capsule could do with that information. |
Here again I think IPC might be the interesting kid on the block. In particular, if process A does something, and wants to notify process B via IPC, it does so, enqueues a task and all is happy. If A keeps running, and during this same quanta (or just generally before B gets to run, due to process priorities, etc) and does another something which requires notifying B via IPC, then [eventually] B will need to run and drain IPC events. If A finds it is generating many events that B never responds to, it may wish to |
|
@bradjc I generally agree with your assessment, although I'm not sure about (3). You're right in that the subscribe number is hardcoded in current capsules, but given we're providing interfaces to downstream developers and this parameter could well be determined at runtime -- regardless of whether there is a good reason to do so -- I'm of the opinion we should return an error in this case. If we are returning an error in one case, we might as well inform the capsule about all cases except (4), as that would indicate a serious kernel bug and might even justify a panic, and (2), which is to say that a null-callback is defined as a valid callback which is simply never delivered or from the capsules point of view ignored by the process. |
|
I can certainly see the argument "wait I passed in an invalid subscribe_num and didn't get an error?!?". I propose then returning enum UpcallError {
InvalidSubscribeNum,
QueueFull,
// Process is inactive (can never happen) or process.tasks was empty (won't ever happen).
// If either _did_ happen, well that would be a kernel error.
// We do this in ProcessError to solve this same problem, and make it clear
// that users of the error type shouldn't try to handle this case.
KernelError,
} |
|
@bradjc I like that approach. I've taken at stab at implementing it here: ce9c084. If @ppannuto doesn't mind I can push it to this branch. I wasn't sure whether |
The |
I say push, we can always remove the commit.
I would say upcall.rs, and that upcall things being in grant.rs is the exception we would otherwise like to avoid. |
|
I'm marking this 2.0 because I think we either do it now while we are thinking about it and change capsules, or we leave it |
89af0c0 to
ce9c084
Compare
Pushed my changes. This still does not address the unused
Makes sense. Will move it there. |
| /// A kernel-internal invariant has been violated. | ||
| /// | ||
| /// This error should never happen. It can be returned if the | ||
| /// process is inactive (which should be caught by | ||
| /// [`Grant::enter`]) or `process.tasks` was taken. | ||
| /// | ||
| /// These cases cannot be reasonably handled. | ||
| KernelError, |
There was a problem hiding this comment.
Is this error worth including? I am imaging me as a handler of this code -- what do I do? The options are
- Just ignore the error and hope (same as if an error had not been returned)
panic!, the world is broken
I think that making capsule authors choose between these will likely just result in a lot of arbitrary coin flips between these two options.
There was a problem hiding this comment.
I suppose the issue is that panic!ing in the kernel is not an option (even if these error cases were to occur, the system might not be in an entirely unrecoverable state), whereas returning Ok(()) as if the callback was scheduled is obviously misleading.
This is the cheap way out, blaming the capsule code for either panic!ing or misbehaving in response to this error. :)
There was a problem hiding this comment.
It's convenient to include the error, because rust forces us to handle cases that can "never" happen. I think it is better to have the error than do something like hardcode panics into the core kernel binary which will never be triggered, but still take up space.
From a capsule author's point of view, I think it's pretty clear that it's not an error they should handle, but could report upstream to us if they ever saw it.
However, we do not want capsule authors to include a debug or panic, because that just re-creates the size issue (and probably worse). But that is also true of handling this return value at all, unless a capsule can do something meaningful it should just ignore the return value.
I agree. I am worried that making this a |
Well, the boilerplate is only adding
No capsules use the return value today. However, if we make change the return to Result now, then we can update the capsule interface now as part of 2.0, and if we expand the error types later we don't have to go back and change any capsules that do not need to handle schedule errors. |
I think I am Is there any increase in code size? I've become a bit wary of flexibility that we might need in the future that increases code size now. If the increase is minimal then I'm on board. |
02ab15b to
9359719
Compare
b06daed to
6f7fd27
Compare
I think it saves space. Looking through https://github.com/tock/tock/pull/2656/checks?check_run_id=3030271308 a lot of the numbers look smaller. |
21d1568 to
78af01a
Compare
|
I think this is waiting on @ppannuto to review his own PR with the latest changes. |
ppannuto
left a comment
There was a problem hiding this comment.
You go away for one long weekend and your PR flies on its own :P
Looks good to me overall; one small interface question
Previously this function only returned a boolean, indicating whether enqueueing the task worked. Users of this function might be interested in the precise reason why enqueueing the task failed. In particular, whether the process is no longer alive, the process' task queue is full, or some kernel-internal error occurred. Signed-off-by: Leon Schuermann <leon@is.currently.online>
Return a Result on GrantUpcallTable::schedule_upcall, indicating whether an error occurred and the precise reason for the error. Users of this API may utilize this information to, for example, adjust the rate at which Upcalls are scheduled or reschedule an Upcall which has been lost due to the process' task queue being full. This commit just changes the kernel interfaces, it does not address all of the new unused Result warnings at all of the capsule scheduling call sites. Co-authored-by: Leon Schuermann <leon@is.currently.online>
No cases where we need to handle a failed schedule.
78af01a to
0f09cd2
Compare
|
bors r+ |
2656: kernel: schedule upcall, bool -> Result r=bradjc a=ppannuto ### Pull Request Overview Following up on tock#2639 (comment), this looks at changing the signature of scheduling upcalls to return a `Result` rather than a simple `bool`. To start, this just looks at the kernel interfaces (i.e. this introduces 110 "unused Result" warnings in capsule code right now). One interesting thing surfaced is that IPC does use the result of upcall scheduling to report to the IPC sender whether the receiver was scheduled successfully or not. Previously, this treated scheduling the null upcall as a failure. I'm not 100% sure that's right though -- a receiver has every right (although it would be odd) to ignore notifications. I used the new error code granularity to separate out this case right now; though maybe the best solution is to simply bubble up the error all the way to userspace for IPC. --- This commit just changes the kernel interfaces, it does not address all of the new unused Result warnings at all of the capsule scheduling call sites. ### Testing Strategy N/A yet [compiling] ### TODO or Help Wanted Do these error code assignments make sense? Should IPC treat scheduling a null upcall as an error or success? Should I move forward with the rest of the capsules? ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [] Ran `make prepush`. Co-authored-by: Leon Schuermann <leon@is.currently.online> Co-authored-by: Pat Pannuto <pat.pannuto@gmail.com> Co-authored-by: Brad Campbell <bradjc5@gmail.com>
Pull Request Overview
Following up on #2639 (comment), this looks at changing the signature of scheduling upcalls to return a
Resultrather than a simplebool.To start, this just looks at the kernel interfaces (i.e. this introduces 110 "unused Result" warnings in capsule code right now).
One interesting thing surfaced is that IPC does use the result of upcall scheduling to report to the IPC sender whether the receiver was scheduled successfully or not. Previously, this treated scheduling the null upcall as a failure. I'm not 100% sure that's right though -- a receiver has every right (although it would be odd) to ignore notifications. I used the new error code granularity to separate out this case right now; though maybe the best solution is to simply bubble up the error all the way to userspace for IPC.
This commit just changes the kernel interfaces, it does not address
all of the new unused Result warnings at all of the capsule scheduling
call sites.
Testing Strategy
N/A yet [compiling]
TODO or Help Wanted
Do these error code assignments make sense?
Should IPC treat scheduling a null upcall as an error or success?
Should I move forward with the rest of the capsules?
Documentation Updated
/docs, or no updates are required.Formatting
make prepush.