Exception handling for privileged mode code
Jeffrey Lee (213) 6048 posts |
There are a number of situations in which the kernel/OS flattens some or all of the privileged mode stacks – when recovering from unhandled aborts, when the Error or Exit environment handlers are invoked, when new programs start, etc. However there’s no simple way for code to determine when these events occur, and for some of the cases code can only discover that the stack has been flattened after it’s already happened. This causes serious problems for privileged mode code which wants to use mutexes/semaphores – if a SWI locks a mutex, and then something crashes before the SWI is able to unlock the mutex, that mutex is likely going to stay locked forever and break a significant portion of the system. As we continue to make the OS multicore-friendly this is going to become a major problem. To try and solve this problem, I’ve implemented a simple form of stack unwinding that allows special stack frames to be inserted into the stack. Each of these frames will contain a pointer to a handler function and an arbitrary number of arguments. Before a stack gets flattened, the kernel will walk the stack and call the handlers in turn, to allow them to perform any relevant recovery actions. The OS doesn’t enforce any specific privileged mode stack frame format, and the toolchain/source code doesn’t allow for the generation of stack unwind tables which would allow for unwinding of arbitrary frames. Also, the OS allows arbitrary third-party code to run in privileged modes and put things on the privileged mode stacks, so if we were to mandate the use of a standard stack frame layout or unwind tables then we’d have to break compatibility with all current third-party code. So to allow the exception handler stack frames to be reliably located, I’ve gone with the approach of having them all join with each other in a linked list, with the pointer to the head node stored near the base (low address) of the stack, next to where the C relocation offsets are stored. There’s currently no way for a handler to stop the stack unwinding halfway (e.g. like a C++ “catch” handler is able to) – but this could potentially be added if we need it. For some reason I call this system by the acronym “SEH”, which probably stands for “Stack-based Exception Handlers”. This kernel merge request contains the main code:
Other places in the OS which flatten stacks (which I’ve found) are:
FPEmulator & VFPSupport have been tackled indirectly by updating them to use the new OS_ClaimProcessorVector, since that allows the stack flattening (which occurs in the undefined instruction handler) to be moved out of the modules and into the kernel. The WIP changes to the SMP module also include some basic SEH support (triggering unwinding when flattening the stacks). FileSwitch & RTSupport haven’t been updated yet, but should be pretty straightforward. What are people’s thoughts on this approach? Are there are any issues with the current implementation? I’ve mainly been focusing on getting things working for the SMP module, so I haven’t yet checked if the code causes any problems for normal day-to-day use. But I figured it would be a good idea to ask for feedback now, before I spend too long polishing it only to be told that there’s a much better way of doing things. |
David Pitt (3386) 1248 posts |
Whether SEH is a ‘good idea’ or not is a trifle beyond my competence, but Titanium and Pi ROMs containing the SEH Kernel have been built and are running just fine. Hope that may be of some use. |
Sprow (202) 1161 posts |
This sounds interesting, I can’t spot any flaws in the proposal, but do have 3 queries (1 of which I suspect the answer is “no”):
|
Jeffrey Lee (213) 6048 posts |
There is SeriousErrorV, which exists to allow the Debugger to capture the stacks, and to allow RTSupport to detect when the kernel flattens the stacks (because when the kernel flattens the stacks and resets IRQsema it’s essentially forcing a context switch out of the active RTSupport thread and into the foreground thread) Of course, SeriousErrorV is only invoked for serious errors (unhandled aborts), not for any of the other stack-flattening cases. Maybe we’ll need service calls/vectors for those other cases, but I think that in most cases the only reason something will care about the stacks being flattened will be if something important is stored on the stack (either some data or the execution state of a function) – in which case it should use SEH to listen for the flattening, not a service call/vector.
Probably – I just chucked it into OS_RSI 6 because that was easiest. Another thing that’s probably worth changing is the amount of “implementation detail” exposed – anything that uses SEH needs to have hardcoded knowledge of the node size, and at least partial knowledge of the node format. And anything that creates new privileged mode stacks needs to know that a single word at a specific (OS_RSI 6 defined) offset needs to be zeroed. More of that implementation detail could be hidden away inside the kernel, providing greater flexibility to change the implementation in the future. The only thing which we can’t really get rid of is the hardcoded knowledge of the node size, because that’ll make it a lot more awkward to use from C code.
IIRC there’s a document somewhere which says that the bottom N words of the stack are reserved – I’ll have to try and find it again to see exactly what it says. |
Jeffrey Lee (213) 6048 posts |
PRM 4, talking about the usermode chunked stack: The seven words above the stack chunk structure are reserved to Acorn. The stack-limit register points 512 bytes above this (ie 560 bytes above the base of the stack chunk). I.e.:
The LibInit SWI calls document that “stack base + 20” (i.e. SL-540) is the library static data offset and “stack base + 24” (SL-536) is the client static data offset. For modules, things are a bit different because there’s no stack chunking or stack chunk structures. However, the SharedCLibrary_LibInitModule documentation says: Note: You must save the words at offsets +20 and +24 from the returned stack base. Module veneers set the SL register to stack base + 540, which means that the relocation offsets are always at the same place (SL-540 & SL-536) regardless of CPU mode. I’m not seeing anything obvious in the PRMs which mentions any restrictions on address from SL to SL-532 (for privileged mode stacks). However, both TaskWindow and RTSupport assume that the seven words at the base of the SVC stack contain thread-local values which need preserving when they copy the stacks around during context switches. I.e. the notes in the PRM about SL-540 to SL-512 being reserved apply to both usermode chunked stacks and privileged mode fixed stacks. SEH currently stores the node head pointer at stack base + 8, i.e. right after the relocation offsets. |
Simon Willcocks (1499) 540 posts |
FWIW, a lock-unwinding system sounds goo to me, but of course the memory they were protecting could still be corrupted. |
Sprow (202) 1161 posts |
You’ve done better than me at finding details, so the takeaways here are: there are 7 words, 2 are definitely used, 5 currently unused.
Could the node size include a version flag perhaps, or the node size (discovered via an API) be used to indirectly encode the node format? That works just as long as any future changes to the node format result in it changing size too. Exporting function pointers via OS_ReadSysInfo 6 looks a bit odd, is it worth a SWI instead? I found OS_TaskControl 0 tucked away at the bottom of the Select docs, maybe using some higher up reason codes there? At least avoids allocating a new SWI number in the precious < 0×100 block. |
Jon Abbott (1421) 2654 posts |
This sounds like a sensible proposal.
Is this via the kernel SeriousErrorV handler? This may not be relevant to your proposal, but I flatten the stacks when terminating tasks or handling fatal Aborts triggered by tasks being monitored by ADFFS. This is done either when SeriousErrorV Collect is triggered or via the IRQ hardware vector when CTRL-SHIFT-F12 is detected. I’d previously coded this in a cooperative way using the Collect/Recovery process, but I think at the time SeriousErrorV wasn’t fully implemented so I opted for my code to take over the recovery at the Collect stage and not pass it back to the kernel. With this proposal, I probably need to recode this to use the correct SeriousErrorV process, with a stack frame inserted. Once the way the routines are exposed is agreed, could you post a link to the documentation please?
PRM4-249 – Stack chunk Format PRM4-243 also mentions two types of stack frames, one used by Pascal/Modula-2 and the other by C. Should the former be deprecated, if it hasn’t already? Whilst talking about C, does something need to be done about C’s nasty hack for finding the bottom of the stack – BIC Rx, R13, … I think? Replace it with an SWI perhaps? |
Rick Murray (539) 13908 posts |
A SWI to flatten important stacks… Hmm…
That’s not the MOVs with LSR#20 then LSL#20? |
André Timmermans (100) 656 posts |
No a SWI to get the address of the code to flatten the stack. |
Jon Abbott (1421) 2654 posts |
Undocumented SWI – need I say any more?
Yes, that sounds like it |
Steve Pampling (1551) 8198 posts |
I’m not clear whether it exists anywhere other than the Select fork, but the documents over there do cover it. General application space execution changes Common usages of this form of operation are :
OS_TaskControl 0 (Read address of stacks reset code) |
Jeffrey Lee (213) 6048 posts |
changing is the amount of “implementation detail” exposed […] The only thing which we can’t really get rid of is the hardcoded knowledge of the node size, because that’ll make it a lot more awkward to use from C code. Instead of having the kernel tell programs “this is the node size you must use”, it might be better to go in the opposite direction, where programs tell the kernel “this is the node size/ABI version I can support”. I.e. if we’re hiding the details of the node format from applications, then that any program which wants to use SEH will have to call a SWI to get pointers to the functions that are used for pushing/popping nodes or unwinding the stack. If that SWI accepts a node size/ABI version argument then the kernel will be able to select an appropriate implementation of the functions (or complain if the version isn’t supported). We just need to make sure that each node has a header indicating the size/version so that the unwinding code knows how to deal with each node it sees.
OS_TaskControl looks like a good fit, yes – I’d never spotted that SWI before. Before a stack gets flattened, the kernel will walk the stack and call the handlers in turn, to allow them to perform any relevant recovery actions No, calling SeriousErrorV doesn’t trigger unwinding. For serious errors raised by the kernel, the SEH unwinding occurs just after the SeriousErrorV_Collect call, just before the SVC stack gets flattened.
Sure
I think the stack frames (& stack chunk header format) are the same, the only difference is which stack extension routines the languages use?
Only if you want to break compatibility with every current C module binary. The APCS-R spec in the PRM mandates that the privileged mode stacks are aligned to megabyte boundaries: In SVC and IRQ modes (collectively called module mode) SL_LWM is implicit in sp: it |
David Pitt (3386) 1248 posts |
It is only the SWI number defined, no actual code. |
Rick Murray (539) 13908 posts |
Like NVMemory (wasn’t that originally NCOS?), it’s quite possible that it is defined so that it is known not to be “free for use”. |
Jon Abbott (1421) 2654 posts |
This would certainly be useful for me if implemented.
Are you saying that if the C compiler was changed today to compile code that uses a legal means to obtain the stack base, it will break existing C Modules? I can understand the stack base can’t me moved from a boundary to keep existing code working, but I don’t understand why we’re persisting with non-legal means to obtain the stack for newly compiled code. |
Stuart Swales (8827) 1367 posts |
It’s a guarantee. |
Jeffrey Lee (213) 6048 posts |
Are you saying that if the C compiler was changed today to compile code that uses a legal means to obtain the stack base, it will break existing C Modules? Ah, I thought you were suggesting that the stack base should be moved from the megabyte boundary. In which case, my answer is: Rounding down R13 to a megabyte boundary is the legal means for obtaining the stack base. It’s ugly and horrible and inflexible but it’s legal. But, having said that, one of the changes that the SMP kernel introduces is a data structure which contains the reset addresses of each of the privileged mode stacks. This allows the system to properly cope with different threads having different SVC stacks, and different cores having different ABT/UND/IRQ stacks (each core has its own instance of the structure). It’d be easy enough to also add the stack base addresses to the structure, although as I’ve said it’d be a bit pointless since if we move the stacks away from the megabyte boundary then it’ll break all existing C modules (and some assembler ones which also try to check free stack space). |
André Timmermans (100) 656 posts |
Hm, having implementation independent stack handling code would allow the use of more efficient implementations on a 64-bit system. |
Jon Abbott (1421) 2654 posts |
Legal to me means explicitly asking the OS for the base – not trying to figure it out from R13.
It just seems like a wasted opportunity to me, SMP safe Modules/code need to be explicitly compiled and advertise their safe status to the OS so could ask for the stack base from the outset. If the OS is to implement thread isolation/security in the future, randomising the location of thread stacks should be on the agenda. |