Exception handling for privileged mode code

20 posts, 9 voices

Mar 1, 2022 11:13pm Jeffrey Lee (213) 6048 posts	There are a number of situations in which the kernel/OS flattens some or all of the privileged mode stacks – when recovering from unhandled aborts, when the Error or Exit environment handlers are invoked, when new programs start, etc. However there’s no simple way for code to determine when these events occur, and for some of the cases code can only discover that the stack has been flattened after it’s already happened. This causes serious problems for privileged mode code which wants to use mutexes/semaphores – if a SWI locks a mutex, and then something crashes before the SWI is able to unlock the mutex, that mutex is likely going to stay locked forever and break a significant portion of the system. As we continue to make the OS multicore-friendly this is going to become a major problem. To try and solve this problem, I’ve implemented a simple form of stack unwinding that allows special stack frames to be inserted into the stack. Each of these frames will contain a pointer to a handler function and an arbitrary number of arguments. Before a stack gets flattened, the kernel will walk the stack and call the handlers in turn, to allow them to perform any relevant recovery actions. The OS doesn’t enforce any specific privileged mode stack frame format, and the toolchain/source code doesn’t allow for the generation of stack unwind tables which would allow for unwinding of arbitrary frames. Also, the OS allows arbitrary third-party code to run in privileged modes and put things on the privileged mode stacks, so if we were to mandate the use of a standard stack frame layout or unwind tables then we’d have to break compatibility with all current third-party code. So to allow the exception handler stack frames to be reliably located, I’ve gone with the approach of having them all join with each other in a linked list, with the pointer to the head node stored near the base (low address) of the stack, next to where the C relocation offsets are stored. There’s currently no way for a handler to stop the stack unwinding halfway (e.g. like a C++ “catch” handler is able to) – but this could potentially be added if we need it. For some reason I call this system by the acronym “SEH”, which probably stands for “Stack-based Exception Handlers”. This kernel merge request contains the main code: Definitions for the stack frame format (in hdr.KernelWS & h.SEH) Unwinding code (c.SEH) Assembler macros & C functions for pushing/popping frames Kernel stack flattening locations will now perform SEH unwinding The key routines & values are exposed via OS_ReadSysInfo 6, to allow other components to use SEH Other places in the OS which flatten stacks (which I’ve found) are: FileSwitch Run_UndatedFile RTSupport return_to_foreground FPEmulator & VFPSupport FPEmulator & VFPSupport have been tackled indirectly by updating them to use the new OS_ClaimProcessorVector, since that allows the stack flattening (which occurs in the undefined instruction handler) to be moved out of the modules and into the kernel. The WIP changes to the SMP module also include some basic SEH support (triggering unwinding when flattening the stacks). FileSwitch & RTSupport haven’t been updated yet, but should be pretty straightforward. What are people’s thoughts on this approach? Are there are any issues with the current implementation? I’ve mainly been focusing on getting things working for the SMP module, so I haven’t yet checked if the code causes any problems for normal day-to-day use. But I figured it would be a good idea to ask for feedback now, before I spend too long polishing it only to be told that there’s a much better way of doing things.

Mar 2, 2022 11:26am David Pitt (3386) 1248 posts	Whether SEH is a ‘good idea’ or not is a trifle beyond my competence, but Titanium and Pi ROMs containing the SEH Kernel have been built and are running just fine. Hope that may be of some use.

Mar 2, 2022 1:34pm Sprow (202) 1168 posts	This sounds interesting, I can’t spot any flaws in the proposal, but do have 3 queries (1 of which I suspect the answer is “no”): Is it worth having a service call or vector to pre-announce that an SEH list is about to be walked? And/or perhaps when the list is done and the stack is flattened. I realise there’s a paradox if you’re flattening the stack due to a problem that doing a service call has the potential to make things worse! The post flattened one would be pretty safe though. Exporting function pointers via OS_ReadSysInfo 6 looks a bit odd, is it worth a SWI instead? I realise there’s precedent with the HAL TX/RX functions, but for the most part RSI 6 is just a table of internal kernel workspaces, rather than entry points into the kernel. The module relocation offsets are part of a dark and mysterious bottom-of-stack place I’ve never really understood. Is it documented anywhere? How do we know it’s safe to allocate offset X rather than Y?

Mar 2, 2022 2:38pm Jeffrey Lee (213) 6048 posts	Is it worth having a service call or vector to pre-announce that an SEH list is about to be walked? And/or perhaps when the list is done and the stack is flattened. There is SeriousErrorV, which exists to allow the Debugger to capture the stacks, and to allow RTSupport to detect when the kernel flattens the stacks (because when the kernel flattens the stacks and resets IRQsema it’s essentially forcing a context switch out of the active RTSupport thread and into the foreground thread) Of course, SeriousErrorV is only invoked for serious errors (unhandled aborts), not for any of the other stack-flattening cases. Maybe we’ll need service calls/vectors for those other cases, but I think that in most cases the only reason something will care about the stacks being flattened will be if something important is stored on the stack (either some data or the execution state of a function) – in which case it should use SEH to listen for the flattening, not a service call/vector. Exporting function pointers via OS_ReadSysInfo 6 looks a bit odd, is it worth a SWI instead? Probably – I just chucked it into OS_RSI 6 because that was easiest. Another thing that’s probably worth changing is the amount of “implementation detail” exposed – anything that uses SEH needs to have hardcoded knowledge of the node size, and at least partial knowledge of the node format. And anything that creates new privileged mode stacks needs to know that a single word at a specific (OS_RSI 6 defined) offset needs to be zeroed. More of that implementation detail could be hidden away inside the kernel, providing greater flexibility to change the implementation in the future. The only thing which we can’t really get rid of is the hardcoded knowledge of the node size, because that’ll make it a lot more awkward to use from C code. The module relocation offsets are part of a dark and mysterious bottom-of-stack place I’ve never really understood. Is it documented anywhere? How do we know it’s save to allocate offset X rather than Y? IIRC there’s a document somewhere which says that the bottom N words of the stack are reserved – I’ll have to try and find it again to see exactly what it says.

Mar 2, 2022 10:26pm Jeffrey Lee (213) 6048 posts	IIRC there’s a document somewhere which says that the bottom N words of the stack are reserved – I’ll have to try and find it again to see exactly what it says. PRM 4, talking about the usermode chunked stack: The seven words above the stack chunk structure are reserved to Acorn. The stack-limit register points 512 bytes above this (ie 560 bytes above the base of the stack chunk). I.e.: SL-560 is the base of the allocated memory for the stack SL-540 is the start of the seven reserved words SL-512 is the address of the stack chunk structure SL is (nominally) the lowest usable address (the APCS doc supplied with the DDE says that SL to SL-256 may be used; this extra space reduces the overhead of stack limit checking, e.g. by allowing leaf functions which use <= 256 bytes of stack space to not perform limit checking) The LibInit SWI calls document that “stack base + 20” (i.e. SL-540) is the library static data offset and “stack base + 24” (SL-536) is the client static data offset. For modules, things are a bit different because there’s no stack chunking or stack chunk structures. However, the SharedCLibrary_LibInitModule documentation says: Note: You must save the words at offsets +20 and +24 from the returned stack base. You must do this before exiting your module initialisation code. These words contain the shared libraries static data offset and the client static data offset (the offset you must use when accessing your static data). These must be restored in the static data offset locations at offsets +00 and +04 from the base of the SVC stack when you are re-entering the module in SVC mode (e.g. in a SWI handler). When restoring the static data offsets you must save the previous static data offsets around the module entry. Module veneers set the SL register to stack base + 540, which means that the relocation offsets are always at the same place (SL-540 & SL-536) regardless of CPU mode. I’m not seeing anything obvious in the PRMs which mentions any restrictions on address from SL to SL-532 (for privileged mode stacks). However, both TaskWindow and RTSupport assume that the seven words at the base of the SVC stack contain thread-local values which need preserving when they copy the stacks around during context switches. I.e. the notes in the PRM about SL-540 to SL-512 being reserved apply to both usermode chunked stacks and privileged mode fixed stacks. SEH currently stores the node head pointer at stack base + 8, i.e. right after the relocation offsets.

Mar 3, 2022 12:11am Simon Willcocks (1499) 552 posts	FWIW, a lock-unwinding system sounds goo to me, but of course the memory they were protecting could still be corrupted.

Mar 3, 2022 10:49am Sprow (202) 1168 posts	However, both TaskWindow and RTSupport assume that the seven words at the base of the SVC stack contain thread-local values which need preserving when they copy the stacks around during context switches. You’ve done better than me at finding details, so the takeaways here are: there are 7 words, 2 are definitely used, 5 currently unused. changing is the amount of “implementation detail” exposed […] The only thing which we can’t really get rid of is the hardcoded knowledge of the node size, because that’ll make it a lot more awkward to use from C code. Could the node size include a version flag perhaps, or the node size (discovered via an API) be used to indirectly encode the node format? That works just as long as any future changes to the node format result in it changing size too. Exporting function pointers via OS_ReadSysInfo 6 looks a bit odd, is it worth a SWI instead? Probably – I just chucked it into OS_RSI 6 because that was easiest. I found OS_TaskControl 0 tucked away at the bottom of the Select docs, maybe using some higher up reason codes there? At least avoids allocating a new SWI number in the precious < 0×100 block.

Mar 3, 2022 11:29am Jon Abbott (1421) 2661 posts	This sounds like a sensible proposal. Before a stack gets flattened, the kernel will walk the stack and call the handlers in turn, to allow them to perform any relevant recovery actions Is this via the kernel SeriousErrorV handler? This may not be relevant to your proposal, but I flatten the stacks when terminating tasks or handling fatal Aborts triggered by tasks being monitored by ADFFS. This is done either when SeriousErrorV Collect is triggered or via the IRQ hardware vector when CTRL-SHIFT-F12 is detected. I’d previously coded this in a cooperative way using the Collect/Recovery process, but I think at the time SeriousErrorV wasn’t fully implemented so I opted for my code to take over the recovery at the Collect stage and not pass it back to the kernel. With this proposal, I probably need to recode this to use the correct SeriousErrorV process, with a stack frame inserted. Once the way the routines are exposed is agreed, could you post a link to the documentation please? PRM 4, talking about the usermode chunked stack PRM4-249 – Stack chunk Format PRM4-243 also mentions two types of stack frames, one used by Pascal/Modula-2 and the other by C. Should the former be deprecated, if it hasn’t already? Whilst talking about C, does something need to be done about C’s nasty hack for finding the bottom of the stack – BIC Rx, R13, … I think? Replace it with an SWI perhaps?

Mar 3, 2022 1:12pm Rick Murray (539) 14048 posts	I found OS_TaskControl 0 tucked away at the bottom of the Select docs A SWI to flatten important stacks… Hmm… C’s nasty hack for finding the bottom of the stack – BIC Rx, R13, … I think? That’s not the MOVs with LSR#20 then LSL#20?

Mar 3, 2022 4:26pm André Timmermans (100) 658 posts	A SWI to flatten important stacks… Hmm… No a SWI to get the address of the code to flatten the stack. You call it in you initialization code and save it somewhere, then call the code when required.

Mar 3, 2022 4:31pm Jon Abbott (1421) 2661 posts	I found OS_TaskControl 0 tucked away at the bottom of the Select docs Undocumented SWI – need I say any more? That’s not the MOVs with LSR#20 then LSL#20? Yes, that sounds like it

Mar 3, 2022 6:30pm Steve Pampling (1551) 8272 posts	Undocumented SWI – need I say any more? I’m not clear whether it exists anywhere other than the Select fork, but the documents over there do cover it. I think Jeffrey was proposing the same. General application space execution changes -———————————————————— In the past it has been the practice to flatten the SVC stack on application execution by knowing the stack top address. This is a specialised operation, and one which is not expected to be required by most clients. With RISC OS 4 a defined mechanism was provided for reading the SVC stack top. With the modern versions of RISC OS, starting with version 4.42, this operation is now forbidden for all external clients. RISC OS internal components will be changed as soon as possible to remove this operation. Common usages of this form of operation are : On aborts, to ensure that the stack state is safe (internal handlers) On application start (OS_FSControl 4, OS_Module 2) Within pre-emption of system calls (TaskWindow and others) Within context-switching operations (WindowManager and others) A new API has been created to perform this operation : OS_TaskControl 0 (Read address of stacks reset code) => R0 = 0 (reason code) <= R0 = pointer to code to call to reset the SVC and IRQ stack state

Mar 3, 2022 7:48pm Jeffrey Lee (213) 6048 posts	changing is the amount of “implementation detail” exposed […] The only thing which we can’t really get rid of is the hardcoded knowledge of the node size, because that’ll make it a lot more awkward to use from C code. Could the node size include a version flag perhaps, or the node size (discovered via an API) be used to indirectly encode the node format? That works just as long as any future changes to the node format result in it changing size too. Instead of having the kernel tell programs “this is the node size you must use”, it might be better to go in the opposite direction, where programs tell the kernel “this is the node size/ABI version I can support”. I.e. if we’re hiding the details of the node format from applications, then that any program which wants to use SEH will have to call a SWI to get pointers to the functions that are used for pushing/popping nodes or unwinding the stack. If that SWI accepts a node size/ABI version argument then the kernel will be able to select an appropriate implementation of the functions (or complain if the version isn’t supported). We just need to make sure that each node has a header indicating the size/version so that the unwinding code knows how to deal with each node it sees. I found OS_TaskControl 0 tucked away at the bottom of the Select docs, maybe using some higher up reason codes there? At least avoids allocating a new SWI number in the precious < 0×100 block. OS_TaskControl looks like a good fit, yes – I’d never spotted that SWI before. Before a stack gets flattened, the kernel will walk the stack and call the handlers in turn, to allow them to perform any relevant recovery actions Is this via the kernel SeriousErrorV handler? No, calling SeriousErrorV doesn’t trigger unwinding. For serious errors raised by the kernel, the SEH unwinding occurs just after the SeriousErrorV_Collect call, just before the SVC stack gets flattened. Once the way the routines are exposed is agreed, could you post a link to the documentation please? Sure PRM4-243 also mentions two types of stack frames, one used by Pascal/Modula-2 and the other by C. Should the former be deprecated, if it hasn’t already? I think the stack frames (& stack chunk header format) are the same, the only difference is which stack extension routines the languages use? Whilst talking about C, does something need to be done about C’s nasty hack for finding the bottom of the stack – BIC Rx, R13, … I think? Replace it with an SWI perhaps? Only if you want to break compatibility with every current C module binary. The APCS-R spec in the PRM mandates that the privileged mode stacks are aligned to megabyte boundaries: In SVC and IRQ modes (collectively called module mode) SL_LWM is implicit in sp: it is the next megabyte boundary below sp. Even though the SVC-mode and IRQ-mode stacks are not extensible, sl still points 512 bytes above a skeleton stack-chunk descriptor (stored just above the megabyte boundary).

Mar 3, 2022 8:07pm David Pitt (3386) 1248 posts	OS_TaskControl looks like a good fit, yes – I’d never spotted that SWI before. Found in OS5. It is only the SWI number defined, no actual code.

Mar 3, 2022 8:20pm Rick Murray (539) 14048 posts	Like NVMemory (wasn’t that originally NCOS?), it’s quite possible that it is defined so that it is known not to be “free for use”.

Mar 3, 2022 9:22pm Jon Abbott (1421) 2661 posts	OS_TaskControl 0 (Read address of stacks reset code) This would certainly be useful for me if implemented. Only if you want to break compatibility with every current C module binary Are you saying that if the C compiler was changed today to compile code that uses a legal means to obtain the stack base, it will break existing C Modules? I can understand the stack base can’t me moved from a boundary to keep existing code working, but I don’t understand why we’re persisting with non-legal means to obtain the stack for newly compiled code.

Mar 3, 2022 9:27pm Stuart Swales (8827) 1384 posts	non-legal means It’s a guarantee.

Mar 3, 2022 11:32pm Jeffrey Lee (213) 6048 posts	Are you saying that if the C compiler was changed today to compile code that uses a legal means to obtain the stack base, it will break existing C Modules? I can understand the stack base can’t me moved from a boundary to keep existing code working, but I don’t understand why we’re persisting with non-legal means to obtain the stack for newly compiled code. Ah, I thought you were suggesting that the stack base should be moved from the megabyte boundary. In which case, my answer is: Rounding down R13 to a megabyte boundary is the legal means for obtaining the stack base. It’s ugly and horrible and inflexible but it’s legal. But, having said that, one of the changes that the SMP kernel introduces is a data structure which contains the reset addresses of each of the privileged mode stacks. This allows the system to properly cope with different threads having different SVC stacks, and different cores having different ABT/UND/IRQ stacks (each core has its own instance of the structure). It’d be easy enough to also add the stack base addresses to the structure, although as I’ve said it’d be a bit pointless since if we move the stacks away from the megabyte boundary then it’ll break all existing C modules (and some assembler ones which also try to check free stack space).

Mar 4, 2022 10:19am André Timmermans (100) 658 posts	It’d be easy enough to also add the stack base addresses to the structure, although as I’ve said it’d be a bit pointless since if we move the stacks away from the megabyte boundary then it’ll break all existing C modules (and some assembler ones which also try to check free stack space). Hm, having implementation independent stack handling code would allow the use of more efficient implementations on a 64-bit system.

Mar 4, 2022 12:20pm Jon Abbott (1421) 2661 posts	It’s ugly and horrible and inflexible but it’s legal. Legal to me means explicitly asking the OS for the base – not trying to figure it out from R13. it’d be a bit pointless since if we move the stacks away from the megabyte boundary then it’ll break all existing C modules (and some assembler ones which also try to check free stack space). It just seems like a wasted opportunity to me, SMP safe Modules/code need to be explicitly compiled and advertise their safe status to the OS so could ask for the stack base from the outset. If the OS is to implement thread isolation/security in the future, randomising the location of thread stacks should be on the agenda.

Reply

To post replies, please first log in.

Forums → Code review →

Exception handling for privileged mode code

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options