Debug hooks
Jeffrey Lee (213) 6048 posts |
There are a couple of debug hooks that I’ve been making use of on a fairly regular basis, and I think it’s about time that they were made available to other people:
The problem I’m facing with enabling them for general use is that they’re very low-level – in order to do something useful with the hook you need to have a certain amount of low-level knowledge about the particular OS build you’re running on. E.g. for the unhandled exception debug hook, the obvious way of exposing it would be to make it a vector, which gets called in UND/ABT mode, before the SVC stack gets reset (currently it puts the SVC stack pointer to near the base of the stack, so that it can call the HangWatch SWI). But that would place a lot of restriction on what the handler can do – it won’t be able to call any SWIs (unless it resets the SVC stack itself). For HangWatch this is fine since it just dumps everything out to the serial port, but for a more user-friendly system I’d imagine you’d want to save the stack dumps out to disc. Which means you’d have to do something like copy the stacks + register dump into a preallocated buffer and then set a flag which your module will poll from the foreground. There’s also no proper way of getting state data from other systems – e.g. getting a dump of all the RTSupport threads and their stacks, getting the current Wimp task, etc. So maybe we’d want some kind of broadcast message to be sent out to the system (service call? vector?) so that everything which has something important to report can dump it all into one buffer, with a header on each blob of data to describe what it is. Or maybe there’d be a linked list of buffers collected, so that the owner of the data can manage allocating the buffers, to make sure that there’s always enough memory available to store the data (how would the kernel or debug module or whatever know how big to make the buffer to make sure everything will fit?) Or for the SWI error logging, we have the problem that the return address of the SWI instruction isn’t always that useful. If the SWI was called from C then the return address will generally point to _swi or _kernel_swi, rather than the C function which called the SWI. So we’d want some kind of vector or service call that the error logger can call in order to unwind the stack and get the true address of the caller – something which will not only work with builtin stuff like the shared C library and BASIC (and complexities like the fact that ROM CLib will still be used even after a softload version has been loaded), but also third-party stuff like UnixLib, Lua, Python, Charm, etc. Although I guess we only really need to deal with languages in which modules can be implemented – if the SWI was called from an application then you’d hopefully have access to the application source code in order to insert source-level debugging for why a particular error is occurring. There’s also the question of how much support for backtraces we want to include directly in the OS. E.g. it would be useful if the kernel could detect certain important markers in the stack (SWI calls, service calls, interrupts, etc.) and annotate them, so that the debug module doesn’t have to have lots of knowledge about the internal structure of the particular OS version which it’s running on (the routine addresses will shift around depending on the machine type, and stack structure is subject to change over time). Anyone have any thoughts on how this kind of thing should be handled, preferably without it being something that will take all year to implement? :-) I realise that in terms of functionality there’ll probably be a lot of overlap with RISC OS 6’s diagnostic stuff – I should probably have a read over that and see how applicable that would be for RISC OS 5. |
Jeffrey Lee (213) 6048 posts |
I’ve been working on the exception dump stuff a bit for the past few days. Rather than try and come up with an all-singing, all-dancing system which will work with everything, I’m focusing on the basics which will help with 90% of cases. So I’m expecting the eventual result to be something like the following:
The current stack annotation code is able to spot SWI calls and most function calls. I’m also hoping to get some support for detecting interrupts, although I don’t think I’ll be able to detect when an interrupt handler switches from IRQ mode to SVC mode (there generally won’t be any obvious markers left on the stack) – so making sense of the stack dumps is still going to be a bit tricky. I’m also not sure yet whether I’ll be able to do anything useful to clean up the red herrings found in regular stack dumps (I think it should be possible, if I teach the code to group the annotations by module boundary – so that e.g. it won’t get confused if it sees an old call for the Wimp in the middle of a bunch of function calls for FileCore). But even if it thinks something is a red herring it will still include it in the dump, just in case it got it wrong. Doing the stack dump annotation on the machine that’s had the crash is the best way of getting accurate stack frame information (since you do need to be able to probe memory for instruction sequences), but it has the downside that there generally won’t be any debug symbols available. So to start with I expect the code will just report the module name + offset for any functions it finds (or the function name, if dealing with something that’s been built with function name embedding enabled). For developers I’ll probably add an extra command to profanal to allow it to read in the text dump and add extra annotations (since I’ve already got the logic there for loading the ROM debug symbols). Later on I might look at some way of squeezing the debug symbols into the ROM itself – maybe as a big data blob after all the modules? If we limit it to just the address + filename + symbol name for each function/symbol then I’d expect there to be enough space to fit everything in, especially if duplicate strings are eliminated. |
Rick Murray (539) 13806 posts |
An ability to unwind C stack frames? |
Jeffrey Lee (213) 6048 posts |
Yep, APCS stack frames are detected. They also help quite a bit with red herring detection, since each stack frame contains a pointer to the next frame. |