Is it possible to detect a system crash?
Rick Murray (539) 13857 posts |
Is it possible to tell if RISC OS has crashed? What comes to mind is to set up a callback and see if the callback fires. Would this work?1 What are the requirements for RISC OS to process outstanding callbacks – specifically, what does “when RISC OS is threaded but idle” actually mean? It looks like USR mode with interrupts enabled. How often does this happen? Or does it describe the state upon returning to a user program from something like Wimp_Poll? [well, when Wimp_Poll exits, it should exit to the callee in USR mode with IRQs enabled; but I’m not sure I’d call this situation “idle”!] 1 Not as strange a question as it seems – I crashed the Beagle (trashed R14 in module code) while developing a module on the Beagle, yet the blinky light via BeagLEDs kept on blinking – so TickerV was still active even if nothing else seemed to be! I’m guessing OS_CallAfter/OS_CallEvery will likewise be different to OS_AddCallBack in behaviour, guessing that CallAfter/Every will use TickerV when Callback will use an aspect of RISC OS’s operation? |
Rick Murray (539) 13857 posts |
I should probably throw together a small module to fire off callbacks to make a beep every ten seconds, and then try various ways of (ab)using the system to see how it behaves. |
Jeffrey Lee (213) 6048 posts |
Depends on the level of crash you’re interested in dealing with.
Some of those you can detect in software and try to recover from, others (if you can find suitable places to insert a heartbeat signal) are best dealt with by a hardware watchdog timer that will reset the machine when it all goes tits up.
It’ll detect if callbacks are working. But there are many ways of crashing a system such that callbacks continue to function :)
Off the top of my head, callbacks will fire in two situations:
CallAfter/CallEvery are tied directly into the 100Hz ticker interrupt. So as long as the timer is running and the OS isn’t completely trashed, your code will get called, including in situations where the CPU never returns to user mode (or doesn’t return for a long time). Callbacks, on the other hand, rely on the OS returning to user mode, and so are probably a better method of determining whether the system has crashed (since most interesting software you’d want to use runs in user mode). Of course if the system has crashed you won’t receive the callback, so you’ll need to use some kind of watchdog to check for an absence of callbacks and go “yep, the system’s crashed”. E.g. a CallEvery, a HAL timer configured to generate FIQs (which is what HangWatch uses), or a proper hardware watchdog timer. |
Jeffrey Lee (213) 6048 posts |
Sprow recently prompted me to uncover this beauty: 10 SYS "OS_ChangeEnvironment",6,511<<20 20 !0=0 RUN |
Rick Murray (539) 13857 posts |
Exactly this, the Pi’s BCM2-something-something-something has a hardware watchdog that expires in about 16 seconds, so prodding it once per second should be plenty. Note to observant readers: Jeffrey quotes a number of things not present in my message. I posted, read chunks of the PRMs on program environment, then updated the original several times. Oh, for a preview facility in Beast!
I worked this one out for myself. But it was interesting to read all the same – what worries me is printing. I recall that printing on my A5000 took forever. Thankfully, the printing mechanism is similar to Wimp redraws, so there should be hope for getting callbacks during. |