Safeguarding the past, present and future of RISC OS for everyone

News | Downloads | Bugs | Bounties | Forums | Library

Forums → Community Support →

Is it possible to detect a system crash?

5 posts, 2 voices

Feb 24, 2014 8:41pm Rick Murray (539) 13857 posts	Is it possible to tell if RISC OS has crashed? What comes to mind is to set up a callback and see if the callback fires. Would this work?¹ What are the requirements for RISC OS to process outstanding callbacks – specifically, what does “when RISC OS is threaded but idle” actually mean? It looks like USR mode with interrupts enabled. How often does this happen? Or does it describe the state upon returning to a user program from something like Wimp_Poll? [well, when Wimp_Poll exits, it should exit to the callee in USR mode with IRQs enabled; but I’m not sure I’d call this situation “idle”!] ¹ Not as strange a question as it seems – I crashed the Beagle (trashed R14 in module code) while developing a module on the Beagle, yet the blinky light via BeagLEDs kept on blinking – so TickerV was still active even if nothing else seemed to be! I’m guessing OS_CallAfter/OS_CallEvery will likewise be different to OS_AddCallBack in behaviour, guessing that CallAfter/Every will use TickerV when Callback will use an aspect of RISC OS’s operation?

Feb 24, 2014 9:04pm Rick Murray (539) 13857 posts	I should probably throw together a small module to fire off callbacks to make a beep every ten seconds, and then try various ways of (ab)using the system to see how it behaves. But not tonight. Time to brew some linguini carbonara, mmmmmmmm! :-)

Feb 24, 2014 9:09pm Jeffrey Lee (213) 6048 posts	Is it possible to tell if RISC OS has crashed? Depends on the level of crash you’re interested in dealing with. OS still responding to interrupts, but foreground task sat in a loop being unresponsive? Foreground task running, but interrupts not working? Infinite abort loop? Everything completely ground to a halt? Interrupts working and Wimp running, but something bad has happened and you can’t interact with the machine? (e.g. USB drivers killed, task manager killed, etc.) Some of those you can detect in software and try to recover from, others (if you can find suitable places to insert a heartbeat signal) are best dealt with by a hardware watchdog timer that will reset the machine when it all goes tits up. What comes to mind is to set up a callback and see if the callback fires. Would this work? It’ll detect if callbacks are working. But there are many ways of crashing a system such that callbacks continue to function :) What are the requirements for RISC OS to process outstanding callbacks – specifically, what does “when RISC OS is threaded but idle” actually mean? Off the top of my head, callbacks will fire in two situations: On return from a SWI call, if (a) the SWI didn’t generate an error, and (b) the OS is returning to user mode, and (c) the SVC stack is empty. I.e. when the OS is returning to the foreground task On return from an interrupt, if (a) the OS isn’t already processing callbacks, and (b) the OS is returning to user mode, and (c) the SVC stack is empty. Is there a difference between AddCallBack and CallEvery in this respect? CallAfter/CallEvery are tied directly into the 100Hz ticker interrupt. So as long as the timer is running and the OS isn’t completely trashed, your code will get called, including in situations where the CPU never returns to user mode (or doesn’t return for a long time). Callbacks, on the other hand, rely on the OS returning to user mode, and so are probably a better method of determining whether the system has crashed (since most interesting software you’d want to use runs in user mode). Of course if the system has crashed you won’t receive the callback, so you’ll need to use some kind of watchdog to check for an absence of callbacks and go “yep, the system’s crashed”. E.g. a CallEvery, a HAL timer configured to generate FIQs (which is what HangWatch uses), or a proper hardware watchdog timer.

Feb 24, 2014 9:16pm Jeffrey Lee (213) 6048 posts	I should probably throw together a small module to fire off callbacks to make a beep every ten seconds, and then try various ways of (ab)using the system to see how it behaves. Sprow recently prompted me to uncover this beauty: 10 SYS "OS_ChangeEnvironment",6,511<<20 20 !0=0 RUN

Feb 24, 2014 9:39pm Rick Murray (539) 13857 posts	or a proper hardware watchdog timer. Exactly this, the Pi’s BCM2-something-something-something has a hardware watchdog that expires in about 16 seconds, so prodding it once per second should be plenty. Just, you know, would like to try to be able to detect when the machine has actually crashed, as opposed to “doing something tedious” (like printing). That said, it is more aimed at unattended uses, where a crash may occur without a user nearby to poke the reset button. Note to observant readers: Jeffrey quotes a number of things not present in my message. I posted, read chunks of the PRMs on program environment, then updated the original several times. Oh, for a preview facility in Beast! Specifically: CallAfter/CallEvery are tied directly into the 100Hz ticker interrupt. I worked this one out for myself. But it was interesting to read all the same – what worries me is printing. I recall that printing on my A5000 took forever. Thankfully, the printing mechanism is similar to Wimp redraws, so there should be hope for getting callbacks during.

Reply

To post replies, please first log in.

Forums → Community Support →

Search forums

Social

Follow us on

and

ROOL Store

Buy RISC OS Open merchandise here, including SD cards for Raspberry Pi and more.

Donate! Why?

Help ROOL make things happen – please consider donating!

RISC OS IPR

RISC OS is an Open Source operating system owned by RISC OS Developments Ltd and licensed primarily under the Apache 2.0 license.

Description

Community-provided support for all users of RISC OS.

Voices

Options

Forums
Login

Contact Us | About Us

The RISC OS Open Beast theme is based on Beast's default layout
Site design © RISC OS Open Limited 2024 except where indicated

Hosted by Arachsys

Powered by Beast © 2006 Josh Goebel and Rick Olson
This site runs on Rails