RISC OS Open: Forum: Machine hang in USR mode C

Jun 7, 2018 8:07pm

Jeffrey Lee (213) 6048 posts

Pretty much game over, although exactly how that hangs the box in this specific case is not yet clear.

The OS calls the (data/prefetch/instruction) abort environment handler in a privileged CPU mode. The stub code which gets called has been overwritten with &0 (essentially NOPs), so execution continues until it falls off the end of the application slot (or hits some non-zeroed instruction which causes some other type of abort). Repeat ad infinitum.

Can someone explain why those handlers couldn’t be in a DA?

I can’t say why they can’t, but I can say why they shouldn’t: Logical address space exhaustion.

Rather than simply moving the code elsewhere and hoping that it won’t get overwritten by a different stray pointer, a better proposition would be to protect the code by making the page read-only. Better yet, make all of the code pages for the program read-only (and make the data pages non-executable).

Jun 7, 2018 8:43pm

Jeffrey Lee (213) 6048 posts

Anyway, whilst experimenting, I’ve discovered that a simple busy loop, executed within a TaskWindow, does not multitask?!

Patching a SWI OS_NewLine into the compiled program just before the wait loop fixes the problem, which suggests that the start of the new application has left TaskWindow in a funny state.

I’ve seen this behaviour for a long time – programs which run within task windows typically only start multitasking (and responding to Escape!) once the first character is output.

Well it does in RO4, using GCC.

UnixLib or SCL? On RISC OS 5, UnixLib does mulittask (SCL doesn’t), and neither respond to Escape.

I’ll continue investigating.

Jun 7, 2018 8:54pm

nemo (145) 2556 posts

-MSTUBS – so SCL

But as Adrian said, &8000 B &8000 works fine. So my guess is it’s nowt to do with TW and something horrible to do with optimisation of the SCL runtime leaving interrupts off or being in the wrong mode.

But that’s just which end of the string I’d start pulling.

address space exhaustion

Oh do get a grip. Even if it were one page per CLib client (and it wouldn’t be) that’s not going to… come on now.

Rather than simply moving the code elsewhere and hoping that it won’t get overwritten by a different stray pointer

It’s not a ‘stray pointer’ (as in completely random and terribly bad luck) it’s literally next to writeable memory. That isn’t going to happen somewhere else.

I don’t want to look into SharedCLibrary, I really don’t, but I presume we are talking about these four stubs in particular:

trapHandler
uncaughtTrapHandler
eventHandler
unhandledEventHandler

Jun 7, 2018 9:08pm

Jeffrey Lee (213) 6048 posts

So my guess is it’s nowt to do with TW and something horrible to do with optimisation of the SCL runtime leaving interrupts off or being in the wrong mode.

Nope – it’s definitely in user mode, IRQs enabled.

Jun 7, 2018 9:12pm

nemo (145) 2556 posts

Uggh. Yet B &8000… rather you than me.

Jun 7, 2018 9:28pm

Jeffrey Lee (213) 6048 posts

address space exhaustion

Oh do get a grip. Even if it were one page per CLib client (and it wouldn’t be) that’s not going to… come on now.

There’s this thing called memory fragmentation, maybe you’ve heard of it? UnixLib likes to use dynamic areas for its heap, which has caused problems in the past (prior to the introduction of PMPs the ARMX6 would have only had a few hundred MB of space available for program-created DAs, thanks to the 2GB free pool mapping). Admittedly I’m not sure how much of that Otter problem was down to lack of free space vs. lack of free contiguous space. But it doesn’t change the fact that if CLib was to start dynamically creating DAs then our chances of troublesome levels of fragmentation would increase.

The stubs are only accessed when the task is paged in. To place the stubs outside of application space would be inappropriate.

Jun 7, 2018 10:00pm

nemo (145) 2556 posts

Yeah, except I didn’t say put the heap in a DA, I said put those four stubs in a DA.

Jun 7, 2018 10:22pm

Jeffrey Lee (213) 6048 posts

I know.

Jun 7, 2018 11:21pm

nemo (145) 2556 posts

So the vulnerability seems to be that the addresses in the language descriptor are (typically) of those stubs, so are indirected through the application slot. Whereas CLib could have seen that they are the standard addresses and avoided that indirection.

It’s not possible to stop the application following pointers it has corrupted, but it is possible to stop CLib from relying on pointers in the appslot.

This is analogous to the long-standing MessageTrans misdesign, that had it relying on the integrity of a linked list of blocks of memory belonging to its clients. The fix is similar – copy the important things away from where the client might trample them.

Language descriptors are very small. Fears of address space exhaustion are frankly incredible.

Jun 8, 2018 5:36am

Adrian Lees (1349) 122 posts

Re the hang, my guess would be that TaskWindow notices calls to OS_ChangeEnvironment and that causes the behaviour change? In all cases note that I have been invoking my test programs from the CLI within a TaskWindow, so text output has already occurred, just not from my application. I confirm that just introducing ‘swix(OSWriteI+10,0)’ does restore multitasking and Escape functionality, and I vaguely recall having observed this myself once a number of years ago.

For clarity, it’s the ‘_stub_XYHandlerInData’ registered by ‘_kernel_init’ (‘risc_oslib.kernel.s.k_body’) at which I have been looking, rather than the C language-specific support code. I can’t see a way to reorder the C$$data and Stub$$Data areas using the linker as a quick-and-dirty workaround (-first and -last are insufficiently flexible).

However…If my reasoning is correct, those stubs don’t need to exist anyway, since they assume the validity of R13, and can thus vacate enough registers to discover the location of the static data simply by calling OS_ChangeEnvironment to pick up a pointer to a buffer that has been registered with one of the other handlers (s.k_body/InstallHandlers wants pretty much everything!). Then use that to locate the static base?

Or the OS_ChangeEnvironment API could be modified to declare R2/R3 as stored and returned for the low-level exception handlers. I note that the kernel code (‘AdjustOurSet’) already does this, and is just unable to supply either value in a register to the handler code. (Perhaps this approach could be problematic if something tries to provide an alternative implementation; TaskWindow or Wimp perhaps?)

Jun 8, 2018 7:19am

Jon Abbott (1421) 2651 posts

If USER code is allowed to install privileged handlers, is a fix even possible? Would a better solution not be getting CTRL-BREAK to work when in an Abort loop, so a locked machine can be recovered or put a check in the OS exception handlers to prevent abort loops.

Edit: CTRL-BREAK should read ALT-BREAK

Jun 8, 2018 10:57am

Jeffrey Lee (213) 6048 posts

Re the hang, my guess would be that TaskWindow notices calls to OS_ChangeEnvironment and that causes the behaviour change?

There is definitely some logic in there which tries to keep track of the environment handlers, but I haven’t pulled it apart yet to see where it’s going wrong.

For clarity, it’s the ‘_stub_XYHandlerInData’ registered by ‘_kernel_init’ (‘risc_oslib.kernel.s.k_body’) at which I have been looking

Yep, those are the ones.

However…If my reasoning is correct, those stubs don’t need to exist anyway, since they assume the validity of R13, and can thus vacate enough registers to discover the location of the static data simply by calling OS_ChangeEnvironment to pick up a pointer to a buffer that has been registered with one of the other handlers (s.k_body/InstallHandlers wants pretty much everything!).

It’s dangerous for abort handlers to call SWIs, at least until they’ve determined that it’s safe to do so (e.g. the abort could have been due to SVC stack overflow).

(Sigh – CLib calls SWI FPEmulator_Abort from within a data abort handler. Good luck recovering from SVC stack overflows if a C app is active.)

However, if CLib is able to detect when its abort/environment handlers are being swapped out, then it could just use a small patch of global memory to store the details of the active client (since currently, there can only be one application client “active” at a time)

Would a better solution not be getting CTRL-BREAK to work when in an Abort loop,

Possibly the kernel could re-enable IRQs if the abort occurred with IRQs enabled – but I’m not sure we’d gain much, considering Ctrl-Break is just a quicker way for someone to reset the machine than go reaching for the power button/reset switch.

Jun 8, 2018 11:38am

Adrian Lees (1349) 122 posts

It’s dangerous for abort handlers to call SWIs, at least until they’ve determined that it’s safe to do so (e.g. the abort could have been due to SVC stack overflow).

Fair point. I was under the false impression that those handlers were at a higher level with SVC stack already flattened. And to correct a couple of other errors on my part: (i) they are not called in SVC mode now (I consulted only PRM1), rather the appropriate privileged mode, and (ii) there are no stored R2/R3 values for the ‘hardware vector’ handlers at present; I skim-read that code too quickly.

The abort handler in kernel/clib actually stores to R13_svc even before executing its first SWI. I think I’d argue an overflowed SVC stack cannot and should not be for clib, or any environment handler, to worry about. If it happens, there’s no way that it can be resolved without calling a SWI (no go) or putting yet more low-level hackery into clib (et al) which has already strayed a long way from being user-level application support code. A job for the kernel abort handling code, instead?

Jon: My concerns are two-fold really; not just system stability, but also helping coders track down their programming errors during development. It’s hard enough writing code on RISC OS without machine hangs flummoxing and demoralising application writers. Abort handling and reporting really needs to be the most robust and least vulnerable code in the system, IMHO. (Yes, I’m looking at you, DDT!)

Jun 8, 2018 12:04pm

Jon Abbott (1421) 2651 posts

Possibly the kernel could re-enable IRQs if the abort occurred with IRQs enabled – but I’m not sure we’d gain much, considering Ctrl-Break is just a quicker way for someone to reset the machine than go reaching for the power button/reset switch.

What’s the key for breaking into an app? Could have sworn it was CTRL-BREAK…I don’t have a machine to hand to try.

My concerns are two-fold really; not just system stability, but also helping coders track down their programming errors during development. It’s hard enough writing code on RISC OS without machine hangs flummoxing and demoralising application writers.

Tell me about it, I’ve spent all week debugging code trying to figure out why the machine is locking solid. I must have rebooted my pi-top several hundred times trying random code changes as it’s impossible to diagnose a fault when the machine stiffs.

What’s exacerbating the issue further is the screen is blanking, so the debug info I’m writing to screen is also useless. All I’ve figured out in a week of debugging is that sound IRQ’s are involved – I’m no closed to figuring out how or why that’s blanking the screen and stiffing the machine!

Jun 8, 2018 12:26pm

Steffen Huber (91) 1953 posts

What’s the key for breaking into an app?

Alt+Break.

Jun 8, 2018 12:40pm

Rick Murray (539) 13850 posts

as it’s impossible to diagnose a fault when the machine stiffs.

I don’t know if it works as it is years old, but I made a hack of DADebug that would periodically spit info to the HAL debug serial port. That might help, if interrupts (and the tickers) are still working.

Jun 8, 2018 12:47pm

Jeffrey Lee (213) 6048 posts

Fair point. I was under the false impression that those handlers were at a higher level with SVC stack already flattened.

My memory of them was a bit fuzzy too (it doesn’t help that the PRM doesn’t really give a good explanation of them, and likewise our wiki).

The precedence is:

Processor vector (should just branch to the next stage)
OS_ClaimProcessorVector (FPEmulator, VFPSupport, etc. sit here)
Kernel pre-veneer (prefetch & data abort handlers for lazy task swapping & ARMv7 cache maintenance aborts)
Environment handler
Error handlers (Service_Error, error environment handler, etc.)

The first four occur in the corresponding abort mode. The default environment handler in the kernel is responsible for setting the OS back to a safe state (flattening SVC & IRQ stacks) and then raising the default abort error message. I haven’t looked yet to see why exactly CLib wants to get in on the action (does it allow aborts to be recovered from, or is it just so it can customise the error handling?), but it does seem to be a bit of a flaw in the OS that the environment handler has to take on so much responsibility.

What’s the key for breaking into an app?

Alt-break. Which I think relies on callbacks to function, so unless we can kick the abort handling down to user mode, isn’t likely to be very helpful.

Jun 8, 2018 1:49pm

Jon Abbott (1421) 2651 posts

2. Kernel pre-veneer (deals with lazy task swapping)
3. OS_ClaimProcessorVector (FPEmulator, VFPSupport, etc. sit here)

That’s interesting, I’ve had to add explicit code to handle Aborts raised by PMP’d pages at 3, it doesn’t look like they should be reaching my code if they’re handled at 2.

Alt-break. Which I think relies on callbacks

No wonder it’s next to useless, I implemented a handler for CTRL-SHIFT-F12 in the IRQ handler so apps could be terminated…doesn’t help when the Abort handler gets in a loop with IRQ disabled though!!

Jun 8, 2018 6:05pm

Rick Murray (539) 13850 posts

You might think that OS_SetColour sets the colour. It does, but it can also Get the colour. Surprise! When it is reading the colour (flags b7) the other flags magically change meaning.

Just looking at this, nothing better to do. :-)

I think grumpy cat might have missed a few things?

The official documentation (PRM), the one I have, says nothing about reading colours (and specifies bits 6-31 must be zero).
With this in mind, that is does potentially weird stuff when reading is the fault of the person using a Set SWI to read. That it can is not relevant, it is not documented as being able to. For that you need ReadVduVariables.

Oh, and the documentation I’ve seen (wiki here, StrongHelp) that does mention the ability to read notes: When reading the colours, text colours are returned as a colour number in R1, but you must supply a pattern block to read the graphics colours.

Yeah… VduVars is a lot simpler…

Jun 9, 2018 8:55am

nemo (145) 2556 posts

I think grumpy cat might have missed a few things?

Unlikely.

The official documentation

You keep using that word. I do not think it means what you think it means.

it is not documented as being able to … the documentation I’ve seen

No further questions m’lud.

Jun 9, 2018 9:24am

Rick Murray (539) 13850 posts

You keep using that word. I do not think it means what you think it means

So clearly you want me to think that the PRMs don’t count as documentation; as I’m pretty certain that Acorn called it the Programmer’s REFERENCE for a reason.

No further questions m’lud.

So… There’s no “documentation” to be relied upon and the “documentation” available is wrong? Maybe this is why there aren’t many developers?

Jun 9, 2018 9:28am

Rick Murray (539) 13850 posts

BTW, you still have not replied to the noted comments that the description of SetColour makes no mention of reading (so one shouldn’t be using this call to read) and the third party descriptions say that if you do read the graphics colour you need to specify a memory block …. so, no, the SWI only writes to random locations of you use it incorrectly and don’t pay attention to what others have written about it…

Jun 9, 2018 9:51am

Colin Ferris (399) 1818 posts

To Jon
Could this be added to the RO download?

CTRL-SHIFT-F12 in the IRQ handler – to quit errant Apps.

Jun 9, 2018 10:47am

nemo (145) 2556 posts

So clearly you want me to think that the PRMs don’t count as documentation

Of course they do… right up until there’s a change. Then it becomes out of date documentation.

the “documentation” available is wrong?

Have I not mentioned this on an almost weekly basis? :-)

you still have not replied to the noted comments that the description of SetColour makes no mention of reading

Oh sorry, missed that. It’s certainly an unfortunate SWI name, but that’s because it was introduced in RISC OS 3, and could only set the colour. The ability to read the colour was added in 3.50 – so it’s unfortunate, but when things change, the existing documentation becomes out of date.

As I demonstrated with OS_ReadLine, did I not?

one shouldn’t be using this call to read

Sorry, you’ve misunderstood. The documentation you are looking at is out of date. That’s what happens to stuff printed on paper.

Jun 9, 2018 12:02pm

Steve Pampling (1551) 8172 posts

The documentation you are looking at is out of date. That’s what happens to stuff ~~printed on paper~~.

Fixed that for you.

It’s been happening to us too, although I’ve noted at work that newer often fails to be better. Apparently working brains are an optional extra these days.

Note: If it’s written correctly it is fully in date forever for the stated use. Acorn didn’t define that so it’s use in a modern framework may be at odds with the actuality.

Machine hang in USR mode C

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Jun 7, 2018 8:07pm Jeffrey Lee (213) 6048 posts	Pretty much game over, although exactly how that hangs the box in this specific case is not yet clear. The OS calls the (data/prefetch/instruction) abort environment handler in a privileged CPU mode. The stub code which gets called has been overwritten with &0 (essentially NOPs), so execution continues until it falls off the end of the application slot (or hits some non-zeroed instruction which causes some other type of abort). Repeat ad infinitum. Can someone explain why those handlers couldn’t be in a DA? I can’t say why they can’t, but I can say why they shouldn’t: Logical address space exhaustion. Rather than simply moving the code elsewhere and hoping that it won’t get overwritten by a different stray pointer, a better proposition would be to protect the code by making the page read-only. Better yet, make all of the code pages for the program read-only (and make the data pages non-executable).

Jun 7, 2018 8:43pm Jeffrey Lee (213) 6048 posts	Anyway, whilst experimenting, I’ve discovered that a simple busy loop, executed within a TaskWindow, does not multitask?! Patching a SWI OS_NewLine into the compiled program just before the wait loop fixes the problem, which suggests that the start of the new application has left TaskWindow in a funny state. I’ve seen this behaviour for a long time – programs which run within task windows typically only start multitasking (and responding to Escape!) once the first character is output. Well it does in RO4, using GCC. UnixLib or SCL? On RISC OS 5, UnixLib does mulittask (SCL doesn’t), and neither respond to Escape. I’ll continue investigating.

Jun 7, 2018 8:54pm nemo (145) 2556 posts	-MSTUBS – so SCL But as Adrian said, `&8000 B &8000` works fine. So my guess is it’s nowt to do with TW and something horrible to do with optimisation of the SCL runtime leaving interrupts off or being in the wrong mode. But that’s just which end of the string I’d start pulling. address space exhaustion Oh do get a grip. Even if it were one page per CLib client (and it wouldn’t be) that’s not going to… come on now. Rather than simply moving the code elsewhere and hoping that it won’t get overwritten by a different stray pointer It’s not a ‘stray pointer’ (as in completely random and terribly bad luck) it’s literally next to writeable memory. That isn’t going to happen somewhere else. I don’t want to look into SharedCLibrary, I really don’t, but I presume we are talking about these four stubs in particular: `trapHandler uncaughtTrapHandler eventHandler unhandledEventHandler`

Jun 7, 2018 9:08pm Jeffrey Lee (213) 6048 posts	So my guess is it’s nowt to do with TW and something horrible to do with optimisation of the SCL runtime leaving interrupts off or being in the wrong mode. Nope – it’s definitely in user mode, IRQs enabled.

Jun 7, 2018 9:12pm nemo (145) 2556 posts	Uggh. Yet `B &8000`… rather you than me.

Jun 7, 2018 9:28pm Jeffrey Lee (213) 6048 posts	address space exhaustion Oh do get a grip. Even if it were one page per CLib client (and it wouldn’t be) that’s not going to… come on now. There’s this thing called memory fragmentation, maybe you’ve heard of it? UnixLib likes to use dynamic areas for its heap, which has caused problems in the past (prior to the introduction of PMPs the ARMX6 would have only had a few hundred MB of space available for program-created DAs, thanks to the 2GB free pool mapping). Admittedly I’m not sure how much of that Otter problem was down to lack of free space vs. lack of free contiguous space. But it doesn’t change the fact that if CLib was to start dynamically creating DAs then our chances of troublesome levels of fragmentation would increase. The stubs are only accessed when the task is paged in. To place the stubs outside of application space would be inappropriate.

Jun 7, 2018 10:00pm nemo (145) 2556 posts	Yeah, except I didn’t say put the heap in a DA, I said put those four stubs in a DA.

Jun 7, 2018 10:22pm Jeffrey Lee (213) 6048 posts	I know.

Jun 7, 2018 11:21pm nemo (145) 2556 posts	So the vulnerability seems to be that the addresses in the language descriptor are (typically) of those stubs, so are indirected through the application slot. Whereas CLib could have seen that they are the standard addresses and avoided that indirection. It’s not possible to stop the application following pointers it has corrupted, but it is possible to stop CLib from relying on pointers in the appslot. This is analogous to the long-standing MessageTrans misdesign, that had it relying on the integrity of a linked list of blocks of memory belonging to its clients. The fix is similar – copy the important things away from where the client might trample them. Language descriptors are very small. Fears of address space exhaustion are frankly incredible.

Jun 8, 2018 5:36am Adrian Lees (1349) 122 posts	Re the hang, my guess would be that TaskWindow notices calls to OS_ChangeEnvironment and that causes the behaviour change? In all cases note that I have been invoking my test programs from the CLI within a TaskWindow, so text output has already occurred, just not from my application. I confirm that just introducing ‘swix(OSWriteI+10,0)’ does restore multitasking and Escape functionality, and I vaguely recall having observed this myself once a number of years ago. For clarity, it’s the ‘_stub_XYHandlerInData’ registered by ‘_kernel_init’ (‘risc_oslib.kernel.s.k_body’) at which I have been looking, rather than the C language-specific support code. I can’t see a way to reorder the C$$data and Stub$$Data areas using the linker as a quick-and-dirty workaround (-first and -last are insufficiently flexible). However…If my reasoning is correct, those stubs don’t need to exist anyway, since they assume the validity of R13, and can thus vacate enough registers to discover the location of the static data simply by calling OS_ChangeEnvironment to pick up a pointer to a buffer that has been registered with one of the other handlers (s.k_body/InstallHandlers wants pretty much everything!). Then use that to locate the static base? Or the OS_ChangeEnvironment API could be modified to declare R2/R3 as stored and returned for the low-level exception handlers. I note that the kernel code (‘AdjustOurSet’) already does this, and is just unable to supply either value in a register to the handler code. (Perhaps this approach could be problematic if something tries to provide an alternative implementation; TaskWindow or Wimp perhaps?)

Jun 8, 2018 7:19am Jon Abbott (1421) 2651 posts	If USER code is allowed to install privileged handlers, is a fix even possible? Would a better solution not be getting CTRL-BREAK to work when in an Abort loop, so a locked machine can be recovered or put a check in the OS exception handlers to prevent abort loops. Edit: CTRL-BREAK should read ALT-BREAK

Jun 8, 2018 10:57am Jeffrey Lee (213) 6048 posts	Re the hang, my guess would be that TaskWindow notices calls to OS_ChangeEnvironment and that causes the behaviour change? There is definitely some logic in there which tries to keep track of the environment handlers, but I haven’t pulled it apart yet to see where it’s going wrong. For clarity, it’s the ‘_stub_XYHandlerInData’ registered by ‘_kernel_init’ (‘risc_oslib.kernel.s.k_body’) at which I have been looking Yep, those are the ones. However…If my reasoning is correct, those stubs don’t need to exist anyway, since they assume the validity of R13, and can thus vacate enough registers to discover the location of the static data simply by calling OS_ChangeEnvironment to pick up a pointer to a buffer that has been registered with one of the other handlers (s.k_body/InstallHandlers wants pretty much everything!). It’s dangerous for abort handlers to call SWIs, at least until they’ve determined that it’s safe to do so (e.g. the abort could have been due to SVC stack overflow). (Sigh – CLib calls SWI FPEmulator_Abort from within a data abort handler. Good luck recovering from SVC stack overflows if a C app is active.) However, if CLib is able to detect when its abort/environment handlers are being swapped out, then it could just use a small patch of global memory to store the details of the active client (since currently, there can only be one application client “active” at a time) Would a better solution not be getting CTRL-BREAK to work when in an Abort loop, Possibly the kernel could re-enable IRQs if the abort occurred with IRQs enabled – but I’m not sure we’d gain much, considering Ctrl-Break is just a quicker way for someone to reset the machine than go reaching for the power button/reset switch.

Jun 8, 2018 11:38am Adrian Lees (1349) 122 posts	It’s dangerous for abort handlers to call SWIs, at least until they’ve determined that it’s safe to do so (e.g. the abort could have been due to SVC stack overflow). Fair point. I was under the false impression that those handlers were at a higher level with SVC stack already flattened. And to correct a couple of other errors on my part: (i) they are not called in SVC mode now (I consulted only PRM1), rather the appropriate privileged mode, and (ii) there are no stored R2/R3 values for the ‘hardware vector’ handlers at present; I skim-read that code too quickly. The abort handler in kernel/clib actually stores to R13_svc even before executing its first SWI. I think I’d argue an overflowed SVC stack cannot and should not be for clib, or any environment handler, to worry about. If it happens, there’s no way that it can be resolved without calling a SWI (no go) or putting yet more low-level hackery into clib (et al) which has already strayed a long way from being user-level application support code. A job for the kernel abort handling code, instead? Jon: My concerns are two-fold really; not just system stability, but also helping coders track down their programming errors during development. It’s hard enough writing code on RISC OS without machine hangs flummoxing and demoralising application writers. Abort handling and reporting really needs to be the most robust and least vulnerable code in the system, IMHO. (Yes, I’m looking at you, DDT!)

Jun 8, 2018 12:04pm Jon Abbott (1421) 2651 posts	Possibly the kernel could re-enable IRQs if the abort occurred with IRQs enabled – but I’m not sure we’d gain much, considering Ctrl-Break is just a quicker way for someone to reset the machine than go reaching for the power button/reset switch. What’s the key for breaking into an app? Could have sworn it was CTRL-BREAK…I don’t have a machine to hand to try. My concerns are two-fold really; not just system stability, but also helping coders track down their programming errors during development. It’s hard enough writing code on RISC OS without machine hangs flummoxing and demoralising application writers. Tell me about it, I’ve spent all week debugging code trying to figure out why the machine is locking solid. I must have rebooted my pi-top several hundred times trying random code changes as it’s impossible to diagnose a fault when the machine stiffs. What’s exacerbating the issue further is the screen is blanking, so the debug info I’m writing to screen is also useless. All I’ve figured out in a week of debugging is that sound IRQ’s are involved – I’m no closed to figuring out how or why that’s blanking the screen and stiffing the machine!

Jun 8, 2018 12:26pm Steffen Huber (91) 1953 posts	What’s the key for breaking into an app? Alt+Break.

Jun 8, 2018 12:40pm Rick Murray (539) 13850 posts	as it’s impossible to diagnose a fault when the machine stiffs. I don’t know if it works as it is years old, but I made a hack of DADebug that would periodically spit info to the HAL debug serial port. That might help, if interrupts (and the tickers) are still working.

Jun 8, 2018 12:47pm Jeffrey Lee (213) 6048 posts	Fair point. I was under the false impression that those handlers were at a higher level with SVC stack already flattened. My memory of them was a bit fuzzy too (it doesn’t help that the PRM doesn’t really give a good explanation of them, and likewise our wiki). The precedence is: Processor vector (should just branch to the next stage) OS_ClaimProcessorVector (FPEmulator, VFPSupport, etc. sit here) Kernel pre-veneer (prefetch & data abort handlers for lazy task swapping & ARMv7 cache maintenance aborts) Environment handler Error handlers (Service_Error, error environment handler, etc.) The first four occur in the corresponding abort mode. The default environment handler in the kernel is responsible for setting the OS back to a safe state (flattening SVC & IRQ stacks) and then raising the default abort error message. I haven’t looked yet to see why exactly CLib wants to get in on the action (does it allow aborts to be recovered from, or is it just so it can customise the error handling?), but it does seem to be a bit of a flaw in the OS that the environment handler has to take on so much responsibility. What’s the key for breaking into an app? Alt-break. Which I think relies on callbacks to function, so unless we can kick the abort handling down to user mode, isn’t likely to be very helpful.

Jun 8, 2018 1:49pm Jon Abbott (1421) 2651 posts	2. Kernel pre-veneer (deals with lazy task swapping) 3. OS_ClaimProcessorVector (FPEmulator, VFPSupport, etc. sit here) That’s interesting, I’ve had to add explicit code to handle Aborts raised by PMP’d pages at 3, it doesn’t look like they should be reaching my code if they’re handled at 2. Alt-break. Which I think relies on callbacks No wonder it’s next to useless, I implemented a handler for CTRL-SHIFT-F12 in the IRQ handler so apps could be terminated…doesn’t help when the Abort handler gets in a loop with IRQ disabled though!!

Jun 8, 2018 6:05pm Rick Murray (539) 13850 posts	You might think that OS_SetColour sets the colour. It does, but it can also Get the colour. Surprise! When it is reading the colour (flags b7) the other flags magically change meaning. Just looking at this, nothing better to do. :-) I think grumpy cat might have missed a few things? The official documentation (PRM), the one I have, says nothing about reading colours (and specifies bits 6-31 must be zero). With this in mind, that is does potentially weird stuff when reading is the fault of the person using a Set SWI to read. That it can is not relevant, it is not documented as being able to. For that you need ReadVduVariables. Oh, and the documentation I’ve seen (wiki here, StrongHelp) that does mention the ability to read notes: When reading the colours, text colours are returned as a colour number in R1, but you must supply a pattern block to read the graphics colours. Yeah… VduVars is a lot simpler…

Jun 9, 2018 8:55am nemo (145) 2556 posts	I think grumpy cat might have missed a few things? Unlikely. The official documentation You keep using that word. I do not think it means what you think it means. it is not documented as being able to … the documentation I’ve seen No further questions m’lud.

Jun 9, 2018 9:24am Rick Murray (539) 13850 posts	You keep using that word. I do not think it means what you think it means So clearly you want me to think that the PRMs don’t count as documentation; as I’m pretty certain that Acorn called it the Programmer’s REFERENCE for a reason. No further questions m’lud. So… There’s no “documentation” to be relied upon and the “documentation” available is wrong? Maybe this is why there aren’t many developers?

Jun 9, 2018 9:28am Rick Murray (539) 13850 posts	BTW, you still have not replied to the noted comments that the description of SetColour makes no mention of reading (so one shouldn’t be using this call to read) and the third party descriptions say that if you do read the graphics colour you need to specify a memory block …. so, no, the SWI only writes to random locations of you use it incorrectly and don’t pay attention to what others have written about it…

Jun 9, 2018 9:51am Colin Ferris (399) 1818 posts	To Jon Could this be added to the RO download? CTRL-SHIFT-F12 in the IRQ handler – to quit errant Apps.

Jun 9, 2018 10:47am nemo (145) 2556 posts	So clearly you want me to think that the PRMs don’t count as documentation Of course they do… right up until there’s a change. Then it becomes out of date documentation. the “documentation” available is wrong? Have I not mentioned this on an almost weekly basis? :-) you still have not replied to the noted comments that the description of SetColour makes no mention of reading Oh sorry, missed that. It’s certainly an unfortunate SWI name, but that’s because it was introduced in RISC OS 3, and could only set the colour. The ability to read the colour was added in 3.50 – so it’s unfortunate, but when things change, the existing documentation becomes out of date. As I demonstrated with OS_ReadLine, did I not? one shouldn’t be using this call to read Sorry, you’ve misunderstood. The documentation you are looking at is out of date. That’s what happens to stuff printed on paper.

Jun 9, 2018 12:02pm Steve Pampling (1551) 8172 posts	The documentation you are looking at is out of date. That’s what happens to stuff ~~printed on paper~~. Fixed that for you. It’s been happening to us too, although I’ve noted at work that newer often fails to be better. Apparently working brains are an optional extra these days. Note: If it’s written correctly it is fully in date forever for the stated use. Acorn didn’t define that so it’s use in a modern framework may be at odds with the actuality.