Forums → Bugs →

Machine hang in USR mode C

69 posts, 9 voices

Pages: 1 2 3

Jun 9, 2018 12:45pm Jon Abbott (1421) 2651 posts	2. Kernel pre-veneer (deals with lazy task swapping) 3. OS_ClaimProcessorVector (FPEmulator, VFPSupport, etc. sit here) That’s interesting, I’ve had to add explicit code to handle Aborts raised by PMP’d pages at 3, it doesn’t look like they should be reaching my code if they’re handled at 2. I’ve just tried turning off the code that touches every page to prevent PMP aborts and then start seeing aborts I shouldn’t see. Are you sure PMP aborts are handled before OS_ClaimProcessorVector? Where’s the source so I can see what it’s doing when an abort is raised and how it retries the instruction? Could this be added to the RO download? CTRL-SHIFT-F12 in the IRQ handler – to quit errant Apps. Not easily, its for shutting down tasks running under the ADFFS JIT. Its something that could easily be implemented as ALT-BREAK within the OS however, its not exactly rocket science as the OS already knows about every entry point into appspace (voices, vectors, environment handlers etc) and just needs to kill them before collapsing the stacks and doing an OS_Exit.

Jun 9, 2018 12:58pm Rick Murray (539) 13850 posts	Then it becomes out of date documentation. Okay, it’s a fair point. So I looked in the 5a PRM, and it says: For graphics colours you must supply a pattern block, whereas text colours are returned in R1 as colour numbers. The point being that it shouldn’t be writing to unknown memory locations. Something documented somewhere does make mention… Have I not mentioned this on an almost weekly basis? :-) Yup. That plus random insanity. Thank you for breaking your brain on this stuff. Saves us from wandering too far into the abyss.

Jun 9, 2018 1:03pm nemo (145) 2552 posts	the OS already knows about every entry point into appspace (voices, vectors, environment handlers etc) Indeed. It is unfortunate that all that has not been pulled together into a controlling API (a kind of all-flavours DelinkApplication) coordinated through a Service Call (or something more modern). Because although you say ‘every entry point’ that’s certain to not be the case. Have a controlling protocol though, and any other systems the Kernel is unaware of could be tied in too.

Jun 9, 2018 4:13pm Jeffrey Lee (213) 6048 posts	2. Kernel pre-veneer (deals with lazy task swapping) 3. OS_ClaimProcessorVector (FPEmulator, VFPSupport, etc. sit here) That’s interesting, I’ve had to add explicit code to handle Aborts raised by PMP’d pages at 3, it doesn’t look like they should be reaching my code if they’re handled at 2. I’ve just tried turning off the code that touches every page to prevent PMP aborts and then start seeing aborts I shouldn’t see. Are you sure PMP aborts are handled before OS_ClaimProcessorVector? Looks like I got it wrong; the kernel pre-veneers occur after OS_ClaimProcessorVector. I’ll correct the post.

Jun 10, 2018 7:58pm Jeffrey Lee (213) 6048 posts	Anyway, whilst experimenting, I’ve discovered that a simple busy loop, executed within a TaskWindow, does not multitask?! Patching a SWI OS_NewLine into the compiled program just before the wait loop fixes the problem, which suggests that the start of the new application has left TaskWindow in a funny state. Actually, it looks like it’s the kernel that’s been left in the funny state. It looks like the last SWI that’s executed by the C runtime startup is returning an error, causing the callback postpone flag to be set. And since all that main() does is sit in a loop, no other SWIs are executed to trigger that postpone flag to be cleared, blocking TaskWindow’s task switch callback (and all other callbacks) from occurring.

Jun 10, 2018 11:10pm Jeffrey Lee (213) 6048 posts	I’ve now submitted a change to CLib to try and prevent callbacks being blocked – but it’s not a great solution for a number of reasons (e.g. errors returned by _swix will still block them, since that doesn’t use CopyError). But it is at least something that can be softloaded on older machines. For a proper fix, I’m thinking of going straight to the source. Either by adding a safety timer to the postponement flag (so callbacks can only be postponed for so many centiseconds), or changing the kernel to make a copy of the error block in some kernel workspace. However that second option would also require the contents of that error block to be preserved by anything which performs task swapping, which hits the “we don’t have a proper DelinkApplication call” issue that nemo mentions.

Jun 11, 2018 6:44am Jon Abbott (1421) 2651 posts	I’ve now submitted a change to CLib to try and prevent callbacks being blocked If I understand correctly, an SWI X that returns with V set, blocks all further callbacks until another SWI is successful? Are they blocked system wide, or just for the current task? If the reasoning behind that is to prevent an error block being overwritten, why not allocate a new error block from the RMA, track the RMA allocations and remove them on the next SWI success, without blocking callbacks? …or is there more to it? Is this behaviour documented somewhere, there’s no mention of it in the CallBack Handler documentation.

Jun 11, 2018 9:15am nemo (145) 2552 posts	an error, causing the callback postpone flag to be set Either by adding a safety timer to the postponement flag s/stop thing from exploding/stop thing from exploding for a short time/ or changing the kernel to make a copy of the error block in some kernel workspace That’s the one. The perceived problem is that the error block that will be returned to the caller after the callbacks could be corrupted by the callbacks. I’m not terribly impressed by the could there – copying all error blocks because of theoretical error buffer reuse is a fix in the wrong place, but it is a fix that will work. However that second option would also require the contents of that error block to be preserved by anything which performs task swapping There are a number of scenarios: R0 is in ROM – do nothing. R0 is in appslot – do nothing. SWI is XOS_GenerateError – do nothing. No pending callbacks – do nothing. Otherwise, copy the error into a cyclic array of error buffers, and update R0. As for ‘task swapping’, that was already the case – error buffers that can be reused during callbacks can definitely be reused by another task.

Jun 11, 2018 10:01am Rick Murray (539) 13850 posts	It looks like the last SWI that’s executed by the C runtime startup is returning an error, Just out of interest… What SWI is failing?

Jun 11, 2018 1:12pm Jeffrey Lee (213) 6048 posts	If I understand correctly, an SWI X that returns with V set, blocks all further callbacks until another SWI is successful? Are they blocked system wide, or just for the current task? It’s a bit more complicated than that. The ground rules: Callbacks can be triggered in the following scenarios: A SWI returns to USR mode with interrupts enabled An interrupt returns to USR mode with interrupts enabled Just before the error environment handler is invoked Transient callbacks will occur whenever those criteria are met, non-transient callbacks have the extra requirement that the SVC stack must be empty If an X SWI returns an error to user mode with interrupts enabled, and the error pointer isn’t in ROM, and callbacks aren’t already postponed, and there isn’t a pending callback, the postpone flag will be set If the postpone flag is set: The next SWI which returns to user mode with interrupts enabled (and without returning an error) will clear the flag, and any postponed callbacks will be triggered Callback generation by interrupts will be postponed without affecting the flag The next invocation of the error handler will clear the flag, and any postponed callbacks will be triggered So to clear the flag you either need to call a SWI (from USR mode with interrupts enabled) which doesn’t return an error, or call a non-X SWI which will cause an error to be generated. However, as outlined in the CallbackChange document the behaviour in earlier OS versions has been different. The flag isn’t explicitly cleared when swapping task, but since swapping tasks usually involves calling a number of SWIs (most/all of which should succeed) I think it’s safe to say that the flag should be cleared as a side-effect. But maybe it’s worth investigating just to make sure there aren’t any edge cases. If the reasoning behind that is to prevent an error block being overwritten, why not allocate a new error block from the RMA, track the RMA allocations and remove them on the next SWI success, without blocking callbacks? …or is there more to it? I don’t think there’s any need to use an RMA block (compared to the kernel just having a static buffer somewhere), although doing so might help reduce the amount of data that needs copying during task switches. Is this behaviour documented somewhere, there’s no mention of it in the CallBack Handler documentation. Judging by the comments in the CallbackChange document, no it wasn’t/isn’t documented anywhere. The perceived problem is that the error block that will be returned to the caller after the callbacks could be corrupted by the callbacks. I’m not terribly impressed by the could there MessageTrans is responsible for handing out most of the error blocks that programs use. It has one pool of error buffers which are used by interrupt handlers, and another pool which is used by foreground tasks. But as far as it’s concerned a callback is a foreground task, so any callbacks that occur will run the risk of overwriting errors which the foreground program hasn’t had a chance to examine yet. Things could be tweaked so that the system is more reliable (e.g. it should be possible for the kernel to expose a flag for whether you’re in a transient callback or not), potentially that would be enough to produce a reliable solution to the problem. Except, all bets are off for anything that dynamically generates errors but doesn’t use MessageTrans to manage the blocks. Just out of interest… What SWI is failing? OS_Args 2, because it was getting called with a file handle of 0.

Jun 11, 2018 4:24pm Jon Abbott (1421) 2651 posts	Could you not allocate a page of memory that’s way above appspace, but is swapped along with it, to hold that tasks error block?

Jun 11, 2018 5:24pm nemo (145) 2552 posts	I don’t know why various people are talking about task swapping, it’s completely irrelevant.

Jun 12, 2018 7:31am Jon Abbott (1421) 2651 posts	I don’t know why various people are talking about task swapping, it’s completely irrelevant. Because you want the contents of memory pointers returned by the OS to be persistent on a per task basis, without reserving large chunks of kernel space for cyclic buffers? This goes for Error blocks, MODE specifiers and countless other transient items. Is there a way to programmatically determine if CallBacks are blocked?

Jun 12, 2018 10:12am nemo (145) 2552 posts	Because you want the contents of memory pointers returned by the OS to be persistent on a per task basis In what way do you think anything has changed? The perceived vulnerability is that a TaskWindow client could be pre-empted before it has finished with an error block – normal Wimp tasks wouldn’t dream of holding onto an error pointer over a call to Wimp_Poll. So in the case of a TaskWindow client, is it your belief that error pointers are currently longer-lasting than they would be if copied to a cyclic buffer? Because that’s not true. It is already the case that having received an error pointer returned from an XSWI (with callbacks avoided) the TaskWindow client may well call some other SWI (in fact it’s almost inevitable) before it has finished with the error pointer. Assuming that this SWI does not return an error, callbacks will be triggered, and the TaskWindow client may be pre-empted. This has always been the case. MessageTrans uses cyclic buffers – 32 if memory serves (for foreground callers) – but any non-ROM module may only have a single error buffer which could be overwritten by the actions of other tasks before TaskWindow gets its idle poll back again. So the suggested change only makes error pointers more reliable, not less, and if the SWI Handler had cyclic error block copying, then MessageTrans could probably do away with its similar scheme. (MessageTrans uses a cyclic array of buffers, but a truly cyclic buffer would be much more memory efficient). Don’t get me wrong, I do believe that task management should be much nearer to the Kernel than it is at the moment (and I’ve argued that it should be removed from the Wimp and bundled into TaskManager, but that’s a detail). There are things that would benefit from spotting context switches (via some mechanism more efficient than a Service Call) – a system-wide DelinkApplication would be one of the benefits. And error buffering during the callbacks would be neater if integrated into that task swap… but it’s not on a per-task basis, but a per-pre-emption basis. It ought to be TaskWindow’s responsibility, not the Wimp’s, for example. But this is finessing, it is not necessary. A cyclic buffer is already a huge improvement. Ultimately the extent of the cyclic buffer is dictated by how many TaskWindow clients have been simultaneously (serially) pre-empted during their error handling… so I’m going to stick my neck out and say MessageTrans’ choice of 32 does not look recklessly small.

Jun 12, 2018 12:38pm Jeffrey Lee (213) 6048 posts	So the suggested change only makes error pointers more reliable, not less Which suggested change? MessageTrans uses cyclic buffers – 32 if memory serves (for foreground callers) Under RISC OS 5, MessageTrans uses 4 buffers for interrupt handlers and 16 for “foreground” code.

Jun 12, 2018 3:43pm nemo (145) 2552 posts	Which suggested change? Copying of error buffer (if there is one) around callbacks (if there are any). RISC OS 5, MessageTrans uses 4 buffers for interrupt handlers and 16 for “foreground” code Ah so. RO4 is (I’ve just checked) 8 and 32. Either because “it goes up to 11” or post hoc ergo propter hoc – I know not which.

Jun 18, 2018 12:11pm Rick Murray (539) 13850 posts	There seems to be a certain degree of inherent shonkiness in CLib. I have a module (written in C) which is reading its data wrongly. Something that should have zero indexed items is claiming to have 208. I’m not sure why yet, in the middle of debugging it. However… It was setting a backpointer to a previous index entry. Which didn’t exist. So it was returning the pointer as a file offset to 538976288. The C code was attempting to fseek() to this location, and then fwrite() if the seek succeeded. This would invariably trigger a crash with an undefined instruction around &30 or so (I’m using a low vector image). If you were lucky (rare) you’d get “FileCore in use”. Otherwise, the usual result would be the foreground processing would freeze, the keyboard would cease responding, the mouse might freeze, but tickers would keep on going (as evidenced by my OLED’s clock updating). I don’t know if it’s RISC OS or CLib at fault. I do know that passing a crap file offset (no matter how out of range) to fseek() should never muck up the system that badly. [my current anti-stiffage fix is to simply bail out if the previous message pointer (514MB) is greater than the file size (64 bytes)] Edit: Oops. I was looking up array offsets using the non-consequential array references rather than the actual array offsets. So, oops, trying to read index value 60 from an array with maybe 15 elements ain’t gonna work. ;-)

Jun 18, 2018 6:27pm nemo (145) 2552 posts	I had the pleasure of mentoring a graduate in C, who had been taught nothing but Java. “Array bounds checking? What’s that?” he infamously asked.

Jun 18, 2018 7:33pm Steffen Huber (91) 1953 posts	I had the pleasure of mentoring a graduate in C, who had been taught nothing but Java. “Array bounds checking? What’s that?” he infamously asked. Hopefully you answered “It’s the mechanism that throws those IndexOutOfBoundsExceptions”.

Pages: 1 2 3

Reply

To post replies, please first log in.

Forums → Bugs →

Machine hang in USR mode C

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options