OS_ClaimProcessorVector again
Jon Abbott (1421) 2651 posts |
I feel there’s some major scope creep here Jeffrey. I meant in terms of the proposed change to OS_ClaimProcessorVector not your todo list. When you started mentioning other potentially linked things, it’s blurring what is actually going to be changed as part of this change. Given this is a fundamental change that is going to affect both existing usage and all future SMP usage, we should probably keep it as minimal and precise as possible.
Could you expand of this please. The Abort vectors are normally entered in Abort mode, at which point if the abort was in FIQ or User you would either temporarily switch mode to get the FIQ registers, or use STM^ to get the User registers. Could you confirm which mode the proposed abort handler will be entered in and if the parameter block passed contains the actual aborting register set. Will FSR also be included? And if so, in native format or de-“implemation-defined” – I’m thinking of the extended abort type field on some CPU’s and split IFSR/DFSR. Will FAR also be included? What happens if an Abort handler subsequently generates an Abort? Would that be handled correctly? I feel we should make reentrancy handling a requirement of all future Abort handler code as for example, an Abort handler that attempts instruction recovery or rollback might itself trigger an Abort. In this scenario, the handler either needs to be aware it triggered the Abort and recover or pass the subsequent Abort on to the next handler. I appreciate recursive Abort get messy real fast, so we possibly want to cap the level of recursive aborts to a sensible level before the OS takes over and reports the Abort, without passing it the handlers. |
Jeffrey Lee (213) 6048 posts |
LDR temp, [context, #15*4] ; SPSR from register dump MSR SPSR_cxsf, temp LDMIA context, {r0-r13, r15}^ ; R0-R14 from register dump Since the new OS_ClaimProcessorVector handlers are invoked after the old handlers, the above claims are only valid if the old handlers haven’t changed things. E.g. an old data abort handler that triggers a recursive data abort without preserving DFAR/DFSR/etc. will cause the new handler to see the wrong values when it’s entered. Because handlers are given copies of the relevant coprocessor registers, recursive exceptions are fine (apart from on ARM6/7, see below), providing the handler takes the necessary basic precautions. E.g. a data abort handler running in ABT32 must be aware that any recursive data/prefetch abort will corrupt R14. For ARM6/7, DFAR & DFSR are read-only. I haven’t decided yet on what to to about that (e.g. just ignore the problem and allow the data abort environment handler to receive corrupt values, or maybe check to see if they’ve been modified and then raise a “recursive abort” error) All the CP15 registers will be in native format. I’ll probably add support for allowing handlers to return errors, in a similar way to OS_AbortTrap (exit with V set and R0 pointing to an error pointer; if it’s a hardware error (bit 31 of error number set) it’ll be raised immediately via OS_GenerateError, otherwise the error will be ignored and the next handler will be tried) Another thing you can’t do at the moment when claiming the exception is specify new R14/SPSR values for the exception handler mode; i.e. if some ABT-mode code raises a data abort while executing “LDR R14,[somewhere]” then there’s no way for the handler to specify the new R14_abt value that should be used on return from the exception. I’m not sure yet whether this is significant enough to warrant extending the API & rewriting things.
Exception recovery (e.g. C++ try/catch blocks) is a complicated issue. The modern way of doing it is to ensure all your code has unwinding tables generated so that the exception handler in the kernel/runtime can examine the tables and work out how to unwind the call stack, restore the relevant registers, and execute the relevant ‘catch’ block. For RISC OS this is obviously a problem because large chunks of the OS sources are assembler, the C compiler doesn’t generate unwinding tables, and there are oodles of interfaces through which control could pass from OS code to external code which might not have unwinding tables of its own. Exception recovery is going to be a major issue for SMP (you don’t want important mutexes to be stuck in a locked state forever if some code crashes while the mutex is held), so it’ll have to be tackled at some point, but probably not right now.
At the moment I’m not planning to add any code to limit the number of recursive aborts or protect against stack overflows. |
Jon Abbott (1421) 2651 posts |
The entry modes you’ve described above are the hardware level entry states. Surely you’re not proposing that the SWI replacement also enter in these states? I would expect any replacement to abstract the CPU mode and always enter in System
Two options. Drop all support pre ARMv5 from the OS or read them before any abort handler is entered and pass them as parameters. Personally I think we need to give serious consideration to dropping older CPU support as they are holding back OS development. They don’t have half the features required for SMP for starters.
The problem with this approach is handlers will have to be aware of the CPU model they’re running on, that I think is a going to be problematic. Should they not be abstracted so that regardless of the CPU, the handlers are agnostic to the CPU model?
If it avoids system locks, I’d consider extending. If this is going to be the model going forward, it should make all attempts to avoid system locks and crashes by “user” code. Abort handlers by users should essentially be untrusted by the OS in any modern OS.
Indeed, SMP tasks will need to recover from aborts or be forcefully terminated by the OS if they’ve hung.
Even if you don’t code it up now, I’d include it in design so it’s covered. Any replacement Abort handler needs to be watertight and avoid locks/stack overflows at all costs. |
Jeffrey Lee (213) 6048 posts |
Why not? Is there software that needs to rewrite the stack of the handler mode? (e.g. something that needs to intercept SWIs and arbitrarily rewrite the SVC stack)
That would definitely rule out using the new interface on ARMv3. For ARM6/7, DFAR & DFSR are read-only. I haven’t decided yet on what to to about that The new abort handlers do receive them as parameters. The problem is that if the new handlers don’t deal with the abort, the environment handler will be invoked, and the abort environment handlers don’t accept any parameters – they expect all of the registers to be in their initial states, as if control came straight from the processor vector.
Yeah, there are a number of limitations with older CPUs which are going to be making life difficult as we try to develop new features. But for OS_ClaimProcessorVector, I don’t think there’s anything serious enough to warrant completely dropping support for older CPUs. Worst-case, we can just make it so that the new API is only available on new CPUs, and old CPUs stick to the old API. All the CP15 registers will be in native format It’d certainly be nice to abstract them, but in the interest of keeping the changes “minimal and precise” I decided to take the easy route and not abstract over things ;-) At the moment I’m not planning to add any code to limit the number of recursive aborts or protect against stack overflows. Depending on how watertight you want things, that could either be very easy, or very hard! (halting problem, anyone?) I believe the “industry standard” way of dealing with stack overflows is to just let them happen and then take recovery action afterwards (e.g. extend the stack if possible, or raise an exception which a handler further up the stack can catch, or kill whatever process/thread caused the overflow, or for extreme cases go to BSOD). RISC OS is more unusual in that a lot of (application) software checks the stack limit itself and will take action before the limit is reached (throw an error, or extend the stack). Since we don’t have very good ways of recovering from stack overflows, maybe a reasonable thing to do would be to provide each handler with a SL value on entry, so that the handlers are able to perform limit checking and avoid triggering the overflow to begin with. Other options I can think of would be to prevent abort handlers from being allowed to directly re-enter themselves, or to define some kind of priority order for the handlers so that code which aborts can only have that abort processed by a handler which has a higher priority (which would also have some impact on how people write the handlers, since they’d have to be a lot more aware of which parts of the system the handler is interacting with). |
Kuemmel (439) 384 posts |
@Jeffrey: Might be a bit offtopic, but in the context of multithreading low level coding: In x86 assembler I used e.g. I found somebody suggested to transfer this to aarch32 by Is that correct or would you do differently ?
@EDIT: Further reading made me stumble over a may be need for a data barrier instruction like ‘DMB’ also ? |
Jeffrey Lee (213) 6048 posts |
Correct :-)
Yes, that’s correct.
Yes, in most cases you’ll also want barrier instructions somewhere. By themselves, LDREX/STREX only guarantee that the byte/word/whatever targeted by the instruction has been updated in an atomic manner. They provide no guarantees on the ordering of other memory accesses in the system. So for something like a mutex or spinlock which protects a region of memory, you’d want to use barrier instruction(s) to make sure that out-of-order execution or speculative reads won’t cause the protected region to be accessed while the lock isn’t held. Getting everything right can be tricky (see the “barrier litmus tests” chapter of the ARMv7 ARM or ARMv8 ARM), so it’s probably best to use an existing library (e.g. SyncLib) instead of trying to write your own. Or at the least, look at how existing libraries work and then copy them ;-) |
Clive Semmens (2335) 3276 posts |
Ooh – that sounds nice! I wonder who wrote that? It warn’t there in the ARMv7 ARM in 2007 when I was responsible for it, and it wouldn’t half have been handy… Edit: I do have a copy of the later edition, and am now reading the litmus chapter…can’t tell who wrote it, I’m pretty sure it’s no-one I know, neither one of the engineers I knew nor a tech author I knew… |
Stefan Fröhling (7826) 167 posts |
This sounds very good! Wish you good luck with the implementation! |
Stefan Fröhling (7826) 167 posts |
We don’t need multi-threading for the past, we need it for the future of RISC OS. I don’t see the point to spend time and maybe even restrict the new features of the multi-threading to cover 500ish RISC OS computer of which maybe the owners not even want to use the new features. |
Jon Abbott (1421) 2651 posts |
Surely you’re not proposing that the SWI replacement also enter in these states? Sorry, that was poorly worded on my part. I was referring to the replacement OS_ClaimProcessorVector not abstracting the CPU mode for the Abort vectors specifically. I think we need to handle the vectors differently, as the following vectors should not be abstracted as they are going to be used for low-level things like Hypervisors. Ordinary user-level code should not be able to claim them:
The Abort vectors meanwhile should be abstracted and entered in System as they’re going to be claimed by just about every C app and consequently should be untrusted:
They should have application specific abort stacks, so they can’t break other apps or the OS and only entered if the Abort is within their address space. The OS should handle Aborts within the OS, shutting down offending apps that trigger the aborts. As for the interrupt vectors:
IRQ should be abstracted, arguably the code should be entered in either System or User to avoid code elevation. FIQ – I’d argue that only system level device drivers should have access to claim FIQ. Ideally they should be isolated tasks.
If we’re implementing a completely new programming model, we probably want to only do it the once and future-proof it.
Are you sure? I would expect modern standards would dictate avoiding overflows and almost certainly check for race conditions and forcibly terminate tasks that get into that state.
Personally I think we should freeze the current codebase, bump to RO7 and drop all legacy processors and hardware. Everything prior to the Pi is just going to hold back future development and will add unnecessary work to migrating the OS to C. |
Rick Murray (539) 13840 posts |
While I don’t disagree (there are other problems, such as the Iyonix and earlier having no hardware FP), I would ask “why?”. Would it not be possible for the HAL to say “no, can’t” and just have SMP tasks simply not work on those machines. Want to do that, upgrade.
Never ever use “industry standard” as a justification for anything. Wouldn’t it be better to proactively monitor the stack (especially during development) to try to ensure that (as best possible) everything works as expected? Surely that’s better than letting it happen and trying to pick up the pieces afterwards? Just off the top of my head, I’m wondering if the abort handler should fail noisily if reentered. If you abort during an abort, it seems to me that something worse is happening than a simple stack overflow.
I believe this one gets temporarily claimed when testing for a processor feature (like the existence of SWP).
C claims those? If so, why those and not undefined instruction?
Hmm, I wonder how many TickerV and CallAfter/CallEvery handlers that would affect.
Mostly agreed. I don’t think the current codebase should be frozen and the version bumped to Legacy support should be a service, a gracious one at that. Not a roadblock or neck noose. When the former becomes the latter, it’s time to cauterise. 1 We may mock the BSOD, but at least the user sees something. We could do with some of that…only without the dumb unhappy smiley. |
Jon Abbott (1421) 2651 posts |
ARMv4 cache maintenance is slow and messy, it’s uni-core, there is no virtualisation support, vectors are low, memory cannot be flagged as non-executable, to name but a few.
I disagree, OS development has matured enough for RISCOS to follow industry standards. In this instance I’m pretty sure the “industry standard” would be to prevent overflows. Overflows are bad for security for starters.
Nothing should fail, reentrant aborts should either be handled or passed on. For example, in the case of ADFFS, if the Abort is from userspace, it handles it, if it’s from within its own Abort handler, it handles it if it was caused by misalignment, otherwise it passes it on. Provided the OS doesn’t then get into a race condition, ADFFS will produce a full screen Abort report and forcibly kill the triggering process.
User level code should not be claiming hardware vectors, there are legal OS calls to establish system features.
I can’t speak for modern C, but certainly the majority of C code I’ve looked at in games claim them via the Environment handlers.
You’re talking about downstream events, not the IRQ hardware vector. They can remain as is for the moment. That said, in a modern OS, user code should not be entered in a privileged CPU state so it would need to be addressed at some point.
My reasoning behind freezing 5.x and bumping to 7.x is so there’s a distinct point for ARMv4 support. Acorn did the same thing with 3.1x, so there is precedent. If we move to 7.x all the legacy CPU and hardware support can be removed from the OS kernel, without concern for Iyonix etc as they all just stick with 5.x Application changes can still filter back to the 5.x, but the kernel stops being developed. There’s already precedent for this in how !System is updated on earlier OS. |
Jeffrey Lee (213) 6048 posts |
You’re aware that System is a privileged CPU mode, right? Those comments make me think that you think that it’s unprivileged. Apart from using the user mode register file (and having no SPSR), it behaves exactly like any other privileged CPU mode. The only options to allow untrusted code to safely run are to run the code in user mode, or in some kind of VM (e.g. BPF is used quite a bit in Unix-likes to allow untrusted code to hook into system-critical packet/event streams) I believe the “industry standard” way of dealing with stack overflows is to just let them happen and then take recovery action afterwards Focusing on x86_64 Linux (you can do your own homework on other OS’s & architectures if you want), Google tells me that:
C claims those? CLib sets up undefined instruction, prefetch abort, data abort, and address exception environment handlers. It doesn’t touch OS_ClaimProcessorVector at all. OS_ChangeEnvironment is certainly due a reckoning, but I’d say that that’s a discussion for another thread. The processor vector environment handlers are invoked after the OS_ClaimProcessorVector handlers, so the design/implementation of OS_ChangeEnvironment should have very little impact on the design/implementation of OS_ClaimProcessorVector. |
Rick Murray (539) 13840 posts |
Actually, I meant “Why?” as in “why abandom a machine because it can’t do SMP when it could be made to simply not even try”. But, thanks for clarifying what I thought the reasons were.
As is the OMAP3 and the Pi 1/Zero. Ideally, an SMP system ought to work from 1 core upwards 1, so things don’t need to make assumptions about how many cores are actually available.
<glances at The Register in the other tab> Yup.
In a perfect world, nothing would. In reality however…. I notice you mention ADFFS will show a fault report and kill the errant task. That’s more than the OS itself manages.
These days, yes. But if you have older code that tries probing for SWP in an ARM250, that’s kind of how it worked – well, it uses ChangeEnvironment to hook into the undefined instruction vector rather than attempting to hijack it at hardware level.
Oh, god, I agree SO much. It would be my hope (and joy) to have a flag added to the module header flags word that indicates to the OS that it can safely demote the processor to USR mode before calling the SWI handler. Okay, it might need a little more work in actually implementing (register preservation and how to handle returning to the OS on the way out), but really, using SVC mode should be reserved for stuff that talks to hardware or DAs marked as inaccessible to user mode, and the like. It shouldn’t be “everything that’s a SWI” because a huge amount of that can run quite happily outside of SVC mode. We’ll have to leave the event/vector/service handlers as they are, as they have a pass on mechanism to switching mode is non-viable.
Where would you draw the line? Anything <= Iyonix?
Um… 1 Albeit, slightly suboptimally if it’s running multi-core stuff on a single core. |
David J. Ruck (33) 1635 posts |
I would go as far as to say ARMv7 should be the earliest supported architecture, unless anyone can come up with a really good justification for ARMv6. |
Chris Evans (457) 1614 posts |
Just in case it’s useful for your deliberations, from our contact with many RISC OS users most of whom do not contribute to the forum and I expect don’t read it either: n.b.I’m not aware of anyone using IOMD RISC OS 5 as their main RISC OS machine |
Steve Fryatt (216) 2105 posts |
RPCEmu (including RISC OS Direct on that)? Maybe not “main machine”, but it’s still essential here. |
David J. Ruck (33) 1635 posts |
I’m sure RPCemu could be updated to emulate an ARMv7 Pi, as qemu can. |
Clive Semmens (2335) 3276 posts |
But how fast would it be? The whole reason to want the huge RAM is for speed – not having to go to storage to access items randomly from huge lists. |
George T. Greenfield (154) 748 posts |
? The clue’s in the name, surely? |
Clive Semmens (2335) 3276 posts |
You might call it “repurposed” or “modified” rather than “updated,” but the meaning is clear enough anyway. From my point of view it’s whether an emulator can compete with the real thing for speed. And anyway, real 8GB Pis exist and are not ludicrously expensive. |
George T. Greenfield (154) 748 posts |
I agree. Actually, the existence of RPCEmu should make this step easier: on any half-decent Wintel laptop or desktop it won’t feel much different from an Iyonix or Pi1/2 speed-wise, and a good deal quicker than any RiscPC. So anyone wanting or needing to use software which only runs on pre-ARMv7 OS versions can utilise RPCEmu in conjunction with RO4, 5 or 6 without a significant penalty. For myself on the other hand, I greatly look forward to the day when RISC OS can take fuller advantage of the huge performance advances embodied by recent ARM chippery. |
Matthew Harris (1462) 36 posts |
Would this not preclude the use of the earlier RaspberryPi and current RaspberryPi Zero models which, from memory, are ARMv6? |
Clive Semmens (2335) 3276 posts |
Your memory is correct, Matthew. Not that later Pis are expensive; replacing an early Pi with a later one is not a big deal. There might be other reasons not to want to change a Zero, of course. |
Rick Murray (539) 13840 posts |
Yes, but only the Pi1 and 0 (Pi2v1 is ARMv7). With the slower processor and more restricted memory, they’re fine for RISC OS but might struggle a bit with software such as the new browsers. Running Otter on a 256MB Pi1 was an exercise in pain. |