OS_ClaimProcessorVector again
Jeffrey Lee (213) 6048 posts |
Long ago I moaned about some design flaws in OS_ClaimProcessorVector, and I’m here to moan about them again. This time it’s from the perspective of trying to make OS_ClaimProcessorVector SMP-safe (or maybe introduce an alternate API). For SMP safety, there are two problems with the current OS_ClaimProcessorVector that will cause problems:
Plus of course it would be nice to fix the “handlers must be removed in the correct order” problem. Although there are a few potential solutions to these problems, I’m thinking that the most sensible solution would involve making the vector/handler chain a bit more high-level. I.e. instead of each routine being entered with “virgin” register state and being expected to either return from the exception or directly pass on to the next handler, the kernel would now be in charge of stepping through the vector chain, with the routines receiving a pointer to a register dump as input, and returning a flag on exit to indicate whether the exception was handled or whether it should be passed to the next handler. With the kernel in charge of stepping through the routines, it’ll be trivial to add spinlocks/mutexes to make adding & removing handlers fully multi-core/multi-thread safe. So with that decided, we now need to work out what the entry/exit parameters are for each of the processor vectors. This is where things start to get messy, because there are many different options. Fully virginWith enough effort, the kernel could restore the original registers before calling each handler. The handler would then return to the kernel via a pre-known return address, like the current “previous routine” address that’s returned when registering a handler. To indicate the action the kernel should take (return from exception or pass on to next handler), the PSR flags could be used, or perhaps there could be two return addresses that the handler could exit via. This would be rather ugly to implement, and probably quite inefficient, since there’d be a lot of register stacking & unstacking going on for entry & exit from each routine. Minimum (SWI/vector like)On entry, R0-R9 will be virgin, and R10-R12 will have been modified by the kernel, e.g.: R10 = pointer to context dump containing the original R10-R14 & SPSR On exit, the routine can update R0-R9 and the values in the context, and the PSR flags or R11/R12 could be used to indicate whether to return from the exception or pass on to the next handler. This approach is better than the “fully virgin” approach, but with only one register spare on entry, pretty much every handler is going to need to stack/unstack at least a couple of registers, so it’s not particularly efficient. Full contextEach handler will be given a pointer to a register context dump containing R0-R15 & PSR from the aborting mode. Since all the registers have been preserved in the context, this will allow the handlers to corrupt any registers they want. As with the “minimum” approach, each handler can have a private word/workspace pointer passed to it. Different for eachDifferent vector types need access to different registers. E.g. undefined instruction handlers typically want a full register dump. Data abort handlers will also want a full register dump, and the data abort CP15 registers. Prefetch abort handlers (assuming they’re used for typical memory mapping scenarios) only care about the abort address, PSR, and prefetch abort CP15 registers. IRQ handlers shouldn’t need access to the original registers at all. RISC OS-style SWI handlers typically only need R0-R9 and the PSR, but sometimes more registers will be needed (e.g. OS_CallASWI, or maybe for handling unusual SWIs from other OS’s). So it might make sense to give each handler type its own entry/exit parameters. E.g. undefined instruction, data abort & prefetch abort handlers would be given “full” register dumps (R0-R15 & PSR, from the aborting mode). IRQ handlers don’t need anything, so won’t be passed a context pointer on input, but since the RISC OS 5 OS_ClaimDeviceVector is APCS-compliant, it’d probably simplify things if the new OS_ClaimProcessorVector IRQ handler interface was also APCS-compliant, allowing handlers to corrupt R0-R3 & R12. To ensure compatibility with all SWIs, SWI handlers would also need to be given a full register dump. As with the “minimum” approach, each handler can have a private word/workspace pointer passed to it. Since many of the vector types require a full register dump, this approach is very similar to the “full context” one. The addition of the relevant CP15 registers, and more optimal handling of IRQ & prefetch handlers, should make it better than the “full context” approach, but I’m also paranoid that there could be some situations where a handler routine needs access to a register which has been saved by the kernel, but wasn’t included in the context dump. There’s also some variety in which CP15 registers are relevant for each handler type (e.g. some architecture versions have more registers than others, and the registers needed may vary depending on the sub-type of the exception). So it might be better to stick with the current situation where it’s the responsibility of each routine to manage the CP15 registers. APCSOnly the APCS caller-save registers will be preserved, i.e. R0-R3 & R12 (and provided to the handler routine in a context dump). However this isn’t very practical, since if you’re writing a handler in a high-level language, and need access to one of the callee-save registers, the handler would have to be able to fetch the values from its own stack frame, which isn’t going to be easy. Out of the above options, I’m thinking that providing each handler a full context would be best. So I guess what I’m proposing is the following:
Can anyone see any problems with the above, or have any other thoughts? |
Charles Ferguson (8243) 427 posts |
I’d be more tempted to merely provide an explicit interface for the SWI and Undefined instruction handlers only and be done with it.
So the only ones that actually need to be handled (assuming the existing device vector and abort trap have already been made nice and tidy for you), are the undefined instruction (which is required) and SWI (which is ‘nice to have’ , or ‘as much of an abomination as OS_EnterOS’, depending on your perspective). Given the interface currently allows for (255-7) vectors, there’s no need to use the magic +&280, I think. You can get away with just using new vector numbers. Vector &x11 => Undefined instruction claim/release, in your new format. I don’t really see that it’s necessary to jump up to the high ranged numbers when there is space in the lower range which will be faulted by existing systems. However, the positioning of those interfaces may not be so important really. The interface you’ve defined allows for pre-trapping, but not post-trapping. For example, if a SWI handler claimant wished to perform the usual operation and then mangle the results that came back, that’s nor possible with the interface you’ve defined, but is possible with the existing interface. Maybe returning an address in R0 would mean ‘pass to the next handler, and push this address on to a stack of functions to call as it returns’. I’m relatively certain that SWI exception is the only one that would use this in general, but I can envisage that the other handlers might use it for logging of the exception vectors. On the other hand, maybe that’s a bad idea if the SWI call might not return through the normal channels (sigh). There’s some small simplifications that can be made for some entry points too – passing the SWI number directly to the handler, or the instruction that was being executed for undefined instructions, would make them a tiny-tiny bit easier, as that’s what they’re going to do anyhow. But meh… depends on what sort of thing you want to run on those vectors. |
Charles Ferguson (8243) 427 posts |
Oh, I didn’t say it but I approve of the idea of a more controlled interface using the full context and with chaining behaviour. Keeps the Kernel in control but allows clients to blew their legs off if they want. Might want to consider how to enumerate the chain, too – for diagnostics purposes. |
David J. Ruck (33) 1629 posts |
I agree that any solution should be APCS compliant, as we want to be using HLL for everything now, and the fewer assembler shims the better. Also any new way of doing things should have one eye on 64 bit, so its easily transferable to the new architecture and we don’t have to learn two new ways of doing things in quick succession. |
Jeffrey Lee (213) 6048 posts |
That’s certainly tempting. RISC OS 5 currently lacks OS_AbortTrap, but there are some features on the roadmap which would probably require data/prefetch abort handling to become a lot more sophisticated than the current system. So if we don’t support claiming of the data/prefetch vectors in the new OS_ClaimProcessorVector, then that should help ensure that the new higher-level kernel APIs actually get developed instead of constantly being kicked further into the backlog.
Processor vectors & exception handling in AArch64 is completely different to AArch32. Both for native AArch64 exceptions, and AArch32 exceptions that get handled by AArch64. So the only way of creating handlers which “just work” when running in AArch64 mode would be to make them sufficiently high-level. OS_AbortTrap is a good example of this – all the messy stuff of decoding the aborting instruction and copying values in/out of registers is handled by the kernel, so apart from any changes to the handler function signature, an AArch32 abort trap handler will work just fine in an AArch64 OS (or MIPS, or RISC-V, or x86, etc.). For undefined instructions, you’d also need to find a way of abstracting over the register access. With the current OS_ClaimProcessorVector there’s no abstraction (the code needs to read/write the registers itself). The new OS_ClaimProcessorVector will provide mid-level abstraction (R0-R15 & PSR provided in a context dump), but some instructions may require more than that (e.g. banked registers from other modes, or maybe some coprocessor registers). Potentially we could make the context dump used by AArch32 RISC OS bigger so that the same context dump can be used in AArch64 RISC OS, but that might end up being unwanted bloat for the AArch32 OS (and for coprocessor registers, how can we know which ones a handler will require?). So a more practical solution would probably be to have different context dump formats for handlers running in AArch32 mode vs. AArch64 mode, and leave it down to the handler author to deal with the differences themselves (e.g. using functions/macros to abstract over accessing the state, so e.g. AArch32 builds can access banked registers directly, while AArch64 builds access them via the context struct) SWIs also have some differences to overcome, the biggest one being the wider registers in AArch64. I get the feeling that every AArch64 SWI handler (i.e. normal SWI handlers in modules) is going to have to be aware of whether the call came from AArch32 or AArch64, so it knows whether certain registers need to be zero-extended or sign-extended, whether it’s safe to return 64 bit pointers, how to pack/unpack FP registers, etc. I’m also hoping that we’ll take the opportunity to fix some flaws with error handling – e.g. make it so that the SWI caller must provide a pointer to memory to be used for the error block, so that we’re no longer trying to force everything through MessageTrans’s shared buffers (for calls from AArch32 mode, the kernel can easily take care of this difference). |
Colin Ferris (399) 1809 posts |
Has there been a way found to go between 64bit and 32bit and back? As a bit of info – is there a way to go from Login back to where one was reading? |
Rick Murray (539) 13805 posts |
You can only switch from a 64 bit world. A 32 bit world cannot switch up to 64 bit (that’s why accidentally trashing bit 4 of the PSR results in an exception).
Given discussion elsewhere about a processor with Cortex-M cores, it might be worth thinking about whether it is feasible to include Thumb in this. What happens if a Thumb core hits an exception?
Stay where you were reading and open login in a new tab/window. Once you have logged in, go to the URL bar of where you were reading and press Enter to refetch the page. Do not click the refresh button as that will use the old, stale, cookie and you’ll end up logged out. :-/ |
Jeffrey Lee (213) 6048 posts |
Probably the same as AArch64: Either develop a high-level API that abstracts over the low-level architectural differences, or accept the fact that you’ll need different code for each. |
Alan Adams (2486) 1147 posts |
Or alternatively click the “login to comment” link, login, click back twice, then refresh. If you alreadt have other pages open, refresh will allow you to comment on them too. |
Rick Murray (539) 13805 posts |
Which browser? Firefox, when used like that, will load the pages from cache (and hence be using stale cookies). |
Jon Abbott (1421) 2641 posts |
I assume your concern is around the time gap between pointing at the new routine and it storing the address of the next routine from the returned parameters? I’ve moaned in the past that there’s no way to read the current handler without claiming it.
What is the concern with the released handler running its course? Dumb question: Are the handlers core specific or do they cover all cores? I’ve not really looked into how ARMv5+ works at low-level when more than one core is active.
FAR/FSR need to be included in the register dump? I’m all for abstracting the hardware vectors, but I would still like a (new if need be) method to take over the vectors ahead of the OS for ADFFS. It needs to sit directly on the SWI, Data Abort, IRQ and Undefined Instruction to work at any speed and avoid crashes. If/when RISCOS can handle multi-core, I’ll modify the code accordingly and spin out emulation threads/VM’s to the other cores. |
Jeffrey Lee (213) 6048 posts |
Correct. The time window might be small, but I’d rather be safe than sorry.
The program that installed the handler could free/overwrite the memory while another core/thread is still executing it. For multi-core code it’s impossible for a routine to maintain an accurate “is the routine running?” flag itself; it has to be external code which sets the flag, calls the routine, then clears the flag.
The aim of the OS_ClaimProcessorVector changes will be to make all the cores use the same handlers. On a technical level, we can give different cores different handlers if we want, and you can use CP15 registers to change the address of the processor vectors on a per-core basis.
Yeah, now that I’ve thought about it some more it’s probably for the best to include them. If we’re going to end up with lots of abort handlers then there’ll be too many opportunities for recursive aborts to occur for each handler to be able to reliably save/restore the registers itself – so having the kernel capture them at the start of each abort and store them in the register dump will be the safest option. |
Steve Pampling (1551) 8154 posts |
Not here. Another tab – yes. |
Chris Mahoney (1684) 2165 posts |
It depends on the browser. In Safari on my Mac, I can open the login page in a new tab then reload the old tab. In Edge on Windows, reloads trigger logouts. |
Alan Adams (2486) 1147 posts |
I do this every day, with Firefox. On Windows, in case it’s different on other platforms. It works here. |
Jon Abbott (1421) 2641 posts |
I think you’re fighting a losing battle there. Without recoding every existing handler, OS_ClaimProcessorVector can never be atomic. Personally I wouldn’t try to fix it, just deprecate its use and replace it with a more core friendly version that’s atomic. I’d also give the option of the handler being core specific or all cores.
Existing code written for cooperative multi-tasking won’t know how to handle aborts from other cores. |
Jeffrey Lee (213) 6048 posts |
That’s essentially what I’m proposing. The old API will still work as-is, but there’ll be an extra flag or new vector numbers to select the new API when making the call to register/deregister a handler.
Sorry, I wasn’t quite clear there. Any handlers registered using the new API will be used across all the cores. Handlers registered using the old API will (probably) only be used with the primary core, which will (hopefully) be where any code which relies on those handlers is restricted to running. |
Steve Pampling (1551) 8154 posts |
Bear in mind that Rick is using an elderly (vintage even) version of Firefox on Android |
Rick Murray (539) 13805 posts |
60.0.2, as in “before they broke it”. Put it like this, the forum sets cookies because I have whitelisted it. The Wiki here runs scripting because I have whitelisted it. Other sites can set cookies (too much stuff breaks without) but said cookies are automatically erased after 90 seconds even if the tab is open unless the site is whitelisted. Aggressive content blocking (including any media over 256K without being whitelisted). It’s actually quite pleasant using the web on my phone. When I use Chrome (FlightRadar24 doesn’t load properly on mobile Firefox) or the times when I used to use Safari on the iPad, it is like culture shock. What the hell is with all this advertising!? So many pop-ups. So much bull, no you didn’t detect 17 viruses on my phone, and no I’m not going to pay you €10 to remove them (and clicking any button will cause a text from Orange saying that the Internet+ payment failed because I haven’t set that up….thieving bastards!). The new improved Firefox is a massive step backwards. It’s no use to me. However, the cookie/Refresh issue. Maybe it’s an Android difference? Who knows… |
Charles Ferguson (8243) 427 posts |
David Ruck said:
In my reading of this, I had pretty much assumed the implementation, other than a small amount of setup for the actual vector entry and SWI veneer would all be written in C from the outset. Your comment has made me question my expectations and, looking at the ROOL Kernel source, there doesn’t seem to be any C code. (facepalm) I think you know what I’m going to say, so I’ll just save myself a couple of hours of writing it. |
Stefan Fröhling (7826) 167 posts |
I cannot help here really with technical comments. But this comment from Jeffrey gave me an idea that the multi-threading module could give us a sidedoor entry to a new better kernel. About task control, task abortion and maybe debugging? In the future new applications could only use the multi-threading module/API that will provide better control over the applications?! |
Jeffrey Lee (213) 6048 posts |
That’s kind of what’s happening, yeah. For some things I’m improving the existing kernel (e.g. new OS_ClaimProcessorVector, implementing OS_AbortTrap). For other things I’m developing what’s essentially a new kernel (e.g. the thread/process management in the SMP module). Progress update on the OS_ClaimProcessorVector stuff:
|
Stefan Fröhling (7826) 167 posts |
Well that sounds good so far! What about enhanced memory protection / virus protection? For example I imagine that a “secure” RISC OS could only allow new application to be run by the multi-threading module and therefore be seperated from the existing base system so that they cannot corrupt the kernel or claim any security relevant vectors? |
Jon Abbott (1421) 2641 posts |
I feel there’s some major scope creep here Jeffrey. The original proposal was to fix the issues around claiming and releasing the vectors. They weren’t major issue to start with until multi-core is viable for all tasks and now you’re mentioning instruction emulation.
This isn’t really related to the OP, but if RISCOS is to be updated for mainstream market use, backward compatibility needs to be dropped from the OS and a whole new programming model put in place that starts from the premise of task isolation, mutithreading and multicore. I’m not sure many are ready to make that leap of faith just yet as it would mean all existing apps would need to sit under an OS emulation layer or be dropped. |
Jeffrey Lee (213) 6048 posts |
At the moment my plan is to develop & release things in the following order:
Since it’s a inconvenient for me to maintain lots of forks of things, I’ll probably aim to get the SMP-safe FPEmulator & VFPSupport merged back into the main sources as soon as possible. This means the SMP-safe OS_ClaimProcessorVector will probably get merged in at the same time (there’s not much point merging it in sooner, since as Jon says, the flaws with the current API only seriously affect SMP code). I’m using this merge request to provide a list of all the SMP related forks and the current state of the project, so keep an eye on that for all the alpha/beta quality changes. In terms of timelines, I’ll hopefully have OS_AbortTrap & OS_ClaimProcessorVector released sometime this month. Then probably July for FPEmulator & VFPSupport, and maybe August for when there’s initial C11 thread & atomics support in the Shared C Library. However other things have been eating in to my spare time a lot recently, so don’t be surprised if the dates slip a month or two.
Not from the perspective of my todo list – there’s multicore (which wants a SMP-safe OS_ClaimProcessorVector), hardware watchpoints (which wants load/store instruction decoding & emulation), hardware breakpoints / improving the Debugger’s woefully inadequate single-stepper (which wants full instruction set decoding & emulation), long descriptor page table support (which wants load/store instruction decoding & emulation), 64-bit future-proofing (which wants high-level APIs instead of low-level ones, full instruction set decoding & emulation for devices which lack AArch32 support, and most code to be written in high-level languages), and some GraphicsV/video improvements (which want either the ability to track page reads/writes, or load/store instruction decoding & emulation). Implementing OS_AbortTrap (and abortable DAs) does add a lot of extra work to the original task of “fix OS_ClaimProcessorVector”, but when looking at the wider picture it makes sense, especially since it could allow some third-party software to migrate away from OS_ClaimProcessorVector on to a higher-level API which is easier to work with and more future-proof. |