OS_ClaimProcessorVector again

147 posts, 30 voices

Pages: 1 2 3 4 5 6

May 2, 2021 4:51pm Jeffrey Lee (213) 6048 posts	Long ago I moaned about some design flaws in OS_ClaimProcessorVector, and I’m here to moan about them again. This time it’s from the perspective of trying to make OS_ClaimProcessorVector SMP-safe (or maybe introduce an alternate API). For SMP safety, there are two problems with the current OS_ClaimProcessorVector that will cause problems: Claiming vectors – as mentioned in the previous thread, this currently isn’t even atomic for the single-core case, let alone the multi-core case. Releasing vectors – when a program removes a handler from a vector, there’s no (easy) way for it to check that all the cores have finished executing the handler code. Plus of course it would be nice to fix the “handlers must be removed in the correct order” problem. Although there are a few potential solutions to these problems, I’m thinking that the most sensible solution would involve making the vector/handler chain a bit more high-level. I.e. instead of each routine being entered with “virgin” register state and being expected to either return from the exception or directly pass on to the next handler, the kernel would now be in charge of stepping through the vector chain, with the routines receiving a pointer to a register dump as input, and returning a flag on exit to indicate whether the exception was handled or whether it should be passed to the next handler. With the kernel in charge of stepping through the routines, it’ll be trivial to add spinlocks/mutexes to make adding & removing handlers fully multi-core/multi-thread safe. So with that decided, we now need to work out what the entry/exit parameters are for each of the processor vectors. This is where things start to get messy, because there are many different options. Fully virgin With enough effort, the kernel could restore the original registers before calling each handler. The handler would then return to the kernel via a pre-known return address, like the current “previous routine” address that’s returned when registering a handler. To indicate the action the kernel should take (return from exception or pass on to next handler), the PSR flags could be used, or perhaps there could be two return addresses that the handler could exit via. This would be rather ugly to implement, and probably quite inefficient, since there’d be a lot of register stacking & unstacking going on for entry & exit from each routine. Minimum (SWI/vector like) On entry, R0-R9 will be virgin, and R10-R12 will have been modified by the kernel, e.g.: R10 = pointer to context dump containing the original R10-R14 & SPSR R11 = unused (temp register for use by handler) R12 = per-handler private word/workspace pointer (no more futzing around copying handlers to the RMA so you can get your workspace pointer from a PC-relative location!) On exit, the routine can update R0-R9 and the values in the context, and the PSR flags or R11/R12 could be used to indicate whether to return from the exception or pass on to the next handler. This approach is better than the “fully virgin” approach, but with only one register spare on entry, pretty much every handler is going to need to stack/unstack at least a couple of registers, so it’s not particularly efficient. Full context Each handler will be given a pointer to a register context dump containing R0-R15 & PSR from the aborting mode. Since all the registers have been preserved in the context, this will allow the handlers to corrupt any registers they want. As with the “minimum” approach, each handler can have a private word/workspace pointer passed to it. Different for each Different vector types need access to different registers. E.g. undefined instruction handlers typically want a full register dump. Data abort handlers will also want a full register dump, and the data abort CP15 registers. Prefetch abort handlers (assuming they’re used for typical memory mapping scenarios) only care about the abort address, PSR, and prefetch abort CP15 registers. IRQ handlers shouldn’t need access to the original registers at all. RISC OS-style SWI handlers typically only need R0-R9 and the PSR, but sometimes more registers will be needed (e.g. OS_CallASWI, or maybe for handling unusual SWIs from other OS’s). So it might make sense to give each handler type its own entry/exit parameters. E.g. undefined instruction, data abort & prefetch abort handlers would be given “full” register dumps (R0-R15 & PSR, from the aborting mode). IRQ handlers don’t need anything, so won’t be passed a context pointer on input, but since the RISC OS 5 OS_ClaimDeviceVector is APCS-compliant, it’d probably simplify things if the new OS_ClaimProcessorVector IRQ handler interface was also APCS-compliant, allowing handlers to corrupt R0-R3 & R12. To ensure compatibility with all SWIs, SWI handlers would also need to be given a full register dump. As with the “minimum” approach, each handler can have a private word/workspace pointer passed to it. Since many of the vector types require a full register dump, this approach is very similar to the “full context” one. The addition of the relevant CP15 registers, and more optimal handling of IRQ & prefetch handlers, should make it better than the “full context” approach, but I’m also paranoid that there could be some situations where a handler routine needs access to a register which has been saved by the kernel, but wasn’t included in the context dump. There’s also some variety in which CP15 registers are relevant for each handler type (e.g. some architecture versions have more registers than others, and the registers needed may vary depending on the sub-type of the exception). So it might be better to stick with the current situation where it’s the responsibility of each routine to manage the CP15 registers. APCS Only the APCS caller-save registers will be preserved, i.e. R0-R3 & R12 (and provided to the handler routine in a context dump). However this isn’t very practical, since if you’re writing a handler in a high-level language, and need access to one of the callee-save registers, the handler would have to be able to fetch the values from its own stack frame, which isn’t going to be easy. Out of the above options, I’m thinking that providing each handler a full context would be best. So I guess what I’m proposing is the following: Vectors can be claimed by calling OS_ClaimProcessorVector with R0 = vector number + &380, R1 = handler routine, R2 = handler R12 Vectors can be released by calling OS_ClaimProcessorVector with R0 = vector number + &280, R1 = handler routine, R2 = handler R12 The OS_ClaimProcessorVector flags in R0 now break down as follows: Bit 7 enables the “extended API”; any call that uses bits 9+ needs to have this set (to work around the problem that old kernels don’t generate errors if bits 9+ are set) Bit 8 marks the claim operation Bit 9 marks that the new handler routine interface is in use Bits 10+ are then free for any other API extensions we want to introduce in the future Handlers will be entered with R0 pointing to a register dump containing R0-R15 & PSR, and R12 with the “handler R12” value Handlers can return with R0 = 0 to return from the exception (restoring the registers from the register dump), or 1 to pass on to the next handler All other registers (R1-R12, R14, PSR flags) can be corrupted The old-style handlers will continue to be supported, but will perhaps only be called for code running on the primary core The new-style handlers will get called after the old-style handlers Can anyone see any problems with the above, or have any other thoughts?

May 2, 2021 7:00pm Charles Ferguson (8243) 438 posts	I’d be more tempted to merely provide an explicit interface for the SWI and Undefined instruction handlers only and be done with it. The Undefined instruction handler is required for FPE + (I presume) other instruction set use like VFP. The SWI vector is manipulated by a few things , so some interface to access that is probably useful (albeit with such huge caveats). The address exception is not used any more. The IRQ vector is handled by the OS and ClaimDeviceVector already provides interfaces for clients of that. The prefetch abort and data abort exceptions are passed to OS_AbortTrap, so they already have a mechanism for being handled. (actually now I think about it, Prefetch abort isn’t handled by OS_AbortTrap, so maybe that needs handling) And branch through 0 is irrelevant. So the only ones that actually need to be handled (assuming the existing device vector and abort trap have already been made nice and tidy for you), are the undefined instruction (which is required) and SWI (which is ‘nice to have’ , or ‘as much of an abomination as OS_EnterOS’, depending on your perspective). Given the interface currently allows for (255-7) vectors, there’s no need to use the magic +&280, I think. You can get away with just using new vector numbers. Vector &x11 => Undefined instruction claim/release, in your new format. Vector &x12 => SWI claim/release, in your new format. Vector &x13 => Prefetch abort, in your new format. (although new reason codes could be added to the OS_AbortTrap interface to provide better control over these, in line with the existing interface) I don’t really see that it’s necessary to jump up to the high ranged numbers when there is space in the lower range which will be faulted by existing systems. However, the positioning of those interfaces may not be so important really. The interface you’ve defined allows for pre-trapping, but not post-trapping. For example, if a SWI handler claimant wished to perform the usual operation and then mangle the results that came back, that’s nor possible with the interface you’ve defined, but is possible with the existing interface. Maybe returning an address in R0 would mean ‘pass to the next handler, and push this address on to a stack of functions to call as it returns’. I’m relatively certain that SWI exception is the only one that would use this in general, but I can envisage that the other handlers might use it for logging of the exception vectors. On the other hand, maybe that’s a bad idea if the SWI call might not return through the normal channels (sigh). There’s some small simplifications that can be made for some entry points too – passing the SWI number directly to the handler, or the instruction that was being executed for undefined instructions, would make them a tiny-tiny bit easier, as that’s what they’re going to do anyhow. But meh… depends on what sort of thing you want to run on those vectors.

May 2, 2021 7:02pm Charles Ferguson (8243) 438 posts	Oh, I didn’t say it but I approve of the idea of a more controlled interface using the full context and with chaining behaviour. Keeps the Kernel in control but allows clients to blew their legs off if they want. Might want to consider how to enumerate the chain, too – for diagnostics purposes.

May 2, 2021 8:37pm David J. Ruck (33) 1696 posts	I agree that any solution should be APCS compliant, as we want to be using HLL for everything now, and the fewer assembler shims the better. Also any new way of doing things should have one eye on 64 bit, so its easily transferable to the new architecture and we don’t have to learn two new ways of doing things in quick succession.

May 3, 2021 12:22pm Jeffrey Lee (213) 6048 posts	I’d be more tempted to merely provide an explicit interface for the SWI and Undefined instruction handlers only and be done with it. That’s certainly tempting. RISC OS 5 currently lacks OS_AbortTrap, but there are some features on the roadmap which would probably require data/prefetch abort handling to become a lot more sophisticated than the current system. So if we don’t support claiming of the data/prefetch vectors in the new OS_ClaimProcessorVector, then that should help ensure that the new higher-level kernel APIs actually get developed instead of constantly being kicked further into the backlog. Also any new way of doing things should have one eye on 64 bit, so its easily transferable to the new architecture and we don’t have to learn two new ways of doing things in quick succession. Processor vectors & exception handling in AArch64 is completely different to AArch32. Both for native AArch64 exceptions, and AArch32 exceptions that get handled by AArch64. So the only way of creating handlers which “just work” when running in AArch64 mode would be to make them sufficiently high-level. OS_AbortTrap is a good example of this – all the messy stuff of decoding the aborting instruction and copying values in/out of registers is handled by the kernel, so apart from any changes to the handler function signature, an AArch32 abort trap handler will work just fine in an AArch64 OS (or MIPS, or RISC-V, or x86, etc.). For undefined instructions, you’d also need to find a way of abstracting over the register access. With the current OS_ClaimProcessorVector there’s no abstraction (the code needs to read/write the registers itself). The new OS_ClaimProcessorVector will provide mid-level abstraction (R0-R15 & PSR provided in a context dump), but some instructions may require more than that (e.g. banked registers from other modes, or maybe some coprocessor registers). Potentially we could make the context dump used by AArch32 RISC OS bigger so that the same context dump can be used in AArch64 RISC OS, but that might end up being unwanted bloat for the AArch32 OS (and for coprocessor registers, how can we know which ones a handler will require?). So a more practical solution would probably be to have different context dump formats for handlers running in AArch32 mode vs. AArch64 mode, and leave it down to the handler author to deal with the differences themselves (e.g. using functions/macros to abstract over accessing the state, so e.g. AArch32 builds can access banked registers directly, while AArch64 builds access them via the context struct) SWIs also have some differences to overcome, the biggest one being the wider registers in AArch64. I get the feeling that every AArch64 SWI handler (i.e. normal SWI handlers in modules) is going to have to be aware of whether the call came from AArch32 or AArch64, so it knows whether certain registers need to be zero-extended or sign-extended, whether it’s safe to return 64 bit pointers, how to pack/unpack FP registers, etc. I’m also hoping that we’ll take the opportunity to fix some flaws with error handling – e.g. make it so that the SWI caller must provide a pointer to memory to be used for the error block, so that we’re no longer trying to force everything through MessageTrans’s shared buffers (for calls from AArch32 mode, the kernel can easily take care of this difference).

May 3, 2021 12:40pm Colin Ferris (399) 1847 posts	Has there been a way found to go between 64bit and 32bit and back? As a bit of info – is there a way to go from Login back to where one was reading?

May 3, 2021 3:38pm Rick Murray (539) 14047 posts	Has there been a way found to go between 64bit and 32bit and back? You can only switch from a 64 bit world. A 32 bit world cannot switch up to 64 bit (that’s why accidentally trashing bit 4 of the PSR results in an exception). Processor vectors & exception handling in AArch64 is completely different to AArch32. Given discussion elsewhere about a processor with Cortex-M cores, it might be worth thinking about whether it is feasible to include Thumb in this. What happens if a Thumb core hits an exception? As a bit of info – is there a way to go from Login back to where one was reading? Stay where you were reading and open login in a new tab/window. Once you have logged in, go to the URL bar of where you were reading and press Enter to refetch the page. Do not click the refresh button as that will use the old, stale, cookie and you’ll end up logged out. :-/

May 3, 2021 3:47pm Jeffrey Lee (213) 6048 posts	Given discussion elsewhere about a processor with Cortex-M cores, it might be worth thinking about whether it is feasible to include Thumb in this. What happens if a Thumb core hits an exception? Probably the same as AArch64: Either develop a high-level API that abstracts over the low-level architectural differences, or accept the fact that you’ll need different code for each.

May 3, 2021 3:58pm Alan Adams (2486) 1155 posts	Stay where you were reading and open login in a new tab/window. Or alternatively click the “login to comment” link, login, click back twice, then refresh. If you alreadt have other pages open, refresh will allow you to comment on them too.

May 3, 2021 6:08pm Rick Murray (539) 14047 posts	Or alternatively click the “login to comment” link, login, click back twice, then refresh. Which browser? Firefox, when used like that, will load the pages from cache (and hence be using stale cookies). There’s a reason I said to press enter in the URL bar and not use Refresh!

May 3, 2021 6:28pm Jon Abbott (1421) 2661 posts	Claiming vectors – as mentioned in the previous thread, this currently isn’t even atomic for the single-core case, let alone the multi-core case I assume your concern is around the time gap between pointing at the new routine and it storing the address of the next routine from the returned parameters? I’ve moaned in the past that there’s no way to read the current handler without claiming it. Releasing vectors – when a program removes a handler from a vector, there’s no (easy) way for it to check that all the cores have finished executing the handler code. What is the concern with the released handler running its course? Dumb question: Are the handlers core specific or do they cover all cores? I’ve not really looked into how ARMv5+ works at low-level when more than one core is active. Can anyone see any problems with the above, or have any other thoughts? FAR/FSR need to be included in the register dump? I’m all for abstracting the hardware vectors, but I would still like a (new if need be) method to take over the vectors ahead of the OS for ADFFS. It needs to sit directly on the SWI, Data Abort, IRQ and Undefined Instruction to work at any speed and avoid crashes. If/when RISCOS can handle multi-core, I’ll modify the code accordingly and spin out emulation threads/VM’s to the other cores.

May 3, 2021 7:59pm Jeffrey Lee (213) 6048 posts	I assume your concern is around the time gap between pointing at the new routine and it storing the address of the next routine from the returned parameters? Correct. The time window might be small, but I’d rather be safe than sorry. What is the concern with the released handler running its course? The program that installed the handler could free/overwrite the memory while another core/thread is still executing it. For multi-core code it’s impossible for a routine to maintain an accurate “is the routine running?” flag itself; it has to be external code which sets the flag, calls the routine, then clears the flag. Dumb question: Are the handlers core specific or do they cover all cores? I’ve not really looked into how ARMv5+ works at low-level when more than one core is active. The aim of the OS_ClaimProcessorVector changes will be to make all the cores use the same handlers. On a technical level, we can give different cores different handlers if we want, and you can use CP15 registers to change the address of the processor vectors on a per-core basis. FAR/FSR need to be included in the register dump? Yeah, now that I’ve thought about it some more it’s probably for the best to include them. If we’re going to end up with lots of abort handlers then there’ll be too many opportunities for recursive aborts to occur for each handler to be able to reliably save/restore the registers itself – so having the kernel capture them at the start of each abort and store them in the register dump will be the safest option.

May 3, 2021 8:57pm Steve Pampling (1551) 8272 posts	Stay where you were reading and open login in a new tab/window. Once you have logged in, go to the URL bar of where you were reading and press Enter to refetch the page. Do not click the refresh button as that will use the old, stale, cookie and you’ll end up logged out. :-/ Not here. Another tab – yes. The simply switch back to the previous tab and hit F5, or the loop arrow refresh

May 3, 2021 9:30pm Chris Mahoney (1684) 2177 posts	It depends on the browser. In Safari on my Mac, I can open the login page in a new tab then reload the old tab. In Edge on Windows, reloads trigger logouts.

May 4, 2021 12:12pm Alan Adams (2486) 1155 posts	Firefox, when used like that, will load the pages from cache (and hence be using stale cookies). I do this every day, with Firefox. On Windows, in case it’s different on other platforms. It works here.

May 4, 2021 6:14pm Jon Abbott (1421) 2661 posts	Correct. The time window might be small, but I’d rather be safe than sorry. I think you’re fighting a losing battle there. Without recoding every existing handler, OS_ClaimProcessorVector can never be atomic. Personally I wouldn’t try to fix it, just deprecate its use and replace it with a more core friendly version that’s atomic. I’d also give the option of the handler being core specific or all cores. The aim of the OS_ClaimProcessorVector changes will be to make all the cores use the same handlers. Existing code written for cooperative multi-tasking won’t know how to handle aborts from other cores.

May 4, 2021 6:35pm Jeffrey Lee (213) 6048 posts	Personally I wouldn’t try to fix it, just deprecate its use and replace it with a more core friendly version that’s atomic. That’s essentially what I’m proposing. The old API will still work as-is, but there’ll be an extra flag or new vector numbers to select the new API when making the call to register/deregister a handler. Existing code written for cooperative multi-tasking won’t know how to handle aborts from other cores. Sorry, I wasn’t quite clear there. Any handlers registered using the new API will be used across all the cores. Handlers registered using the old API will (probably) only be used with the primary core, which will (hopefully) be where any code which relies on those handlers is restricted to running.

May 4, 2021 6:51pm Steve Pampling (1551) 8272 posts	I do this every day, with Firefox. On Windows, in case it’s different on other platforms. It works here. Bear in mind that Rick is using an elderly (vintage even) version of Firefox on Android

May 4, 2021 8:20pm Rick Murray (539) 14047 posts	Bear in mind that Rick is using an elderly (vintage even) version of Firefox on Android 60.0.2, as in “before they broke it”. Put it like this, the forum sets cookies because I have whitelisted it. The Wiki here runs scripting because I have whitelisted it. Other sites can set cookies (too much stuff breaks without) but said cookies are automatically erased after 90 seconds even if the tab is open unless the site is whitelisted. Aggressive content blocking (including any media over 256K without being whitelisted). It’s actually quite pleasant using the web on my phone. When I use Chrome (FlightRadar24 doesn’t load properly on mobile Firefox) or the times when I used to use Safari on the iPad, it is like culture shock. What the hell is with all this advertising!? So many pop-ups. So much bull, no you didn’t detect 17 viruses on my phone, and no I’m not going to pay you €10 to remove them (and clicking any button will cause a text from Orange saying that the Internet+ payment failed because I haven’t set that up….thieving bastards!). The new improved Firefox is a massive step backwards. It’s no use to me. However, the cookie/Refresh issue. Maybe it’s an Android difference? Who knows…

May 4, 2021 9:17pm Charles Ferguson (8243) 438 posts	David Ruck said: I agree that any solution should be APCS compliant, as we want to be using HLL for everything now, and the fewer assembler shims the better. In my reading of this, I had pretty much assumed the implementation, other than a small amount of setup for the actual vector entry and SWI veneer would all be written in C from the outset. Your comment has made me question my expectations and, looking at the ROOL Kernel source, there doesn’t seem to be any C code. (facepalm) I think you know what I’m going to say, so I’ll just save myself a couple of hours of writing it.

Jun 1, 2021 5:46pm Stefan Fröhling (7826) 169 posts	That’s certainly tempting. RISC OS 5 currently lacks OS_AbortTrap, but there are some features on the roadmap which would probably require data/prefetch abort handling to become a lot more sophisticated than the current system. So if we don’t support claiming of the data/prefetch vectors in the new OS_ClaimProcessorVector, then that should help ensure that the new higher-level kernel APIs actually get developed instead of constantly being kicked further into the backlog. I cannot help here really with technical comments. But this comment from Jeffrey gave me an idea that the multi-threading module could give us a sidedoor entry to a new better kernel. About task control, task abortion and maybe debugging? In the future new applications could only use the multi-threading module/API that will provide better control over the applications?!

Jun 2, 2021 11:32am Jeffrey Lee (213) 6048 posts	That’s kind of what’s happening, yeah. For some things I’m improving the existing kernel (e.g. new OS_ClaimProcessorVector, implementing OS_AbortTrap). For other things I’m developing what’s essentially a new kernel (e.g. the thread/process management in the SMP module). Progress update on the OS_ClaimProcessorVector stuff: I’ve settled on an implementation where the register context contains all the registers from the handler mode (e.g. R0-R14 & SPSR, reflecting the state on entry to the handler). This is different to what I was originally aiming for (R0-R15 & PSR from the abort source), since I realised that would have made it impossible for the handlers to interact with the registers that the abort handler had modified. E.g. for a data abort from FIQ mode, the old interface would have given the handler the FIQ version of R8-R12, causing problems if the handler needed to access the non-FIQ versions of those registers (the kernel would have buried them somewhere on the ABT stack). There are still some problems if you want to use CMHG veneers as handlers (since the veneers force a switch to SVC mode, which would then prevent your handler from accessing the SVC registers that the veneer is touching), but that’s not my problem™. I’ve resurrected the code I wrote a decade ago (!) for rotated load/store emulation and expanded it into an OS_AbortTrap implementation. There’s still a lot of work to do (adding the VFP/NEON instructions, writing a whole load of tests, expanding the AbortTrap API to support LDREX/STREX & prefetch aborts, abortable DAs, seeing what can be done for the base-updated abort model, etc.), but it’ll get there eventually. The AbortTrap implementation should hopefully solve the problem of the long descriptor page table format lacking support for the “user read, svc read/write” access privilege (AP 1); the kernel can just map the memory as “user none, svc read/write” and AbortTrap will step in to handle any reads from usermode. There’d still be a problem if people are trying to execute code from such a region in user mode, but I’m hoping that’s rare enough that we can just say “not supported”! Another thing the AbortTrap implementation should be useful for is hardware watchpoint support, which requires software to manually work out which watchpoint was triggered by decoding the instruction. I’m not planning on implementing support for this right away, but it should bring us a lot closer to being able to provide a Debugger API for using them. It’s also possible the code could be used as a base for building a full CPU emulator, which would be useful for single-stepping in the debugger / hardware breakpoints, and maybe for AArch32 emulation on AArch64-only (or non-ARM?) CPUs. But that’s not likely to be something I’d tackle any time soon.

Jun 3, 2021 6:19pm Stefan Fröhling (7826) 169 posts	Well that sounds good so far! When do you expect something working for doing first test or beta status? What about enhanced memory protection / virus protection? For example I imagine that a “secure” RISC OS could only allow new application to be run by the multi-threading module and therefore be seperated from the existing base system so that they cannot corrupt the kernel or claim any security relevant vectors?

Jun 5, 2021 6:54am Jon Abbott (1421) 2661 posts	I feel there’s some major scope creep here Jeffrey. The original proposal was to fix the issues around claiming and releasing the vectors. They weren’t major issue to start with until multi-core is viable for all tasks and now you’re mentioning instruction emulation. What about enhanced memory protection / virus protection? For example I imagine that a “secure” RISC OS could only allow new application to be run by the multi-threading module and therefore be seperated from the existing base system so that they cannot corrupt the kernel or claim any security relevant vectors? This isn’t really related to the OP, but if RISCOS is to be updated for mainstream market use, backward compatibility needs to be dropped from the OS and a whole new programming model put in place that starts from the premise of task isolation, mutithreading and multicore. I’m not sure many are ready to make that leap of faith just yet as it would mean all existing apps would need to sit under an OS emulation layer or be dropped.

Jun 5, 2021 11:44am Jeffrey Lee (213) 6048 posts	When do you expect something working for doing first test or beta status? At the moment my plan is to develop & release things in the following order: OS_AbortTrap & abortable DAs in the main kernel SMP-safe OS_ClaimProcessorVector in my SMP branch of the kernel SMP-safe forks of FPEmulator & VFPSupport (this is where the requirement for a SMP-safe OS_ClaimProcessorVector came from) Resume work on thread support for ordinary C programs (which is where the SMP-safe FPEmulator & VFPSupport requirement came from) Resume stability testing & feature development of the SMP module/kernel Since it’s a inconvenient for me to maintain lots of forks of things, I’ll probably aim to get the SMP-safe FPEmulator & VFPSupport merged back into the main sources as soon as possible. This means the SMP-safe OS_ClaimProcessorVector will probably get merged in at the same time (there’s not much point merging it in sooner, since as Jon says, the flaws with the current API only seriously affect SMP code). I’m using this merge request to provide a list of all the SMP related forks and the current state of the project, so keep an eye on that for all the alpha/beta quality changes. In terms of timelines, I’ll hopefully have OS_AbortTrap & OS_ClaimProcessorVector released sometime this month. Then probably July for FPEmulator & VFPSupport, and maybe August for when there’s initial C11 thread & atomics support in the Shared C Library. However other things have been eating in to my spare time a lot recently, so don’t be surprised if the dates slip a month or two. I feel there’s some major scope creep here Jeffrey. Not from the perspective of my todo list – there’s multicore (which wants a SMP-safe OS_ClaimProcessorVector), hardware watchpoints (which wants load/store instruction decoding & emulation), hardware breakpoints / improving the Debugger’s woefully inadequate single-stepper (which wants full instruction set decoding & emulation), long descriptor page table support (which wants load/store instruction decoding & emulation), 64-bit future-proofing (which wants high-level APIs instead of low-level ones, full instruction set decoding & emulation for devices which lack AArch32 support, and most code to be written in high-level languages), and some GraphicsV/video improvements (which want either the ability to track page reads/writes, or load/store instruction decoding & emulation). Implementing OS_AbortTrap (and abortable DAs) does add a lot of extra work to the original task of “fix OS_ClaimProcessorVector”, but when looking at the wider picture it makes sense, especially since it could allow some third-party software to migrate away from OS_ClaimProcessorVector on to a higher-level API which is easier to work with and more future-proof.