OS_ClaimProcessorVector again
Rick Murray (539) 13805 posts |
Yup, I was being sarcastic. ;-) However, a modern PC can put in a pretty decent emulation of an ARM machine, plus with the ability to cross compile one isn’t stuck with things like the DDE and Zap/StrongEd but can use modern development tools (hmm, like CoPilot to steal the code for you!). I think nemo had, in the past, made his feelings about this clear. It is an option that people use, so even though it’s a bit of a drag with respect to modernity, I don’t think we’re in a position to ditch the RiscPC class just yet, as that’s our RISC-OS-on-something-else option. We need a better emulator 1. But “Who?” (GOTO 10) ;) 1 And woe is you if said emulator doesn’t run on Windows (back to XP), Linux (all forty seven distros), MacOS, and Sarah’s granny’s smart-toaster. And that one weird person that will want it for Minix. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Stuart Swales (8827) 1348 posts |
I think there are a number of devs who will spin up a RPCEmu instance to quickly recompile/test something rather than switching their ARM hardware on (or it may not even be to hand). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
If anyone’s wondering what happened so this, I implemented it on this branch of the kernel and then spent a while updating FPEmulator and VFPSupport to use it (and then further updates and testing to fulfil the goal of getting FPA+VFP working with the SMP module). The documentation is currently incomplete, but this post has a summary of the entry state for the handlers. My goal for the next few weeks (/months) is to get this tidied up and submitted. There are a couple of pre-requisites which need merging first, so it might take a while. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Paolo Fabio Zaino (28) 1851 posts |
@ Jeffrey
THANK YOU soooo much! I was hoping you would start implementing support for the VFP (and you know why), this is an AWESOME news! :D I know you’re busy, so my apologies for this request, but as in the past could I have a build of it please? I really want to port Genann to support your SMP work, but, as you know well, I am “locked” to use Floating point (this is normal in ML). Thanks in advance for your time, answer and work! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Jon Abbott (1421) 2641 posts |
Thanks for the update Jeffrey. A couple of questions:
Once the documentation is available, I’ll have a better idea of what (if any) changes I need to make to the Abort handlers in ADFFS and if I need to consider cutting support for earlier OS versions. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
Sure. There are a couple of tweaks I want to make, and then I’ll email you a new build.
There are a couple of pre-requisites which need finalising and merging first, so it’ll take at least a few weeks before the OS_ClaimProcessorVector changes go in.
Either today, or some time next week. I’ll post here once it’s available.
Yes, the new API works on all machines currently supported by RISC OS 5. The old API is still there and can still be used, so there’s no immediate requirement for software to switch over to it. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
David Pitt (3386) 1248 posts |
A Pi ROM build of the WIP: A useful SMP threading system came a bit unstuck. All of the seven SMP modules including SyncLib 0.05 were added to the current Dev beta and built with DDE30d. Kernel (Sources.Kernel)... amu -E install_rom INSTDIR=ADFS::ROOL.$.Rpi-SMP.BCM2835.RiscOS.Install.ROOL.BCM2835.RISC_OS COMPONENT=Kernel TARGET=Kernel ASFLAGS="-PD \"CMOS_Override SETS \\\"= FileLangCMOS,fsnumber_SDFS,CDROMFSCMOS,&C0\\\"\"" do mkdir -p bin SetEval KernelBase "4" + STR ( 227858432 + ( HALSize LEFT ( LEN HALSize - 1 ) ) * 1024 ) Do link -aif -base <KernelBase> -RW-base 0xff000000 -bin -d -o bin.Kernel_aif GetAll.o SEH.o support.o aborttrap.o atarm.o atcontext.o atinstr.o aterrors.o atmem.o C:SyncLib.o.SyncLib ARM Linker: (Warning) Attribute conflict between AREA SEH(C$$code) and image code. ARM Linker: (attribute difference = {NO_SW_STACK_CHECK}). ARM Linker: (Error) Relocated value too big for instruction sequence. ARM Linker: (at 0x24 in barrier(Asm$$Code): offset/value = 0x2fb4b58 bytes) ARM Linker: (Error) Relocated value too big for instruction sequence. ARM Linker: (at 0x28 in barrier(Asm$$Code): offset/value = 0x2fb4b58 bytes) ARM Linker: (Error) Relocated value too big for instruction sequence. ARM Linker: (at 0x30 in barrier(Asm$$Code): offset/value = 0x2fb4b4c bytes) ARM Linker: (Error) Relocated value too big for instruction sequence. ARM Linker: (at 0x34 in barrier(Asm$$Code): offset/value = 0x2fb4b4c bytes) ARM Linker: garbage output file bin.Kernel_aif removed ARM Linker: finished, 5 informational, 1 warning and 4 error messages. AMU: *** exit (1) *** It is work in progress but I report this in case it is unexpected. Oddly, or not, a Titanium build of the same did complete and did run but the SMP module reported -1 cores. This may be because the WIP has not got as far as the Titanium. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
Thanks for the info. I’ve just pushed a fork of SyncLib which should fix the issue (I’ve been using that fork for a while, I just didn’t realise that it might have been necessary for things to build correctly) The Titanium and PineA64 HALs need updating to support the SMP module, but the other components should all work (if not, then it’s a bug). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
David Pitt (3386) 1248 posts |
Many thanks it has fixed the issue. The Pi’s four cores can now be seen. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
Documentation: OS_ClaimProcessorVectorIn: r0 = vector and flags bit meaning 0-6 vector number 0 = 'Branch through 0' vector 1 = Undefined instruction 2 = SWI 3 = Prefetch abort 4 = Data abort 5 = Address exception (only on ARM 2 & 3) 6 = IRQ 7+ = reserved for future use 7 0 = no flags in bits 9-31 1 = flags in bits 9-31 8 0 = release 1 = claim 9 0 = old API 1 = new API 10-31 reserved (set to 0) Old API: In: r1 = replacement value r2 = value which should currently be on vector (only needed for release) Out: r1 = value which has been replaced (only returned on claim) New API: In: r1 = handler address r2 = handler R12 value Out: All regs preserved Old versions of the kernel ignored bits 9-31 of R0. To provide room for future expansion, bit 7 is now interpreted as a flag to say that bits 9-31 are present. On old kernels setting this bit will cause the OS_ClaimProcessorVector call to fail (the kernel will think you’re trying to claim/release an invalid vector number). This means that in order to select the new API, bits 7 and 9 must both be set. Old APIThe old API can be used for vector numbers 0-6 inclusive. On entry, the register state is the same as when the CPU took the exception, except that CPSR_fs may have been corrupted by a previous handler in the chain. To claim the exception, the appropriate “exception return” operation should be performed in order to return to the foreground, with whatever register values are deemed necessary. To pass on the call to the next handler, the handler must preserve all registers (except CPSR_fs) and branch to the address of the previous handler (which was returned in R1 by the OS_ClaimProcessorVector claim call). On future SMP versions of RISC OS, it’s expected that old-API handlers will only be called for code running on the primary core. New APIThe new API can only be used for vector numbers 1-4 and 6. All the old handlers will be called before any of the new handlers. All vector types follow the same pattern for entry/exit: Entry: R0 = pointer to vector-specific register dump R12 = "handler R12 value" that was passed to OS_ClaimProcessorVector R13 = full-descending stack R14 = return address Processor is in the relevant exception handler CPU mode IRQ+FIQ state is unchanged from exception entry Exit: R0 = Handler result: 0 = Claim the exception 1 = Pass on to next handler Other values are interpreted as an error block pointer R1-R12, R14, CPSR_sf can be corrupted The CP15 registers which hold the exception state can be corrupted (i.e. by triggering a recursive exception) Handlers must pass control back to the kernel by returning to the return address in R14. When claiming exceptions, handlers can (and usually must) update the R0-R14 & SPSR values in the register dump. It’s these values that the kernel will restore when returning from the exception. R0-R13 will be the new R0-R13 values for the exception handler mode, the SPSR will be restored to the CPSR, and execution will continue from the R14 value. The other registers stored in the register dump (e.g. CP15 DFAR) are not restored. If a handler returns an non-serious error (bit 31 of error number not set), behaviour is as if the handler passed on the exception to the next handler in the list. However it’s possible that future versions of the kernel, or special builds, will log this error somewhere for diagnostics. If a handler returns a serious error (bit 31 of error number set), this will cause the kernel to a raise the error. The exact behaviour depends on the exception type. On future SMP versions of RISC OS, new-API handlers will be called for exceptions that occur on any CPU core, not just the primary core. Undefined instruction handler
SWI handler
Prefetch abort handler
Data abort handler
IRQs
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Timothy Baldwin (184) 242 posts |
The implementation requires interrupts to be disabled on exit from the handler to protect SPSR_svc from modification by interrupt handlers but the documentation does not state this. I propose the following to abstract away the processor modes:
A compromise between the original faulting mode register and the current interface would be to store non-FIQ R0 to R12, R13 to R15 from the faulting code, and R13 form the exception handling mode (or just define or export a stack frame size?) However that would complicate the SVC and IRQ handlers needlessly. The problem of wrapping SWI calls needs work, perhaps by endorsing relocating the stack to introduce an additional stack frame, this is complicated by the presence of the exception handler block. Perhaps a routine to do that should be exported. I am however concerned that this approach is slow, with a lock on every SWI instruction (and said lock implemented with a lock instead of using LDREX and STREX directly). I’ve measured RISC OS taking 251906 SWI calls to boot to the desktop with unmodified HardDisc4, taking about 0.25 seconds with a Neoverse N1 in Graviton 2. That is 1,000,000 SWI calls per second, compiling RISC OS is about 300,000 SWI calls per second. I’ve bench-marked contended atomic increments at 20 nanoseconds using ARMv8 32-bit instructions and 38 nanoseconds using ARMv7 instructions, so that wouls suggest a 4% slowdown on a load as similarly SWI call heavy as booting. I tried adding an atomic counter increment into the SWI handler:
With that change I timed the “export_libs”, “resources” and “rom” phases of an IOMD ROM build across 4 cores running on RISC OS on Linux 6.1.1 on an AWS c6g.xlarge, using the time command of GNU bash. With the atomic counter private for each cores times were:
With the atomic counter shared:
That is a 2% slow down (user cpu time), just from the inter-core latency. I’ve uploaded these changes to the Atomic-SWI-Test tag of my git repository. This performance loss can be removed by removing the mutex and instead providing synchronisation by OS_ClaimProcessorVector using an inter-processor interrupt to execute a memory barrier on all cores, which would synchronise with vector dispatch due to the fact that interrupts are disabled whilst it is running. To be continued… |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
Documentation error – thanks. I was meant to say that CPSR_sf can be corrupted, but wrote _cf instead.
That’s something I hadn’t considered doing. I guess it makes sense, to allow easier use of higher-level languages, or to abstract over the differences between whether the AArch32 exception is handled in AArch32 or AArch64 (or some other architecture for emulators). But obviously any register access via a function is going to be slower than direct access ;-)
For single-core machines that could be solved by disabling the spinlocks (which would also get rid of the exception handler block) For the multicore work I’ve been doing, optimisation is still very much in the “I’ll do it later” category, mainly because I haven’t yet hit the milestone of regular C apps being able use threads via a standard API (e.g. C11 threads). (Last time I was working on it, I ran into a roadblock with the way I was handling OS_CallBack/etc. when the primary core is in the idle thread). But once that’s done, and I’ve got easy access to a reliable high-resolution time source (soon!) I’ll be in a much better position to start identifying and fixing performance issues or other faults, and getting the code to a mergeable state. At the moment one of the big problems is that half the code is in the SMP module which does some nasty things to hook itself into the running OS/kernel, so there are limits to what can be done with that implementation (e.g. the kernel doesn’t know that the other cores exist, so there’s no mechanism for it to send an interrupt or message to them) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Simon Willcocks (1499) 509 posts |
How about we don’t let user programs just take over the whole processor? Jeez. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Rick Murray (539) 13805 posts |
Well, that kind of is the entire situation. Arthur evolved from the BBC MOS, and RISC OS evolved from Arthur, but in its heart it’s extremely similar. A single process, single context operating system. As Charles says in the other thread, pretty much all of the ways of starting an application boil down to variations of ways of calling FSControl 4 to get FileSwitch to actually start up a program, the commands just have different behaviours for desktop/non-desktop use, or are legacy things (does anybody use Couple that with the infamous SWI Take all of the above and throw in that the system happily runs unvetted third party add-ons (modules) with most of the calls happening in a privileged mode, and you’ll see that the situation isn’t fixable. It would take a ground up rewrite and redesign to resolve the massive architectural problems. This is why I am very against the idea of any potential rewrite 2 maintaining anything that resembles the current API. The current API is functionally broken. It’s something that was acceptable at the end of the 80s home computer era, not something applicable to today’s connected world. Recreating the API will imply recreating the mistakes of the past. Anyway, as Jeffrey says below, 1 Thinking about it, *Go doesn’t actually start a program or process, it just passes control to something loaded into memory accessible by the current program. It’s basically equivalent to CALL. 2 Not that I believe, for one moment, that such a thing will ever actually happen. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
OS_ClaimProcessorVector isn’t that bad, AFAIK. Obviously yes it could be used to take over the system, but in terms of actual use (within the OS) there are only a handful of things that use it – possibly just FPEmulator & VFPSupport (to hook onto the undefined instruction vector) and the PCI module (to trap potential data aborts while scanning the bus). And in the wild I can’t imagine that much user software uses it. If you’ve got a version of the OS that can work out whether a SWI is coming from a user program or a trusted OS component, then you could probably quite easily lock OS_ClaimProcessorVector away so that only trusted code can use it, without breaking anything important (except the BASIC test code I’ve written) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Rick Murray (539) 13805 posts |
Define “trusted OS component”? Or, rather, define “trusted OS component when EnterOS exists and one presumably would like to retain the ability to load updated/enhanced modules” 1. 1 So restricting the test to “code in a ROM address” can’t be used as it prevents updated modules being used. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ronald (387) 195 posts |
The Wimp uses smoke and mirrors you are using both Wimp and Desktop terms, does one equal the other. Does one or the other or both do time sharing. Sometimes at the start of documentation things like that are stated. and then it is known what the reference is throughout. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Rick Murray (539) 13805 posts |
The Wimp is the window manager that provides the desktop environment. When I use the term “desktop”, I’m referring to it conceptually. The part of RISC OS with windows and menus and multiple applications running at the same time. This, of course, isn’t helped by there being a module called “Desktop” (that starts up the environment, pops up the banner (in the old days), and gets the autobooted ROM apps going). But since that’s just a startup kind of thing, one can generally say that referring to “the desktop” means the multitasking user interface as a whole.
Not a valid comparison. The desktop (high level, metaphor) works via the Wimp (low level, SWI calls), so they’re different views of the same thing. Oh, and it’s the Wimp that handles switching tasks.
PRM book 3, first section “The desktop”, first chapter “The Window Manager”, first sentence: This chapter describes the Window Manager. It provides the facilities you need to write applications that work in the Desktop windowing environment that RISC OS provides. :-) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Jon Abbott (1421) 2641 posts |
If one of the goals is to lock down OS_ClaimProcessorVector to trusted code, we’re going to need code-signing. There would also need to be a way to allow untrusted code for developers and people transitioning. As you say, there’s probably not much 3rd party software that relies on taking over the hardware vectors. I just had a look at ADFFS – it takes over all, except PreFetch and Address Exception when 26bit is running. I suspect it’s mostly going to be debuggers and instruction emulation that would be impacted or could make use of any changes. ADFFS couldn’t make use of the new method as its too late in the call chain. ADFFS acts as a Hypervisor and needs to be first in the chain in front of FPEmulator and OS. Provided there’s a method for trusted-code on the old method, I don’t see any issues. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
David J. Ruck (33) 1629 posts |
My !SWIstat SWI call monitoring application has to hook in to to SWIV using OS_ClaimProcessorVector, I’d like to keep that working. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Jeffrey Lee (213) 6048 posts |
For clarity: I’m not planning on locking down OS_ClaimProcessorVector. I just mentioned locking it down as it’s something Simon could opt to do in his reimplementation of the kernel. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Timothy Baldwin (184) 242 posts |
I don’t see any invocation of ISB before calling the vector routines which is needed between synchronising the caches and executing new code, OS_SynchroniseCodeAreas does this for processor on which runs, but it is needed on every processor.
Well that’s one reason OS_ClaimProcessorVector to define a user mode only behaviour, but the one I’m most interested in my Linux Port which runs in user mode. Also the existence of OS_EnterOS/Os_LeaveOS does make things more complicated… The follow discussion applies to undefined instruction, data abort and prefetch abort handlers only. I intend to mostly remove my para-virtualization of CPU modes, leaving only the switching of user /supervisor R13 and R14. This means the aborted code’s R13 and R14 needs to go in the structure. Placing them after R12 seems logical, and R0 to R15 then CPSR format is used Linux, BSD, Windows and in parts of RISC OS. This makes the native return sequence :
This almost matches the Linux ABI, alas Linux puts IFSR / DFSR below R0, which rather messes up out layout. In my opinion it is better to abstract the remaining differences try to make the structure the same by providing routines to access it.
Similar could be done to save and restore the aborting mode r13 and r14. On a RISC OS port that runs all AArch32 code in user mode, these 4 routines would do nothing. Some things that are implied, but should be made explicit:
Is there any problem with any exception handler entered in user mode being prohibited from switching stacks by calling OS_LeaveOS or otherwise? |