Norcroft 64
Colin Ferris (399) 1814 posts |
Looks like the Norcroft C compiler can output ARM 64bit code :-) |
Rick Murray (539) 13840 posts |
Brilliant! Now we just need to tweak the MakeFiles and RISC OS will be 64 bit… …oh, wait. |
Stuart Swales (8827) 1357 posts |
“This was once revealed to me in a dream” |
Cameron Cawley (3514) 157 posts |
I’m guessing that the main focus is to allow AArch64 applications to run within AArch32 RISC OS on existing platforms like the Pi 4. That feels like a more achievable goal (especially if we stick to ILP32 so that we don’t need as many API changes), while still laying foundations for Cortex-A76 support later on. That said, I don’t have any insight into ROOL’s plans (and the Icon Bar article is a bit light on details), but it’s good to see that progress is being made. |
Simon Willcocks (1499) 513 posts |
That’s not how it works; a 32-bit core can’t run 64-bit code. Updating Norcroft seems like a waste of effort, to me, given gcc. |
Steffen Huber (91) 1953 posts |
Probably the last mover amongst the currently available C compilers to add a Aarch64 backend. |
Michael Grunditz (8594) 259 posts |
@Simon But a 64-bit core can.. just send off your norcroft compiled 64-bit program to a 64-bit running core. Perfectly doable,, but not very useful as it is now. Requires a bit of infrastructure. |
Cameron Cawley (3514) 157 posts |
Fair enough. Could you elaborate a bit on how interactions between 32-bit and 64-bit mode works? |
Simon Willcocks (1499) 513 posts |
OK, so basically (it’s been a few years since I played with it), 64-bit EL2 can manage 32-bit code by giving it a memory space that looks to the 32-bit code like physical memory. The 32-bit code can manage that memory as if it were real. Interrupts, aborts, etc. come up to the 64-bit code that can pass them back down to 32-bit code. What can potentially be done is have 64-bit code request services (like Wimp_Poll) from 32-bit code. At the moment, I’m hoping that we can transition to processor agnostic communication mechanisms (pipes and task queues) that allow for multiple cores co-operating with 32 and 64 bit code. |
Paolo Fabio Zaino (28) 1882 posts |
AFAIR, a core can only change state at reset (at least on ARMv8 and v9). However, for the famous AArch64 in kernel mode and AArch32 in user mode, things have a different rule: “an AArch64 entity can host AArch32 children, but not the other way round” This means that an AArch64 hypevisor can host an AArch32 VM, but an AArch32 hypervisor cannot host an AArch64 VM. Note the terms hypervisor and VMs (so EL2 and EL1/EL0). AFAIU, this is how RISC OS on Linux should make it work on the Pi 5 to run RISC OS in user-mode AArch32, while Linux runs in privileged mode AArch64. However, as mentioned by Mciahel, you can send the AArch64 code to a core that IS in AArch64 mode (set in such a mode at boot or reset). However, as also pointed out by Michael, that code will probably need a kernel or something too (to do something useful). So, what could possibly work in such a situation is a similar approach as it was used on Hydra, where minimal kernels where used on the extra CPUs. But that would only allow execution of things like math processing etc. So, still limited. HTH [edit] |
Michael Grunditz (8594) 259 posts |
You can PCIRamAlloc memory and use some of it as a workspace shared with riscos and the rest with 64bit code. That scenario could easily work for a game. allthough with this simple setup game needs to poll input from its workspace. |
Rick Murray (539) 13840 posts |
<shrug> Every RISC OS machine that isn’t a dinosaur has three 32 bit cores doing exactly nothing, the compiler screws everybody with the emulated FP rather than the onboard FP that’s been around on RISC OS hardware for a decade and a half, no SIMD without resorting to assembler… |
Colin Ferris (399) 1814 posts |
Didn’t Stuart (the one who spills tea when I mention 512Mb for RPC) Do a hardware FPU Library for C users? |
Stuart Swales (8827) 1357 posts |
There are quite useful gains to be had by using said library with Norcroft, but that’s only a halfway house. A compiler with true VFP support would knock it out of the park for properly FP intensive work. Some authors might be willing to support two builds of applications – one just for ARMv7 w/VFP and one for ARMv3. I probably would fall into that camp. |
Rick Murray (539) 13840 posts |
As Stuart says, proper VFP support would be the way forward. |
Colin Ferris (399) 1814 posts |
I don’t know how Stuart has worked it – but some C progs load like a module at startup – I wonder if that system could be used here. |
Stuart Swales (8827) 1357 posts |
I haven’t gone to that extreme. If the system has VFP, then it will be used, otherwise it falls back to using the equivalent FPA instructions in most cases. [See https://www.riscosopen.org/forum/forums/2/topics/3457?page=12#posts-146247] |
Rick Murray (539) 13840 posts |
Ideally Norcroft would have the following options:
Question is, how to you handle the data? Annoyingly the byte order of the words is back to front (VFP vs FPA). |
Stuart Swales (8827) 1357 posts |
It’s the word order that’s different in VFP (and in most other IEEE754 double implementations)
This would produce bonkers code if done natively by the compiler; you’d really need an implementation like mine. |
Rick Murray (539) 13840 posts |
Hmm, I knew something was the other way around…
Not necessarily.
The choose option will always be slower (at least a branch 1 followed by a load 2 followed by a compare 3 followed by the op) but it’ll be far quicker than assuming FPA and it’ll work on old and new systems alike by just “doing the useful thing”. Yes, it would be lead to crazy code if the compiler tried to sort it out for itself every time. So it shouldn’t. In essence, it’s like using your library, but doing so while writing normal C code… 1 To the library. 2 The “which FP to use” variable. 3 To choose what to do, one would expect to then branch to FPA and fall through to VFP for speed. |
Graeme (8815) 106 posts |
You can use two VLDR/VSTR instructions with 32-bit (S) registers to store in the reverse order because D0-D15 are the same registers as S0-S31. |
Stuart Swales (8827) 1357 posts |
Helpfully in the -apcs /softfp world (as indeed with FPA since a long time) floating function parameters (and return value) are passed in ARM registers, so all you do is twizzle during double loading/saving. VMOV d0, r1, r0 ; x in {r0,r1}, FPA ordered VMOV d1, r3, r2 ; y in {r2,r3}, FPA ordered ... VMOV r1, r0, d0 ; result in {r0,r1}, FPA ordered |
David J. Ruck (33) 1635 posts |
Obviously you want to use the default word ordering for which ever FP system you are using at runtime for the best performance, however you do have a problem if the program either stores doubles as part of a file format or sends them over the network in binary. The saver/sender and loader/receiver could be using different word ordering, which will make life rather interesting as there are only a few values out of the 64 bit space which are not valid when the ordering is wrong, although it will be very obvious to the user. |
Rick Murray (539) 13840 posts |
Is it possible to get VFP to swap word order on loading FP values from memory 1, or must it be done via ARM registers? 1 Sounds like the sort of weird things that NEON can do. |
Martin Philips (9013) 48 posts |
Is there an ARM64 assembler, that uses the same syntax as the DDE assembler |