Neoverse N1 32 bit user mode CPSR bit 5 = 0
Pages: 1 2
Timothy Baldwin (184) 242 posts |
On a Neoverse N1 r3p1, as used in a Amazon Web Services m6g, running this GNU C test program:
in 32-bit user mode results in:
On ARMv3 to ARMv4 that value would indicate 26-bit user mode. And on CPUs that RISC OS runs on 32-bit mode would be indicated by bit 4 being set. This breaks programs that read CPSR to determine if they are running in 32-bit mode, such as the SWI exit code in TerritoryManager and many other modules, which crash with an undefined instruction exception:
|
Jeffrey Lee (213) 6048 posts |
This behaviour is documented in the ARMv8 ARM, in the description for MRS:
No obvious mention of that in the ARMv7 ARM, so I guess that’s a new behaviour we’ll have to watch out for. Of course in your case, you’re only running into a problem with SWI exit handlers because you’re running the code in user mode. Replacing the troublesome code sequences with macros (so the 26bit path can be removed in 32bit builds) would be a good solution, although I’m not sure if we have any ready-to-go macros for “flag-preserving SWI exit on 26bit systems, flag-destroying SWI exit on 32bit systems” |
Colin Ferris (399) 1831 posts |
Does this mean that – MRS r0,CPSR now doesn’t work in User mode? |
Timothy Baldwin (184) 242 posts |
In part A (Application Level Architecture) section A2.4 of the ARMv7 ARM the register is called APSR and bits 15 to 0 are reserved. You have to look in part B System Level Architecture for a complete description of CPSR, section B1.3.3 states:
After fixing a few ROM modules, I can boot to the Desktop, build RISC OS, and run all the third party programs I’ve tried. RISCOSmark5 gives:
I suspect it is also to do with that Neoverse N1 has no privileged 32-bit modes. The Shared C Library stubs also use CPSR to detect 32-bitness which is then ignored by the Shared C Library.
For returning using LR there is RETURNVS and RETURNVC but not any other combination.
You can read DIT (data independent timing), SSBS (Speculative Store Bypass Safe), and the condition flags N, Z, C, V, Q, GE0, GE1, GE2, and GE3, . You can’t necessarily read the current mode, interrupt and asynchronous abort disable, nor the current endianness. |
Jeffrey Lee (213) 6048 posts |
Yeah, that makes sense. I can understand then wanting to remove the info, but it is a bit annoying that they didn’t at least hardwire the mode bits to USR32.
Hmm, so they do. I wonder how much other user mode software does the same thing. Thankfully we can just use |
Jeffrey Lee (213) 6048 posts |
Actually it is there – I was just looking at the original “DDI0406C” version instead of the more recent C.c, C.d, etc. errata versions. Although it does smack a bit of a copy & paste error, since the wording suggests that it returns an unknown value for all architectures described by the manual (ARMv4-ARMv7), which sounds like a big thing for them to have forgotten to document in previous manuals. |
Jeffrey Lee (213) 6048 posts |
Ugh, while updating the wiki I’ve realised that because the MRS technically returns an “unknown” value, it means there’s no longer any mode/CPU-agnostic way of determining whether you’re in a privileged CPU mode or not. I guess the best thing you could do is read the PSR, then try switching to a couple of different privileged modes, and see if any of the banked registers change, then restore the original PSR. Switching into the banked modes will be a no-op if you’re in user mode. There is a risk it’ll break if later AArch32 implementations start making use of the low 16 bits of the APSR for something, but at the moment it’s the best I can think of. |
André Timmermans (100) 656 posts |
Was not TEQ R0,R0 : TEQ PC,PC the official way ? |
Jeffrey Lee (213) 6048 posts |
Yes, my mistake. TEQ PC,#0 is a bit useless in this context (it’s not guaranteed to set any flags) |
Kuemmel (439) 384 posts |
Hi Timothy, interesting CPU ! Could you also run my good old 4 Fractal programs successfully, like back in time on that Cortex A72 machine ? How much MHz does that thingy run on ? |
Timothy Baldwin (184) 242 posts |
----------------------------------- 32 Bit Fixed Point Integer Fractal using 64-Bit SMULL by M.Kuebel ----------------------------------- Time taken in [s]............: 0.59 Total Iterations.............: 179217040 Million Iterations per second: 303.757 --------------------- NEON Single Precision Fractal by M.Kuebel --------------------- Time [s].....................: 0.16 Iterations...................: 177944574 Million Iterations per second: 1112.153 --------------------- VFP Double Precision Fractal by M.Kuebel --------------------- Time [s].....................: 0.58 Iterations...................: 177936814 Million Iterations per second: 306.787 --------------------- VFP Single Precision Fractal by M.Kuebel --------------------- Time [s].....................: 0.58 Iterations...................: 177944574 Million Iterations per second: 306.8
|
Jeffrey Lee (213) 6048 posts |
If you’re interested in more benchmarks, I’ve just uploaded a little benchmark (“supercalar”) for measuring performance of scalar/superscalar/out-of-order workloads. It should also give a rough measurement of the CPU speed (the “scalar” result). |
Kuemmel (439) 384 posts |
@Timothy: Thanks! The hell of a CPU, even clock by clock around 50% faster than a RPI4 ! |
David Feugey (2125) 2709 posts |
On the other hand, Pi4 is Cortex-A72 and we did have Cortex-A73 (+30%), A75 (32%), A76 (30%), now A77 (29%) and soon A78 (20%), with big changes in term of speed for each generation. The Snapdragon 865’s main processor is a Kryo 585 Prime @ 2.84GH (Cortex-A78). Kryo 485 did compete with Core i5: Kryo 585 should compete with Core i7. Nota: Graviton 2 single thread perfs are between latest Epyc and Xeon. |
David Feugey (2125) 2709 posts |
Back on topic, to have a solution for a “RISC OS on demand on the Cloud” would be great. |
Jan Rinze (235) 369 posts |
@Jeffrey Lee On an Nvidia Xavier AGX : scalar: 2585.3952 MIPS On my RPI4 in Linux RISC OS Port: scalar: 1222.2464 MIPS On RPI4 running RISC OS native: Now i am very curious how to get that super5 performance :-) (A hint here: set cwd to the folder with superscalar, Use F12 for commandline, type: superscalar { > benchresult } ) |
Kuemmel (439) 384 posts |
@Timothy: Don’t know if you read my benchmarking post regarding Risc OS vs Linux 64 in Aldershot section. If you still have access to that Neoverse cpu and got some time it would be interesting how my code (direct link here ) for RISC OS and Linux 64 performs on that, if you got Linux 64 running there by chance… EDIT: Meanwhile I also updated my single precision benchmark for RISC OS. It’s still experimental, but shows big gains for modern cores on RISC OS. I’ll write about it soon when I’m completely done, as it might be relevant for any kind of coding. You can find the 3 versions here Here it would interesting to see if the gains are also visible on that Neoverse under RISC OS running on Linux. |
Jan Rinze (235) 369 posts |
Running on Apple M1 in Linux RISCOS Port using qemu-arm: Since the benchmark is a tight loop the ‘JIT’ of QEMU will help tremendously. Depending on the type of code the results can vary quite a bit on QEMU. |
Jan Rinze (235) 369 posts |
Another interesting platform for RISC OS is the lenovo arm laptop X13s. The laptop comes default with Windows 11 for ARM. |
Clive Semmens (2335) 3276 posts |
Do they provide privileged modes in their 32-bit world though? |
André Timmermans (100) 656 posts |
No Arch32 in EL0 ring only, so user mode only. |
Kuemmel (439) 384 posts |
I recently signed up for Amazons Web Services to check out the Linux multi core performance of the latest Gravitron 3 CPU. It’s a Cortex X level CPU with 64 cores at 2.6 Ghz :-) I’ll post some benchmarks in Aldershot some day soon. You can also still try Gravitron 2 (Neoverse N1, 64 cores) and Gravitron 1 (Cortex A72 level, 16 cores). The cost depends on how much cores you launch, the Gravitron 3 “c7g” instance with 64 cores is about 2.3 dollar/hour. |
Clive Semmens (2335) 3276 posts |
That’s what I thought – which surely makes them useless for RISCOS? |
Timothy Baldwin (184) 242 posts |
Alas that’s not the only disappearing instruction. Also on the Neoverse N1 the instruction sequence below results in an undefined instruction exception:
That’s because LDM with CPSR writeback is an unconditional undefined instruction. It’s escaped my notice because FPEmulator is skipping any instruction with a false conditional. Perhaps FPEmulator should not. I will try patching out those instructions in an exception handler, and see if it increases performance. |
Jeffrey Lee (213) 6048 posts |
I guess we could expect any/all of the “system instructions” listed in the ARMv7 ARM to fail in a similar manner. (I would say the ARMv8 ARM, but that seems to have dropped the “system instructions” category, making them a bit harder to find)
Yeah, that could probably do with fixing! |
Pages: 1 2