Safeguarding the past, present and future of RISC OS for everyone

News | Downloads | Bugs | Bounties | Forums | Library

Forums → Aldershot →

More ARMv8 Linux adventures

5 posts, 3 voices

Jun 26, 2022 7:51am Kuemmel (439) 384 posts	Some may be remember my post here from last year when I ported my Mandelbrot NEON benchmark to Linux for single core. Meanwhile I managed to make it use multithreading with the PosixThreads library. Once you know how that works it’s kind a piece of cake. To max out speed on all cores the multithreading code assigns one Mandelbrot set line at a time to each available core. If one line of any core is finished it increments the global line counter and the next available one is chosen until the set is complete. This ensures that no core ever runs idle as each line might take a different time to calculate due to the iterative nature and especially in big/little cores. So the parallelisation reaches something like 99%. You can get the code and see some table of results here Some findings on the side: I got hold of a Firefly-RK3588S CortexA76/55 board. Results included. I guess something like that will be the next RPi5. That thing is really fast, also got a direct PCI slot for an M2 SSD and due to 8nm process still consumes quite low power. As I did the needed global/atomic variables update in assembler I found ARMv8.2 offers a memory add instruction…since when is ARM not a load-calc-store architecture any more ;-) !? So with ARMv8.2 you can do `LDADDAL X4,XZR,[X1]` …while with ARMv8 you have to `try_again_update_global_iteration_counter: LDAXR X9,[X1] ADD X6,X9,X4 STLXR W7,X6,[X1] CBNZ W7,try_again_update_global_iteration_counter` Though in my code case the speedup from ARMv8.2 variant isn’t there as I don’t use that very often during runtime. If anybody got an interesting ARM device running Linux where I don’t have results listed and can spare some time running the benchmark, would be nice to get in contact. Check the readme.txt for my email address.

Jun 26, 2022 11:28pm David J. Ruck (33) 1635 posts	since when isn’t ARM not load-calc-store architecture Sacrilege!

Jun 27, 2022 10:03am Rick Murray (539) 13840 posts	Hmmm, the x86 became more RISC (internally), and the ARM is becoming more CISC. The best option is probably a bit of both. In other words, the 6502 was right all along. ;)

Jun 27, 2022 11:46am Kuemmel (439) 384 posts	@Rick: I’d think so, too. The simple benefit of having a memory ADD or similar is that you need less register usage, especially when you are dealing with constants from memory. Back in time I thought: But a memory ADD would be slower in an inner loop routine, so try to put all those constants in registers beforehand as much as you can, but within x86 it’s not the case as far as I can tell for the newer generation of cpu cores since may be core duo or something. I wouldn’t know what in means from the standpoint of a cpu designer, but me as a programmer wants to have that in ARM, too.

Jun 27, 2022 12:25pm Rick Murray (539) 13840 posts	I think the issue is that if you do that sort of thing, the processor grinds to a halt as it needs to fetch/write values to memory. Unless this one is aimed at spinlocks and such, it ought to be preloaded by the many tricks (out of order/speculative execution, etc) which should reduce the impact of direct memory access. As for stacking up the registers, that’s why RISC has lots that are (excepting architecture things like R14 and calling protocol things like SP, FP, etc) completely unrestricted in use (unlike “this is a loop counter” and “this is where the results of calculations end up”).

Reply

To post replies, please first log in.

Forums → Aldershot →

Search forums

Social

Follow us on

and

ROOL Store

Buy RISC OS Open merchandise here, including SD cards for Raspberry Pi and more.

Donate! Why?

Help ROOL make things happen – please consider donating!

RISC OS IPR

RISC OS is an Open Source operating system owned by RISC OS Developments Ltd and licensed primarily under the Apache 2.0 license.

Description

Everything with nothing particularly or remotely to do with ROOL.

Voices

Options

Forums
Login

Contact Us | About Us

The RISC OS Open Beast theme is based on Beast's default layout
Site design © RISC OS Open Limited 2024 except where indicated

Hosted by Arachsys

Powered by Beast © 2006 Josh Goebel and Rick Olson
This site runs on Rails