RPi performance
Pages: 1 2
David Gee (1833) 268 posts |
Are there any stats that show the relative performance of the various marks of RPi (up to 3B+) on RISC OS? I’m aware that the newer Pis are faster, but much of this comes from having multiple cores which RO can’t use. |
Stuart Painting (5389) 714 posts |
Chris Hall has some RISC OS benchmarks at http://www.svrsig.org/images/Page36.htm which include most of the Pi models. |
David Feugey (2125) 2709 posts |
https://riscos.fr/utilisez.html |
Rick Murray (539) 13840 posts |
While multiple cores will certainly give a speed boost to capable systems, it’s worth noting that there have been increases in the clock speed of the ARM core and changes in architecture. Oh, and I think the RAM access has sped up along the way, which would make a difference. Certainly, for building RISC OS, my ARMv7 Pi2 is quite a bit nippier than the Pi1, more than a 200MHz difference in clock speed would imply. It’s not just that, it’s also ARM11 → Cortex-A7. The ARM11 wasn’t terribly fast. The Pi2 feels like twice as fast. It also feels faster than the Beagle-xM (Cortex-A8) despite being 100MHz slower in raw clock speed. But perhaps that is due to the ability to push off some of the video handling to the GPU to manage? The Pi3 clocks a mere 1.2GHz with an A53 processor, and the Pi4 has an A72 at 1.5GHz. |
Jeff Blyther (1856) 47 posts |
I’m probably hijacking this thread, but i’ve just got RO up and running on a pi4 and i’ve been suprised by the performance of it compared to the pi3b+ (an increased clock rate of 100MHz), as one of my ARM code programs is running at nearly double the speed. I can only assume that the out of order core is rearranging my very badly written code and turning it into something sensible. But some of the more sanely written code has only seen an increase of about 14%. |
George T. Greenfield (154) 748 posts |
Is that whilst running RISC OS? Over the years I’ve collated !Firebench results on the various machines I’ve owned or tested – as follows. !Firebench is a programme written by Michael Kubel to calculate 16,000 iterations of a 320 × 100 pixel fire, calculating the new pixels from the existing 8 surrounding pixels; in other words, a straight test of processor power. Results are: Iyo = 40.02 secs [36 secs on re-test] Pi1 [default] = 16.1 secs Pi2 [default] = 11.62 secs Pi3 [default] = 5.14 secs IGEP = 3.66 secs RPCEmu 0.9.2 [Win7, Intel i3, 2.6GHz] = 19.65 secs ‘Default’ means standard CPU/Core/RAM clock settings. |
Jeff Blyther (1856) 47 posts |
I just downloaded !Firebench, unfortunately it seems the size is now 512 × 256, not the 320 × 100 which your tests were done, but the iterations have been changed to compensate (16000 down to 4096) so the results should be sameish (probably a dangerous assumption!). Anyway I ran the test on 3 nearby pi’s and got the following results:- Running another program by Michael Kubel, Fixed point integer fractal, gave the following results :- |
Kuemmel (439) 384 posts |
Hi there…it’s me, the author (nickname Kuemmel)…yes, the pi4 was surprisingly slower for the FixFrac, don’t actually know why. It’s hard to find good cpu cycle tables from ARM that could give a hint. But you can check the versions for FPU and NEON, here it’s faster, of course also when you overclock it but also clock by clock. At some point I updated that !FireBench as it’s done too fast. It was written back in time on my StrongARM. Where FixFrac and the VFP/NEON Versions (check my website here for the latest versions) are true cpu math crunching benchmarks with no effect from memory subsystem, the !FireBench is more of a memory benchmark as it needs to shuffle a lot of data, but of course some adding/shifting is needed also to compute the pixels. If I have some time I’ll publish a version using NEON to do the fire that’s an order of magnitude faster :-) |
Jeff Blyther (1856) 47 posts |
Hi Kuemmel, Yes your VFP/NEON fractal versions do beat the pi3b+ A53. Maybe on the A72 we dont get shifts (LSR/LSL) for free anymore? (although that was probably the case after ARM3) |
Kuemmel (439) 384 posts |
Hi Jeff…it’s hard to tell, I wouldn’t expect that they did anything ‘bad’ regarding the shifts. Unfortunatelly I could never find cycle timings for the Cortex A53, while the A72 there’s very good data (Link). May be a bit of reordering instructions makes a difference ? (I don’t have my RPI4 yet set up, so I can’t test myself). You could try e.g. move “MOV R3,R3,LSR#(fl%)” at the two positions in the code between “SMULL R5,R6,R0,R0” and “SMULL R8,R2,R0,R1”. |
George T. Greenfield (154) 748 posts |
I stand corrected! But that would explain why the Pi1 (and RPCEmu) is twice as fast as the Iyonix, despite all three having very similar CPU performance as measured by RISCOSmark (Iyo 260%, Pi1 253%, RPCEmu 238% [baseline S/Arm RPC = 100%]; and why overclocking the Core and RAM rates on the Pi1 and 2 has quite a dramatic effect on !FireBench performance. |
Jeff Blyther (1856) 47 posts |
done the code shuffle, but the pi4 gave the same answer (I assume the re order buffer is optimising the order of code anyway), but it slowed down the pi3b+ :-( |
Kuemmel (439) 384 posts |
okay…that behaviour of the RPI3 is kind of weird…I always thought that its dual issue pipeline would benefit a bit also…the mysteries of modern cpu internals…meanwhile I reordered everything to the max (you can get it here). Does that help for the pi4 ? As you said I wouldn’t expect it, as the reorder buffer should do the same job. |
Jeff Blyther (1856) 47 posts |
As you guessed, the pi4 stays the same, but you are making my A53 slower! your original code runs the best! |
Kuemmel (439) 384 posts |
Meanwhile I had time to polish my Fire NEON code. Before I release it officially on my website, could you give it a test run on the RPI4 ? Here’s the link It’s actually not pixel perfect the same as the old FireBench. In the old one I calculated 8 sourrinding pixels, now, to make use of NEON parallel computing I go for a 3×3 pixel block. With the help of NEON long adds and especially the pretty VEXT command things speed up like hell. It’s roughly 3 times faster than the traditional ARM approach and much more easy to code, no unmasking of “fire” bytes at all :-) P.S.: Is there still no solution to that log in problem to this website ? I constantly have to delete all browers data (using Chrome on Windows 10). |
Jeff Blyther (1856) 47 posts |
Well thats interesting.. pi4 2.52 sec The A53 beats the A72 again! Just had a thought, could it be that the pipelines on the A72 are longer than the A53? so when the core is maxed out the shorter pipeline going to win. |
Jeff Blyther (1856) 47 posts |
While i’m logged on (using netserf on pi4, I cant log on using safari on my mac) I must say I’m most impressed with riscos on pi4 at such an early stage. Although i’ve only been running it for a day I cant seem to make the system go wrong using my pi as I use it in my work enviroment (spreadsheet + own progs), and it feels real nippy as well! |
David Pitt (3386) 1248 posts |
!FireBenchNeon Titanium 1.96s RPi4 2.38s RPi3B+ 2.17s |
Jeff Blyther (1856) 47 posts |
Phew… my pi4 is not dodgy then. |
Rick Murray (539) 13840 posts |
Am I the only person who thinks that, made full screen and left to run in a loop, that it would make a great screensaver? |
Kuemmel (439) 384 posts |
Hm, two ideas on the RPI4 beeing slower…is there still some cache setting not set or enabled or are your RPI4’s are having throttling problems as there’s may be no fan ? @Jeffrey: Hope you read that, may be you can say something on the cache setting as this benchmark uses lots of memory bandwith, or do you have other suspects ? @Rick: I coded something for full screen with that blur algorithm…I’ll publish it soon with the !FireBench Update. But somebody with more OS-coding talent has to convert it to a screen saver ;-) |
Jeff Blyther (1856) 47 posts |
The pi is not throttling, but I have found a problem with the pi4. So somewhere between these dates a change has been made thats not good for pi4’s. Update. |
Jeff Blyther (1856) 47 posts |
I just a little play around to see if I can see why my pi4 is slower on the new rom. But after doing a reset my benchmaking test are saying its not. So something in my setup is causing the problem, I will do some more investigating tonight. |
Chris Gransden (337) 1207 posts |
Do you have !CpuClock. You can use it to lock the CPU to the highest clock speed. Sometimes single tasking programs run with the lower clock speed of 600MHz instead of 1500MHz. |
Chris Johnson (125) 825 posts |
Yes, you need to set the low speed to the maximum. I used to have it so that you could never set the slow speed to the max possible speed, but after user request I removed that inhibition 8) |
Pages: 1 2