Safeguarding the past, present and future of RISC OS for everyone

News | Downloads | Bugs | Bounties | Forums | Library

Forums → Bugs →

OS_SynchroniseCodeArea speed issue

4 posts, 2 voices

Feb 18, 2019 1:02pm Jon Abbott (1421) 2651 posts	This example takes between 9 cs and 2 seconds on my Pi3 at an F12 prompt: `TIME=0:FOR A%=0 TO 100:SYS "OS_SynchroniseCodeAreas",0:NEXT:PRINT TIME` This example takes between 4.5 seconds and 62.5 seconds: `TIME=0:SYS "OS_SynchroniseCodeAreas",1,0,&FFFFFF00:PRINT TIME` Why does the CPU speed have such an effect on invalidating the cache on a Pi3? And why doesn’t OS_SynchroniseCodeAreas set the CPU speed to full prior to doing whatever it’s doing, that’s so CPU intensive.

Feb 18, 2019 1:45pm Jeffrey Lee (213) 6048 posts	I believe the CPU is already at full speed when you’re at the command line (unless it’s waiting for keyboard input). Besides, the difference between 9 cs and 2 seconds is a factor of ~20, and 4.5 seconds and 62.5 seconds is a factor of ~14. IIRC the default min & max speeds for a Pi 3 are 1200MHz and 600MHz – so unless you’ve set your min speed down to 60MHz (or have overclocked to 12GHz) there’s clearly more to this than just CPU speed. Assuming you’re on a recent OS version (July 8 2018 or newer), OS_SynchroniseCodeAreas for large areas, on a multi-core machine, will be slower than before because there’s no (trivial) SMP-friendly full data cache flush operation. On a single-core machine large clean operations can be simplified to a full d-cache flush followed by a full i-cache invalidate, but on multi-core machines it must walk the entire address range flushing each MVA from the d-cache in turn, followed by the full i-cache invalidate. To stop OS_SynchroniseCodeAreas 0 taking forever, it’s been modified so that on SMP machines it’ll only do a ranged clean of the RMA and application space (the two historic areas where code is most likely to appear). So execution speed will mostly depend on how large your RMA & appslot are. Additionally, on ARMv7 and newer, cache maintenance operations can trigger data aborts if they target unmapped memory. The kernel has a handler for this which will cause it to skip the instruction – but the cache maintenance loops are still pretty dumb so it’ll keep trying all the other cache lines within that page. So your second example, where you’re cleaning pretty much every address in the system, will be generating tens or hundreds of millions of aborts due to ~80% of the memory map being empty.

Feb 18, 2019 3:53pm Jon Abbott (1421) 2651 posts	I believe the CPU is already at full speed when you’re at the command line You can’t assume the CPU speed when you’re about to do something CPU intensive. From the tests I’ve done, the CPU speed appears to be either high or low when you F12 from the desktop and under a Task window, it fluctuates. IIRC the default min & max speeds for a Pi 3 are 1200MHz and 600MHz I have the low speed set to 100MHz, to fix the blanking issue. Assuming you’re on a recent OS version Build date is 15th Feb 2019. Daft question, but why is it worried about the other cores, when the OS is running on one? Shouldn’t it only invalidate the current core caches and any areas shared between cores? Should it not also be extended to have an explicit flag for flushing all cores? Any current occurrences of the SWI will be specific to the core the app/OS are running on.

Feb 18, 2019 5:14pm Jeffrey Lee (213) 6048 posts	You can’t assume the CPU speed when you’re about to do something CPU intensive. From the tests I’ve done, the CPU speed appears to be either high or low when you F12 from the desktop and under a Task window, it fluctuates. The Wimp should set the CPU back to a high speed when entering the command line – so that sounds like a bug. Daft question, but why is it worried about the other cores, when the OS is running on one? For OS_SynchroniseCodeAreas 0, I was too lazy/paranoid to make the logic change take effect once the other cores are started. Shouldn’t it only invalidate the current core caches and any areas shared between cores? With SMP, “any areas shared between cores” is typically almost every page in the system. You’ll want the RMA and other dynamic areas to be shared so that programs don’t have to worry about which core they’re running on before they try to access the code/data held there. And any multi-threaded app will want its application space to be shareable so that it gets full multi-core performance instead of being restricted to running on a single core at a time. For single-threaded apps it may be feasible to mark its wimpslot non-shareable, but that feels like an optimisation that we should only look into once we’ve got the basics up and running.

Reply

To post replies, please first log in.

Forums → Bugs →

Search forums

Social

Follow us on

and

ROOL Store

Buy RISC OS Open merchandise here, including SD cards for Raspberry Pi and more.

Donate! Why?

Help ROOL make things happen – please consider donating!

RISC OS IPR

RISC OS is an Open Source operating system owned by RISC OS Developments Ltd and licensed primarily under the Apache 2.0 license.

Description

Bug discussions that aren’t covered by the bugs database.

Voices

Options

Forums
Login

Contact Us | About Us

The RISC OS Open Beast theme is based on Beast's default layout
Site design © RISC OS Open Limited 2024 except where indicated

Hosted by Arachsys

Powered by Beast © 2006 Josh Goebel and Rick Olson
This site runs on Rails