ARM generic timer
Cameron Cawley (3514) 157 posts |
Looks good, and I’ll have a go at adding support for this in SDL once this is merged. A couple of comments, though: - Am I correct in assuming that OS_TimerControl is only useful in modules and single-tasking programs, or can I use it in Wimp applications as well? |
Jeffrey Lee (213) 6048 posts |
Correct – it has the same behaviour as OS_CallAfter/OS_CallEvery.
What type of functionality are you interested in? A version of Wimp_PollIdle which uses the 64 bit timer would be pretty straightforward, but if you want more advanced stuff (calling a specific routine, or waking up your task without waiting for the current task to yield) then that’d be trickier.
Yes, that’s a good idea, and one I was considering (along with an API to allow the hardware timer to be read at its native frequency). But on Cortex-A9 the timer frequency is affected by the CPU clock speed. The kernel can adjust to that fairly easily (since it already has scaling code to convert to microseconds, and to allow the RTC module to fine-tune the OS’s clock rate), but for user software which is expecting a fixed frequency it’d be a bit awkward. Maybe the raw APIs simply won’t be available there, or maybe the kernel/HAL could find the greatest common divisor of all the possible clock rates and claim that the timer runs at that rate. |
Stuart Swales (8827) 1357 posts |
I like the idea of presenting a uniform ns-resolution interface, with a QueryPerformanceFrequency-like interface that could be used if needed to gauge the timer granularity on the platform a program is being run on, rather than as a simple divisor. |
Sprow (202) 1158 posts |
I maintain that having the CallAfter/CallEvery type stuff on a similarly named SWI would be better, if only that they then appear together alphabetically in the StrongHelp manuals. Maybe lump the new registrations into OS_CallTime64 (name TBD)? You wouldn’t need a shadow of OS_RemoteTickerEvent since that already figures out which list to delete from based on the code address/R12 so could search the OS_CallTime64 list too. That leaves OS_TimerControl as dealing only with things which control the timer configuration. My earlier 0xWIDE suggestion didn’t stick for OS_CallAfter and OS_CallEvery, I guess because having looked at the kernel implementation it doesn’t do any sanity checking on the code address, so it would silently succeed on an old kernel…then explode when the timer elapsed (hmm, the OS should probably do at least a word aligned check on R1!), so scrap that. Hows about using 0xWIDE as a new input arg for OS_ReadMonotonicTime to save a SWI entry? Another way of looking at it is: the last core SWI 0×7F was added in 2002, and we have 32 spare SWI slots left, using 2 of those would mean at that rate we’ll run out in 304 years. If truly desperate I think it’d be safe to recycle the slot for OS_VIDCDivider since that was only ever on STBs. So my options would be
Nanoseconds to match |
Rick Murray (539) 13839 posts |
This. If a programmer asks to be called in 4.7µS, whatever that is in ns, then the OS will round that up if necessary depending on the actual resolution of the timer. The programmer doesn’t need to do the maths themselves (implementation issues should be an OS problem, not an application one).
Agreed. Because it helps to separate the timer control from the timer use, and it also makes it clear that such a facility exists rather than being buried within a SWI whose name isn’t immediately evident as providing such a function.
Does UpdateMEMC do anything these days, or was it quietly retired once machines no longer had an MEMC? Pretty much the only user use that I’m aware of was the infamous UpdateMEMC 64, 64 to speed things up, a trick that only worked on RISC OS 2. Anyway, I like this one. It saves a SWI.
But I wonder if the magic word should be something like “NANO” or something? “WIDE” is, I believe, already used by some palette SWIs. TimerControl could also have a reason code to return info on the platform timers, rather than using OS_Hardware to ask the HAL. |
Martin Avison (27) 1494 posts |
I have used Druck’s TimerMod for years in a couple of modules to provide a microsecond timer, as it provides a SWI Timer_Value which returns monotonic time in seconds (in r0) and microseconds in (r1). Hence my interest in your proposed OS_ReadMonotonicTime64. While I admit I have never measured the time taken by Timer_Value, I was suprised to see that OS_ReadMonotonicTime64 takes nearly 5 times as long as OS_ReadMonotonicTime – although still only 696ns! I would be interested to hear any explanation of the increase. |
Rick Murray (539) 13839 posts |
Off the top of my head having not looked at any code, I would imagine the older SWI just grabs and returns the value of MetroGnome that’s stored in page zero (wherever that happens to be) while the newer 64 bit SWI might actually have to go and ask the HAL, plus potentially fiddle the result so it can return a consistent value. Just a guess… |
Steve Drain (222) 1620 posts |
+1 I found that way of returning the time very easy to work with, although it is not a true 64-bit value. |
Martin Avison (27) 1494 posts |
If so, there must be a reason for not creating a MetroGnome64 !? |
Stuart Swales (8827) 1357 posts |
At present, MetroGnome is simply updated on the 100Hz clock. It wouldn’t be much use to have a MetroGnome64 updated at that same rate :-). Hence the need to actually interact with the hardware timer on each call (presumably, I haven’t taken a dive into Jeffrey’s code). |
Jeffrey Lee (213) 6048 posts |
It’s still implemented, but the only thing it really does is blank/unblank the screen in response to the “video/cursor DMA enable” bit changing (and when reading the register, the kernel will make sure that bit reflects the blank/unblank state).
Rick’s guess is correct; all that OS_ReadMonotonicTime does is load a word from kernel workspace. So about 140 of those 147ns will be the base overhead of SWI dispatch/return and my test loop. Meanwhile OS_ReadMonotonicTime64 needs to call through to the HAL to read the value from the hardware timer (which for OMAP also requires expanding the timer value from 32 to 64 bits), and then the kernel has to rescale it to microseconds (a couple of long multiplies, shifts, and additions). There’ll also be some extra overheads because the code is spread over a few functions instead of being in one big linear block. Maybe once I’ve got all the platforms working I’ll take a closer look at the performance (especially on IOMD) and see what improvements can be made. |
Cameron Cawley (3514) 157 posts |
In that case, I think it might be best if I stick to the current approach of using pthreads for the timer instead.
Would it be an OK compromise to only scale the timer values on the Cortex-A9? In addition to the benefits I’ve already mentioned, it would be necessary for calling code to rescale the values anyway if OS_ReadMonotonicTime64 is used to implement SDL_GetPerformanceFrequency and SDL_GetPerformanceFrequency, so it would be nice to avoid extra rescaling on lower-end hardware if it causes performance overhead. One additional question: Would it be possible for OS_ReadMonotonicTime64 to be provided via the CallASWI module for older versions of RISC OS? |
Jeffrey Lee (213) 6048 posts |
I’ve been experimenting with the Cortex-A9 timer today. The timer has a built-in clock divider, which means you can basically program the clock divider in the timer with a value that’s equivalent to the clock multiplier that’s being used to generate the CPU clock, so that the timer rate can be adjusted to match the new CPU speed. But using that approach still results in some cycles being lost/gained because there’ll always be a delay between the code updating the CPU clock and the code updating the timer divider. With a test program that constantly flips between high and low CPU speed, the best I can get the error down to is the timer running about 5% fast. So I think I’m going to go with the approach that FreeBSD uses – it uses one of the normal timers as the main time source (I can just use similar code to OMAP3 to extend a timer to 64 bits), and only uses the A9 timer for timed interrupts. That way it doesn’t matter too much if the A9 timer has a variable frequency, or gains or loses ticks every so often, because whenever the CPU clock speed changes we can use the normal timer to check how long it’s meant to be until the next interrupt and then reprogram the A9 timer with the right value. So all that to say, the OS should be able to report a fixed timer frequency on Cortex-A9 because it will now be using a fixed-frequency timer.
I think so, yeah. |
Jeffrey Lee (213) 6048 posts |
I’ve pushed the changes to switch it over to nanosecond resolution instead of microsecond, along with calls to read the raw timer frequency & value, and a few other tweaks/fixes (and iMx6 support a few days ago). Next will be OMAP4 (hybrid of OMAP3 & iMx6 code), and then flipping a coin over whether I go for all the platforms which use the ARM Generic Timer (which should all be pretty easy), or the two harder platforms (IOMD & Iyonix, where I’ll have headaches due to time loss). I was suprised to see that OS_ReadMonotonicTime64 takes nearly 5 times as long as OS_ReadMonotonicTime – although still only 696ns! I would be interested to hear any explanation of the increase. Another data point – the SWI which just calls through to the HAL device and returns the raw, unscaled 64 bit timer takes around 619ns per call. Comparatively this is probably a third of the number of instructions as would be executed for OS_ReadMonotonicTime64, but it’s only 80ns quicker. So chances are that most of the time is spent in the one single instruction that reads the timer value. The interconnect that the timer is on is a lot slower than the CPU (100MHz or lower? I can’t find a quick reference for a typical frequency), and there’ll be extra delays due to CPU pipelining, synchronisation between different clocks, etc. |