ARM generic timer
Jeffrey Lee (213) 6048 posts |
RISC OS currently has a very old-fashioned way of dealing with timers. The HAL timers are expected to be count-down timers which generate interrupts at a fixed (but programmable) rate, and are able to report how long it is until the next interrupt occurs. This is fine as long as all you want is interrupts which occur at a fixed rate, but as soon as you want to try adjusting the rate of the interrupt, or you want to read the raw timer value (to get a high-precision timestamp), things get very complicated because there’s often no way of doing those things without introducing a risk of getting inaccurate results. E.g. if you want to get a high-precision timestamp, you’d somehow need to be able to perform an atomic read of both the timer counter register (HAL_CounterRead), and the OS-maintained “number of timer interrupts seen” counter (OS_ReadMonotonicTime). This is tricky, if not impossible. E.g. if interrupts are currently disabled, there’s no way for the OS to deal with a pending timer interrupt, potentially causing your high-precision timestamp to be 1cs lower than it should be. This is where the ARM generic timer (and other timers like it) come into the picture. In brief, the spec requires the timer to be able to run for at least 40 years before wrapping, and for it to operate at a fixed frequency somewhere around 1-50MHz. This makes it a good source for high-precision timestamps, whether you’re measuring short durations or long ones (although the spec also warns that the clock drift can be +/- 10 seconds per 24 hour period, so a software correction layer is likely to be required). And instead of generating interrupts at a fixed frequency, interrupts are generated based around a compare value. So it’s easy for the OS to support both periodic interrupts (just keep adding a constant onto the old compare value) and more variable ones (e.g. OS_CallAfter-style). There’s also a feature to allow WFE/SEV events to be generated (at a fixed rate), which ARM say could be used as a way of adding timeouts to spinlocks and the like – which is something we’ll surely want to tackle as the SMP work progresses. But also it could be used as a way of solving one of my bugbears (HAL_CounterDelay sits in a busy-wait loop burning CPU cycles because there’s no way for it to schedule an interrupt to wake it up – but if it knows a WFE event is going to arrive at a fixed rate it can sleep with WFE for most of the time and only spin once for the last few timer cycles) There have been a few mentions of FastTickerV recently – my view is that high-frequency period interrupts are bad, because a lot of the time the routines which hang off of the timer don’t actually do anything. E.g. MIDI – a 1kHz TickerV is a complete waste when 99.9% of those ticks don’t correspond to any notes being output. A high-precision OS_CallAfter (or more usefully, OS_CallAt) would be a much more useful interface, and timers like the ARM generic timer would make implementation of such functionality trivial. Potential issues:
|
nemo (145) 2545 posts |
Then we’re not using RISC OS.
Can that timer be reset? If so, that’s an occasional responsibility of the CallAfter-like functionality. Presumably there will be a modest upper limit on the required delay – even a year is silly, never mind 40. |
Jeffrey Lee (213) 6048 posts |
You can’t reset the timer, but you can adjust the offset which the virtual timer uses (which I’m assuming means you can use that as a reset mechanism). |
Steve Pampling (1551) 8170 posts |
So it wouldn’t work on my venerable Beagle or the not quite v. Pi and not a chance on the amazingly not dead RPC1 – well, there you go sh** happens and I have to look at buying a new board or machine to play with the feature. 1 Last time it was switched off was when we moved house, last time it lost power was a power cut a few weeks ago. As I said not dead yet and the rust is still spinning (lightning bolt of fate starts moving) |
Jeffrey Lee (213) 6048 posts |
A quick look at the OMAP3 TRM reveals that its timers could be used in a similar way to the ARM generic timer. They’re only 32 bits wide, so they’ll wrap more often (ARM timer is at least 56 bits wide), but they do have the all-important compare functionality. Also a little extra care is needed when programming it because the compare needs an exact match on the value for the interrupt state to change, unlike the ARM timer which checks timer_value >= compare_value. The Pi is pretty similar – it does have a full 64 bit timer, but the compare value only checks the low 32 bits (and needs an exact match). Iyonix & IOMD are the tricky ones since they have bare-bones count-down timers which wrap frequently, even with max reload value (Iyonix: about 5.7 minutes, IOMD: 0.03 seconds). |
Tristan M. (2946) 1039 posts |
Jeffrey, I agree with what you are saying. Guess which piece of HAL implementation I keep trying to write, get angry, then delete? The HAL_Counter code. The blocking nature of it bothered me. I even tried temporarily removing one of the timers from my HAL and using it as a counter. The basic idea was to set the desired value and a completion interrupt, then park the core with a WFI. Whenever it was woken up the code would check whether the interrupt was for the counter. If it wasn’t it’d go back to sleep. It didn’t quite work. I could see a lot of reasons why it mightn’t, but it was worth a try. Regarding the Pi timer, I looked at that one too, trying to find a template that I felt comfortable with. I was kind of surprised it only used the bottom 32 bits. Trouble is I’m rubbish at arithmetic and couldn’t do any better. I toyed with the idea of passing the values over to the VFP as 64 bit ints and letting it do the legwork, but I’m not sure that’s a great idea. I’m just stating my observations here. Making something reasonably accurate that doesn’t mess with other things is a bit of an issue. |
Tristan M. (2946) 1039 posts |
On thinking about it I opted to use the generic counter for my HAL_Counter and just made the low 32 available to HAL_CounterRead and made HAL_CounterDelay use the full 64 bits. I just opted to do it the simple way of calculating the 64 bit delta, adding it to the initial counter value and polling until the counter reaches / bypasses it. I’m not too concerned about issues with counter wraparound. |
Jeffrey Lee (213) 6048 posts |
Adding, subtracting & comparing 64bit (or wider) numbers is pretty straightforward with ARM. Multiplication is the only tricky one (although for the machine you’re working on, you will at least have 32×32 → 64 multiply & accumulate. So it’s only true 64×64 multiply which will cause problems, if you need it). ADDS RdLo,RnLo,RmLo ADC RdHi,RnHi,RmHi SUBS RdLo,RnLo,RmLo SBC RdHi,RnHi,RmHi ; comparison of unsigned numbers CMP RnHi,RmHi CMPEQ RnLo,RmLo ; comparison of signed numbers - this essentially adds a bias of 1<<63, essentially making it an unsigned comparison (and requires you to use the condition codes that would be used as if it were an unsigned comparison) ; this is necessary because although a signed compare of the high half is possible, a signed compare of the low half would be wrong because the low half doesn't store a sign bit ; (n.b. untested!) SUBS temp,RmHi,RnHi CMPNE temp,#&80000000 CMPEQ RnLo,RmLo |
nemo (145) 2545 posts |
Yup. |
Tristan M. (2946) 1039 posts |
Jeffrey, that’s more or less what I did before my previous post. I’m not going to post it (although it is on my Git repo) because there are some silly mistakes which I realised. When I was reading the ARM ARM it appears that accessing the generic counter via the coprocessor interface is preferable to doing it via the memory mapped interface, so that’s what I did. Definitely not pioneering code :) I do still feel that busy waiting is kind of awful. Using the counter can’t fix that. Using one of the system timers or one of the CPU core timers with WFE as Jeffrey suggested or WFI does seem much friendlier if done right. |
nemo (145) 2545 posts |
Correction came there none. For the record, a 64bit signed comparison requires only two instructions and a temporary register:
A 96bit comparison extrapolates the principle:
|
Tristan M. (2946) 1039 posts |
Me? Haven’t had time to do anything useful beyond getting everything to build again. Haven’t even pushed that yet. Your example is logical. I knew mine was clunky and strange. It’s what I’d call a first pass or rough draft. |
nemo (145) 2545 posts |
Not you mate. |
Jeffrey Lee (213) 6048 posts |
Considerations for designing/implementing the API:
|
Jeffrey Lee (213) 6048 posts |
After thinking about this last night & checking over the docs, I think it should be just about possible to support the required functionality on IOMD & Iyonix. As previously stated, both platforms use simple count-down timers which trigger an interrupt on hitting zero and then reload with a pre-programmed reload value. But you can also force them both to reload while they’re in the middle of counting down. For Iyonix, you can trigger a reload by directly writing to the counter value register. So we could leave the reload register at maximum, and then manually poke the counter register with whatever new countdown value we want to program. With the reload register set at maximum this would give the OS 20 seconds to react and reprogram the timer before it hits zero again, so there shouldn’t be any danger of interrupts being lost. All that’s needed is some logic that can track the “lost” clock cycles that occur when reprogramming the timer – if it reads the timer just before it writes the new value then that should be enough for the code to make a good guess as to how many cycles might be lost between the read & write operation (either we can work out the instruction cycle timing and hardcode a value, or we can try using another time source like the cycle count performance counter) For IOMD, a reload can be manually triggered by writing to the GO register. But it’ll only reload with the value that was programmed in the reload register, so for short timer periods there’s a chance that the OS will react too slowly and miss some interrupts. So for safety we’ll probably want to limit how low the reload value can be set – perhaps 1ms, since a 1kHz timer seems to be a common request. For detecting how many clock cycles are “lost” during the reprogramming phase, we can rely on the fact that the timers are continuously running, and peek at the other timer to see how much time it thinks has passed. Also both Iyonix & IOMD allow the CPU to detect if there’s a pending timer interrupt, so it should be possible for the timers to be used to provide a high-frequency 64bit monotonic timer. (although since IOMD will always be using a low reload value, it’ll be susceptible to inaccuracies if the interrupt handler is blocked for too long) |
Andrew Conroy (370) 740 posts |
Please excuse the possibly stupid question from a mere user, but would this high resolution timer be available to users from eg. BASIC. There’s currently the TimerMod module by Druck which claims to offer a microsecond timer for when code needs a greater resolution than centiseconds, would this timer be replacing that? |
Jeffrey Lee (213) 6048 posts |
Yes – there’ll be a standard SWI interface to complement OS_ReadMonotonicTime, OS_CallAfter, etc.
The aim is for it to be an alternative, rather than a replacement. Each machine has one or more timers which are exposed by the HAL; timer 0 is used by the OS for the centisecond timer, while the others are free for other programs (with the drawback that RISC OS 5 doesn’t have a way for programs to coordinate access to the extra timers). The plan is for the new timer system to also use timer 0 (with the usual centisecond timer derived from it), keeping the other HAL timers free for other programs to continue to use. |
Andrew Conroy (370) 740 posts |
Great, thanks! |
nemo (145) 2545 posts |
I’ve been thinking about this, and thought I must have misread it, but I can’t think of anything that would be affected by timer 0 being a different frequency as long as the 100Hz Ticker and (simulated) VSync stay the same. To what were you referring? |
Jeffrey Lee (213) 6048 posts |
The NetTime / RTC modules fine-tune the reload value in order to ease the soft RTC back in to sync with real time – that obviously won’t work very well if the reload/compare value is being changed by the kernel on every timer interrupt. For third-party software, the HAL counter / timer APIs are included in the set of HAL entry points that were publicly documented by Castle around the release of the Iyonix. So any software which uses HAL_TimerReadCountdown (for timer 0) or HAL_CounterRead to get sub-centisecond timing values is going to get confused when the timer starts to get used for more than just the 100Hz ticker. The USB drivers contain code which does this, so it is at least feasible that third-party software might try doing the same as well (although I suspect most would just rely on TimerMod since that’ll work on more OS versions). |
Andrew Conroy (370) 740 posts |
I’m guilty of doing this when I wanted something that was stand-alone and didn’t rely on a third-party module! It was fairly rough, but good enough. |
Rick Murray (539) 13839 posts |
Would it not be possible to run the underlying ticker at, say, 1000Hz and simply synthesise the 100Hz ticker by incrementing every ten fast ticks? After all, the point of a HAL is to present a unified API abstracted from the physical hardware, so as long as the 100Hz stuff appears to run at 100Hz…? BTW, shouldn’t that USB code be calling HAL_CounterRate instead of just assuming: /* conversion to ns, assume counter is for 1 cs */ ns_factor = 10000000 / max_count; ? |
Jeffrey Lee (213) 6048 posts |
It should be trivial for the HAL or OS to provide an implementation of the counter API that conforms to the current spec. But since the current spec is a bit light on details (e.g. no guarantees as to how long the counter will run until wrapping), any code which wants to do “long” duration timing (like the USB drivers) is likely to assume that the counter is derived from timer 0 (and that timer 0 is only being used for TickerV). So changing how the counter is implemented, or changing how the OS uses timer 0, is likely to break that code (e.g. the code from the USB drivers – it’s only getting by without needing to call HAL_CounterRate because it’s making the assumption that counter == timer 0 == OS_ReadMonotonicTime) Emulating that kind of counter behaviour is likely to require some help from the OS, since the OS will be the only component that knows when the 100Hz ticks occur. Not a deal breaker, just a bit of extra compatibility baggage for the kernel to carry around. |
Jon Abbott (1421) 2651 posts |
Why not virtualise HAL timer 0 and return the values it would ordinarily return? I’m assuming the timer API will allow multiple timers. On IOMD we have 2Mhz countdown timers, we could just go lowest common denominator and provide something similar but at a lower rate. It possibly doesn’t matter if they’re count up or count down as the main use is probably to trigger code (OS_CallAfter, OS_CallEvery) or allow RTSupport to offer more granularity. For reading the time since the last timer event, you could just have a seperate universal counter like OS_ReadMonotonicTime. For the timers themselves, you can probably get away with only being able to read/set the time to the next event. Depending on how concerned we are about acurate timing, there might be some value in passing the current time in registers when a timer event is triggered, this saves issuing an SWI to get the current time – if that’s something that timer driven code needs. One of RTSupport’s quirks is the fact you have to use OS_ReadMonotonicTime to reschedule and with multiple claimants, those SWI’s soon add up. |
Jeffrey Lee (213) 6048 posts |
That’s basically what we’ll have to do, yes.
I have a feeling that most routines won’t care about what the time currently is, only about what the time should have been (i.e. the time the event was scheduled). So I’m thinking that on entry the event routine will be passed the time the event was scheduled to occur, and on exit it can return the time the event should occur next (or zero to disable the event). This means that TickerV could just add The interrupt handler will just sit in a loop peeking at the front of the event queue and processing events until it runs out of ones which are in the past. So if the system is temporarily overloaded, you’ll get a few events occurring back-to-back as it catches up (e.g. TickerV might get called several times). This also means the code will be reading the current time before it calls each event handler – so we could easily pass in both the current time and the scheduled time, and let the event decide which ones it wants to make use of. |