RTC puzzle
Pages: 1 2
Jeffrey Lee (213) 6048 posts |
After observing some undesirable behaviour, I’ve been spending some time trying to decipher the logic that the RTC module uses to adjust the frequency of the 100Hz ticker. The algorithm it uses is here – which looks like it was cribbed from NetTime, which in turn cribbed it from RTCAdjust. However none of them have documented exactly why the algorithm is the way is, so to work out what it’s doing (and why it’s wrong in some situations) I ended up having to derive the equations myself and simulate the behaviour in BASIC. The basic aim of the algorithm is to “close the gap” between the OS’s idea of the time (which I’ll call time_OS) and the real time (which I’ll call time_RTC). That’s what the (P+C)/P term is for: C is the current difference between time_OS and time_RTC, and P is the duration (in OS clock ticks) that you want to spend closing the gap. Given those inputs, it’ll output a value which acts as a multiplier for the timer period. If the clock crystal used to generate the 100Hz ticker was fully accurate, (P+C)/P is the only term you’d need – your algorithm for calculating the new timer period can just be But the problem is that the clock crystal used for the 100Hz ticker isn’t guaranteed to be accurate (and that’s the problem that RTCAdjust was created to solve). It may run faster or slower than the period you’ve programmed it for, and it’ll probably vary over time in response to temperature. This is where the P’/(P’-C+W) term comes in. A clearer way of writing it would be delta_OS / delta_RTC: It’s the speed of the OS’s clock relative to the RTC, which you can then use to correct for the error in the clock crystal used for the 100Hz ticker. When the algorithm is executed it knows the value of delta_OS (it’s the time since the algorithm last ran – P’). delta_RTC isn’t tracked directly – instead it’s calculated from P’, the current error value ©, and the error value from P’ ticks in the past (W). With those values, delta_RTC is just the number of OS clock ticks plus the difference in the two error values (i.e. P’+(W-C), or P’-C+W after rearranging). Also, with the delta_OS / delta_RTC term added to the equation, we no longer want to be using default_period as the base ticker value – we want all our adjustments to be made relative to the previously programmed period value (because we’re measuring everything relative that value, instead of relative to the assumed-to-be-accurate default_period). Now that that’s out of the way, we can discuss the flaws with how the algorithm has been integrated into the RTC module.
I think the easiest solutions to these problems would be as follows:
Can anyone spot any flaws in the above? |
Steve Pampling (1551) 8172 posts |
Can you check the logic in the NTP module in FreeBSD or similar, because I could swear NTP doesn’t do large scale corrections in one go. I believe it introduces a proportionate correction value that will probably take a number of iterations (double digits) to achieve near parity on a minute or so delta. Clear priority of sources would be good and NTP time source should be highest, but failure to connect to a server should mark the source as invalid. |
Chris Hall (132) 3558 posts |
I have a set up where the time is derived from a GPS signal as the only external reference. This has some rather strange behaviour in that some NMEA messages include only the time (in GMT) and others include both time and date (in ddmmyy format, note with no century). If the GPS battery has gone flat then it starts counting from midnight on 6th January 2080 (which it might think is 6th January 1980) but if the battery is healthy, then the time and date will be correct. You can only be sure it is correct after it has confirmed a satellite position fix (which can be inferred from other fields in the NMEA message). My SatNav software carefully excludes any date with bit 39 of the 5 byte time set (i.e. after 04:28:56 18-Mar-2074) and if this bit is not set will set the RISC OS time and will generate Wimp messages about clock error (between RISC OS time and satellite time) in centiseconds. Satellite time is derived from extremely accurate atomic clocks.
The Witty Pi 2 has a RISC-OS compatible RTC chip fitted but has a slight idiosyncracy that an unexpected power on will set the clock to January 2100. Hope this helps. NTP time source should be highest Arguably GPS time source should be highest. |
Rick Murray (539) 13850 posts |
Minor correction: should mark the source as invalid but will retry later. Something that really irritates me with Android’s email client is that if there is a single connection error (not unlikely if it syncs every 15 minutes using the mobile network) then rather than flagging an error and trying again later, it’ll throw in the towel completely and tell you your login password is wrong. Grrrrrrrrrr! Software not built for resilience! |
David Pitt (3386) 1248 posts |
A puzzle can be seen on my Titanium, which has a not noticeably accurate battery backed RTC. Network time is configured but the downside is that the initial correction applied by !Boot is in the wrong direction. The next time check 30 minutes later gets it right and spends the next 30 minutes undoing the first 30 minute’s error. Given that a second NetTime call is correct and to avoid this wasted hour of even more inaccurate time my boot sequence issues a delayed This example is from OS5.24 without my correction bodge. The Titanium was switched off overnight. Previous OS5’s and OS5.27 all do the same. *FX0 RISC OS 5.24 (16 Apr 2018) * *NetTime_Status Current time: Wednesday, 6 March 2019 07:01:10.09 Status: Sleeping Last adjustment: 25 seconds ago Last delta: 90.321594 seconds fast Last server: ntp.plus.net Last protocol: SNTP Poll interval: 30 minutes *NetTime_Status Current time: Wednesday, 6 March 2019 07:32:23.42 Status: Sleeping Last adjustment: 1 minute 38 seconds ago Last delta: 126.139601 seconds fast Last server: ntp.plus.net Last protocol: SNTP Poll interval: 30 minutes *NetTime_Status Current time: Wednesday, 6 March 2019 08:05:24.93 Status: Sleeping Last adjustment: 4 minutes 40 seconds ago Last delta: 89.944136 seconds fast Last server: ntp.plus.net Last protocol: SNTP Poll interval: 30 minutes *NetTime_Status Current time: Wednesday, 6 March 2019 08:43:09.46 Status: Sleeping Last adjustment: 12 minutes 24 seconds ago Last delta: 53.729533 seconds fast Last server: ntp.plus.net Last protocol: SNTP Poll interval: 30 minutes |
Jeffrey Lee (213) 6048 posts |
the second invocation of the algorithm uses data from NetTime to try and correct for a 10 minute difference between the network time and the OS’s time RISC OS’s RTC module will jump immediately to the new time if the correction to be applied is larger than ~5 minutes (so saying “10 minute difference” in my example wasn’t a good choice). Otherwise it’ll try to apply the correction over the ‘P’ period (1 hour for the RTCAdjust-alike logic, half an hour for NetTime?).
I’ve seen similar inaccuracies on my BeagleBoard – I have a feeling that all OMAP/TI machines have inaccurate battery-backed RTCs. So an extra thing I thought of last night would be to add the ability to flag the battery-backed RTC as “low accuracy”, so that instead of having the RTC module try to adjust the 100Hz ticker to match the frequency of the RTC, the system can work in the other direction: Adjust the frequency of the RTC to try and match the 100Hz ticker. I’d have to check the docs, but I think the RTCs do include fine-tuning controls to allow the OS to do this.
Probably some combination of problems 1-3 from my post. (edit: plus there’s currently a bug which can allow low-priority time sources to override high-priority ones – although I’m not sure if it’ll be triggered in your case) We may also want to review the situations in which the “correct” time value is written back to the battery-backed RTC – currently I feel that it doesn’t do it often enough. |
Rick Murray (539) 13850 posts |
It’s a shame there’s no way for NetTime to force a time correction at boot, so if one can contact the server the machine can start off with a good known value.
? There’s no RTC on my Beagle xM, it’s an add on. Which raises an interesting question – what’s going wrong? I don’t expect old machines to keep exact time, but my A5000 (etc) wasn’t ever synced to anything and the time wasn’t so far out that it would cause annoyance. Are these things using cheap Chinese crystals? Is this the timing equivalent of capacitor blight? |
Sprow (202) 1158 posts |
You might want to have a quick read at PID control loops, I’m pretty sure that’ll unlock what the original (as you say, literally translated from RTCAdjust) is trying to do. That’ll also explain things like P’ (which I read as pea-prime) notation; that’s a differential term. That said, it’s possible I didn’t adapt it to cover some of the edge cases you highlight. The priority logic should generally latch onto the highest “quality” clock source and ignore all the others, so there shouldn’t be a great deal of switching in and out, which in turn would be why it seemed to behave like RTCAdjust did (except written in C) when tested. Suggestions 1-5 in your list look sensible. Priority levels are intended to be unique, allocated, values. |
Jeffrey Lee (213) 6048 posts |
I have a feeling that all OMAP/TI machines have inaccurate battery-backed RTCs. The RTC is always there (it’s part of the power control chip), the battery is the add-on. Without a battery fitted, the OS will ignore the RTC time on boot (it knows it’s garbage), but it will program it when e.g. NetTime sends the (massive) clock correction request to the RTC module. So if you were to then hit the reset button on the board then the OS should retain that time value. I guess this also means that even people who aren’t explicitly trying to use the onboard RTC may be punished by its clock drift. Although maybe the RTC uses a different clock crystal if main power is available (i.e. the large drift only occurs when it’s running from the battery).
Thanks – I wasn’t really sure what search term to use. |
Rick Murray (539) 13850 posts |
Ah yes, I remember now – thinking that missing a little battery was an odd omission….
Is there no 32kHz xtal around? I can’t look, I’m standing in a field in the rain on the way to feed a furry… :-) At any rate there really ought to be just the one time source. Anything else would be “complicated”. It’s a shame it’s such rubbish at keeping time. |
Steve Pampling (1551) 8172 posts |
but failure to connect to a server should mark the source as invalid. Yup, feeding cats and diving out of the house and missed the end of the required text “to retry after a defined interval” should have finished it.
I thought Android had a recheck interval much longer than that. It does use the *.pool.ntp.org setup1 though, as opposed to the Apple insistence time.apple.com and variants in the 17.0.0.0/8 1 I think its actually something like 2.android.pool.ntp.org |
Steve Pampling (1551) 8172 posts |
The highest quality source would almost always be NTP. The exceptions being when the server is unreachable because the network stack isn’t yet operational or the network connection is down for some other reason.
Perhaps the NTP source being valid should be reason to mark everything else as invalid? |
Jeffrey Lee (213) 6048 posts |
large drift only occurs when it’s running from the battery) According to the docs, the RTC is driven by an external 32kHz crystal (or an external digital 32kHz clock – but the BB-xM uses the crystal). There’s also mention of it being synchronised to the main high-frequency clock, when available (i.e. when main power is available). There’s a drift compensation register which can allow the RTC to adjust for +/- 1 second of error every hour – which should be more than adequate. The crystal used in the BB-xM is accurate to +/-20ppm at nominal temperature, so drifting by 2 or more seconds per day isn’t out of the question. I’m not sure if that’s enough to account for the drift that I’ve seen (bear in mind that this is half-remembered events from years ago). Meanwhile, the 26MHz crystal which drives the main clocks is (nominally) accurate to +/-25ppm, and seems to age worse – so in theory the 32kHz crystal should be the more accurate one, if only by a small margin. |
Dave Higton (1515) 3534 posts |
Jeffrey: I power up and use my BBxM for typically about 4 hours daily, but I notice that, at power-up, Alarm shows a time roughly a minute in advance of reality. This is gradually corrected (I couldn’t tell you how soon it comes back, though), and long before time to power down, it’s correct. Every day it all repeats. Do you see anything like that? How would I diagnose whether the RTC is running ridiculously fast, or whether the clock is being set wrongly at power-up? |
Jeffrey Lee (213) 6048 posts |
The best way of diagnosing what’s going on with the hardware RTC would probably be to use a custom ROM which spits the raw (BCD-encoded) data out over the serial port whenever the RTC is read/written. Then a second machine (with a trustworthy clock) can add its own timestamps to each line of data as it comes in. From that it should be possible to see whether any errors are down to the RTC or the OS. There’ll also be an error of up to 1 second when reading the RTC, and a drift of up to 1 second when writing it, because 1 second is the smallest unit that’s exposed over the I2C interface. This is the same as with the chips used in Acorn-era machines, but is something worth bearing in mind when thinking about keeping the clocks accurate. So for accurate RTC timestamps we’d either need to resort to polling loops, or the alarm interrupt feature (if supported), which would allow us to record monotonic time that corresponds to when a new RTC second starts. |
Rick Murray (539) 13850 posts |
If it’s as bad as some seem to imply, even with an error margin of a second… ;-) Does the OMAP’s RTC not have a clock output? I recall the PCF-thingy output a square wave at 32.768kHz, 1.024kHz (and some slower ones) which could be fed into an oscilloscope to tweak the capacitor to calibrate the oscillator. There doesn’t appear to be hardware calibration (big-ass varicaps don’t look good on a board filled with surface-mount!), however the TPS65690 appears to have drift compensation and time correction – so that could be a place to look, perhaps? Damn – just looking at the TPS65690 datasheet. It does a lot. There’s a 32KCLKOUT signal that is connected to the OMAP’s SYS_32K so it doesn’t look directly accessible. If I understand the datasheet, the chip uses its internal 32kHz source when powered, reverting to the external crystal when not powered… Is this correct? Or does RISC OS set it to always use the external crystal? Ah – here’s a PDF on the drift compensation – http://www.ti.com/lit/an/swca024/swca024.pdf |
Jeffrey Lee (213) 6048 posts |
SYS_32K is the source for the OMAP’s 32KHz counter, so using that to perform the drift compensation calculations should be relatively straightforward.
My reading of the docs suggest that it always uses the external 32kHz source. When main power is available, the (digitised) 32kHz clock will be synchronised with a higher-frequency clock, but I believe the only purpose that serves is to stop the 32kHz clock from transitioning in the middle of a high-frequency clock tick – the synchronisation doesn’t do anything to correct for inaccuracies in the clock. |
Dave Higton (1515) 3534 posts |
You need something other than an oscilloscope to calibrate frequency that accurately – specifically a frequency counter. I have two ‘scopes, one of which is a crystal controlled DSO, but I don’t have a counter. (And even a DSO with crystal sampling timebase can’t be used for this job.) |
Dave Higton (1515) 3534 posts |
I just looked at the BBxM schematic that I have. The 32 kHz clock circuit is as most of these sorts of things are: a 32 kHz crystal and two capacitors connected to the main chip. The capacitors are fixed. The oscillator will be a simple little stage internal to the main chip. There is a buffered 32 kHz output, which is useful as of course connecting a probe to either side of the crystal would shift the oscillation frequency significantly. |
Jeffrey Lee (213) 6048 posts |
The RTC module fixes are now in CVS. They should result in a noticeable improvement in clock accuracy when NetTime is in use – previously the first clock correction would typically be completely wrong (e.g. over-correcting, or going in the wrong direction), meaning it would take an hour for the system to arrive at the correct time (i.e. the end of the second adjustment period). Problems which haven’t been fixed:
|
Chris Hall (132) 3558 posts |
If, for example, a hardware RTC chip sees its battery discharge so that it adopts its own default time when the computer is next started up (which can be Jan 1970, Jan 2100, Jan 2080 or Jan 1900 depending on the chip) then how well does RISC OS and/or the new RTC module handle this? |
Jeffrey Lee (213) 6048 posts |
The behaviour hasn’t been changed in this new version. The behaviour will also vary depending on what type of RTC you have. Some of the drivers will check for the clock being stopped/invalid and return an error (causing the OS to ignore the RTC and use the kernel’s default of 1970 until a new time gets written to the chip). Other drivers don’t check for the clock being stopped/invalid, so you’ll end up with whatever default value the RTC chip uses (these drivers should probably be fixed, since for most/all of the chips I think it is possible to detect when the invalid times) |
Chris Evans (457) 1614 posts |
When this happens could the OS spot such dates, give an non blocking error message and set the year to something more sensible? e.g. 11 years prior to the OS’s build date. I say 11 years as I think you still want it to be obviously wrong. |
Rick Murray (539) 13850 posts |
Please tell me you don’t click up, up, up, up (repeat a thousand times) in Alarm?!?
|
Chris Evans (457) 1614 posts |
Thanks for that Rick. I knew you could set everything from BASIC but it is so easy to get the syntax slightly wrong and everything is then ignored. I’m hoping now that *Set Sys$Year 2019 is easy enough to remember, though as I may not need to use it for several months…. |
Pages: 1 2