RISC OS Open: Forum: Is there a known issue with OS

Aug 10, 2015 7:25am

Rick Murray (539) 13840 posts

I have a problem where a module that is supposed to do something at periodic intervals…doesn’t.
It is almost as if sometimes OS_CallAfter forgets .

Has anybody come across something like this before?

Aug 10, 2015 12:06pm

Jon Abbott (1421) 2651 posts

You can’t rely on callbacks to work as they’re depended on many factors, such as (IIRC) the Supervisor stack being clear, IRQsema being clear, IRQ’s being enabled and the SWI handler exiting to User mode.

I had no end of issues with it in ADFFS and had to write my own callback handler in the end. You could use TickerV, OS_CallEvery (I think this hangs off TickerV) or RTSupport instead.

Aug 10, 2015 12:19pm

Rick Murray (539) 13840 posts

CallAfter, not CallBack. This is the “happens on time but don’t mess with anything or the OS will blow up” version.

Aug 10, 2015 12:31pm

Jeffrey Lee (213) 6048 posts

Are you checking for any errors returned by OS_CallAfter? If you’re calling it from an interrupt context then it’s possible there isn’t enough free memory in the RMA (or is it the system heap?) for it to add the entry (and you can’t resize dynamic areas from IRQ handlers).

If the SWI isn’t returning an error, then it’s possible you’re encountering this TickerV corruption bug, which I’m yet to get to the bottom of (I tried adding some debug code to check for it, but either my initial test case wasn’t very good or it’s a very timing-sensitive issue and the addition of the debug code has caused it to not happen).

I’m also starting to wonder whether some of the issues people are reporting with NetTime – like your case of it being > 1 day since the last correction – are also a symptom of the TickerV corruption bug.

If you have an easy-to-repro test case (i.e. one that happens after a few days rather than one that happens after a few weeks) then I’d love to hear about it!

Aug 10, 2015 1:22pm

Rick Murray (539) 13840 posts

My server module (started at boot) often fails to respond until it has had a “kick”, though sometimes the kick fails. Thing is, it is an entire application. I’ve not tried to narrow it down.

In your other post, you said “It looks like somehow the kernel’s ticker event chain is becoming corrupted. […] so both it and everything that was located after it in the schedule weren’t being processed.” Do you have a tool to dump the contents of the ticker chain? If the thing stops responding (or NetTime), I could look at the chain to see if it made sense? If it does, then the problem lies elsewhere…

Aug 10, 2015 8:05pm

Jeffrey Lee (213) 6048 posts

Do you have a tool to dump the contents of the ticker chain?

DebugTools has a *Tickers command for that very purpose. However the output wasn’t particularly human-readable (it was basically just a raw dump of the internal ticker chain format), so I’ve now tweaked it to make it a bit better. Grab a build of the latest version of the module here

If the ‘Repeat’ time for an entry is lower than the ‘In …’ heading then that’s a sign that you’ve been struck by the bug.

Aug 10, 2015 8:39pm

Rick Murray (539) 13840 posts

This is strange.

My module isn’t listed with *Tickers. There are two CallAfter events that are always running. One every 50cs to check for incoming connections. I know this is running as the server module just answered a connection from my phone. The other, some temporary code to ‘kick’ the listener socket every 180000cs (30 minutes). I can see this has been working (it reports to DADebug when it ‘kicks’).

Called with code like:

reply = SetPending((int)module_kickcallafter, (int)wsp, 180000);
kick_ca_ispending = TRUE;

(wsp is a copy of the “pw” (privateword) passed during the module initialisation)

and:

; Entry: R0 = Pointer to routine
;        R1 = Module private word
;        R2 = Delay (centiseconds)
SetPending
       STMFD   R13!, {R1-R3, R14}
       MOV     R3, R2     ; remember delay
       MOV     R2, R1     ; private word in R2
       MOV     R1, R0     ; address in R1
       MOV     R0, R3     ; delay in R0
       SWI     XOS_CallAfter
       MOVVC   R0, #0
       LDMFD   R13!, {R1-R3, PC}

I have added some extra code to write to DADebug if the reply from CallAfter is non-NULL; though this is normally called in either SVC or USR mode – with the sole exception of the kick routine, everything else needs filesystem access; so the CallAfter schedules a CallBack from which it is safe to do stuff without the dreaded FileCore in use.
Anyway, no error messages (as yet).

So, interesting. Why isn’t my module showing up in the list? ;-)

Aug 10, 2015 9:58pm

Martin Avison (27) 1494 posts

DebugTools has a *Tickers

Just tried that on Iyo with RO5.22, and the last entry was
In 649151161 cs:
Code – Address &FC055018 is at offset &00001554 in module TerritoryManager
Wkspc – &20002114

and I was wondering why on earth TerritoryManager would need a call after about 375 days ?!

Aug 10, 2015 10:15pm

Jeffrey Lee (213) 6048 posts

and I was wondering why on earth TerritoryManager would need a call after about 375 days ?!

Daylight savings time. (And unless I’m mistaken, that value comes out as 75 days, not 375)

Aug 10, 2015 11:16pm

Rick Murray (539) 13840 posts

Well, that didn’t take long.

*Tickers
Ticker event claimants...
 In 4294551774 cs:
  Code   - Address &FC3A8494 is at offset &000004F8 in module EtherUSB
  Wkspc  - &20148854
  Repeat - Every 2 cs
 In 4294551777 cs:
  Code   - Address &FC2D31A4 is at offset &00000704 in module ScreenBlanker
  Wkspc  - &20040E14
  Repeat - Every 20 cs
 In 4294551778 cs:
  Code   - Address &20252C84 is at offset &000103CC in CoolSwitch module's workspace
  Wkspc  - &FB407C84
 In 4294551783 cs:
  Code   - Address &FC1E3AC8 is at offset &000002F0 in module DWCDriver
  Wkspc  - &20015DA0
  Repeat - Every 11 cs
 In 4294551794 cs:
  Code   - Address &201DE288 is at offset &000007F4 in module CJEPower
  Wkspc  - &2013BC74
 In 4294551803 cs:
  Code   - Address &2032685C is at offset &000E3FA4 in CoolSwitch module's workspace
  Wkspc  - &FB407DEC
  Repeat - Every 101 cs
 In 4294551833 cs:
  Code   - Address &FC14E6D8 is at offset &00003E90 in module WindowManager
  Wkspc  - &20003B94
  Repeat - Every 100 cs
 In 4294552194 cs:
  Code   - Address &FC39404C is at offset &000006A8 in module LanManFS
  Wkspc  - &FB407644
  Repeat - Every 4500 cs
 In 4294554676 cs:
  Code   - Address &FC1D8100 is at offset &000006F8 in module USBDriver
  Wkspc  - &2000D0B4
  Repeat - Every 3001 cs
 In 4294578517 cs:
  Code   - Address &2024635C is at offset &00003AA4 in CoolSwitch module's workspace
  Wkspc  - &FB407BBC
 In 4294670114 cs:
  Code   - Address &20252C54 is at offset &0001039C in CoolSwitch module's workspace
  Wkspc  - &FB407C84
 In 4294893971 cs:
  Code   - Address &FC2D1A84 is at offset &0000029C in module RTC
  Wkspc  - &FB406F8C
  Repeat - Every 360000 cs

The non-existant server was also ‘dead’, kicking it revived it and appears to have revived part of CoolSwitch, although some of the earlier when-it-worked timings are a bit odd.

This is sort of what it should look like, from around 9pm this evening:

*Tickers
Ticker event claimants...
 In 1 cs:
  Code   - Address &FC3A8494 is at offset &000004F8 in module EtherUSB
  Wkspc  - &20148854
  Repeat - Every 2 cs
 In 2 cs:
  Code   - Address &FC2D31A4 is at offset &00000704 in module ScreenBlanker
  Wkspc  - &20040E14
  Repeat - Every 20 cs
 In 4 cs:
  Code   - Address &FC1E3AC8 is at offset &000002F0 in module DWCDriver
  Wkspc  - &20015DA0
  Repeat - Every 11 cs
 In 8 cs:
  Code   - Address &201DE288 is at offset &000007F4 in module CJEPower
  Wkspc  - &2013BC74
 In 36 cs:
  Code   - Address &20252D04 is at offset &0001044C in CoolSwitch module's workspace
  Wkspc  - &FB407C84
 In 38 cs:
  Code   - Address &FC14E6D8 is at offset &00003E90 in module WindowManager
  Wkspc  - &20003B94
  Repeat - Every 100 cs
 In 76 cs:
  Code   - Address &2032685C is at offset &000E3FA4 in CoolSwitch module's workspace
  Wkspc  - &FB407DEC
  Repeat - Every 101 cs
 In 267 cs:
  Code   - Address &FC1D8100 is at offset &000006F8 in module USBDriver
  Wkspc  - &2000D0B4
  Repeat - Every 3001 cs
 In 999 cs:
  Code   - Address &FC39404C is at offset &000006A8 in module LanManFS
  Wkspc  - &FB407644
  Repeat - Every 4500 cs
 In 96992 cs:
  Code   - Address &20252CD4 is at offset &0001041C in CoolSwitch module's workspace
  Wkspc  - &FB407C84
 In 124589 cs:
  Code   - Address &2024635C is at offset &00003AA4 in CoolSwitch module's workspace
  Wkspc  - &FB407BBC
 In 261776 cs:
  Code   - Address &FC2D1A84 is at offset &0000029C in module RTC
  Wkspc  - &FB406F8C
  Repeat - Every 360000 cs

Aug 10, 2015 11:21pm

Rick Murray (539) 13840 posts

BTW, would this explain why sometimes it seems as if Wimp_PollIdle “gives up” and just behaves like regular Wimp_Poll?

Aug 11, 2015 12:26pm

Rick Murray (539) 13840 posts

Okay, I have “found” my module in the tickers list:

*Tickers
Ticker event claimants...
 In 1 cs:
  Code   - Address &FC3A8494 is at offset &000004F8 in module EtherUSB
  Wkspc  - &20148854
  Repeat - Every 2 cs
 In 10 cs:
  Code   - Address &FC2D31A4 is at offset &00000704 in module ScreenBlanker
  Wkspc  - &20040E14
  Repeat - Every 20 cs
 In 10 cs:
  Code   - Address &FC1E3AC8 is at offset &000002F0 in module DWCDriver
  Wkspc  - &20015DA0
  Repeat - Every 11 cs
 In 12 cs:
  Code   - Address &FC14E6D8 is at offset &00003E90 in module WindowManager
  Wkspc  - &20003B94
  Repeat - Every 100 cs
 In 27 cs:
  Code   - Address &201DE2E8 is at offset &000007F4 in module CJEPower
  Wkspc  - &2013BC74
 In 46 cs:
  Code   - Address &202DBB24 is at offset &000E474C in CoolSwitch module's workspace
  Wkspc  - &FB407DC4
 In 1908 cs:
  Code   - Address &FC1D8100 is at offset &000006F8 in module USBDriver
  Wkspc  - &2000D0B4
  Repeat - Every 3001 cs
 In 2639 cs:
  Code   - Address &202DBAF4 is at offset &000E471C in CoolSwitch module's workspace
  Wkspc  - &FB407DC4
 In 4072 cs:
  Code   - Address &FC39404C is at offset &000006A8 in module LanManFS
  Wkspc  - &FB407644
  Repeat - Every 4500 cs
 In 46571 cs:
  Code   - Address &2024AC3C is at offset &00053864 in CoolSwitch module's workspace
  Wkspc  - &FB407BE4
 In 224343 cs:
  Code   - Address &FC2D1A84 is at offset &0000029C in module RTC
  Wkspc  - &FB406F8C
  Repeat - Every 360000 cs
*MyTickerDebug
Addresses:
  Socket CallAfter         = &202DBB24
  Socket CallBack          = &202DBB0C
  LTask  CallAfter         = &202DBB18
  LTask  CallBack          = &202DBB00
  Kicker CallAfter         = &202DBAF4
Status:
  Socket CallAfter pending = Yes
  Socket CallBack pending  = No
  LTask  CallAfter pending = No
  LTask  CallBack pending  = No
  Kicker CallAfter pending = Yes
*

Interesting. Zap just apologised that it didn’t have enough memory to provide me with a copy of CoolSwitch’s workspace (I have about 180MiB free).
I have killed CoolSwitch, and my module is appearing correctly in the tickers list now.

Aug 11, 2015 12:34pm

Rick Murray (539) 13840 posts

Right, I’ve removed CoolSwitch from my boot and my module is showing up in the tickers list. I’ll leave it running awhile too see if the ticker chain times muck up.

While I’m doing this, I might change the current behaviour (a CallAfter that schedules a CallBack; a CallBack that does the work and then schedules a new CallAfter at the end) to simply be a CallEvery that will schedule a CallBack if one isn’t already pending…

BTW – there is a clash with the DDE and DebugTools. If the module is loaded at boot prior to the DDE being ‘seen’ by the filer, the DDE will fail to boot, syntax error re. *Canonical.

Aug 11, 2015 12:58pm

Jeffrey Lee (213) 6048 posts

Well, that didn’t take long.

Yep, that looks pretty broken!

Not sure why I didn’t spot these issues when I first looked at the kernel’s CallAfter/CallEvery code, but here are two I’ve spotted:

There’s a potentially dangerous interrupt hole in the code that deals with CallEvery’s (line 172-ish). It the user’s routine is naughty and returns back to the kernel with IRQs enabled then it will re-insert the ticker node correctly (due to InsertTickerEvent disabling iRQs locally), but then when it goes to check to see if there’s another event which needs firing it will be doing that check with IRQs enabled – potentially leading to bad things if there’s another timer interrupt during that time.
I think the change in 2013 to treat the times as unsigned will have made this problem even worse – if there are two events which are due to fire at the same time, then when the first event is removed from the list in order to be executed, the second event will be left there, with a TickNodeLeft of 0 (internally the ticker chain stores the delta time between each entry in the list – so that the kernel only has to update the time remaining of the first list entry instead of walking through the whole list each tick). If the first event re-enables IRQs during its processing, and a timer IRQ fires, then the code at line 148 will subtract 1 from 0 (the TickNodeLeft of the second event), store back the result of &FFFFFFFF, and then exit without firing the event. I think that’s a pretty safe bet for what happened to you last night (But unfortunately I don’t think it’s what happened to me when I was able to reproduce it – the times I was seeing from *Tickers weren’t anywhere near as high).

If my condition codes are correct then swapping the MOVNE pc, lr for MOVHI pc, lr should fix the unsigned wrap around problem. And the interrupt hole should be fixable by moving the “IRQ’s off again” at line 193 down to after the 10 label (for something like this it’s best for the kernel to play it safe and not assume the users routine restored interrupt state correctly).

AFAIK the issues with SmartReflex/TickerV only started getting reported in 2014, after the unsigned time changes in 2013 – so rather than doing any more exhaustive testing myself, maybe it would be enough to make those changes, check them in and wait to see if people report any further issues. (The other alternative I can think of would be to make a testbed containing a copy of the kernel’s ticker code and manually call it recursively at various points to see what issues pop up – but if I’m fixing the only interrupt hole I can spot then I don’t really know where else I’d be putting the checks!)

BTW, would this explain why sometimes it seems as if Wimp_PollIdle “gives up” and just behaves like regular Wimp_Poll?

Not sure – I think the Wimp maintains the time itself for Wimp_PollIdle.

Aug 11, 2015 1:04pm

Jeffrey Lee (213) 6048 posts

BTW – there is a clash with the DDE and DebugTools. If the module is loaded at boot prior to the DDE being ‘seen’ by the filer, the DDE will fail to boot, syntax error re. *Canonical.

Interesting – perhaps the DDE provides its own version of that command?

There’s also *Where, which is now present in both the Debugger and DebugTools (Debugger’s version is better).

Aug 11, 2015 1:31pm

Rick Murray (539) 13840 posts

Yep, that looks pretty broken!

;-) For the moment, it is behaving. For the moment.

That said, stuff thinking CoolSwitch had an infinite workspace indicates that something was wrong!

Not sure why I didn’t spot these issues when I first looked at the kernel’s CallAfter/CallEvery code,

Okay, hands up every programmer that has had a moment like that.
<raises hand>

Sometimes, the best way to deal with (potentially) problematic code is to leave it, do other stuff, and come back a while later.

If my condition codes are correct then swapping […]

Thanks. I’ll keep an eye on the CVS and do a diff to modify my kernel. I’m running a build from last October (as I never got CVS to work for me and it is really time consuming unpacking the gzip) because I depend upon CE(S)T in the UK territory, and CLib dealing with it correctly.
Which reminds me, I really ought to build some test case software to bash localtime() on every OS version I can lay my hands upon.

Interesting – perhaps the DDE provides its own version of that command?

AcornC/C++.!SetPaths.Lib32.canonical (utility)

Aug 11, 2015 2:53pm

Colin (478) 2433 posts

I never got CVS to work for me

I have a directory containing !cvs and the taskobey file ‘cvsfetch’ as shown below

dir <obey$dir>
set UnixEnv$cvs$sfix ""
set alias$cvs <obey$dir>.!CVS.Bin.cvs %%0
echo "Started"
cvs -d :pserver:anonymous:@riscosopen.org:/home/rool/cvsroot co mixed/RiscOS/Sources/HWSupport/USB/NetBSD
echo "Finished"

The path after ‘co’ is the path of the directory you want to download. Just double click on cvsfetch and the path is downloaded to the same directory as cvsfetch.

Aug 11, 2015 3:03pm

Jeffrey Lee (213) 6048 posts

I think the recommendation is to use ‘cvs -z9’ to ensure the connection is compressed (to save ROOL on some bandwidth costs).

With a bit of work it’s also possible to get ROOL’s Perl CVS scripts working under RISC OS: https://www.riscosopen.org/forum/forums/5/topics/313?page=3#posts-5842

Aug 11, 2015 3:18pm

Rick Murray (539) 13840 posts

Thanks. I wrapped that in a TaskWindow call, and added “-z 9”, and it worked. ;-)

But… A question and an observation. The question – can I download the Pi build, or does this mean checking out everything piece-by-piece-by-piece?

The observation – remember when I said it was not entirely realistic to end the ZeroPain trial period at the end of the year? Well, “cvs” crashes unless alignment faults are disabled, so such old software as that is still kicking around.

Aug 11, 2015 3:27pm

Jeffrey Lee (213) 6048 posts

The question – can I download the Pi build, or does this mean checking out everything piece-by-piece-by-piece?

That’s what the Perl scripts are for – see here for some slightly crappy instructions on how to get them (this guide is perhaps a bit better, but it assumes you’re setting up for write access)

If you have the Perl scripts you can just do “checkout BCM2835Dev” to grab all the components (although, updating an existing source tree is a bit trickier)

Aug 11, 2015 3:28pm

Colin (478) 2433 posts

I think you need Jeffreys perl scripts for that. I think they read the components file and module database that !builder uses to automate a build fetch. I just fetch the odd part via CVS.

Aug 11, 2015 4:49pm

Rick Murray (539) 13840 posts

Just had a quick look at the ticker handler code and I noticed that the handler is called using BLX – does this suggest that the code could be written in Thumb?

Aug 11, 2015 5:13pm

Jeffrey Lee (213) 6048 posts

Yes, although there are no guarantees that it will work, or which OS versions it will work on – the BLX was added there as a micro-optimisation to save one instruction, not with the goal of enabling Thumb support.

Even without the BLX, Thumb could work on ARMv7 anyway due to all branches being interworking.

Aug 12, 2015 8:29am

Rick Murray (539) 13840 posts

Yep, that looks pretty broken!

Happened again this morning, about half an hour after (yet another brief) power cut.

The huge values – while BASIC and RISC OS itself very helpfully say “Number too big” when attempting to convert 4294551774 into hex, going the other way via OS_ConvertCardinal4 tells me that &FFFFFFFF is 4294967295 – so if they all have values “about like this” as their trigger time, it appears as if the times somehow got erroneously calculated away from zero, to give a massive number.
Would it not be possible to raise some behaviour for delta values that are excessively high (over &7FFFFFFF), as that would equate to something like 250 days until the next repeat.

I’m going to apply your changes to my kernel, I’ll get back to you on its behaviour. Fingers crossed! ;-)

Aug 12, 2015 8:50am

Colin (478) 2433 posts

Would it not be possible to raise some behaviour for delta values that are excessively high (over &7FFFFFFF), as that would equate to something like 250 days until the next repeat.

There’s no need with the changes. The high values were caused when the head of the list had a delta value of 0. After 1 was taken from it it became &FFFFFFFF – a valid high value. The seamingly random high numbers you are getting are the &FFFFFFFF value being counted down at cs intervals. Jeffrey’s changes will avoid this happening.

Is there a known issue with OS_CallAfter ?

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Aug 10, 2015 7:25am Rick Murray (539) 13840 posts	I have a problem where a module that is supposed to do something at periodic intervals…doesn’t. It is almost as if sometimes OS_CallAfter forgets . Has anybody come across something like this before?

Aug 10, 2015 12:06pm Jon Abbott (1421) 2651 posts	You can’t rely on callbacks to work as they’re depended on many factors, such as (IIRC) the Supervisor stack being clear, IRQsema being clear, IRQ’s being enabled and the SWI handler exiting to User mode. I had no end of issues with it in ADFFS and had to write my own callback handler in the end. You could use TickerV, OS_CallEvery (I think this hangs off TickerV) or RTSupport instead.

Aug 10, 2015 12:19pm Rick Murray (539) 13840 posts	CallAfter, not CallBack. This is the “happens on time but don’t mess with anything or the OS will blow up” version.

Aug 10, 2015 12:31pm Jeffrey Lee (213) 6048 posts	Are you checking for any errors returned by OS_CallAfter? If you’re calling it from an interrupt context then it’s possible there isn’t enough free memory in the RMA (or is it the system heap?) for it to add the entry (and you can’t resize dynamic areas from IRQ handlers). If the SWI isn’t returning an error, then it’s possible you’re encountering this TickerV corruption bug, which I’m yet to get to the bottom of (I tried adding some debug code to check for it, but either my initial test case wasn’t very good or it’s a very timing-sensitive issue and the addition of the debug code has caused it to not happen). I’m also starting to wonder whether some of the issues people are reporting with NetTime – like your case of it being > 1 day since the last correction – are also a symptom of the TickerV corruption bug. If you have an easy-to-repro test case (i.e. one that happens after a few days rather than one that happens after a few weeks) then I’d love to hear about it!

Aug 10, 2015 1:22pm Rick Murray (539) 13840 posts	My server module (started at boot) often fails to respond until it has had a “kick”, though sometimes the kick fails. Thing is, it is an entire application. I’ve not tried to narrow it down. In your other post, you said “It looks like somehow the kernel’s ticker event chain is becoming corrupted. […] so both it and everything that was located after it in the schedule weren’t being processed.” Do you have a tool to dump the contents of the ticker chain? If the thing stops responding (or NetTime), I could look at the chain to see if it made sense? If it does, then the problem lies elsewhere…

Aug 10, 2015 8:05pm Jeffrey Lee (213) 6048 posts	Do you have a tool to dump the contents of the ticker chain? DebugTools has a *Tickers command for that very purpose. However the output wasn’t particularly human-readable (it was basically just a raw dump of the internal ticker chain format), so I’ve now tweaked it to make it a bit better. Grab a build of the latest version of the module here If the ‘Repeat’ time for an entry is lower than the ‘In …’ heading then that’s a sign that you’ve been struck by the bug.

Aug 10, 2015 8:39pm Rick Murray (539) 13840 posts	This is strange. My module isn’t listed with `Tickers`. There are two CallAfter events that are always running. One every 50cs to check for incoming connections. I know this is running as the server module just answered a connection from my phone. The other, some temporary code to ‘kick’ the listener socket every 180000cs (30 minutes). I can see this has been working (it reports to DADebug when it ‘kicks’). Called with code like: `reply = SetPending((int)module_kickcallafter, (int)wsp, 180000); kick_ca_ispending = TRUE;` (wsp* is a copy of the “pw” (privateword) passed during the module initialisation) and: `; Entry: R0 = Pointer to routine ; R1 = Module private word ; R2 = Delay (centiseconds) SetPending STMFD R13!, {R1-R3, R14} MOV R3, R2 ; remember delay MOV R2, R1 ; private word in R2 MOV R1, R0 ; address in R1 MOV R0, R3 ; delay in R0 SWI XOS_CallAfter MOVVC R0, #0 LDMFD R13!, {R1-R3, PC}` I have added some extra code to write to DADebug if the reply from CallAfter is non-NULL; though this is normally called in either SVC or USR mode – with the sole exception of the kick routine, everything else needs filesystem access; so the CallAfter schedules a CallBack from which it is safe to do stuff without the dreaded FileCore in use. Anyway, no error messages (as yet). So, interesting. Why isn’t my module showing up in the list? ;-)

Aug 10, 2015 9:58pm Martin Avison (27) 1494 posts	DebugTools has a Tickers Just tried that on Iyo with RO5.22, and the last entry was In 649151161 cs: Code – Address &FC055018 is at offset &00001554 in module TerritoryManager Wkspc – &20002114 and I was wondering why on earth TerritoryManager would need a call after about 375 days* ?!

Aug 10, 2015 10:15pm Jeffrey Lee (213) 6048 posts	and I was wondering why on earth TerritoryManager would need a call after about 375 days ?! Daylight savings time. (And unless I’m mistaken, that value comes out as 75 days, not 375)

Aug 10, 2015 11:16pm Rick Murray (539) 13840 posts	Well, that didn’t take long. Tickers Ticker event claimants... In 4294551774 cs: Code - Address &FC3A8494 is at offset &000004F8 in module EtherUSB Wkspc - &20148854 Repeat - Every 2 cs In 4294551777 cs: Code - Address &FC2D31A4 is at offset &00000704 in module ScreenBlanker Wkspc - &20040E14 Repeat - Every 20 cs In 4294551778 cs: Code - Address &20252C84 is at offset &000103CC in CoolSwitch module's workspace Wkspc - &FB407C84 In 4294551783 cs: Code - Address &FC1E3AC8 is at offset &000002F0 in module DWCDriver Wkspc - &20015DA0 Repeat - Every 11 cs In 4294551794 cs: Code - Address &201DE288 is at offset &000007F4 in module CJEPower Wkspc - &2013BC74 In 4294551803 cs: Code - Address &2032685C is at offset &000E3FA4 in CoolSwitch module's workspace Wkspc - &FB407DEC Repeat - Every 101 cs In 4294551833 cs: Code - Address &FC14E6D8 is at offset &00003E90 in module WindowManager Wkspc - &20003B94 Repeat - Every 100 cs In 4294552194 cs: Code - Address &FC39404C is at offset &000006A8 in module LanManFS Wkspc - &FB407644 Repeat - Every 4500 cs In 4294554676 cs: Code - Address &FC1D8100 is at offset &000006F8 in module USBDriver Wkspc - &2000D0B4 Repeat - Every 3001 cs In 4294578517 cs: Code - Address &2024635C is at offset &00003AA4 in CoolSwitch module's workspace Wkspc - &FB407BBC In 4294670114 cs: Code - Address &20252C54 is at offset &0001039C in CoolSwitch module's workspace Wkspc - &FB407C84 In 4294893971 cs: Code - Address &FC2D1A84 is at offset &0000029C in module RTC Wkspc - &FB406F8C Repeat - Every 360000 cs The non-existant server was also ‘dead’, kicking it revived it and appears to have revived part of CoolSwitch, although some of the earlier when-it-worked timings are a bit odd. This is sort of what it should look like, from around 9pm this evening: Tickers Ticker event claimants... In 1 cs: Code - Address &FC3A8494 is at offset &000004F8 in module EtherUSB Wkspc - &20148854 Repeat - Every 2 cs In 2 cs: Code - Address &FC2D31A4 is at offset &00000704 in module ScreenBlanker Wkspc - &20040E14 Repeat - Every 20 cs In 4 cs: Code - Address &FC1E3AC8 is at offset &000002F0 in module DWCDriver Wkspc - &20015DA0 Repeat - Every 11 cs In 8 cs: Code - Address &201DE288 is at offset &000007F4 in module CJEPower Wkspc - &2013BC74 In 36 cs: Code - Address &20252D04 is at offset &0001044C in CoolSwitch module's workspace Wkspc - &FB407C84 In 38 cs: Code - Address &FC14E6D8 is at offset &00003E90 in module WindowManager Wkspc - &20003B94 Repeat - Every 100 cs In 76 cs: Code - Address &2032685C is at offset &000E3FA4 in CoolSwitch module's workspace Wkspc - &FB407DEC Repeat - Every 101 cs In 267 cs: Code - Address &FC1D8100 is at offset &000006F8 in module USBDriver Wkspc - &2000D0B4 Repeat - Every 3001 cs In 999 cs: Code - Address &FC39404C is at offset &000006A8 in module LanManFS Wkspc - &FB407644 Repeat - Every 4500 cs In 96992 cs: Code - Address &20252CD4 is at offset &0001041C in CoolSwitch module's workspace Wkspc - &FB407C84 In 124589 cs: Code - Address &2024635C is at offset &00003AA4 in CoolSwitch module's workspace Wkspc - &FB407BBC In 261776 cs: Code - Address &FC2D1A84 is at offset &0000029C in module RTC Wkspc - &FB406F8C Repeat - Every 360000 cs

Aug 10, 2015 11:21pm Rick Murray (539) 13840 posts	BTW, would this explain why sometimes it seems as if Wimp_PollIdle “gives up” and just behaves like regular Wimp_Poll?

Aug 11, 2015 12:26pm Rick Murray (539) 13840 posts	Okay, I have “found” my module in the tickers list: Tickers Ticker event claimants... In 1 cs: Code - Address &FC3A8494 is at offset &000004F8 in module EtherUSB Wkspc - &20148854 Repeat - Every 2 cs In 10 cs: Code - Address &FC2D31A4 is at offset &00000704 in module ScreenBlanker Wkspc - &20040E14 Repeat - Every 20 cs In 10 cs: Code - Address &FC1E3AC8 is at offset &000002F0 in module DWCDriver Wkspc - &20015DA0 Repeat - Every 11 cs In 12 cs: Code - Address &FC14E6D8 is at offset &00003E90 in module WindowManager Wkspc - &20003B94 Repeat - Every 100 cs In 27 cs: Code - Address &201DE2E8 is at offset &000007F4 in module CJEPower Wkspc - &2013BC74 In 46 cs: Code - Address &202DBB24 is at offset &000E474C in CoolSwitch module's workspace Wkspc - &FB407DC4 In 1908 cs: Code - Address &FC1D8100 is at offset &000006F8 in module USBDriver Wkspc - &2000D0B4 Repeat - Every 3001 cs In 2639 cs: Code - Address &202DBAF4 is at offset &000E471C in CoolSwitch module's workspace Wkspc - &FB407DC4 In 4072 cs: Code - Address &FC39404C is at offset &000006A8 in module LanManFS Wkspc - &FB407644 Repeat - Every 4500 cs In 46571 cs: Code - Address &2024AC3C is at offset &00053864 in CoolSwitch module's workspace Wkspc - &FB407BE4 In 224343 cs: Code - Address &FC2D1A84 is at offset &0000029C in module RTC Wkspc - &FB406F8C Repeat - Every 360000 cs MyTickerDebug Addresses: Socket CallAfter = &202DBB24 Socket CallBack = &202DBB0C LTask CallAfter = &202DBB18 LTask CallBack = &202DBB00 Kicker CallAfter = &202DBAF4 Status: Socket CallAfter pending = Yes Socket CallBack pending = No LTask CallAfter pending = No LTask CallBack pending = No Kicker CallAfter pending = Yes * Interesting. Zap just apologised that it didn’t have enough memory to provide me with a copy of CoolSwitch’s workspace (I have about 180MiB free). I have killed CoolSwitch, and my module is appearing correctly in the tickers list now.

Aug 11, 2015 12:34pm Rick Murray (539) 13840 posts	Right, I’ve removed CoolSwitch from my boot and my module is showing up in the tickers list. I’ll leave it running awhile too see if the ticker chain times muck up. While I’m doing this, I might change the current behaviour (a CallAfter that schedules a CallBack; a CallBack that does the work and then schedules a new CallAfter at the end) to simply be a CallEvery that will schedule a CallBack if one isn’t already pending… BTW – there is a clash with the DDE and DebugTools. If the module is loaded at boot prior to the DDE being ‘seen’ by the filer, the DDE will fail to boot, syntax error re. *Canonical.

Aug 11, 2015 12:58pm Jeffrey Lee (213) 6048 posts	Well, that didn’t take long. Yep, that looks pretty broken! Not sure why I didn’t spot these issues when I first looked at the kernel’s CallAfter/CallEvery code, but here are two I’ve spotted: There’s a potentially dangerous interrupt hole in the code that deals with CallEvery’s (line 172-ish). It the user’s routine is naughty and returns back to the kernel with IRQs enabled then it will re-insert the ticker node correctly (due to InsertTickerEvent disabling iRQs locally), but then when it goes to check to see if there’s another event which needs firing it will be doing that check with IRQs enabled – potentially leading to bad things if there’s another timer interrupt during that time. I think the change in 2013 to treat the times as unsigned will have made this problem even worse – if there are two events which are due to fire at the same time, then when the first event is removed from the list in order to be executed, the second event will be left there, with a TickNodeLeft of 0 (internally the ticker chain stores the delta time between each entry in the list – so that the kernel only has to update the time remaining of the first list entry instead of walking through the whole list each tick). If the first event re-enables IRQs during its processing, and a timer IRQ fires, then the code at line 148 will subtract 1 from 0 (the TickNodeLeft of the second event), store back the result of &FFFFFFFF, and then exit without firing the event. I think that’s a pretty safe bet for what happened to you last night (But unfortunately I don’t think it’s what happened to me when I was able to reproduce it – the times I was seeing from *Tickers weren’t anywhere near as high). If my condition codes are correct then swapping the MOVNE pc, lr for MOVHI pc, lr should fix the unsigned wrap around problem. And the interrupt hole should be fixable by moving the “IRQ’s off again” at line 193 down to after the 10 label (for something like this it’s best for the kernel to play it safe and not assume the users routine restored interrupt state correctly). AFAIK the issues with SmartReflex/TickerV only started getting reported in 2014, after the unsigned time changes in 2013 – so rather than doing any more exhaustive testing myself, maybe it would be enough to make those changes, check them in and wait to see if people report any further issues. (The other alternative I can think of would be to make a testbed containing a copy of the kernel’s ticker code and manually call it recursively at various points to see what issues pop up – but if I’m fixing the only interrupt hole I can spot then I don’t really know where else I’d be putting the checks!) BTW, would this explain why sometimes it seems as if Wimp_PollIdle “gives up” and just behaves like regular Wimp_Poll? Not sure – I think the Wimp maintains the time itself for Wimp_PollIdle.

Aug 11, 2015 1:04pm Jeffrey Lee (213) 6048 posts	BTW – there is a clash with the DDE and DebugTools. If the module is loaded at boot prior to the DDE being ‘seen’ by the filer, the DDE will fail to boot, syntax error re. Canonical. Interesting – perhaps the DDE provides its own version of that command? There’s also Where, which is now present in both the Debugger and DebugTools (Debugger’s version is better).

Aug 11, 2015 1:31pm Rick Murray (539) 13840 posts	Yep, that looks pretty broken! ;-) For the moment, it is behaving. For the moment. That said, stuff thinking CoolSwitch had an infinite workspace indicates that something was wrong! Not sure why I didn’t spot these issues when I first looked at the kernel’s CallAfter/CallEvery code, Okay, hands up every programmer that has had a moment like that. <raises hand> Sometimes, the best way to deal with (potentially) problematic code is to leave it, do other stuff, and come back a while later. If my condition codes are correct then swapping […] Thanks. I’ll keep an eye on the CVS and do a diff to modify my kernel. I’m running a build from last October (as I never got CVS to work for me and it is really time consuming unpacking the gzip) because I depend upon CE(S)T in the UK territory, and CLib dealing with it correctly. Which reminds me, I really ought to build some test case software to bash localtime() on every OS version I can lay my hands upon. Interesting – perhaps the DDE provides its own version of that command? AcornC/C++.!SetPaths.Lib32.canonical (utility)

Aug 11, 2015 2:53pm Colin (478) 2433 posts	I never got CVS to work for me I have a directory containing !cvs and the taskobey file ‘cvsfetch’ as shown below `dir <obey$dir> set UnixEnv$cvs$sfix "" set alias$cvs <obey$dir>.!CVS.Bin.cvs %%0 echo "Started" cvs -d :pserver:anonymous:@riscosopen.org:/home/rool/cvsroot co mixed/RiscOS/Sources/HWSupport/USB/NetBSD echo "Finished"` The path after ‘co’ is the path of the directory you want to download. Just double click on cvsfetch and the path is downloaded to the same directory as cvsfetch.

Aug 11, 2015 3:03pm Jeffrey Lee (213) 6048 posts	I think the recommendation is to use ‘cvs -z9’ to ensure the connection is compressed (to save ROOL on some bandwidth costs). With a bit of work it’s also possible to get ROOL’s Perl CVS scripts working under RISC OS: https://www.riscosopen.org/forum/forums/5/topics/313?page=3#posts-5842

Aug 11, 2015 3:18pm Rick Murray (539) 13840 posts	Thanks. I wrapped that in a TaskWindow call, and added “-z 9”, and it worked. ;-) But… A question and an observation. The question – can I download the Pi build, or does this mean checking out everything piece-by-piece-by-piece? The observation – remember when I said it was not entirely realistic to end the ZeroPain trial period at the end of the year? Well, “cvs” crashes unless alignment faults are disabled, so such old software as that is still kicking around.

Aug 11, 2015 3:27pm Jeffrey Lee (213) 6048 posts	The question – can I download the Pi build, or does this mean checking out everything piece-by-piece-by-piece? That’s what the Perl scripts are for – see here for some slightly crappy instructions on how to get them (this guide is perhaps a bit better, but it assumes you’re setting up for write access) If you have the Perl scripts you can just do “checkout BCM2835Dev” to grab all the components (although, updating an existing source tree is a bit trickier)

Aug 11, 2015 3:28pm Colin (478) 2433 posts	I think you need Jeffreys perl scripts for that. I think they read the components file and module database that !builder uses to automate a build fetch. I just fetch the odd part via CVS.

Aug 11, 2015 4:49pm Rick Murray (539) 13840 posts	Just had a quick look at the ticker handler code and I noticed that the handler is called using BLX – does this suggest that the code could be written in Thumb?

Aug 11, 2015 5:13pm Jeffrey Lee (213) 6048 posts	Yes, although there are no guarantees that it will work, or which OS versions it will work on – the BLX was added there as a micro-optimisation to save one instruction, not with the goal of enabling Thumb support. Even without the BLX, Thumb could work on ARMv7 anyway due to all branches being interworking.

Aug 12, 2015 8:29am Rick Murray (539) 13840 posts	Yep, that looks pretty broken! Happened again this morning, about half an hour after (yet another brief) power cut. The huge values – while BASIC and RISC OS itself very helpfully say “Number too big” when attempting to convert 4294551774 into hex, going the other way via OS_ConvertCardinal4 tells me that &FFFFFFFF is 4294967295 – so if they all have values “about like this” as their trigger time, it appears as if the times somehow got erroneously calculated away from zero, to give a massive number. Would it not be possible to raise some behaviour for delta values that are excessively high (over &7FFFFFFF), as that would equate to something like 250 days until the next repeat. I’m going to apply your changes to my kernel, I’ll get back to you on its behaviour. Fingers crossed! ;-)

Aug 12, 2015 8:50am Colin (478) 2433 posts	Would it not be possible to raise some behaviour for delta values that are excessively high (over &7FFFFFFF), as that would equate to something like 250 days until the next repeat. There’s no need with the changes. The high values were caused when the head of the list had a delta value of 0. After 1 was taken from it it became &FFFFFFFF – a valid high value. The seamingly random high numbers you are getting are the &FFFFFFFF value being counted down at cs intervals. Jeffrey’s changes will avoid this happening.