OS_Byte 13 event state wraps
Jon Abbott (1421) 2651 posts |
Tested on Pi3 and SA build date 19-08-16 using event 4 If the event is already disabled, OS_Byte 13,4 turns the event on indefinitely, it’s then impossible to turn the event off. Repro:
The final state should be 0, but is instead 255. |
Jeffrey Lee (213) 6048 posts |
255 is treated as a failsafe value – if there are so many systems interested in an event that the 8-bit counter for that event is in danger of wrapping then then the counter will lock itself at the maximum value, with neither OS_Byte 13 nor 14 having an effect. This allows you to do things like call OS_Byte 14 300 times and then call OS_Byte 13 299 times without the event being becoming disabled at any point. Of course if you called OS_Byte 13 once more then the event would still be enabled, but having the event enabled when it isn’t needed is safer than having it disabled when it is needed. The bug here is that you’re causing the counter to wrap by decrementing it too many times – although the OS could trap that and clamp it to zero, really it’s a bug in your code, because you’re not balancing the event enable & disable calls. And when faced with imbalanced calls, maybe the counter being locked to 255 is a good thing – it prevents the buggy code from doing any further damage (if it was clamped to zero, it might mean that the event gets disabled while someone else is still using it) Of course, other people may disagree with me on this! |
Jon Abbott (1421) 2651 posts |
If you’re decreasing an event below zero, you’d expect the event to shut off, not lock on to the point you have to restart the system. It wouldn’t be so bad if OS_Byte 14 corrected the situation. Up until RO5, the behaviour was to leave the event disabled, locking it on makes no logical sense. This change in behaviour is not mentioned on the OS_Byte 13 wiki page. Whoever implemented the change failed to update the documentation to state going negative will require a reboot to correct. |
Jeffrey Lee (213) 6048 posts |
It was here where the behaviour was changed. Another bug fix is that once the event semaphores had saturated at 255, OS_Byte 13 was still happy to decrement the semaphore, so for example 256 enables followed by 255 disables would have disabled the event. If you read the PRM, you’ll see that OS_Byte 14 and OS_Byte 13 exist in order to reduce CPU load, by allowing the OS to avoid distributing events that it knows that no-one is interested in. You can’t rely on the behaviour of OS_Byte 13 to disable event generation – apart from the failsafe behaviour there’s also the possibility that another piece of software (e.g. running under IRQ) could come along and enable the event that you just disabled. Also if you read the full commit message for the above change then you’ll note that it’s only event numbers 0-31 which have counters available – event numbers outside that range have never been influenced by OS_Byte 13/14. OS_Byte 13 is just a hint to allow reduction of CPU load, not a commandment that the event must be silenced. |
Jon Abbott (1421) 2651 posts |
Got caught out by this issue again, having found it to be the cause of machine locks. I don’t use events on RO5 preferring RTSupport instead, so the locks are being caused by other software not initialising before claiming EventV. I’m not sure if there was/is any developer guidance on how to claim EventV, but I’ve seen many poorly written code sequences that claim EventV, setup variables etc, then enable the event – a sure recipe for a page zero access, abort or lock! Ignoring poorly written software for the minute, I fully understand the reasoning behind locking the event semaphore to 255 if an event is turned on too many times. I don’t however understand why its allowed to wrap below zero, it just shouts bug to me. |
Jeffrey Lee (213) 6048 posts |
I suspect the justification for that is that (for RISC OS 5) the only time when it could possibly wrap below zero is if buggy code has called OS_Byte 13 too many times. In which case it’s probably safer to leave the event on than to disable it. (Or maybe it’s just an edge case which the programmer didn’t consider at the time) |
nemo (145) 2546 posts |
If there’s more disables than enables than the one thing that is certain is that something has gone wrong. Leaving the event enabled is therefore the only sensible response – the counting has failed, stop counting. It is, as Jeffrey has explained, only an optimisation anyway, and as we all know there’s only two rules for optimisation:
|
Jon Abbott (1421) 2651 posts |
I find the whole change odd, if the bug fix was to handle more than 255, I personally would have increased the semaphores to 32bit internally and only put 255 in the OS Vars if it went higher. I would have left the previous behaviour of ignoring a decrease below zero.
The key thing with semaphores is the caller must track how many times they’ve increased the semaphore. When a shutdown occurs either on purpose or due to an Error, Exit etc the respective number of calls should be made to reduce the semaphore. I’ve yet to see a single piece of code do that, they all assume they’ve called OS_Byte 14 once, so issue one OS_Byte 13 regardless. An issue I can see with global semaphores is when processes are terminated, unless the OS is tracking them on a per process basis they’re immediately indeterminate once a process is terminated. Another issue is programmers that try to force an event off by repeatedly calling OS_Byte 13 (eg Deeva), instead of implementing internal semaphores to track if they’re capturing an event, or where they assume events aren’t occurring (eg Ibix the Viking, Hoverbod) It’s a pity the original PRM didn’t include a code example, stating your code must track how many times its increased a semaphore and reduce accordingly when it terminates. |
Rick Murray (539) 13840 posts |
? Is it wrong to assume that if you turn an event on once, you turn it off once? |
nemo (145) 2546 posts |
Broken.
Broken. Both these scenarios are simple bugs. They were wrong when they were released, they were wrong on RISC OS, they were wrong on Arthur, they were wrong on MOS (Beeb¹).
The only way you can decrease below zero is that something is trying to disable what it did not enable. By definition that can cause other systems to malfunction (as it could disable the event the other system is relying on). Consequently disabling the optimisation is the only sensible recourse. As for 32bit semaphores, we never have anything like 100 claimants as it is. That’s not the problem. ¹ The BBC AUG says:
In other words, there are ALWAYS other EventV claimants, any of which might have enabled any of the events. Assuming anything else is dead wrong and always has been. |
nemo (145) 2546 posts |
On my machine, which has been running for some days now, I have: Event Enables Name 4 1 VSync 11 3 Key transition 19 3 Internet socket 21 2 Misc (RO4 scroll wheel) By these claimants: EventV DeepKeys+5D0 TaskWindow+14BC WindowManager+A0D0 ShareFS+C98 Freeway+304 ZeroConf+268 WindowScroll+3F8 ScreenBlanker+788 UtilityModule+1C5D4 "EventProcessScrolls" Kernel+126C (default handler) EventV is a poor way of sharing a vector, and by far the worst usage is Vsync (I’m not bothered about WindowScroll getting called every time I press a key, I am irritated that ShareFS is getting called 50 times a second). All these things should have their own vector, which is what VectorExtend enables. I have been delaying the (long overdue) release because of RO6.2’s gargantuan SWI and vector handlers… but I think I will do without 6.20 compatibility and wait until someone asks really nicely. |
Rick Murray (539) 13840 posts |
Common sense?
One shouldn’t claim anything that results in the OS calling into your code (which you should assume could happen the instant you enable the event) until you’re in a state where that will work.
The bug is surely something turning it off too many times? If this is the case, we will have – in the process – reached a point where it has been turned off behind the back of something that was still using it.
Indeed. EventV is “all the other stuff”. Buffers full/empty, ADC conversion, VSync, Escape, Econet, mouse/key transition, a new bar (and not one that serves Stella Artois), MIDI, Internet, battery… With that in mind, it is foolish to think that there aren’t other services using some of these events. Just wrote a quick program to report the number of claims, and on my machine it’s eight for the key transition and four for the Internet event. Surprised nothing is hanging on to VSync.
Too many (disparate) things all lumped together.
It could have been worse, it could have been sitting on TickerV. ;-) Surely using VSync means that ShareFS’s responsiveness depends upon the monitor refresh rate?!? |
nemo (145) 2546 posts |
Let’s be very clear: Your code will get called regardless of whether you enabled any particular event. Your code is on EventV, which is called if anyone has enabled any event. Your code will therefore be called, by definition, under completely unpredictable circumstances. |
nemo (145) 2546 posts |
You misunderstand. ShareFS wants to know about an Internet Socket Event, so it claims EventV and enables Event 19. Somebody else wants VSync, so claims EventV and enables Event 4. Now, whenever there is an internet socket event or a VSync, both claimants get called. This was a regrettable compromise on the Beeb, sharing one of the very few vectors between a couple of Sideways ROMs. On RISC OS, with ~200 modules running, it’s very silly indeed. It is analogous to Service Calls, which are just about the worst API in RISC OS – again, inherited from the BBC Micro which might, if your spent money and got your soldering iron out, have ten sideways ROMs listening for twenty service calls. I have about 200 modules listening to well over 200 service calls. In RISC OS the Service Call mechanism is the primary victim of the lack of vectors. Even the extensive RISC OS 4 service call ‘optimisation’ could only damp the flames down a bit (which is why ImageFileRender_Artworks is called every time there’s a MimeMap request). |
Colin (478) 2433 posts |
The UpCallV is bad as well I doubt it was envisaged to be the ‘best’ way to use USB when it was conceived. Most of the other stuff on the vector is notification of occasional events. But then again I don’t think that USB should be on a vector at all. |
nemo (145) 2546 posts |
Actually I disagree, I think every USB device should have its own vector, plus an overall USB admin vector. USB device inserted, a vector is dynamically allocated for it and announced on the fixed USB vector. Thereafter all device-specific events occur on the device’s own vector. But I like vectors. They are what makes RISC OS so flamboyantly insecure. |
Colin (478) 2433 posts |
Don’t mind that, just don’t like using vectors for USB transfer callbacks. |
nemo (145) 2546 posts |
No, agreed. Vectors for transactions, direct (vector-negotiated) API for data transfer. |