RPi 4B with RISC-OS 5.28 & 5.29 Lockups
Daniel Garrod (9459) 34 posts |
Hi everyone! I run The Jolly Roger BBS on my Pi4 and just lately I have been experiencing hard lock ups running RISC-OS, I have tried both versions 5.28 & 5.29. Kind regards, |
Colin Ferris (399) 1818 posts |
What programs are you running? |
Martin Avison (27) 1494 posts |
Does an Alt-Break show a program to stop, and does RISC OS then start running? Or is a complete restart required? |
Daniel Garrod (9459) 34 posts |
The Pi4 is currently running on RISC-OS 5.29 (06.01.23) with ArmBBS BBS software and a Telnet server, Alarm is also used for Backup tasks, nothing much else is running. I am using the nightly HardDisc4 boot sequence. I have tried the 5.28 SD image as-well, It locks up very randomly on both versions. When it locks up, the only way is to do a reset by holding the power button down until it goes off, then I power on again. |
Jon Abbott (1421) 2651 posts |
Not a Pi4, but I was seeing them on my Pi3 – since that post I’ve not used the Pi3 long enough to see if it still occurs with 5.29/5.30 |
Stuart Swales (8827) 1357 posts |
Though there was this “Out-of-spec USB devices can cause a system crash (Pi 4 only)” on https://www.riscosopen.org/wiki/documentation/show/RISC%20OS%20bugs%20specific%20to%20the%20Raspberry%20Pi Wasn’t there some issue with early Pi 4 firmware that could GPU lockup? |
Andrew McCarthy (3688) 605 posts |
RE: Out-of-spec USB devices. I assumed all disc adapters were equal; I discovered they are not. Here are a couple of links to help; if you think there may be an issue. https://jamesachambers.com/raspberry-pi-4-usb-boot-config-guide-for-ssd-flash-drives/ An out-of-date Line Editor module will cause lockups if using the task window. I have a Pi 4, and I haven’t seen a lockup for some time now. Is it possible that one of your programs has a memory leak; you may need to set up a remote debug session. Or can you trace it to a specific action or event? I see you say you are using a nightly build. Try a new build, using one of the stable releases. A final thought. How is the disc drive? Have you tried a verify or checked it with DiskKnight? |
Stuart Swales (8827) 1357 posts |
I’ve experienced a ‘user lockout’ as I have PS/2 keyboard and mouse attached to an old KVM then wired via PS/2-to-USB adaptor into a Pi 4. Maybe every other day on switching over to the Pi it will appear to be unresponsive to keyboard or mouse input, but ‘gets better’ when I disconnect and then reconnect the USB lead at the Pi end. |
Andrew McCarthy (3688) 605 posts |
That reminds me, my Pi 3 gave me a few headaches (lockups); I traced it to a dodgy cabled mouse; no more issues since I switched to a USB dongle wireless mouse. |
Paul Sprangers (346) 525 posts |
I have exactly the same experience, albeit with a wireless keyboard/mouse and a new KVM. But could it be related to Daniels lockups? |
Colin Ferris (399) 1818 posts |
Is there a way of a logging the PC to a circular buffer or a way of narrowing down the fault? |
Daniel Garrod (9459) 34 posts |
Just come back from going out, I have just tried to login via VNC to check on the BBS, ping and VNC doesn’t work, so it has locked up again :( I am running the Pi4 as a headless with no keyboard and mouse connected. |
Martin Avison (27) 1494 posts |
You may find that my Reporter will give some clues about the lockup, as it will log any command activity and may even catch an error. You would need Logging on to capture details even after a lockup, but that may slow things down a little. See website for download, then !Help for details. |
Daniel Garrod (9459) 34 posts |
Thanks Martin, I will try it out and see what happens… |
Andrew McCarthy (3688) 605 posts |
An interpretation of what you’ve said above, it’s been running with no problems for a while. Now it’s started to lock up. You have made no changes or applied any updates. There is plenty of disc space and no errors. Am I correct? I wouldn’t be using the nightly builds on a live service unless you have good backups or, doing so, fixed a particular issue. Based on what you’ve said, I would want to create logs; local then remote. |
Daniel Garrod (9459) 34 posts |
I originally had the Pi4 running on 5.28, it would lockup at random times, so that is why I went over to 5.29, which didn’t solve the issue. I have put the latest upgrades on the Pi4 in hope that it would help the situation, but it hasn’t worked… I have started collecting data using Martin’s Reporter tool, so we will see what comes of it. |
Rick Murray (539) 13850 posts |
For mine, I don’t run anything like VNC. In the sysop tools is access to the command line (currently it’s only possible to login as a sysop from a local LAN address). Any more then that, I’ll need to get up and turn the monitor on. ;) My issue is completely different. There are no lockups, but after a while (usually numerous days), my other thing, the weather station logger, simply ceases to work. The app is still running, but it no longer does the every five minute update. Does anybody know how long it takes the centisecond ticker to hit negative numbers? Maybe that’s part of the reason? Try yours without VNC, or Alarm, or anything other than the BBS server and the telnet gateway. |
Paolo Fabio Zaino (28) 1882 posts |
@ Rick
I can’t speak for the specifics here Rick, but I use a PollIdle of -1 (so the negative value as you mention it), when my Launchpad has to redraw a lot of icons (hundreds and hundres). In terms of user experience, that makes the return to the App for the next NULL event pretty much immediate, almost like a single task app. However, given that NULL events have low priority, it doesn’t make the App fully behave like a single task app, everything else stays responsive and obviously CPU usage grow. This allows a really fast background (and multi-tasking) redraw of a large amount of icons (please note that icons redraw on Launchpad is multi-tasking, aka doesn’t lock the desktop, doesn’t matter how many icons one has on their installation). IIRC, the only exception to the above that I have noticed is, with very low (or negative) values, if the App is doing I/O (for instance accessing a file), it effectively becomes a single task app, aka the other Apps in the desktop freeze for a while, until I/O completes (this even when the I/O is done in multi-tasking like I do it on Launchpad, aka read a portion of a file and then return control to desktop and read the next portion on the next NULL event and so on and so forth). Hope this helps, again, not sure how a useful info it is in this specific case. |
Stuart Swales (8827) 1357 posts |
‘A decision that I later came to bitterly regret’ Doesn’t it ‘go negative’ after about six months uptime? Yeah, 124^H^H^H 248 days. So after that point, -1 will suddenly transform from ‘a long time ago’ to ‘waaay in the future’. Waaay back I did have a module that incremented MetroGnome with an additional number (asm-time determined) of ticks per TickerV to get times to advance faster than usual. |
Steve Fryatt (216) 2105 posts |
According to the docs, it will just return a Null event immediately. The PRM has a specific comment on this in its remarks on the SWI.
What’s the intent of that, though? The value passed in R2 is an absolute monotonic time, so -1 is 0xffffffff, or the end of the monotonic time range. So in effect I think you’re saying “don’t return a Null event to me until around 500 days after the machine booted, or the same number of days after time last wrapped around”? I suppose that as the machine gets close to the 497-plus-a-bit days, there’s a time when applications might start to find the call returning somewhat sooner than expected if their intended delay causes the current time value to overflow when the two are added. |
Steve Fryatt (216) 2105 posts |
248, isn’t it?
Please tell me that we’re treating the output of OS_MonotonicTime as an unsigned 32-bit int…? |
Paolo Fabio Zaino (28) 1882 posts |
On Luanchpad there is a config file to allow to change this, the default is to use 0, but on my local system I have experimented with -1, the option to set is the following: RefreshPriority: 0 It’s in a file called Config and that is located in !DeskCfg:Gadgets.Iconbar.Launchpad [edit] |
Rick Murray (539) 13850 posts |
What’s an unsigned int in BASIC? There’s this grey area where stuff like “&FFFFFFFF is -1” exists. ;) |
Paolo Fabio Zaino (28) 1882 posts |
And that is what I thought when I tested it, however the result of my tests are quite the opposite. It is possible that it’s either transformed into a 0 (as the default valu above) or that triggers something (I had no time yet to dig this into the sources tbh). People can run their own experiment when it’ll be fully available. In the test what I have observed was a visible reduction of “latency” in the multi-tasking icons redraw start, compared to, for example, using 1 as value.
Possible, but we should also check how the value is used when negative, I am not 100% sure that it may be just a straight passing it through, but (again) I could be wrong, had no time to check it in the ROOL’s sources yet. |
Paolo Fabio Zaino (28) 1882 posts |
In BBC BASIC calling SYS? I doubt it.
This, lol saint words! And yes it did drive me nuts at the beginning when I got back in to BBC BASIC, together with having to “reset” my brain to use a language without datastructures and with a peculiar concept of “pointers”. XD
If someone has time, can we please check how the PollIdle treat such a value, pleaseee? I am still working and it’s a busy day here, so won’t be able to check RISC OS stuff before very late tonigth, thanks! :) |