RPi 4B with RISC-OS 5.28 & 5.29 Lockups

95 posts, 24 voices

Pages: 1 2 3 4

Jan 23, 2023 7:41pm Dave Higton (1515) 3526 posts	You guys need to brush up on your 2’s complement arithemtic. Really.

Jan 23, 2023 8:21pm Rick Murray (539) 13840 posts	If someone has time, can we please check how the PollIdle treat such a value, pleaseee? Should work, as -20 plus 10 is -10, which will be the same as big hex number to slightly larger hex number (I’m writing this on my phone so can’t ask BASIC to `PRINT ~-10` or anything). That being said, the Wimp Poll/Idle source is, uh, interesting. A Byzantine maze of exception. So I’m not going to go looking to see if it’s a signed or unsigned comparison…

Jan 23, 2023 8:24pm Frank de Bruijn (160) 228 posts	If someone has time, can we please check how the PollIdle treat such a value, pleaseee? Already tested that, ages ago, when I still thought giving it a large unsigned integer (a.k.a. a negative signed one) would work. It didn’t (i.e. it returned immediately).

Jan 23, 2023 8:29pm Steve Fryatt (216) 2105 posts	What’s an unsigned int in BASIC? In BBC BASIC calling SYS? I doubt it. For the purposes of OS_MonotonicTime and Wimp_PollIdle, where we’re just doing additions and subtractions and testing relative values, it shouldn’t matter. If you never compare two time values directly, but always subtract one from the other and compare the result to zero, things cancel out just fine. That’s why you get constructs like this one, paraphrased a bit from the PRM (page 3-185, in the description of Wimp_PollIdle): `SYS "OS_ReadMonotonicTime" TO now%%` `WHILE (now% - next_time%) > 0 next_time% += 100 ENDWHILE` `SYS "Wimp_PollIdle", 0, b%, next_time%` Assuming, of course, that internally, both OS_ReadMonotonicTime and Wimp_PollIdle are trading in unsigned 32-bit ints. The PRM doesn’t explicitly say they are, but the OS_ReadMonotonicTime description implies it.

Jan 23, 2023 8:32pm Frank de Bruijn (160) 228 posts	And Wimp_PollIdle doesn’t. Tested extensively, some time during the 2010s.

Jan 23, 2023 8:34pm Dave Higton (1515) 3526 posts	I’m not going to go looking to see if it’s a signed or unsigned comparison… The result is treated as signed. It doesn’t matter whether the numbers being compared are signed or unsigned, so long as (a) they are both of the same type, and (b) the difference is less than half the range of the numbers. Like I say, brush up on your 2’s complement arithmetic. IIRC there used to be a problem in the Wimp at the wrap-round of the monotonic timer, but it was fixed decades ago.

Jan 23, 2023 8:46pm Steve Fryatt (216) 2105 posts	Already done that, ages ago, when I still thought giving it a large unsigned integer (a.k.a. a negative signed one) would work. It didn’t (i.e. it returned immediately). The relevant code seems to be `SWI XOS_ReadMonotonicTime` `CMP R0,R2 BPL returnnull ; time's up! (use PL not CS)` from hereabouts down to the `02` label. Unless anyone knows different – I’ve just grepped the sources for likely looking flag constants and skimmed the code from there. That looks to be very similar to the `IF (now% - return_time%) > 0 THEN ...` construct from the Wimp_PollIdle PRM entry? If “now” is greater than “return time” then return? PS. That’s got to be a contender for a “most idiotic comment” award, hasn’t it? I can see that you’re using `PL` and not `CS`, because you’ve written “`BPL`” and not “`BCS`”. Perhaps if the comment said why you’d chosen `PL` over `CS`, it might shed some light on the thought process behind it. But then, it wouldn’t be the RISC OS sources if the comments were useful, I suppose.

Jan 23, 2023 9:35pm nemo (145) 2546 posts	Gentlemen, there’s a degree of confusion here. 1. The time parameter for Wimp_PollIdle is an absolute MonotonicTime. If you want to yield for one second, do `SYS"OS_ReadMonotonicTime"TOnow%: SYS"Wimp_PollIdle",mask%,q%,now%+100` – do not use fixed numbers; numbers believed to be magic; or numbers believed to be relative. 2. (Related point) Don’t do anything just because you tried it and it didn’t obviously catch fire. 3. If you want to be called back immediately, don’t use PollIdle – just use Wimp_Poll as the gods intended. The code in question is this simple (my labels) – the pink bit: Task_R2 is the MonotonicTime the task asked to idle until. It will therefore NOT receive a null event until MonotonicTime is greater than or equal to that number, by subtracting it and comparing with zero. But also note that null events are the lowest priority event, and your task will be called back regardless of the idle time if anything else at all happens. For the record, the events are delivered in this priority order (i.e. first one that happens gets delivered): ‘High priority’ pollword is nonzero (see bit 23) A Message has been queued or has bounced (events 17-19) A window must be redrawn A normal priority pollword is nonzero A drag has occurred A window has opened or moved A window should be closed The pointer has left a window The pointer has entered a window A menu has been selected The mouse has been clicked A keypress has been queued or detected None of the above, null events are not masked, and if there’s a PollIdle time it’s now or in the ‘recent’ past HTH

Jan 23, 2023 9:40pm nemo (145) 2546 posts	most idiotic comment In context, it gets credit for someone considering whether a signed comparison was required. Regrettably we have not always been so lucky in that regard – see VDU-12345678 in a TaskWindow for example.

Jan 23, 2023 9:45pm nemo (145) 2546 posts	Incidentally, the above priority list is why it is a Very Bad Idea™ for a task to send a message to itself – Wimp_Poll will just immediately return without a context switch. Do not mistake the WindowManager for any kind of multitasking scheduler. There is no concept of time-slicing or time-starvation. It’s a very simple hack of the single-tasking Wimp and it’s a testament to legions (generations, even) of Wimp programmers that the desktop works as well as it does.

Jan 23, 2023 9:49pm Steve Fryatt (216) 2105 posts	I wonder if the confusion here is coming from people not realising that there’s a limit to the time delay which can be applied to Null events by Wimp_PollIdle, due to the wrap-around of the arithmetic? The delay can only be half of the total timespan allowed by OS_ReadMonotonicTime: &FFFFFFFF centiseconds is just over 497 days, so we can only delay by 248-and-a-half days before the comparison wraps around from “in the future” to “in the past”. Given this, Paolo’s -1 will be “in the past” until the machine has been running for more than 248 days, and so for that time it will return immediately with a Null event (just as if Wimp_Poll were used). However, after 248 days, it will start to cause Wimp_PollIdle to block Null events until 497 days have elapsed. Then there will be another 248 days where it returns immediately, and so on. In a similar vein, a “large unsigned integer” is more likely to be “in the past” than “in the future” for the first few days after the system has booted. Testing the Wimp’s behaviour is tricky unless the monotonic timer is tampered with.

Jan 23, 2023 10:12pm Paolo Fabio Zaino (28) 1882 posts	First of all, thank you everyone for checking this! :) @ Steve That looks to be very similar to the IF (now% – return_time%) > 0 THEN … Not quite, the BPL instruction uses the N flag (Negative), so if N is not set… hence your BBC BASIC code should be: `IF (now% - return_time%) >= 0 THEN ...` Also, I need to make an apology, I have mentioned -1, but I do not use -1 as a magic number, sorry, I was in a hurry due other things going on at the same time and cutted my content way too much! For my test, I used: now% + user_delay% Where user_delay% in my test was set to -1 and normally is set to 0 for a redraw, while it’s set to 7 otherwise. Which in my understanding should return immediately, as explained by nemo, I do not consider RO as an OS with a real multi-tasking scheduler (as I have mentioned bilions of times, for me it’s an Acorn MOS pumped with steroids). So using an immediately expired time sounds like it should just return immeditately to me. In other words, the difference between +0 and +(-1), should be in the lines of, with +0 there might be something else happening (like try to switch to another task), while with +(-1) there is no chance and just return straight to my task and, in my case, execute the next chunk of work. I guess I am wrong then. Again thanks for checking it :)

Jan 23, 2023 10:17pm Stuart Swales (8827) 1357 posts	after 248 days, it will start to cause Wimp_PollIdle to block Null events Yes, one of the first clients that was updated to use Wimp_PollIdle (and incorrectly) was the internal MailMan at Acorn. After the requisite interval had elapsed on one of the manglement desktops, it no longer polled for mail. Ironically, this particular mangler had wangled one of the early A440 production systems so he could boot either RISC OS or RISCiX and was ‘going to be doing that all the time’. Clearly not for 248 days, they hadn’t… [Edit: as Nemo infers below, they did in fact barely use the system.]

Jan 23, 2023 10:31pm nemo (145) 2546 posts	Steve garbled we can only delay by 248-and-a-half seconds Days. If you manage to 1. Use RISC OS without crashing or resetting for eight months; and 2. Write a program that does so very little that the idle null poll is the first thing it hears about; then you win a special prize from the RISC OS faeries.

Jan 25, 2023 2:58pm Daniel Garrod (9459) 34 posts	Sorry guys, but I am lost, what are all these messages to each other got to do with my locking up issue? Daniel.

Jan 25, 2023 3:09pm nemo (145) 2546 posts	It moved off onto the orthogonal topic of tasks ceasing to perform their function (“locking up”) because of a failure to understand Wimp_PollIdle.

Jan 25, 2023 5:33pm Paolo Fabio Zaino (28) 1882 posts	@ Daniel It started with people trying to understand why you are experiencing that locking up, but unfortunately it has moved off onto a side discussion because some people have made quite an enormous set of assumptions, which led to a confusing comclusions. In your case (and even in others mentioned here) I don’t think that, even a mistaken use of Wimp_PollIdle is causing the issue, here is why: 1) Some people mentioned that use now% - 1 would cause problems at some point in the future, this is obviously not quite right (and can happen only in an extremely remote condition), here is why: The monotonc timer will reset after a while, in the end is a 32bit number, so when reached its maximum it will restart from zero. This seems to be the sole element evaluated by who thinks it will cause problems in the future, but there is more to consider: a) WIMP_PollIdle interval only decide which NULL events to be send to our task, doesn't preclude other events and messages, this is absolutely crucial to consider, because in the rare case of "locking up" by using -1, as soon as a new event comes in, obviously the interval will be reset to proper numbers and so everything will be back to normal. So I agree with Nemo, someone is definitely confused about how Wimp_PollIdle works and affects RO, hopefully this will help. b) NULL events are the lowest priority AFAIR, so any other event, will reach a task and that will also trigger a reset of the now% - 1, which will then solve the problem. c) To actually get into the case where the number produced is going to lock a task, we need to execute the now% - 1 just about it is resetting and that task must only be accepting NULL events, but that requires some very specific configuration, which I have never seen done tbh also because a task should accept at th every minimal a signal to quit. But, even in this case, on modern RO there is a chance to kill that specific task using [ALT]+[BREAK]. 2) Someone suggested to use Wimp_Poll in cases where we wish to receive ALL the NULL events, and that is true, but not quite the same. In fact using Wimp_Poll shoudl equate to use Wimp_PollIdle with now% + 0 or something, not the same as now% - 1, which will literaly return immediately to the original task, no process "swap" will happen at all. In your case, if the system is locking up, it’s most likely something else causing the problem, and I would start from investigating which modules you have loaded etc. Hope this helps, [edit] Sorry for the use of pre, but for some reason, my bullet points were being joined together otherwise, no idea why… [/edit]

Jan 25, 2023 6:22pm nemo (145) 2546 posts	Paolo monospaced now% – 1 would cause problems at some point No. now%-1 is pointless (use Wimp_Poll) but harmless – wrap-around is never an issue in that case. The problem is in the theoretical case of using a fixed number, whether -1, 0 or RND, which in the worst-case might not return for eight months. In fact using Wimp_Poll should equate to use Wimp_PollIdle with now% + 0 or something Yes. not the same as now% – 1, which will literaly return immediately to the original task, no process “swap” will happen at all. No. There’s no difference. I posted the lines of code above – the Wimp will return to your task if MonotonicTime is >= your idle time. This will happen only if there is no other event to be delivered anywhere, but will happen¹ regardless of whether your idle time was now%, now%-1 or now%-2147483647. As Dave has pointed out, this is a simple matter of twos-complement arithmetic. ¹ Your task will also only be called back immediately if there are no other tasks waiting for nulls – they’re delivered in round-robin manner to avoid one task monopolising things.

Jan 25, 2023 7:23pm Paolo Fabio Zaino (28) 1882 posts	There’s no difference. I posted the lines of code above – the Wimp will return to your task if MonotonicTime is >= your idle time. That was, originally, my understanding as well, but I swear I have seen my Launchpad redrawing the icons faster than when using now% + 0, I’ll re-test tonigth again (it may have been just some peculiar combinations of things). Your task will also only be called back immediately if there are no other tasks waiting for nulls – they’re delivered in round-robin manner to avoid one task monopolising things. In my test, that was probably the case, as it was the only user executed task running. In any case, if I see again that visible difference I’ll make a video and post it somewhere.

Jan 27, 2023 3:20pm Daniel Garrod (9459) 34 posts	Hi Paolo, You mention about testing modules, What is the best way and is there a copy of !Boot that has minimum amount of modules loaded? Thanks. Daniel.

Jan 28, 2023 6:26pm Paolo Fabio Zaino (28) 1882 posts	Hey Daniel, sorry for the late reply, but yesterday I had quite a busy day. What is the best way and is there a copy of !Boot that has minimum amount of modules loaded? Here are few ideas that may help you to find the root cause of the problem you are experiencing: You can quickly check all the loaded modules by rebooting your system and then opening a TaskWindow and type: *modules AFAIR, ROM Modules should start at `FCxxxxxx`, while modules that gets loaded during your boot sequence should have addresses not starting with `FC`. Take note of those. If you are using an editor like !StrongEd you can copy then into the clipboard and the paste them on a text file, it’s quick. Then download a clean UniBoot from ROOL (if you are using RO 5), you can find one here I think: https://www.riscosopen.org/content/downloads/common (Scroll that page down until you find “Disc based components” and download the HardDIsc4 image (if you have a way to unzip it manual on your RISC OS system then you can download the zip file, if you don’t then download the self extracting one) Unzip the HardDisc4 image and then rename your original !Boot (something like XBoot will be fine) and then copy the !Boot you’ll find in your unzipped HardDisc4 onto the root directory of your SD card (where your old !Boot was basically). Reboot and run your software, see what happens. If it works fine and the problem you had disapear, then the problem was caused by something in your previous boot sequence. At this point, if you need to recover things from your old UniBoot I would suggest to proceed with caution and copy one item at the time from your old !Boot to your new one. Every time you copy a item from the old !Boot to the new one I would retest everything, so reboot, launch your app and make sure no problem happens, if no problem happens, then add another another item from old !Boot to the new one and repeat. In case the problem you had happens also with the clean new !Boot, then I would look for bugs in the App you’re using. RISC OS has a very, very vintage architecture (as mentioned above, it’s just an Acorn MOS – the original OS for the BBC Micro – improved, ported to ARM and with added a Desktop that was designed for machines running at 8Mhz, that is all it is), so it offers no protection against modules that miss-behave or have bugs, no protection against applications that have bugs and it’s an OS in which both the App can access the kernel space and the Kernel can access (directly) the App space, so system freeze can happen pretty much for almost any issue (technically one can “freeze” a RISC OS system just by running an infinite loop in an App). In recent years ROOL has put some effort in making the situation a bit better and they have created RO “kernels” that, for example, do not use page 0, so offer a bit more resiliance against apps that may be traing to access memory on page 0 by mistake (this can happen for a number of reasons). So, another suggestion is, if your App works fine with such kernels (and it should tbh), then use those, that will help gaining a bit more resiliance. Another quick test you could do to roll out any issues caused by the hardware for example, is just plug your SD card on another Raspberry Pi and see if the problem happens also on the new hardware, if it does then try the tricks described above, if it doesn’t then it could be either something caused by the Hardware on your previous Raspberry Pi OR by the Pi firmware version installed on your previous Raspberry Pi. Hope this helps, good luck! :)

Jan 28, 2023 7:25pm Rick Murray (539) 13840 posts	At this point, if you need to recover things from your old UniBoot I would suggest to proceed with caution and copy one item at the time from your old !Boot to your new one. If it’s a Pi we’re talking about, you’ll want to be VERY careful with the big “Loader” file within !Boot. This should never be copied, nor deleted. And I’m not sure I’d move it either. Why? It’s sort-of-not-quite a real file. It’s a bit of magic that covers the FAT boot partition (used by the bootloader) so the RISC OS filesystem doesn’t mess with that part of the disc. A potentially less problematic way to test boot stuff is: Open `$.!Boot.Choices.Boot` and create a directory called “OldStuff”. Move Desktop, PreDesk, and Tasks into there. Find the same three things (Desktop, PreDesk, and Tasks) in a virgin copy of the Harddisc4 installation. Copy them over. When you reboot, the standard stuff will run, and any of your startup customisations won’t be done. Note that this will reset your monitor setup, as that’s part of PreDesk. Do NOT run the server during the test period. Let’s check the machine itself is working correctly first. If things do not work, then try renaming `$.!Boot.Choices` as something else (like “_Choices”) and installing the default from harddisc4. Note that this will nuke loads of things like your network settings, and it’ll need more delving to work out where the problem lies. However, if things do work, then you can – as Paolo suggests – copy stuff back bit by bit. Start with the things in PreDesk, then add the Desktop file, and finally Tasks. Some things ought to be skippable. I’d suggest that you don’t need to worry individually testing Fat32Fs, RPFS, or !!DeepKeys. Other files, you’ll see, are just options. Like “SetUpNet” which just says “go run !Internet”. Once the system is running and seems stable, you can then add in the server. How are you running ARMbbs and the telnet gateway? Are you doing it via Aemulor?

Jan 28, 2023 10:33pm nemo (145) 2546 posts	Renaming your `!Boot` to be `old!Boot` doesn’t move anything on disc, so worry not about magic files therein. Some things ought to be skippable “Start RISC OS in Safe Mode” perhaps? There was a time when Shift-Boot did that. Those were the days. Mind you there was also a time when holding Shift once the boot started would result in loads of files being loaded instead of run. Like I said, those were the days.

Jan 29, 2023 6:48am Rick Murray (539) 13840 posts	I wasn’t thinking so much the moving out of the way as the putting back…

Jan 29, 2023 6:49am Rick Murray (539) 13840 posts	As for a safe mode RISC OS, doesn’t spamming the Escape key at boot do that?