Wi-Fi in 5.30
Chris Gransden (337) 1202 posts |
I did some testing with the CPU @600Mhz for downloading over a wired and wifi network from a local server. Using a local server there’s next to no latency affecting RISC OS. On RISC OS the speed dropped from about 80MB/s to 30MB/s downloading to a ram disk. Wifi doesn’t seem to be affected as the download speed is so low already. With Raspberry Pi OS @600MHz Wifi still gets about 10MB/s and wired varies between 95MB/s and 100MB/s. Then I thought, the RPi 4 has four CPUs but RISC OS can only use one. |
Thomas Milius (7848) 116 posts |
I agree with Chris. RISC OS Ethernet speed in my local networks isn’t slow. The old BBxM with a NAS from around 2008 achieves upto 8MB/s e.g. if loading a big sprite. RPis 3B+ and a new NAS from 2020 are obtaining much more, 24 MB/s perhaps and the new CM4 without the USB limitations is faster with the same NAS. Ok loading something from the internet is a “bit” slower with ADSL and a router from around 2005. ;-) |
David Feugey (2125) 2709 posts |
Ah, OK. Cool!
So do I.
Perhaps I was a bit optimistic on this one. |
Chris Mahoney (1684) 2165 posts |
I’ve seen the same sort of behaviour. Local servers are tolerable, but the further away you get, the slower things go. When working on HTTPLib I found that larger block sizes can result in faster downloads, but it’s a double-edged sword because making the block “too big” will make each polling loop slow. HTTPLib has some logic in it to find a nice equilibrium. |
Thomas Milius (7848) 116 posts |
At my measures MB is meaning Megabyte. CM4 loaded a Sprite file of 40144952 Bytes into Paint from my old NAS within less than 1.5 seconds (measured manually). I am entirely satisfied with this and as mentioned the new NAS seems to be faster. Of course if loading the same Sprite file from my NVMe SSD on the CM4 it is faster. |
Jon Abbott (1421) 2641 posts |
From reports I’m getting and my own testing, there’s certainly common issues with both WiFi solutions. If I leave games supported by ADFFS running on their demo loops, they randomly hang when a WiFi stack is loaded. I’ve also seen random app crashes and lock ups if I leave RISC OS at the desktop after booting, that didn’t occur previously. That’s with the 15th May 24 RISC OS 5.30 build. I did a bit of a deeper dive with the original Pac-mania and the Learning Curve release as the original hung quickly and the Learning Curve version didn’t, despite the code being near identical. I found the Music Tracker was being called via an OS_CallEvery in the Learning Curve version and OS_Claim of EventV that checked for event 4 in the original. I moved the Tracker to hang off the ChannelHandler 1 fill and the hangs no longer occur. That would suggest to me that something in the WiFi stack or OS code it relies on might be enabling IRQ when it shouldn’t and causing IRQ re-entrancy issues for IRQ handlers not coded to handle re-entrancy. That’s just a guess though, it’s going to be really hard to pin down the exact root cause. |
Rick Murray (539) 13806 posts |
Just a tuppence here, when leaving the machine in the desktop, with the RODev stack and WiFi it can go happily for days 1. The resets and shutdowns are mine, not the stack/WiFi crashing. 1 I think the longest was a week and a half, but this time of year there are frequent thunderstorm warnings… |
André Timmermans (100) 655 posts |
IIRC there were some discussions between John Balance and Ben Avison some time ago about IRQ issues in some of John’s merge requests but IIRC Ben was not convinced of the origin of the problem.. |
Jon Abbott (1421) 2641 posts |
I believe you’re referring to the conversation in this merge request. The implication being there might be a fundamental issue in the OS somewhere that both WiFi stacks trigger. EDIT: I forgot to add that it’s not a lock-up but probably a race condition. RTSupport routines and Sound Channel fill handlers are still being called when it appears to hang, but the OS stops exiting back to Appspace. |
Jon Abbott (1421) 2641 posts |
If it’s any help, when it gets into a race condition its bouncing around between SWI Dispatch, the following sequence in WLanBWFM: STR R14,[R13,#-4]! TEQ PC,PC TEQNEP PC,#0 MRSEQ R3,CPSR BICEQ R3,R3,#&CF MSREQ CPSR_c,R3 SWI XOS_LeaveOS MOVVS R0,#0 MOVVS R1,#1 SWIVS XOS_Byte SWI XOS_EnterOS TEQ PC,PC LDMNEIA R13!,{PC}^ LDR PC,[R13],#4 and +860 in SDIODriver, which is the code sequence that includes the first occurrence of RT_Yield. I’m not sure where that is in the C sourcecode though. EDIT: I should add that SWI Dispatch probably isn’t relevant as it triggers the bulk of IRQ when it restores the caller’s IRQ state. EDIT2: Why is that code sequence in WLanBWFM 26bit neutral? Was WLanBWFM compiled with the wrong flags? Or does the library code not take account of the CPU compiler flags? |
Jon Abbott (1421) 2641 posts |
The more I think about it, the more confused I am about why WLanBWFM is 26bit neutral. The Pi can’t run 26bit code and the Module only covers the Pi, so why is it compiled as 26bit neutral and in the 350/360 system folders? I’m sure there’s some logical explanation. Is the sourcecode available for WLanBWFM, I can’t seem to find it in the OS source? I’ve tried for three days to produce a Repro for the race condition, but can’t. Although I can get it into a race condition easily enough, I can’t seem to pin down what’s triggering it. It’s odd as RTSupport is still calling other routines, Sound fills still occur and possibly event calls and Environment handles as well (I’ve not checked). User code seems to stop being called. The keyboard still works for example, but if you’re in BASIC, it doesn’t receive the key. At a guess it’s stuck in a CallBack loop and SWI dispatch never gets to exit back to User? I’ve not tried the ROD stack since its recent update, but as it previously suffered exactly the same issue in the same places, I’m going to presume the issue isn’t necessarily with either WiFi stack, but elsewhere and WiFi just seems to trigger it somehow. |
Jon Abbott (1421) 2641 posts |
I feel there’s a couple of issues with SDIODriver, which are now coming to light with WiFi. SDIOLib makes use of UpCall 6 when blocking in the four SD ops functions, however they might be called from a background operation. I believe that is what’s causing User code to stop executing when it’s in a background driven spin-lock. As I don’t have the WiFi source, I can’t tell if its background operations are threaded via RTSupport or running as a separate User task. I’d expect an RTSupport thread, however due to SDIOLib’s use of UpCall 6 its doing more than just blocking the background thread – it’s also unintentionally blocking the foreground app that currently paged in. If WiFi is using RTSupport for its background processes, I’d expect an RTYield and no use of UpCall 6. Unless I’m misunderstanding the documentation (ignoring the fact the documentation states “The calling task must be running in a task window”), UpCall 6 should only be used where a foreground process is blocked. I’m not sure what the solution, short of changing WiFi/SDIOLib so they only block the background process and not the foreground when they’re performing background specific operations. The second issue is WiFi operations causing SDIOLib to get stuck in a spin-lock. That would imply a couple of possible causes, such as the Bus/Slot not being freed correctly, or the SDIOOp never completing. If the SDIOOp isn’t completing that would indicate there’s an issue with how both WiFi implementations are interacting with the WiFi hardware or worse, an issue in SDIODriver. The third potential issue is general use of UpCall 6 when the Wimp is shelled out and the app hasn’t installed an UpCall handler that claims UpCall 6. Are the OS UpCall handlers aware the Wimp is shelled out and act accordingly, by doing nothing? |
Rick Murray (539) 13806 posts |
Why would NetSurf be limited to 2MiB/sec? For what it is worth, downloading Hardisc4 from here got me around 1.83MiB/sec using RISC OS 5.29 (from March) on a 3B+ using WiFi and the RODev stack (v7.05). For comparison, my phone managed 8.5MiB/sec. |
Jon Abbott (1421) 2641 posts |
I’ve just spotted another issue with the ROOL WiFi. If I RMKill it via an Obey from the desktop, it will randomly fail, causing two Page Zero accesses in MBufManager: Time: Fri Jun 14 19:28:14 2024 Location: Offset 00000f24 in module MbufManager Current Wimp task: Unknown Last app to start: AcornSSL R0 = 20203104 R1 = 0000000e R2 = 00000000 R3 = 00000002 R4 = 00000010 R5 = 00000158 R6 = 20202cec R7 = 00800040 R8 = 202041e8 R9 = 20000113 R10 = fa20021c R11 = fa207bb0 R12 = 00000000 R13 = fa207b60 R14 = 20000193 R15 = fc3a055c DFAR = 00000010 Mode SVC32 Flags Nzcv If PSR = 80000193 fc3a0514 : 1a000019 : BNE &FC3A0580 fc3a0518 : 0affffd8 : BEQ &FC3A0480 fc3a051c : e3510080 : CMP R1,#&80 ; ="€" fc3a0520 : da00000a : BLE &FC3A0550 fc3a0524 : e3510c06 : CMP R1,#&0600 ; =1536 fc3a0528 : c3a00000 : MOVGT R0,#0 fc3a052c : d59c0014 : LDRLE R0,[R12,#20] fc3a0530 : e3300000 : TEQ R0,#0 fc3a0534 : 0affffd1 : BEQ &FC3A0480 fc3a0538 : e5904000 : LDR R4,[R0,#0] fc3a053c : e58c4014 : STR R4,[R12,#20] fc3a0540 : e580100c : STR R1,[R0,#12] fc3a0544 : e3a04000 : MOV R4,#0 fc3a0548 : e5804000 : STR R4,[R0,#0] fc3a054c : ea00000b : B &FC3A0580 fc3a0550 : e28c4010 : ADD R4,R12,#&10 ; =16 fc3a0554 * e5940000 * LDR R0,[R4,#0] fc3a0558 : e3300000 : TEQ R0,#0 fc3a055c : 028c4014 : ADDEQ R4,R12,#&14 ; =20 fc3a0560 : 05940000 : LDREQ R0,[R4,#0] fc3a0564 : 03300000 : TEQEQ R0,#0 fc3a0568 : 15905000 : LDRNE R5,[R0,#0] fc3a056c : 15845000 : STRNE R5,[R4,#0] fc3a0570 : 13a05000 : MOVNE R5,#0 fc3a0574 : 15805000 : STRNE R5,[R0,#0] fc3a0578 : 0affffc0 : BEQ &FC3A0480 fc3a057c : e580100c : STR R1,[R0,#12] fc3a0580 : e129f009 : MSR CPSR_cf,R9 fc3a0584 : e89d000e : LDMIA R13,{R1-R3} fc3a0588 : e3130004 : TST R3,#4 fc3a058c : 0a00000e : BEQ &FC3A05CC fc3a0590 : e52d0004 : STR R0,[R13,#-4]! -------------------------------------------------------------------------------- Time: Fri Jun 14 19:28:14 2024 Location: Offset 00000f24 in module MbufManager Current Wimp task: Unknown Last app to start: AcornSSL R0 = 20203104 R1 = 0000000e R2 = 00000000 R3 = 00000002 R4 = 00000010 R5 = 0000016c R6 = 20202cec R7 = 00800040 R8 = 202041e8 R9 = 20000113 R10 = fa20021c R11 = fa207ed8 R12 = 00000000 R13 = fa207e88 R14 = 20000193 R15 = fc3a055c DFAR = 00000010 Mode SVC32 Flags Nzcv If PSR = 80000193 fc3a0514 : 1a000019 : BNE &FC3A0580 fc3a0518 : 0affffd8 : BEQ &FC3A0480 fc3a051c : e3510080 : CMP R1,#&80 ; ="€" fc3a0520 : da00000a : BLE &FC3A0550 fc3a0524 : e3510c06 : CMP R1,#&0600 ; =1536 fc3a0528 : c3a00000 : MOVGT R0,#0 fc3a052c : d59c0014 : LDRLE R0,[R12,#20] fc3a0530 : e3300000 : TEQ R0,#0 fc3a0534 : 0affffd1 : BEQ &FC3A0480 fc3a0538 : e5904000 : LDR R4,[R0,#0] fc3a053c : e58c4014 : STR R4,[R12,#20] fc3a0540 : e580100c : STR R1,[R0,#12] fc3a0544 : e3a04000 : MOV R4,#0 fc3a0548 : e5804000 : STR R4,[R0,#0] fc3a054c : ea00000b : B &FC3A0580 fc3a0550 : e28c4010 : ADD R4,R12,#&10 ; =16 fc3a0554 * e5940000 * LDR R0,[R4,#0] fc3a0558 : e3300000 : TEQ R0,#0 fc3a055c : 028c4014 : ADDEQ R4,R12,#&14 ; =20 fc3a0560 : 05940000 : LDREQ R0,[R4,#0] fc3a0564 : 03300000 : TEQEQ R0,#0 fc3a0568 : 15905000 : LDRNE R5,[R0,#0] fc3a056c : 15845000 : STRNE R5,[R4,#0] fc3a0570 : 13a05000 : MOVNE R5,#0 fc3a0574 : 15805000 : STRNE R5,[R0,#0] fc3a0578 : 0affffc0 : BEQ &FC3A0480 fc3a057c : e580100c : STR R1,[R0,#12] fc3a0580 : e129f009 : MSR CPSR_cf,R9 fc3a0584 : e89d000e : LDMIA R13,{R1-R3} fc3a0588 : e3130004 : TST R3,#4 fc3a058c : 0a00000e : BEQ &FC3A05CC fc3a0590 : e52d0004 : STR R0,[R13,#-4]! |
DownUnderROUser (1587) 124 posts |
OK have been doing some testing and it appears that Jon is on to something with regards to the SDIO driver as have had issues with both Wifi implementations… Testing so far… (won’t be able to do more for a while unfortunately) Brand new install: Checked that wifi drivers were not present – correct. - installed the following network reliant software: Also installed the following other software: Also installed the following modules / boot software: Ran the computer for over 12 hours with all network software up and running, browsing the net, listening to internet radio, reading news feeds etc and ran it all from my Windows 10 machine using VNC server connection. Left it running over night and came back to it in the morning, still running faultlessly. Only problem I have is with Alarm – it does not show the correct time, it is approx 3.5 hours out (I am using Australian Central time zone, no DST). No other faults. So backup the ‘working’ SSD image and proceed to install wifi drivers… Decided to use the ROD ones: Noted that the time and date settings in configure the NTP try status was ‘busy’ the whole time i was testing. Was using it for about an hour then the VNC session went down. is this the ‘spin-lock’ that Jon has been referring to? Will repeat sessions using alternative ethernet cabled and wifi connections and report if the same occurs (expect that it will)… Hope this may assist diagnosing the issue… Whilst the ROD drivers seem more stable eg allow the VNC connection to function both the ROOL wifi drivers (from previous tests) and the above use of the ROD stack appear to have problems…. The conclusion being that really neither wifi solution is usable for me as a ‘daily driver’ at present. Andrew R did mention a new Beta ROD network stack – I am happy to give this ago on the current set up to see if it improves things. |
DownUnderROUser (1587) 124 posts |
@ Jon re Partition Manager… |
DownUnderROUser (1587) 124 posts |
repeat test: |
DownUnderROUser (1587) 124 posts |
didn’t use VNC to connect to RPi for repeat test above (VNC server is started at boot though) |
DownUnderROUser (1587) 124 posts |
might be obvious but above system lock up above identifed as netradio stopped playing – ie no audio output |
DownUnderROUser (1587) 124 posts |
2nd repeat test – yes same issue – lock up after approx 2 hours Re time issue – time is out by exactly 8.5 hours (ie will say 05:06 when real local time is 13:36). As soon as click ‘set’ on the time&date configure panel the correct time will be displayed in Alarm – there must be some problem here. Note the time difference between here (Adelaide, South Australia) and GMT / London time is 8.5 hours at present (as you have BST active, otherwise it would be 9.5 hours (standard) or 10.5 hours when we have daylight savings time). So the problem looks to be with adjusting the time for local territory. Also the status of the NTP server reports as ‘busy’ when using wifi networking (this is not the case when using wired ethernet, although time difference problem persists). |
Steve Pampling (1551) 8155 posts |
NTP always delivers UTC (approx. GMT for humans) so the error is in the offset wherever that is done. |
Chris Gransden (337) 1202 posts |
Does *nettime_status show the correct time. Alarm has a habit of not refreshing it’s display. Pressing F12 and then return should force it to update. |
Chris Hughes (2123) 336 posts |
In configuration > Time and date, what do you have the locality set to? I stopped using Alarm many years ago and use the excellent paid for version of Organizer instead. Regarding the Wi-Fi, it is still in beta remember and also as the Pi performance on wi-fi will be slower then a wired connection. It is also quite possible there are issues within the SDIO driver as previous explained elsewhere. |
Chris Gransden (337) 1202 posts |
If you leave the Wifi interface unconfigured the Wifi module doesn’t get loaded. PartMgr already does this so no need to re-image. |
DownUnderROUser (1587) 124 posts |
Chris H – Yes acknowledge both Wifi modules are still in beta, I guess have been waiting so long for Wifi was just initially excited and then subsequently disappointed that it is not yet reliable enough for me. Chris G – thanks for clarifying guess I wasn’t sure if the wifi module would be loaded in the background or otherwise so just wanted to eliminate its possible presence/influence but also wanted to be able to remove it completely so could try the ROD version in isolation. Anyway thanks for making it clear how it works (ie loads only when selected). Steve / Chris G and Chris H – I have fixed the Alarm clock display issue. It has something to do with the latest 5.31 image. Not 100% sure what the issue is but I replaced the contents of !Boot.Loader with an older install. So I do not know what file or setting is the issue and it may well be a PI firmware issue (as the newer firmware also introduced the BCM issue). |