XHCI driver on Pi4
Pages: 1 2
Colin Ferris (399) 1813 posts |
I seemed to have missed what the error was!! |
Ralph Barrett (1603) 154 posts |
Err… indeed. Ralph |
Ralph Barrett (1603) 154 posts |
XHCI driver on the Pi4 generates an abort when connected to a USB switch-box when the switch-box is switched-over (or disconnected?). My USB switch box contains a USB2 hub. There might also be other similar XHCI driver issues triggered by different types of USB hubs ? Ralph |
Stuart Swales (8827) 1357 posts |
An imperfect fix is better than having systems reliably lock up. |
Ralph Barrett (1603) 154 posts |
Here is a crash dump generated by XHCI on RO5.30. This crash dump is slightly different to the dump above posted by Andrew. Error block: 80000002 Internal error: abort on data transfer at &FC223FB4 R15 = fc223fbc = XHCIDriver +4178 = xhci_intr +2490 IRQ stack: R14_svc = fc2242c0 = XHCIDriver +447c = kmem_free +24 SVC stack: R14_usr = fc1b2cb0 = BASIC +9138 = SYSCALLTO +c USR stack: End of dump |
David Pitt (9872) 363 posts |
The Titanium also uses XHCI and does show the same issues as the RPi4. As I understand it XHCI is strict USB implementation and is therefore unforgiving of less than compliant USB devices. My impression is that the Titanium is stricter than the RPi4, with the devices I have. I have now tried the updated XHCIDriver on the Titanium and that does stop the abort being reported but does instead stiff the Titanium. It is not just USB that has failed, VNC doesn’t work either. |
George T. Greenfield (154) 748 posts |
I’ve been having exactly this issue for about four years, since getting my first Pi4 in fact. I have a Rextron DVI switchbox connecting the KVM input of the Pi and a Wintel laptop to a single monitor/mouse/keyboard setup, and allowing switching between them. Keyboard and mouse become unresponsive on the Pi if the Wintel machine is ‘switched in’ for more than a few seconds, necessitating a reboot. Toggling between the two devices quickly does not generally provoke this behaviour, which leads me to think it possibly relates to polling on the Pi, but my knowledge of RISC OS (or Windows for that matter) is limited. |
David Pitt (9872) 363 posts |
To avoid XHCI on the rear ports of the RPi4 DWC on the USB C port can be used. A USB C hub with power pass through is required. This method has been reported as working with KVMs on this forum. It is working here and initially seems robust. |
Ralph Barrett (1603) 154 posts |
I dug out a USB C hub with power passthrough tonight, and confirm that this method seems to survive multiple switch-overs of my KVM USB 2.0 hub :-) When the USB cable from the USB 2.0 hub is moved into the ‘normal’ RPi4 USB sockets, I soon get RO5.30 XCHI driver aborts when I switch over the KVM USB 2.0 hub. This made using my RPi4 with Risc OS for day-to-day working unviable. In summary my KVM USB 2.0 hub is now plugged into a ‘spare’ USB A socket on my USB C hub. This USB C hub is also supplying power to the RPi4 using power passthrough. Cool… Ralph |
David Pitt (9872) 363 posts |
There is a bit of an impasse brewing here. The developer is confident that the merge request is good. I thought it was good too on the RPi4, the abort on disconnecting the wireless keyboard and mouse dongle with a USB two way switch no longer happened and the RPi4 appeared not to stiff subsequently, that was until the RPi4 was found stiffed the next morning. At that point I withdrew the test ROM. Trying again today the RPi4 has not aborted or stiffed with a test ROM, yet. Ralph’s RPi4 does stiff The merge request does do what it is intended in preventing the abort but there may be more to it than just that. The Titanium seems to be impervious to this merge request and continues to stiff. |
Ralph Barrett (1603) 154 posts |
I’m currently using RO Direct 531 on my RPi4, to have a play with Iris and Ovation Pro. ROD 531 comes with XHCIDriver 0.31 (11 May 2024) Disconnect_Abort_Fix.1. I confirm that I can get my RPi4 to ‘freeze’ (i.e. no response to keyboard/mouse) by toggling the switch on my USB 2.0 KVM switch. However, it takes me slightly longer to get the USB ‘freeze’ than it did with XHCIDriver 0.30. i.e. More switch-box key presses. Also the symptoms are different in that I no longer get an abort. Based upon the slightly different symptoms, this issue could have a different (and more obscure?) root cause in the XHCI driver firmware ?? So the developer might be correct – he or she might have fixed the original issue ? Ralph |
Dave Higton (1515) 3525 posts |
The feedback here suggests that the merge request makes things better but doesn’t provide a complete solution. If that’s true, it appears to me that the merge should be approved. |
André Timmermans (100) 655 posts |
I see tests made of the fix by switching the KVM, but did someone tests if there is no regression (i.e. stiffing without even touching the switch)? |
Ralph Barrett (1603) 154 posts |
I have repeated my brief USB 2.0 KVM switchover tests last night. I managed to get a lockup much quicker – after just 4 (double) USB KVM switch-overs. Note that I’m slowly moving the mouse during these switch-overs to generate USB traffic. I also connected up a Dell USB keyboard to the USB C port, which allowed me to perform *USBDevINfo Whilst in the lock-up state, USB devices 8 and 9 are no longer present in the *USBDevices list. I also found that when in the lock-up state I could NOT connect the Dell keyboard to the USB C port. Dell keyboard had to be connected before the lockup occurred. 1. Before Lock Up
2. After Lock Up
Edit: Removed the line “This seems to imply that the USB device detection mechanism has stopped working on all USB ports.” due to subsequent testing. |
tymaja (278) 174 posts |
I don’t think I’ve ever had this problem, but then I use my Pi 4 via an ethernet cable and VNC. I do a fair amount of plugging / unplugging USB flash drives though, but they are 32GB USB2.0 ones (got a load as they are often better for OS installs). The location of the crash is interesting; FC1F537C : ,0ì : E593302C : LDR R3,[R3,#44] Probably the first instruction. If so, then it is doing a data abort loading from just over &8000. Is there some issue with either page swapping, or page permissions etc, causing the module to read data from somewhere it no longer has access to (e.g. if the module is responding to an interrupt or service call when devices are plugged / unplugged?) |
David Pitt (9872) 363 posts |
i managed to replicate lockups here using a moving mouse, with similar diagnostics. |
Ralph Barrett (1603) 154 posts |
Note: In my previous check for whether the Dell keyboard was working, I was pressing the ‘F12’ key on the Dell keyboard to attempt to get to the Risc OS command line. I re-checked this test, and confirm that the F12 key does not work when the Dell keyboard is plugged into the USB C port, after the XCHI driver ‘lock-up’. However, when I repeated this test for the second time, I also tried CTRL-BREAK. CTRL-BREAK works as normal. So the USB communications to the Dell keyboard must be working via the USB C port. Ralph |
David Pitt (9872) 363 posts |
There is further discussion in the Abort Fix merge request relating to a further way that things might go wrong. Specifically The full diff is :- --- NVMe::NVMe.$.ROOL.Pi.BCM2835.RiscOS.Sources.HWSupport.USB.Controllers.XHCIDriver.c.xhci00 2024-10-29 11:14:50.0 +0000 +++ NVMe::NVMe.$.ROOL.Pi.BCM2835.RiscOS.Sources.HWSupport.USB.Controllers.XHCIDriver.c.xhci 2024-10-29 11:14:50.0 +0000 @@ -1465,5 +1465,3 @@ xfer->status = status; /* make software ignore it */ -#ifndef RISCOS - callout_stop(&xfer->callout); -#endif + callout_stop(&xfer->timeout_handle); usb_transfer_complete(xfer); @@ -1474,5 +1472,3 @@ xfer->status = status; -#ifndef RISCOS - callout_stop(&xfer->callout); -#endif + callout_stop(&xfer->timeout_handle); usb_transfer_complete(xfer); @@ -3744,3 +3740,3 @@ KASSERT(xfer->pipe->intrxfer == xfer); -#ifndef BRANCH_NHUSB +#ifdef BRANCH_NHUSB xhci_abort_xfer(xfer, USBD_CANCELLED); I would post that ROM but the wired LAN to the RISC OS devices has gone horribly wrong and I cannot upload. Bizarrely this may have happened on updating the iMacPro to Sequoia 15.1. The short version is that we may have a fix, subject to further testing. |
David Pitt (9872) 363 posts |
Normal, as if, service has been resumed. WiFi has gone mad! A hopefully fully abort fixed ROM is here *FX0 RISC OS 5.31 (28 Oct 2024) *help XHCIDriver ==> Help on keyword XHCIDriver Module is: XHCIDriver 0.31 (11 May 2024) Disconnect_Abort_Fix.1 djp * I have initialed the info string to avoid version confusion. Note what happens to the switched device numbers after repeated USB switching *USBDevices No. Bus Dev Class Description 1 1 1 9/ 0 Synopsys DWC OTG root hub 2 2 0 9/ 0 VIA XHCI root hub 3 2 1 9/ 0 VIA Labs USB2.0 Hub 6 2 4 0/ 0 SanDisk Ultra 7 2 5 0/ 0 Integral Portable SSD 3.0 8 2 2 9/ 0 Terminus Technology USB 2.0 Hub 9 2 3 0/ 0 YICHIP JLab GO Keys * *usbdevices No. Bus Dev Class Description 1 1 1 9/ 0 Synopsys DWC OTG root hub 2 2 0 9/ 0 VIA XHCI root hub 3 2 1 9/ 0 VIA Labs USB2.0 Hub 6 2 4 0/ 0 SanDisk Ultra 7 2 5 0/ 0 Integral Portable SSD 3.0 38 2 2 9/ 0 Terminus Technology USB 2.0 Hub 39 2 3 0/ 0 YICHIP JLab GO Keys 40 2 6 0/ 0 PixArt USB Optical Mouse * Note the use of two mice to enhance testing. |
David Pitt (9872) 363 posts |
Unfortunately an abort has now occurred on the RPi4 with that module at offset &2e64 in XHCIDriver. The Pi was initially accessible via vnc but stiffed on a |
mikko (3145) 123 posts |
I might be barking up the wrong tree but it’s worth observing that… On lines 3466 & 3603 of the RISCOS code there are calls made to xhci_abort_xfer(xfer, USBD_CANCELLED) only if BRANCH_NHUSB is defined. On line 3746 a call to xhci_abort_xfer(xfer, USBD_CANCELLED) isn’t made if BRANCH_NHUSB is defined and some other code is called instead. Recent commits have tried to sort out incorrect uses of #ifdef and #ifndef in this code. Looking at the original code, all three calls to xhci_abort_xfer(xfer, USBD_CANCELLED) are present. Should the “#ifndef BRANCH_NHUSB” on line 3745 therefore actually be “#ifdef BRANCH_NHUSB”? |
David Pitt (9872) 363 posts |
A ZeroPain log was found after the Pi was restarted. Time: Wed Oct 30 08:27:59 2024 Location: Offset 00002dc8 in module XHCIDriver Current Wimp task: ShareFS Last app to start: BASIC -quit "SCSI::SSD120.$.Progm.SoftTools.!DecHex.!RunImage" R0 = 2001cc64 R1 = 00000003 R2 = 36146c00 R3 = 36146c74 R4 = 36146c00 R5 = 00000028 R6 = 2001cc64 R7 = 00000000 R8 = 00000003 R9 = 0000003c R10 = fa20021c R11 = fa207c90 R12 = fa207ca4 R13 = fa207c60 R14 = fc23302c R15 = fc231fe8 DFAR = 0000003c Mode SVC32 Flags nZCv If PSR = 60000193 fc231fa0 : e91b6820 : LDMDB R11,{R5,R11,R13,R14} fc231fa4 : ea0005ae : B &FC233664 fc231fa8 : e1a0c00d : MOV R12,R13 fc231fac : e92d000f : STMDB R13!,{R0-R3} fc231fb0 : e92ddbf0 : STMDB R13!,{R4-R9,R11,R12,R14,PC} fc231fb4 : e24cb014 : SUB R11,R12,#&14 ; =20 fc231fb8 : e59b8014 : LDR R8,[R11,#20] fc231fbc : e1a05001 : MOV R5,R1 fc231fc0 : e24dd00c : SUB R13,R13,#&0C ; =12 fc231fc4 : e3a01000 : MOV R1,#0 fc231fc8 : e3580000 : CMP R8,#0 fc231fcc : 9a000002 : BLS &FC231FDC fc231fd0 : e2811001 : ADD R1,R1,#1 fc231fd4 : e1510008 : CMP R1,R8 fc231fd8 : 3afffffc : BCC &FC231FD0 fc231fdc : e2859014 : ADD R9,R5,#&14 ; =20 fc231fe0 * e8990240 * LDMIA R9,{R6,R9} fc231fe4 : e51f1dd4 : LDR R1,&FC231218 fc231fe8 : e51a2218 : LDR R2,[R10,#-536] fc231fec : e3a07000 : MOV R7,#0 fc231ff0 : e3580000 : CMP R8,#0 fc231ff4 : e0820001 : ADD R0,R2,R1 fc231ff8 : e2803004 : ADD R3,R0,#4 fc231ffc : e58d3008 : STR R3,[R13,#8] |
Sprow (202) 1158 posts |
For what it’s worth: that’s I suspect that the proposed change isn’t a complete solution to all woes, it’s a much narrower fix for one specific abort observed. |
David Pitt (9872) 363 posts |
The abort fix merge request has now made it into the beta ROM, KVMs should be safer on the RPI4 now. *FX0 RISC OS 5.31 (07 Nov 2024) *help XHCIDriver ==> Help on keyword XHCIDriver Module is: XHCIDriver 0.32 (06 Nov 2024) * Thanks all. |
Pages: 1 2