Pi hangs at startup?
Rick Murray (539) 13840 posts |
Firstly, I’m using a ROM I built myself from sources circa April (unpacking the files takes forever so I only do it every so often). This may have been fixed, though I didn’t see anything obvious in the CVS log. Hardware is a 256 MiB Pi model B issue 2, CJE clock/power module, and an OLED attached to IIC. Power is good (dedicated 2A supply for Vonets wifi and the Pi). The problem? Sometimes (rarely) at power up from cold, and sometimes when recovering from a brown-out (we get them frequently these days) the Pi will boot and it will hang after the “mod init done” message is on screen. There is no RISC OS welcome display (CPU type, memory, beep); so somewhere between module initialisation and that it is going awry. The friendly red button fixes things… Has anybody else experienced this? Any thoughts? |
Jon Abbott (1421) 2651 posts |
I’m seeing exactly the same issue on the build dated 08/10/15 I can pretty reliably cause a crash at that point by playing with the TCPIP network settings prior to a reboot – it reports Abort on data transfer at &FC0293F0 in this scenario requiring the ROM to be replaced to get it working again. I’m also seeing it hang immediately after displaying the It looks like you’re running RISCOS on a Pi screen. |
Rob Heaton (274) 515 posts |
I occasionally have the same hang after the “mod init done” message on the ARMX6. It seems several other ARMX6 users also have this issue as it’s been mentioned on the armini-support mailing list recently. So doesn’t look as if it’s specific to a certain port. |
Rick Murray (539) 13840 posts |
Since I’m seeing “mod init done” but nothing afterwards, it is in here somewhere, perhaps? This is “NewReset”.
And, for completeness, this bit of code follows the “mod init done” message (in ModHand):
I may tweak my ROM to output more steps, so I can see exactly how far it gets. Of course, if I do that it’ll work fine, right? ;-) |
Jon Abbott (1421) 2651 posts |
EDIT: helps if you search the wiki after stripping off the X on an SWI! This SWI seems fairly innocuous. Is it worth making a debug build publicly available, so those of us seeing this issue can see more of what’s going on? |
Colin (478) 2433 posts |
I think ‘mod init done’ is the point at which callbacks are triggered. I number of modules create a callback when initialised and the actual initialising happens after all of the modules are initialised. |
Jeffrey Lee (213) 6048 posts |
Yeah, it’s probably a callback which is hanging. Technically the callbacks should occur later on in the ROM init (there’s a “callbacks” message for it), except that OS_NewLine will trigger callbacks too. Not sure if there are any nasty side-effects of triggering the callbacks earlier than when they’re meant to happen. (There’s a special hack to prevent callbacks during the “init mod XXX” messages, because they were causing issues in that case) |
Rick Murray (539) 13840 posts |
And one of my favourites – the “once in a blue moon” type of bug. Would it not be better to grab a bit in kernel workspace to mean “inhibit callbacks”; the init can set this, and the callback mechanism can check this bit. When appropriate, the bit can be cleared and callbacks allowed via the LeaveOS/EnterOS pair. Less gymnastics, more assurance that something that shouldn’t happen yet won’t until it is time… Still doesn’t explain why it isn’t regular. Interesting that it happens on other platforms. Is there a common link? Network, perhaps? For what it is worth, I added numerous extra debug lines to my startup. The Big Long Pause after “mod init done” is this:
This appears to:
I am guessing the speed hit is in the RMA claim, surely lots of mucking with memory pages to satisfy the RMA requirements – remember the RMA is always mapped in… 1 I note that Screen Memory is 0K – there is surely some place the screen is mapped in to memory; for if you read the screen base address (&F6600000 on the Pi) and start writing &FF to that location, white pixels appear… |
Colin (478) 2433 posts |
I found most of the time between ‘mod init done’ and the pi startup screen is the USB hub initialising and negotiation of ethernet line format in EtherUSB. An extra USB hub can add about 6 secs, if I remember correctly, but the enumeration of the first hub can be missed and picked up by a backup callback so can take a bit longer. |
Rick Murray (539) 13840 posts |
Mmm, very interesting idea. And with OS_NewLine via debug, it is possible the callback is appearing at this point instead of later when it should. Note to self (tomorrow?): unplug EtherUSB, reboot, see if the delay has gone away. |
Jon Abbott (1421) 2651 posts |
Between the USB initialisation and DHCP at least 15 secs are added to the boot time. Can’t these tasks be done in the background? DHCP negotiation should also not be attempted if there’s no cable plugged in. Does the Pi have cable sense? |
Chris Hall (132) 3554 posts |
One work around, of course, is to use a static IP address. |
Colin (478) 2433 posts |
Yes I get about 6secs USB 6secs EtherUSB and 3 secs DHCP
USB EtherUSB and DHCP are all dependent on each other. So it’s going to take 15 secs to get the network running even if you do all the changes necessary to do them in the background. The only way I can think to do it at the moment is to run them in a thread using the RTSupport module but every init function in the thread would have to be changed taking RTSupport into account. I think in the end all that would be achieved is that you would reach the desktop with the machine unusable – rather like windows where it looks as though it has booted quickly but can be unusable until it has finished what it is doing. |
Michael Emerton (483) 136 posts |
I find the hanging on boot very annoying as I do not always connect up to my network. My Pi spends half it’s time running around in my car, and having to turn off the Network stack to removed that hang, means I have to always reboot my pi when I get back in… so running in the background would be very suitable for me. I think the major thing is, that not everybody uses networking, but akin to other platforms, when they do plug it in, they would expect it to work?
This is a very important step…using this is more important that above, as it would alleviate the hang in the first place (most DHCP servers are very responsive. In my view, RISC OS should sense when a cable is inserted and attempt DHCP. I have found my B Pi sometimes hangs on startup, but as I run it blind in my car when it happens, I can never see why, only that the card is read once and then stops. Another issue I have seen is where the Pi will boot RISC OS and then randomly not see the FileCore partition? |
Steve Pampling (1551) 8170 posts |
Rephrase to include beagleboards and FAT formatted boot devices and you cover the situation. |
Jon Abbott (1421) 2651 posts |
The machine would be usable, you just wouldn’t have network access until USB is up and DHCP has done it’s thing; the pointer might not move for a second or so, although I suspect that by the time the boot has completed to the point you can actually click on anything, the USB stack would have initialised and the mouse be usable. Whilst we’re on the subject of DHCP:
|
Andrew Conroy (370) 740 posts |
EtherUSB does support this, yes, but in order to know how to use it you either have to try to ‘reverse engineer’ the Ethernet code, or sign an NDA from ROOL before you’re allowed to read the full specs for the ethernet SWIs. Of course, this also requires you to know that the ‘secret’ documentation exists in the first place so you know to ask for it! |
Andrew Conroy (370) 740 posts |
I have a little utility I’ve written here which does exactly this. It’s only ‘rough and ready’ and I’m sure the proper programmers on here would throw it away in disgust, but it works for me. Because I had to sign the NDA for the full Ethernet specs, I’m not allowed to distribute the code without each line being approved by ROOL first, anyway. |
Jeffrey Lee (213) 6048 posts |
That’s… madness. Add the Ethernet API to the list of things to rewrite when we update to an IPv6 network stack! |
Chris Hall (132) 3554 posts |
unpacking the files takes forever Now that we have PMP memory, you should be able to get it down from 10min nearer to 1min (using UnTarBZ) but that would require a model 2 Pi (as you need 1Gbyte of memory to get a 500Mbyte RAM disc). |
Sprow (202) 1158 posts |
or sign an NDA from ROOL before you’re allowed to read the full specs for the ethernet SWIs. Having signed the aforementioned NDA myself, there’s really nothing magic in the DCI4 spec, I believe the real problem is that it was co-authored by ANT and as yet the success of getting permission to publish the ANT bits and bobs has been low (just because someone doesn’t respond to a request doesn’t mean they wont sue you).
And that’s just fiction. James Peacock was able to publish EtherUSB just fine (originally on his own site), and I certainly read the spec when writing EtherB and EtherY. The fact that the sources can be published and not the spec highlights that it’s not the existence of the spec that’s the problem, but one of the terms in its reproduction is above ROOL’s lawyer’s pay grade. |
Rick Murray (539) 13840 posts |
Right – so what we need are some people who have not read the NDA spec to work out what the code is doing for the SWI calls, and to write it up. |
Andrew Conroy (370) 740 posts |
Because I had to sign the NDA for the full Ethernet specs, I’m not allowed to distribute the code without each line being approved by ROOL first, anyway. Well I specifically asked Steve how the NDA affected distribution of software which used the NDA’d SWIs and was told I should send a copy to ROOL for clearance first. If ROOL would like to withdraw that requirement, then that’s fine by me. |
Jon Abbott (1421) 2651 posts |
I had a couple of hangs today at the “Contacting DHCP Server over USB interface” message – a total lockup requiring a power cycle. I’ve also seen it sit there, timeout and leave a non-working network several times. From the looks of it, it might be trying to talk to the NIC before it’s fully initialised as I can repeat it without fail at power-on. A subsequent reboot gets the network working. Delaying the DHCP request to later in the boot process may resolve these issues, although I’ve not tested it. There’s also a random annoyance, which I’ve previously reported, where DHCP doesn’t set the DNS server from the DHCP offer. A 2nd reboot usually fixes this, so from power on I usually require three reboots in total before the network layer is working as expected. |
Rick Murray (539) 13840 posts |
Probably not helpful – but I gave up on DHCP and set the Pi to static IP. Thanks to random (and depressingly frequent) brownouts (due to too many milking machines and such running off one ancient overworked transformer), the hardware gets a kick in the backside from time to time. |