ARMX6 boot/crash problem
Alan Adams (2486) 1149 posts |
Intermittently my ARMX6 fails to boot. It stops at the end of mod_init – photo here: About half the times it happens when starting from cold. The other half !Hermes will freeze partway through a fetch, alt-break does nothing, control-break forces a reboot, and often this effect follows. This would seem to point to the SSD, but I tried replacing that with a SATA hard disc, and got the same thing. The machine often recovers after several attempts, occasionally it needs to be powered off for some time, sometimes it won’t boot until I’ve taken the lid off, wriggled a few things, and put the lid back. On one or two occasions leaving it at the screen shown above eventually continues after half an hour or so. At the moment I’ve tried all of those except fiddling with the hardware, without success. I have Harinezumi installed. When this happens, Harinezumi hasn’t started, i.e. there’s no error log stored, and no screen output. I’ve been discussing this with Andrew, but no solution has presented itself yet. Has anyone seen anything like this, on any type of machine? Does anyone know what is supposed to happen at this stage of the boot process? Update: still not booting. I’ve now monitored the 5v and 12v supplies during startup. Both are slightly high and stable. I’ve tried with each of the USB hubs powered down in turn. No change. I’ve tried with the keyboard/mouse connector in each of the hube in turn. No change. I’ve tried with the SD card in and out. No change. I’ve been backing up to the SD card, so at the moment it’s plugged into Win32_Disk_Imager being saved – just in case. It’s going to take an hour. |
Bryan (8467) 468 posts |
a variety of issues like that would lead me to check the power supply. Alternatively, if wriggling a few things helps, then it may be time to buy a Raspberry Pi (Not 400). Nothing wriggles. |
Alan Adams (2486) 1149 posts |
I’ve already got 9 rPis doing various things. As you’ll see from the update, I can’t see a power supply problem. I tested 5 volts as 5.18 and 12 volts as 12.1. Retesting now however 12 volts is still 12.1, 5 volts is 4.8. It might be time to try my bench power supplies – 2 amp max, but might be enough for this. However as I don’t have any SATA power plugs, I’d have to power it into the built-in PSU, and that may not be a good idea. UPDATE: Now found out what that was – on one multimeter it said 4.8 on the 200v range, and 5.1 on the 20volt range. A different multimeter says 5.1 on both ranges, so I’ll believe that one. |
Steve Drain (222) 1620 posts |
Identical, on a mini.m (same processor) from time to time over quite long period. I put it down to temperature problems, but that might just be a comforting story. ;-) |
Alan Adams (2486) 1149 posts |
Progress of a sort. I swapped in the SATA hard disc I had used before, and the system immediately booted. It’s looking as though the crash while Hermes was fetching mail may have corrupted the SSD, or the SSD failed causing the crash. I’m going to try using the DVD connection on the SSD to see whether DiskKnight can do anything – or even whether it can be seen. |
Alan Adams (2486) 1149 posts |
So I then left it running for an hour. When I came back, Hermes was once again stuck partway through a fetch. Control-break, attempt to reboot. Black screen. The Hermes connection would seem to be because it’s the most disc-intensive thing I do on this machine. It’s starting to look as though SSDs aren’t as reliable as physical discs (based on a sample of 1). More importantly, do I replace the SSD with another or with a disc? I’ll need to give that some thought. I can get a 2TB disc for less money than the 256GB SSD. With that much space, I can do full backups between the ARMX6 and the laptop. I was looking at NAS for that. |
Alan Adams (2486) 1149 posts |
It occurs on 5.24 and 5.27. |
Alan Adams (2486) 1149 posts |
I had thought that the wear-levelling in SSDs was handled by the drive’s firmware. Recently however I seem to remember seeing s comment that this needs operating system support, which RISC OS doesn’t have. Can anyone confirm or plaisibly deny this please? This SSD has been relatively lightly used for 3 to 4 years. If it’s worn out in that time, which seems to be the case, it suggests they are not the most appropriate choice for a RISC OS desktop machine. |
Dave Higton (1515) 3526 posts |
TTBOMK, no-one will guarantee a spinning rust drive for more than 3 years either. We just have to accept that there is no mass storage mechanism that lasts for longer, even though the wear-out or failure mechanisms are entirely different. Regardless of the possibility of making a warranty claim, drives rapidly become less reliable after their expected lifetime. I try to keep to a policy of replacing each of my drives after about 3 years, be they SSD or rotating. And, of course: during the life of the drive, make sufficiently frequent backups. |
Stuart Swales (1481) 351 posts |
As Dave says, don’t hang to one drive too long! |
Alan Adams (2486) 1149 posts |
What’s worrying is that I started getting occasional boot failures like this when the drive was only one year old. I’m goinbg back to the tried and tested technology – if spinning drives can stand up to use in Linux or Windows servers where they get hammered to death, they should last a good long time on RISC OS, which has indeed been my experience of them. At least replacing discs is a good deal easier on RISC OS. |
Alan Adams (2486) 1149 posts |
So I now have two 500GB discs on order. I’ll fit both and back one up to the other. That way I should always have a spare boot device by just swapping cables. I don’t use the DVD so I will use the second cable from that. And it still costs less than replacing the SSD. |
Raik (463) 2061 posts |
As Dave said, nobody will give a guarantee but the only problem I have with a SSD on MX6 was the SATA cable. The first one I use was a standard, only plugged. And yes, I have any backup drives ;-) |
Rick Murray (539) 13840 posts |
You’re probably thinking of the discussion regarding TRIM. If the drive knows which blocks are not in use by the filesystem, it can perform an enhanced version of wear levelling by shuffling around blocks used with blocks unused. If the filesystem doesn’t mark which blocks are unused, then the device must assume they’re in use once they’ve been written to, even if the drive is actually ‘empty’… it doesn’t know that.
Faulty drive or slight incompatibility that causes spurious problems? I say this because my media is an 8GiB SanDisk SL08G µSD card from 2016. Was used in my Pi (via adaptor) and now is used in my Pi2. It is periodically cloned so I have images of it on spinning rust. Still going strong (uh-oh, have I just jinxed it?). ANY media that starts crapping up after just a year is surely faulty. I used to get more than a year out of flash devices in my PVR (recording live video from TV); and the SD card in my dashcam is still going despite receiving an absolute clobbering (it spits out MJPEG in three minute chunks, each of which are over half a gigabyte in size; rotational recording is only able to keep about the most recent 24 (or so) minutes). I’ve driven a little over 10,000km in a car that does ~48kph flat out. Just try to imagine how much data has been thrown at that poor SD card!
In my experience, what kills spinning rust is not how much the mechanism gets clobbered, but how often the device is power cycled (and hence warms up, cools down, and of course the head landing and spin-up). |
Alan Adams (2486) 1149 posts |
The ones in the machine have metal clips. All the spares I have accumulated don’t have them. Changing the cable for the SSD is a pain as it’s under the CPU board. I don’t recall the exact detail of the one-year problem. I don’t think it showed the mod_init screen, but that might be because of a different rom build rather than a different fault. I do remember it occurred during the lunch break while running a 2-day competition, and it took an hour of fiddling before it decided to boot up. As a result of this, I’m using an rPi for the main server as it’s more reliable. The client network is all rPi too – up to 9 of them. |