Trouble with a Raspberry Pi 4
Matthew Phillips (473) 719 posts |
I’m trying to fix remotely (by telephone) faults with the Raspberry Pi 4 which is my parents’ main RISC OS computer. There seem to be multiple problems that have arisen at the same time. This could be coincidence, or perhaps there is a common cause I have not spotted yet. The machine is a RiscOS Bits PiHard, with an SSD inside the case, using a USB-SATA adapter. The machine runs SafeStore and is scheduled to do a full backup once a month. They back up onto another SSD attached via another USB-SATA adapter. It was when this monthly backup was underway that the problems first arose. I think this was the first back up since they replaced the power supply — the old one had a dodgy plug and they had been getting the under-voltage indicator on the monitor. It is possible that the new power supply (the official RPi one) does not deliver enough power for both USB-SATA adaptors and drives to be powered, though we never had trouble with the old supply in this respect. I was away at the time, so Hilary dealt with it, and I do not have details of what actually first went wrong. Once I became involved, the machine was booting OK, but soon after getting to the desktop, SafeStore would pop its window up as the monthly backup was not finished, and then the machine would hang. Alt-break would get out of SafeStore on the second attempt, but the SafeStore desktop application soon reappeared as the module was still active. By using F12 we managed to disable SafeStore by renaming its !Boot and !Run files. I then thought it would be good to check the disc. We tried *verify which was fine, but *checkmap showed “Map inconsistent with directory tree”. They had the checking-only DiscKnight handy. The only fault that reports is 1 objects in map not referenced by directories It sounds from that report that somehow a file was being written and is allocated in the map but has no directory entry so we cannot read it. I don’t know how FileCore works in detail, but I am guessing that this fault would not stop the disc from being used fairly normally. I thought it would be a good idea for them to email me the SafeStore logs to investigate whether anything was reported there. I got them to zip them up, but when Hermes started sending the email, there was an error about not being able to validate the SSL certificate (for the SMTP server, I imagine). I asked them to check Internet.files.CertData. The contents gave a date in 2021 which is perhaps a bit old. Funny that this should kick in at the same time as the other problems though. As at Sine Nomine Software we distribute the CertData file in a set of modules and boot resources I suggested they use NetSurf to fetch it from http://sinenomine.co.uk/software/resources.zip (using http to avoid the certificate errors, though NetSurf may well use a different file, I suppose). We then had the third weird thing. NetSurf would not download the file and just sat there saying “Loading”. This also happened if we went to http://sinenomine.co.uk/ What could have caused the disc error, the certificate issue for Hermes and also stopped Netsurf from working properly? It all seems very coincidental. I did get them to ping 8.8.8.8 to check the network and that was fine. TCP/IP certainly seems to be working. I tried FTPc also, hoping they could sign into my web provider’s FTP server to transfer the log files to me that way, but although it connected it would not authenticate. It’s possible the FTP server is locked down to our home IP address but I could not find the setting, if it is! I’ve not got them to try other websites yet. That would be a useful step, I suppose. If we cannot download anything or access email, it might be rather hard to fix anything! Suggestions for what to try next very welcome! They are a 2.5 hour journey away so telephone seems to best option at the moment. They have a tablet and a Chromebook in the house so email is accessible like that, but transferring to/from RISC OS is not easy. |
Steve Pampling (1551) 8154 posts |
It might be that their system isn’t recognising the certificate on your FTP server (you do use certificates, I assume) |
Chris Hughes (2123) 336 posts |
I have a number of suggestions regarding the issues with your parents Pi. 1) Get the full version of DiscKnight and do a repair, this might sort the initial issue out. 2) SafeStore might have a damaged tree log, thus you might need to use one of two options within SafeStore’s jobs window, menu over the window and you will hopefully see rebuild tree, and refresh, these will do a full rebuild of the backup. but fix the drive FIRST. 2a) Adjust SafeStore’s setting to allow it to wait say for at least 40 secs after bootup before it starts it backup to ensure the startup has fully completed. 3) speak to Andy at RISCOSbits as he will be able to support your parents as well. 4) the broken file might well be in the internet stack which would cause internet issues, Which Internet stack are they using. Maybe a reinstall of the stack will fix that issue. Hope the above is some help. |
Paul Sprangers (346) 523 posts |
My experience with SafeStore is that this doesn’t work. What does work is manually deleting the file Tree in directory |
Chris Hughes (2123) 336 posts |
Paul which version of SafeStore are you using. It does work! I have used it a couple of times in the past. I am using version v2.12.01 of SafeStore currently. |
Paul Sprangers (346) 523 posts |
That’s the version that I’m using. It may have worked sometimes, but in the majority of cases the error keeps popping up. Deleting the Tree file is the ultimate work around – at least for me. |
David J. Ruck (33) 1629 posts |
It’s probably something that was being written when the machine hung. DiscKnight can fix that and after rebooting the file will be in $.Lost+Found Examine it in an editor to see if it is anything recognisable, but it’s probably junk. That on it’s own should not cause the backup to hang, as it could do if it encountered a broken directory. |
Dave Higton (1515) 3497 posts |
It sounds very much like power supply issues. I don’t know the schematic of the RPi4, but the RPi3’s power supply to the USB devices goes through a single current limiter, which provides enough power for one external SSD but probably not two. It used to be possible to buy powered USB hubs, but they seem to have disappeared since developments in USB C power arrangements made them unnecessary. I wish I could offer a solution, not merely suggest a diagnosis of the problem. |
Rick Murray (539) 13806 posts |
Yup, it’s worth remembering that a Pi isn’t like a regular computer with a beefy PSU.
You can still get them from the tat bazaar. https://www.amazon.co.uk/dp/B08ZKSK6MB If you want to hang power hungry things off a Pi, it’s probably best to power them separately from the Pi itself. |
Matthew Phillips (473) 719 posts |
Thank you for the suggestions about the power supply issue. It’s annoying, because the old power supply seems to have been capable of supplying enough juice for two SSDs. The problem was that it wasn’t secure in the socket and when it wiggled a little out they got the under-voltage indicator. The new power supply plugs in securely, but seems not to provide so much power. Steve was closest on the cause of the network issues. I rang again this evening and we worked out that the clock was set to 11 June 2224, so 200 years out. I’m not sure what could cause that. The clock had been set to pick up the time automatically from an NTP server. We had to change it to “set manually”. Using the nudge buttons to adjust by 200 years was tedious. The old Alarm allowed you to type. Once the date was correct, NetSurf and Hermes were happy again, and I now have a zip file of SafeStore logs to inspect. We have updated the CertData in Internet.files, though it doesn’t look like the 2021 file was causing any issues. After correcting the clock manually we did change back to automatic, but clicking the Try button resulted in “Not connected”. The option to “Pick a server automatically” is ticked. I have not tried manually defining which server to use. I am not sure how the automatic picking works. So for the moment we have left the clock set manually. I am unsure whether my parents have the RTC option available with the PiHard. We did try turning the machine fully off and on again to check the clock stayed on 2024, and it was OK. Now that they have a working network connection again we could try buying DiscKnight in order to repair the disc. Not tonight though: I’ve had enough of IT support for this week. |
Dave Higton (1515) 3497 posts |
You can change the output connector. They are readily available and inexpensive. The bits to take care of are: get a connector of the correct dimensions; and make very very sure that you connect it up with the correct polarity. This isn’t difficult, and the more times you check the result before you plug it in, the better you will feel! It’s also worth remembering that you should never touch low power electrical contacts, such as the end of batteries, and the barrels of barrel connectors. The sweat from your fingers will eventually make the connection less reliable. (Not a problem with mains connectors; there’s enough voltage available to penetrate through the layer of dirt.) |
Rick Murray (539) 13806 posts |
Ouch! For the future: you can use *Set with the Sys variables, like:
|
David Pilling (8394) 96 posts |
I take pleasure from using my (official) Pi 5 power supply on the Pi 4 – a lot greater capacity. With older Pi’s I have used powered USB hubs. Looking forward to the Pi 6 power supply. |
Steve Pampling (1551) 8154 posts |
Bit of a rush job comment, or I would have explained that SSL (or more accurately these days, TLS) doesn’t work properly if the client and server time differences exceed certain values. It’s there to stop replay attacks. |
Matthew Phillips (473) 719 posts |
What does RISC OS actually do when configured to “Pick a server automatically” for the time? What does it try? |
Alan Adams (2486) 1147 posts |
Good question. What it seems to do here is say “busy”. I have an ARMX6 and an rPi on the same network. Both set to get time from the network. The ARMX6 succeeds always, the rPi frequently doesn’t. All other network stuff works on both. |
Chris Gransden (337) 1202 posts |
You should see the value pool.ntp.org greyed out. *netttime_status confirms the ‘Last Server’ used. https://www.ntppool.org/en/use.html shows what goes on behind the scenes. |
Matthew Phillips (473) 719 posts |
So you’re saying that is what it’s using? I did wonder, but I also wondered whether something like DHCP might be able to tell the machine which NTP server to use, e.g. one provided through the ISP, rather like happens for DNS servers. |
James Pankhurst (8374) 126 posts |
DHCP can provide that, but finding a home (read ISP provided) router that lets you configure it is another matter |
Clive Semmens (2335) 3276 posts |
Really? I assumed they all did – we’ve had two from Virgin and now one from EE, and they all have. Thanks for the heads-up in case we need to change again! |
James Pankhurst (8374) 126 posts |
Guess it depends on the ISP and how much they customise or lock down the router, or not. |
Steve Pampling (1551) 8154 posts |
Yup. DHCP option 042, which the client machines largely ignore. Windows like to use the local domain controllers when the machine is a domain member or a selection of MS preferred if not. All can be beaten into a sort of submission with NAT on a firewall (fooling them into talking to the locally provided NTP servers) |
Steve Fryatt (216) 2103 posts |
I’m not that familiar with NetTime’s inner workings, but when I glanced at the source the other day, there was mention of a collection of system variables being read in sequence, some set by DHCP and some (presumably) by configure. I didn’t spend long enough to work out whether it used DHCP with a fallback (of pool.ntp.org?) unless the user explicitly set some other server. It’s probably documented somewhere… |
Frederick Bambrough (1372) 837 posts |
This thread prompted me to investigate why ‘Pick a server automatically’ was permanently ticked and greyed out to be un-selectable. Seems that option is only available if the network settings are done manually rather than with DHCP. In which case I now find that the time is not fetched on boot. The ‘Try’ button works so presumably the time will be fetched after the delay between checks. Is this a bug or have I missed something? |