Pi 3 shutting down-ish
Jeffrey Lee (213) 6048 posts |
RISC OS doesn’t broadcast gamma – but (on the Pi) the gamma hardware is still enabled, so BCMVideo programs a linear gamma ramp during the mode change.
Yeah, the GPU does do odd things. BCMVideo will crop the displayed area, or hide the pointer completely, depending on how much is on-screen. So it’s some kind of bug in that logic which is causing the phantom pointers to appear. |
Jon Abbott (1421) 2651 posts |
Seems to be fixed in the current nightly build. Not sure why I was still seeing them with your last test build.
I’ve tried with different mice/keyboards and no software loaded, which made no difference. Haven’t had a chance to test different monitors yet though. I can however eliminate the issue by lowering the GPU speed in CONFIG/TXT:
It starts becoming more of an issue once core_freq goes above 200 (I believe the default is 400 on a Pi3) |
Jon Abbott (1421) 2651 posts |
Further testing and it would appear lowering the lower frequency also fixes it, with no other CPU speed settings in CONFIG/TXT:
These do result in a rather slow boot time though, I suspect because the CPU isn’t at full speed until RISCOS has done its init sequence. Makes you wonder how much quicker the boot sequence would be if it had a two step loader, so it’s not loading/expanding a multi MB OS image file until the CPU is at full speed. Interestingly, these settings also lead to a blank screen for a fairly high percentage of boots. RISCOS is loading to the desktop, but the screen never initialises. Not sure if its another issue or the one we’re investigating. |
David Feugey (2125) 2709 posts |
An issue? I can remember that it’s an issue after 85° for the Pi3. |
Jon Abbott (1421) 2651 posts |
Processor temperature doesn’t appear to be a factor in the blanking issue, but processor speed definitely does. With the min frequencies lowered, it didn’t blank once yesterday – which has me totally stumped as most of the time the processor was reading 1200MHz via CPUClock. I suspect however that the CPU is fluctuating rather quickly between its high and low speeds either outside of control by RISCOS (the blob / CPU throttling) or by WFE / WFI instructions in the OS. |
Jeffrey Lee (213) 6048 posts |
I’ve noticed an issue though, a rogue pointer is left on screen every time you send a GraphicsV 5 with Y set to the screen height – you can end up with dozens of them. OK, I finally put in the effort to debug this. The problem was that the code was sending updates to the GPU faster than the GPU was processing them (it looks like dispmanx likes to wait for the end of the frame to apply some types of change). Eventually the GPU would start returning errors, but BCMVideo wasn’t checking for them, resulting in problems like BCMVideo thinking the pointer had been hidden (dispmanx element removed/destroyed) when in reality it was still visible – resulting in a second pointer appearing when the pointer moves back on screen (new dispmanx element added). This ROM has the code changed so that it’ll detect the errors and avoid getting out of sync, and seems to work for my test code: http://www.phlamethrower.co.uk/misc2/bcm2835dev.zip However, in theory this bug will have only been introduced in July last year (after the shutting down-ish bug was first spotted), when I added GraphicsV overlay support. The hardware pointer code used to use EDispmanUpdateSubmitSync (which would block until the GPU applied the changes), but that was causing some overlay calls to block until the end of the frame, so I changed everything to use the non-blocking EDispmanUpdateSubmit instead. |
Jon Abbott (1421) 2651 posts |
I’ll test it later, if it doesn’t blank within 30 seconds, it’s had an effect. |
Jon Abbott (1421) 2651 posts |
Your test build blanked with a few minutes of use, but I can confirm it fixes the ghost mouse pointers. |
Andrew McCarthy (3688) 605 posts |
The test build doesn’t blank with firmware dated 18-Oct-2018 which I took from the NOOBS lite build that deploys with RISC OS 5.26. Tested with mode changes enabled and disabled. It blanks with firmware dated 23-Jan-2019. |
Jeffrey Lee (213) 6048 posts |
Probably because the test build (and recent ROOL nightly builds) uses the mailbox interface for setting gamma, which is a recent addition to the firmware (some time around 9th Jan) |
Jon Abbott (1421) 2651 posts |
As you can reproduce the issue, see if adding the following to CONFIG.TXT fixes it:
|
Andrew McCarthy (3688) 605 posts |
It didn’t fix it. Rather than the screen just going blank, on start-up the desktop is briefly displayed and then the picture changes, leaving me with a desktop with a strange effect applied to it. |
Jon Abbott (1421) 2651 posts |
So it still blanks randomly at the desktop? Or did you never manage to get to the desktop?
GPU startup is flaky as hell, just keep power cycling and it should eventually work. If not, try increasing arm_freq_min / gpu_freq_min – the breakpoint at which it starts randomly blanking at the desktop is 200 for me. I’ve been using my pi-top for a week now, which is over 100 hours and its not blanked once with those settings. Startup is an issue, but I prefer it to blank during power up rather than randomly at the desktop! As an added side effect, my battery life has gone through the roof. I left it on for 12 hours last Saturday and it reckoned it still had 4 hours remaining! |
David Pitt (3386) 1248 posts |
Ever since the first mailbox gamma test ROM nearly three weeks ago my RPi3B+ has been stable, until today and the shutdown-ish fault bit again! The circumstance of the fault was exactly the same as the first time I saw it. The Pi was in use via VNC, a StrongED TaskWindow was commanded from the Task Manager icon and the Pi froze. A look at its HDMI showed only blackness. *fx0 RISC OS 5.27 (31 Jan 2019) bootcode.bin 09Jan19 fixup.dat 22Jan19 start.elf 22Jan19 |
Jon Abbott (1421) 2651 posts |
Switching gamma to a mailbox did not affect the issue, your increased stability is probably as a result of the reduction in gamma changes due to the flashing palette bug fix.
I think I’ve already raised that StrongED triggers the issue and is what triggers it for me in day to day use. It’s nothing to do with StrongED though, it’s simply down to the pointer changing as you move around the StrongED frame…which bring us back to the hardware pointer being the trigger for the issue. The recent change that checks if the GPU is overloaded doesn’t affect the issue. Lowering the low GPU speed does however affect it, so it’s possibly either a timing issue, blob issue or hardware fault in the GPU. VCHIQ still hasn’t been ruled out, it is possible to get the machine into a locked state due to its reliance on RTSupport, but that’s a seperate issue that I reported previously and Jeffrey is aware of. We don’t believe it’s related, but as the pointer changes are handled by VCHIQ it can’t be totally ruled out as the cause. I know that switching the pointer away from VCHIQ to a mailbox does resolve the issue, but if we do that, we then don’t know what the cause was…it may be an underlying bug in VCHIQ that needs resolving. The Pi devs haven’t responded to my last update yet and I’ve run out of things to test, but if anyone has any suggestions… I think the only thing left is to pick through the VCHIQ source code. |
John Sandgrounder (1650) 574 posts |
I have a Raspberry Pi 3 which has been giving me grief ever since I got it. It would run for an indeterminate time after boot and then the screen would go black. Sometimes it would last for an hour or so, but usually it would crash very shortly after boot. I have been reading this topic and the similar one on the Raspberry Pi forum. One of the things said on that forum was that some Pi heatsinks can do more harm than good (as they are stuck on with a heat insulating tape!) So, I took the heatsinks off my Pi 3. I have also have the parameters on !CPUClock set to run the Pi 3 with configured fast speed and configured slow speed both at 1200 MHz. It has been idling now for most of the afternoon at a temperature of 51’C. It is still OK now whilst I am using !RDPclient to post this. Temperature reported by SYS “Portable_ReadSensor” is now 53’C. I will post again if and when it crashes once more. I am using the modified 5.24 build produced by Colin (11th May) to fix keyboard repeat issues. |
Clive Semmens (2335) 3276 posts |
No heatsink on my Pi3 (nor the remains of any tape!) and never been a problem, running almost continuously for over a year now – the only times it’s been down has been when I’ve crashed it with duff code of my own. |
Tristan M. (2946) 1039 posts |
That 3M tape is an insulator? I’ve used those heatsink on so many things. Oh no :( I guess it’s time to get some of that adhesive heatsink goo. Not to be confused with the other stuff that just conducts but doesn’t adhere. My heatsink-less pi zero has been running a steady 60*C for a couple of weeks. I can smell it. Hot plastic and components. My Pi3 has the taped heatsink. It is so easy to overheat. Now I think of it, it is prone to the screen blanking. |
John Sandgrounder (1650) 574 posts |
I do not know which of the available Pi heatsinks have the insulating tape. I have just removed them anyway. I have also punched some better ventilation into the Pi case and separated the Pi case from the SSD case by a few milimeters. |
Chris Evans (457) 1614 posts |
How effective each one is I do not know but I’m sure all manufacturers of these heatsinks will call them ‘conducting tape’ not ‘insulating tape’. |
John Sandgrounder (1650) 574 posts |
My source of information on the Raspberry Pi forum was not saying what they were called, but what the effect was from some but, by no means all, of the products. I would expect the majority to do the job properly. But mine are not all the same make. |
Jon Abbott (1421) 2651 posts |
You need to be running the very latest RISCOS 5.27 build and a fairly recent firmware to get any of the bug fixes that have been implemented as part of the testing done in this thread. If setting the low CPU/GPU setting to match their high speeds doesn’t fix the blanking, try my suggested low settings above.
I would not advise using a Pi3 without a heat sink if you’re going to be actually using it. Any data intensive task will push the temperature past the throttle point. Examples of things that will generate lots of heat are compression/decompression, USB data transfer, and rather oddly network packet retries! Whilst FTPing files earlier mine hit 85C with a proper heatsink. It will work fine without a heatsink, but will throttle a lot more. Heatsinks generally come with Heatsink tape or 3M Thermally Conductive tape and if they don’t, it’s cheap to buy. |
Jeffrey Lee (213) 6048 posts |
I’ve started work on converting VCHIQ to use synclib to see if that resolves the deadlock issue – it looks pretty straightforward, so I should hopefully have something to share in a day or two.
Well, we all know that the current network stack is far from |
Clive Semmens (2335) 3276 posts |
Not having the top on my Pi’s case seemed to be all that was necessary, but then it’s rarely above 25C in this room. It’s 21 in here at the moment, and the Pi, working hard, is running at 58.2C, idling at 48.0. |
Jon Abbott (1421) 2651 posts |
and rather oddly network packet retries! ping and DNS resolution put my temp through the roof when the network stack’s DNS is playing up. It’s adds 20-30C! I’m assuming there’s some tight loops that don’t WFI/WFE (Portable_Idle is probably more appropriate) or yield.
I’ll have to figure out how to reproduce it, my previous post implies it was when GraphicsV 2 is called during GraphicsV 1 if the VSync was soft generated via RTSupport with a priority below 128. I recently implemented a botch to stop Hero Quest locking the machine, so I’ll see if that’s still necessary with the updated VCHIQ. Moving where the soft VSync was generated resolved it, so it may or may not be related. |