Pi 3 shutting down-ish
Jon Abbott (1421) 2651 posts |
I’ve just dumped the HVS registers requested by the Raspberry Pi devs and whilst doing that, I again noticed the issue only occurs at the desktop. I’ve previously commented that the issue didn’t occur when running games under ADFFS, which I put down to the lower resolution given the desktop also doesn’t blank at low resolutions. With the desktop at 1920×1080, if you run the following program and sit in the Wimp Command window, the screen refuses to blank until you return to the desktop – where it will blank within seconds on my Pi.
What’s going on in the background that alters Pi registers whilst at the desktop, that doesn’t occur when single tasking? The mouse pointer can be ruled out as that’s active when at a Wimp Command prompt and doesn’t trigger the issue. |
Rick Murray (539) 13840 posts |
Don’t know about registers, but what comes to mind is the clock speed changing as the system goes in and out of idle. Could this be a factor? You could, perhaps, try “ Just a thought… |
Jon Abbott (1421) 2651 posts |
It wouldn’t surprise me if this issue is somehow linked to a clock, it’s well known that changing the core frequency breaks all manor of things including I2C, SPI, SD, UART etc as they’re all derived from ratios of the core speed. |
Jon Abbott (1421) 2651 posts |
The Raspberry Pi devs have now implemented a mailbox for gamma and provided example code |
Jon Abbott (1421) 2651 posts |
I was going to try coding this up to see if the screen blanking still occurs, but having read the BCMSupport documentation I’m not sure I understand how to send data to mailbox 8012. Do I need to allocate a 32 byte buffer via BCMSupport_AllocPropertyBuffer and fill it as per the block at line 24 in the example code and then call BCMSupport_SendPropertyBuffer ? What memory address do I put into the block for the gamma channel data? The example code looks like it’s passing a logical address, but I would have thought it should be physical? |
Jeffrey Lee (213) 6048 posts |
Correct. Note that the buffer allocated by BCMSupport_AllocPropertyBuffer won’t be accessible in user mode (omission in the documentation; the current implementation is just a wrapper around PCI_RAMAlloc / PCI_RAMFree).
It’ll have to be a physical address. I’m not sure how the mailbox_videocore_alloc function that the example uses is implemented, but I’d guess that PCI_RAMAlloc will be a suitable substitute under RISC OS. I still don’t have a reliable repro for the bug, but I can try updating BCMVideo to use the new call tonight, and upload a test ROM. |
Jon Abbott (1421) 2651 posts |
I can repro it reliably, so was going to do the testing. Just let me know once its in the build. The last firmware build I tested on was 18-07-18, so I’ll update to today’s firmware and confirm it’s still occurring. EDIT: The issue is still there with the latest firmware 09-01-19, so time to see if using the mailbox to set gamma makes any difference. |
Jon Abbott (1421) 2651 posts |
I’ve coded it up using a mailbox and it makes no difference, the screen still goes blank within seconds on my Pi3. I’ll respond on github to that effect. |
Jeffrey Lee (213) 6048 posts |
If you want a second opinion, here is a ROM (+ relevant updated source) which has been changed to use the mailbox interface for gamma. The disable_gamma option should still work. I’ve also optimised MergeUpdate so that it only includes the necessary tags in the message (since that’s now used for the gamma as well) – possibly that will make things more stable. |
Jon Abbott (1421) 2651 posts |
No difference I’m afraid, it still blanks. I tried the example app under Linux and could not got the issue to occur, so its likely something RISCOS is doing. I know I’ve already asked this question, but what exactly is the OS doing with VC when at the desktop? A recap on what I’ve seen so far with the hardware fixed at 1360×768 @ 50Hz:
I’ve not yet tried the ADFFS GraphicsV driver at the desktop, but I suspect it will stop the issue as it only supports legacy resolutions. Is it worth outputting all BCMSupport mailbox traffic to a file so we can see if there’s a correlation? |
Jeffrey Lee (213) 6048 posts |
Off the top of my head:
That could be worth a shot, yes |
Jon Abbott (1421) 2651 posts |
Is there a quick way I can rule out VCHIQ and mailbox messages? I’m guessing bad things will happen if I block the SWI for both at the desktop, to see what happens? I seem to recall the machine stiffs if certain VCHIQ calls get blocked. I might get away with blocking BCMSupport SWI if it’s just palette and CPU changes, I’ll knock up a quick test tomorrow. Can audio VCHIQ be stopped by killing SoundDMA? Can pointer VCHIQ be stopped by killing a Module? |
Jeffrey Lee (213) 6048 posts |
VCHIQ can easily be ruled out by *Unplug VCHIQ. (That command will stiff the machine due to a bug in VCHIQ, but if you reboot afterwards it should be unplugged). The OS will fall back to using a software mouse pointer, so should still be usable. Mailbox messages are a bit trickier since BCMVideo will refuse to start if BCMSupport isn’t running. However, you should be able to rule out CPU speed changes by RMKilling the Portable module, and you should be able to suspend BCMVideo’s MergeUpdate routine by writing a nonzero value to the tagbuffer_busy variable (located 4 bytes into BCMVideo’s workspace). Apart from the calls it makes on startup, that will stop everything BCMVideo uses the mailbox for, except mode changes and reading EDID. Clearing the value to zero should resume normal operations (screen blanking, scrolling, palette updates, gamma updates). Note that you might need to try writing to tagbuffer_busy a couple of times for the value to ‘stick’ (the tag buffer is used for asynchronous mailbox messages – so if BCMVideo was in the middle of using it then it might clear it back to zero once it gets the reply from the GPU). If you’ve got some code handy for monitoring SWI calls then once you’ve killed things off you can just double check that nothing’s calling BCMSupport_SendTempPropertyBuffer or BCMSupport_SendPropertyBuffer. |
Jon Abbott (1421) 2651 posts |
Killing Portable substantially reduced the blanking, I only managed to get it to blank once during roughly one hour. With Portable active, it usually blanks within seconds. With BCMVideo blocked, I couldn’t get it to blank during roughly an hour and a half, but will continue testing later. Corrupt mailbox message perhaps? It’s possible you might be able to repro it by flooding CPU speed mailbox messages in the background. |
Jeffrey Lee (213) 6048 posts |
Another thing to try would be using Portable_Speed2 to set the slow & fast CPU speeds to the same value. E.g. 8,0,0 to force to low speed or 8,1,1 to force to high speed. That should stop the CPU speed mailbox messages without stopping Portable_Idle. |
Jon Abbott (1421) 2651 posts |
Not sure if its relevant but every time its blanked whilst testing today, I’ve been rebuilding the games packages for PackMan which maxes the CPU whilst single tasking in a command window. To double check this, I’ve just kicked it off again and sure enough the screen blanked whilst building the packages. Reboot, F12 then rebuild and it doesn’t blank. I’ve also tried RMKilling BCMSupport at the desktop and kicked off the packaging and it blanked, so we can rule out mailbox messages as the root cause, but they do appear to be a contributing factor. The test I’m currently undertaking is to leave Portable, BCMVideo and BCMSupport active but unplug VCHIQ. It’s not blanked yet, 30 mins in and I’ve been running the packaging continuously which up until now has triggered the issue every time. What’s using VCHIQ and how can I rule them out individually? Given reducing mailbox traffic reduces the chances of it blanking, I can’t help but think RTSupport priorities might be a factor here. |
Jon Abbott (1421) 2651 posts |
With VCHIQ unplugged I’ve not managed to get it to blank after an hour of rebuilding packages. RMKilling either SoundDMA or BCMSupport with VCHIQ active it still blanks. I’m currently testing with the pointer turned off at the desktop. Is there a way to force it to a software pointer, but leave VCHIQ active? EDIT: Forgot to mention that turning the pointer on at an F12 prompt and building the packages causes the screen to blank EDIT2: Its looking like the pointer is the prime suspect as I can’t get it to blank with it turned off at the desktop and can get it to blank at the commandline if I turn it on. |
Jeffrey Lee (213) 6048 posts |
I’ve uploaded an updated ROM which has the hardware pointer disabled. Apart from that, the only change from the previous ROM is fixing VCHIQ to be unkillable (so *Unplug will no longer crash). |
Jon Abbott (1421) 2651 posts |
I’ve been using it for roughly two hours and it’s not blanked yet, I will continue testing later just to be sure. If the hardware pointer is the issue, the next obvious questions that spring to mind are:
EDIT: I’ve been using it most of the day and it’s not blanked once, so I’m pretty sure this is the issue. Do you (or whoever initially raised the issue on Github) want to pick this up with the Raspberry Pi devs, or should I continue the conversation with them? |
Jeffrey Lee (213) 6048 posts |
Another new ROM. This has the hardware pointer enabled, but the logic fixed so that it won’t send redundant position updates to the GPU (previously it would be sending position updates every frame, even if the pointer was sat still). So if the pointer is sat still, and nothing is writing to the pointer palette entries, and the image isn’t animating (e.g. just sat idle at the desktop), there shouldn’t be any updates being sent to the GPU. Should help to rule out whether it’s the presence of the pointer that’s causing the problem, or the way we’re sending updates. |
Jon Abbott (1421) 2651 posts |
A useful update in itself. It blanks within a few minutes of being sat at the desktop, whilst also doing heavy CPU/file activity in the background. So it’s looking like its the presence of the pointer that’s triggering the issue. I’ve reproduced it about a dozen times just to be sure. However, I’ve not yet got it to blank when the CPU is sitting idle, but will leave it on for a few hours to see what happens. What does that tell us, given the Pi’s flacky power regulation? |
David Pitt (3386) 1248 posts |
FWIW… The shutdown-ish issue has always been erratic on my RPi3B+ and the RPi3B before it. Shutdowns could happen shortly after start up, indeed when the RPi3B+ first arrived here it had hardly been through the letter box for ten minutes and it had fallen over twice. On other occasions the fault may not appear for days. Jeffrey’s first test rom, 14Jan19, was stable for three days without As I said, FWIW… |
Jon Abbott (1421) 2651 posts |
It’s been sat idle for over three hours and not blanked, so CPU/IO load or related elements appear to be a factor. I need to rule lots of contributing elements out (CPU load, IO load, mailbox messages, VCHIQ sound) to figure out which combination with the hardware pointer triggers the issue. Before I do all that though, I’m going to try powering the Pi3 from a different power source and repeat the tests as I’m beginning to suspect this is an obscure power related issue.
So it’s likely more than just CPU load and the pointer, try adding IO to the mix. See if you can repeat my scenario. Drop to an F12 prompt and run something that loads the CPU and IO with and without the pointer enabled (via *POINTER) and see if it only blanks when its enabled. |
Jon Abbott (1421) 2651 posts |
I’ve ruled out mailbox messages and VCHIQ sound (assuming RMKilling BCMSound stops it), which leaves CPU load and IO load (ie SD card reads) to rule out. |
David Pitt (3386) 1248 posts |
I am beginning the think that gamma via a mailbox does fix the shutdown-ish thing, as was seen here anyway. More testers perhaps? |