SmartReflex fails over time?
Grahame Parish (436) 481 posts |
I’ve got a BB xM running RO5.21 from about two months back and the appropriate HD contents. It’s on all the time and rarely rebooted. Overnight it runs a backup to the NAS using 7backup and Sunfish for NFS access. I’ve noticed that after a variable amount of time since last reboot that it stops speeding up for intensive processes like the backup. When all is well the backup takes 30-35 minutes. Once the problem starts it can take 1-7 hours for the backup! I check using CPUClock and see that the speed is stuck at 300MHz, and if I increase the lower speed limit the CPU temperature goes up and the process speeds up. A reboot will get it all running correctly again, speeding up and slowing down as necessary. This has actually been a problem for quite some time (may be a year or so?), so is not related to the ROM version I’m currently running as it has happened with previous ROMs, it’s just that I keep forgetting to report it. |
Jeffrey Lee (213) 6048 posts |
I’ve seen this once or twice myself, although I’m not sure whether it’s related to how long the machine has been running or whether it’s just an occasional random failure to change CPU speed. I guess it shouldn’t be too hard for me to come up with an automated test for the issue. |
Grahame Parish (436) 481 posts |
Let me know if you need anything testing/debug logging/etc. This seems to be occurring more frequently at the moment (temperature-related?). |
Grahame Parish (436) 481 posts |
Some more observations on this problem… I run my BB xM 24/7 with the odd reboot – usually when it gets stuck on slow speed. What I’ve seen the last few times is that it actually gets stuck at full speed, and this may be down to Netsurf. There have been times when quitting Netsurf will drop the speed (and temperature) back down again. When this doesn’t work I use CPUClock to set the low speed back to 300MHz (from 300 it’s already set at), and from this point on it doesn’t go back to high speed (1GHz) no matter what activity occurs. I’ve tried setting the high speed setting with CPUClock, but it doesn’t resolve the issue – only a reboot does at this point. It is possible therefore that SmartReflex is OK but Netsurf could be confusing it somehow, and CPUClock might be the wrong way to undo that confusion. |
Chris Johnson (125) 825 posts |
CPUClock doesn’t do any actual switching of the CPU speed. It merely uses the PortableSpeed2 SWIs to (a) read the current settings of fast and slow, (b) set the values that will be used for fast and slow and © read the actual cpu clock speed every second. The actual switching of the speed is done by the OS/Wimp. It could well be that something is bu****ing that up. I will monitor things on my PandaRO and see if there is any similar behaviour. |
Chris Johnson (125) 825 posts |
I have just checked both the ARMini and PandaRO and both machines seem to be switching between fast and slow as expected. PandaRO has been up for several days since a reboot, the BB probably a couple of days since power on. Netsurf has been used on both of them (recent versions of Netsurf – #2053 on this BB) and the OS version is recent 5.21 (this BB is 18th Aug). I will keep an eye on things. |
Jeffrey Lee (213) 6048 posts |
My BB-xM has been running for just over two weeks now, doing nothing but sitting idle and logging the CPU state every 30 seconds. So far everything seems fine – >95% of the CPU time is spent at low speed and the remaining time at high speed. It also regularly starts a new task to confirm that single-tasking apps are started with the CPU running at high speed, and so far that looks fine as well. However, the current code doesn’t actually do any benchmarks to confirm that the speed settings are affecting the performance of the CPU. I’ll hook that up now. |
Jeffrey Lee (213) 6048 posts |
…yep, the speed setting is definitely having an effect. |
Grahame Parish (436) 481 posts |
In normal use – before the system locks at one speed – CPUClock works exactly as expected, so I don’t think it is involved in the problem directly – it’s just where I can confirm it is happening. Once it has locked then CPUClock can no longer change the speed – although the dialogue indicates the changes to the low or high speed setting there are no changes in the current speed when activity starts and stops. The computer is generally running Messenger, NetFetch and Netsurf – all current versions with Netsurf being usually no more than about a couple of days behind the latest development version. Today I’ve updated to last night’s ROM and HD4 images, but I’ve already had one instance of the CPU speed getting locked at the lower setting despite opening the BBC News website in Netsurf which is usually guaranteed to send the speed to max. |
Jeffrey Lee (213) 6048 posts |
Finally been able to reproduce this. After 18 days the Wimp went “Yep, I’m gonna stop switching speed now” and left the CPU running at high speed. Manually switching speed using Portable_Speed still works, so it looks like it’s just some kind of problem with the logic the Wimp is using. Now to see if I can work out what the problem is without having to reboot… |
Jeffrey Lee (213) 6048 posts |
It looks like somehow the kernel’s ticker event chain is becoming corrupted. One of the OS_CallEvery routines had a ‘time remaining’ that was much higher than it should be (several months until the next tick, for something that should happen every 2cs), so both it and everything that was located after it in the schedule weren’t being processed. |
David Feugey (2125) 2709 posts |
I’m not sure this bug is not linked to the fact my webserver becomes crazy after a few weeks (7 to about 45 days, no more). |
Grahame Parish (436) 481 posts |
That’s great to know – it’s not just my set up. I’m getting the problem after running for periods between 2 days and a couple of weeks. It would be great to get this fixed as usually the first thing I notice is that the CPU temperature is in the high 70’s C rather than the low 50’s C when idle. I have CPUClock open all the time on the desktop to show the current speed as a warning. |
Chris Johnson (125) 825 posts |
Maybe I should think of making a version of CPUClock that simply shows the current speed in an iconbar icon. |
David Feugey (2125) 2709 posts |
With the current hour and temp? :) |
Jeffrey Lee (213) 6048 posts |
Yeah, I think this bug has the potential to break quite a lot of things. Now that I know where to look I can hopefully catch it a lot sooner than having to wait for the speed switching to fail. |
Grahame Parish (436) 481 posts |
That would be useful, especially as my wife has a tendency to ‘tidy up’ the desktop and close what she doesn’t want to see! ;) |
David Feugey (2125) 2709 posts |
Better offer: modify the switcher icon to add a little sign to give us a hint about the current speed (use different user defined sprites for each state). |