IIC probing / CMOS sanity failure
Rick Murray (539) 13840 posts |
I’m going to add this to the bug tracker, but paste details/code here as the bug tracker has a ‘different’ version of Textile that means program snippets usually get mangled. Plus I can be more descriptive. While developing some code for CJE, it would have been useful to probe the IIC buses to try to locate a device that is not present on the standard IIC bus. Unfortunately, this has some… unpleasant… side effects. Do NOT do this on a live system with important unsaved data. First up, open a TaskWindow. You’ll need it later. Here’s the code:
If you run this on a Pi with CJE’s smart RTC/power control module, the response will be: Reply is CJE1 on bus 0 Reply is ÞÞÞÞ on bus 2 Reply is üüüü on bus 4 Reply is úúúú on bus 6 Note that when you return to the desktop, everything is RISC OS 3.1 style VDU font. Run the above program again. And again. And again. You will see that it is repeatable, that IIC works (sort of) and the device on bus #0 returns “CJE1” every time. Yet, for some reason, RISC OS has thrown in the towel, decided there is no CMOS RAM, and has reverted to defaults of everything. For example: *St. BootNet BootNet Off *Co. BootNet On *St. BootNet BootNet Off *Unplug BASIC *Unplug TaskManager *Unplug No modules are unplugged * (the unplug TaskManager command will have RMKilled it, but your machine needs to be rebooted anyway…). As a result of this, it is clear that probing IIC devices is not going to work. Questions:
To reboot ‘friendlier’ without pressing a reset button or power cycling, go to the TaskWindow you opened earlier. *RMReInit BASIC *Shutdown *BASIC SYS "OS_Reset" You will need to reconfigure your machine – I found mine reset the monitor type (was Generic, 1280×1024) but had reverted to Auto (1920×1080!), mouse speed, and font cache settings. It may be resetting everything only I don’t change the other stuff from defaults. I have found, via some testing, that attempting to talk to bus 5 is what trashes the CMOS RAM (at least, on a Pi). Trying to talk to the other buses do not cause this oddity to happen. Chris has reported that similar behaviour is exhibited on a Pandaboard, so this is not Pi-specific. |
Chris Evans (457) 1614 posts |
This problem is new. I haven’t had time to try and narrow down when the problem ROM builds started but it appears that it was a few months ago. I believe an alternative solution would be if there was a way of detecting what hardware you are running on. I’ve seen tests for certain CPUs etc but not a neat 0=RiscPC, 1=Iyonix, 2=Pi A, 3 Pi B, 4 Pi B rev 2 …. 6=Panda ES… or something like that. I’ve probably not explained that well but I hope it makes sense. |
Jeffrey Lee (213) 6048 posts |
I’ve checked in a fix, so starting from tomorrow there’s now some parameter validation performed to check for cases like this.
Lack of parameter validation.
Implementation oversight.
Memory corruption, and/or bogus transfers on the IIC bus
Actually, since support for multiple IIC buses was implemented by overloading existing OS_IICOp parameters, bad stuff will happen in one form or another on all versions of RISC OS 5 if you try accessing a bus which doesn’t exist. Old versions which only supported one bus will think you’re trying to perform several million transactions. Chances are it will fail with an IIC error pretty soon after it runs past the end of the valid input, but that might be after it’s corrupted some memory or performed a bogus IIC transfer and corrupted your CMOS. I believe an alternative solution would be if there was a way of detecting what hardware you are running on. I’ve seen tests for certain CPUs etc but not a neat 0=RiscPC, 1=Iyonix, 2=Pi A, 3 Pi B, 4 Pi B rev 2 …. 6=Panda ES… or something like that. There are a few potential solutions to the problems you’re facing, depending on exactly what those problems are. Also I’m not entirely sure what solution ROOL would prefer (but past experience suggests a “0=RiscPC, 1=Iyonix”, etc. solution isn’t one of them)
This bug also highlights the problem that we don’t really have an official way for third-party code to determine how many IIC buses there are in a system. Arguably third-party software shouldn’t need to know (probing for devices on an IIC bus can be dangerous – there’s no way of knowing that you’ve found the right device, you only know whether your interrogation request has succeeded or not. You could confuse the EDID EPROM for the CMOS RAM, or you might unwittingly be bogus sending commands to a device which doesn’t use the same message protocol as your intended target), but if the choice is between code using the ‘for internal use only’ HAL_IICBuses call or using an official SWI then I think an official SWI would be better. That then raises the question of where we put the SWI – overriding OS_IICOp further isn’t a sane thing to do due to backwards compatibility issues, so as far as I can see that leaves us with the following choices:
At the moment I’m leaning towards #1, since it already reports assorted low-level information such as 82C71x features, IOEB/IOMD presence flags, etc. |
Rick Murray (539) 13840 posts |
Thank you for the fast fix and reply. ;-)
Ah. Well… poop.
OS_Hardware returns that. CPU type, board type, and board revision.
In my specific case, luckily not. But yet, I’m aware that there were some changes in the Pi’s IIC configuration.
The logical approach would be for a call to return a value indicating how many IIC buses were present and a bitmap indicating which ones (to cater for non-sequential ordering).
Ah, yes. That’s what I meant by the OS_Hardware call.
…because the information should be available from the OS. :-P
Actually, the device will exist at a specific address (which rules out the ~125 other possible devices). The first thing that I do is write ‘0’ to the device to inform it to begin reading at internal address +0. While this is not guaranteed, many of the IIC devices that I have used (from Teletext chips to ADCs) just “assume” the initial write specifies an offset, so in the rare case that this address is used by something else, it ought to be the same. Not a guarantee, fair enough, but it’s the best that can be done in lieu of any other way to do this. Following this, four bytes are read. If they come back as “CJE1” then the device has been positively identified. If they don’t, then I’ve probably just killed the GPU and pretty soon blood will start pouring out of the HDMI socket…
As long as the initial identification succeeds, then things will be okay as it is a very strong positive identification.
Don’t get me started on this. I reckon ReadSysInfo ought to be telling a lot more about the environment (stuff the OS ought to already know) in a simpler form than requiring OS_Hardware calls if/when GPIO is implemented); and OS_PlatformFeatures should maybe indicate stuff like “this machine has NEON” and the like. Sure, all this information is already available, either by calling the HAL or poking CP15; but given that the OS already needs to know more than a little about what it is actually running upon….sharing would be nice. ;-) Thanks again for the fix. |
Jeffrey Lee (213) 6048 posts |
VFPSupport_Features :-) (although it does require some manual decoding of the MVFR registers to check for the exact features you’re after – the downside of there being about 10 different theoretical VFP/NEON combinations) |
Sprow (202) 1158 posts |
OS_ReadSysInfo seems to fit the best as that’s where motherboard controller info lives. I always associate OS_PlatformFeatures with processor features, which IIC isn’t; OS_Hardware seems to be for tickling HAL functions/devices rather than getting numbers; and there aren’t many spare kernel SWIs left so probably best to keep them for something big. |
Chris Evans (457) 1614 posts |
Thanks Jeffrey and Rick quick work:-) |
Rick Murray (539) 13840 posts |
Thank Jeffrey. All I did was reliably break it. (^_^) |
Jeffrey Lee (213) 6048 posts |
And I was the one who added support for multiple buses in the first place. It’s the circle of life! |
Jeffrey Lee (213) 6048 posts |
I’ve now added OS_ReadSysInfo 14 as a way of getting the number of IIC buses – expect to see it in tomorrow’s ROMs. |
Rick Murray (539) 13840 posts |
Thanks. Are buses guaranteed to be contiguous, or is there a call I’ve missed to say which buses are valid? OMG, I could kill myself. I’ve just spent half an hour trying to track down why a module I’m working on is interacting strangely with my OLED clock/status ticker. I knocked out all of the test module startup code and it was still doing it. Then I tried dropping DADebug calls everywhere and saw all sorts of random SWIs being called. Then it hit me. I originally recycled the CMHG file from OLED and I never changed the SWI chunk ! Argh! Stupid! So I’ve set it to &59580 for now. [don’t worry, I’ll apply for a proper registration when the software does stuff] This does show up an interesting quirk of RISC OS in that the OLED ticker kept on going and the new module was having its lookup routine splattered with random gibberish (well, OLED strings but certainly not the data it would have expected). This implies that RISC OS can not only load two modules with the same SWI chunk, but it’ll call them both too. That’s either awesome or distressing, I’m really on the fence about it… |
Jeffrey Lee (213) 6048 posts |
Yes, buses are guaranteed to be contiguous. |
h0bby1 (2567) 480 posts |
aaaaa |