Puzzler, IOMD wont boot post Kernel 4.79.2.224
Sprow (202) 1158 posts |
Having been doing some testing on a random Risc PC (StrongARM, 64MB + 2MB) I find it doesn’t boot using the ROM image I wanted to test (that I built myself), yet the same ROM image works on RPCEmu. Puzzled by this I downloaded the overnight ROM from ROOL and that also didn’t boot. I enabled serial debugging in the HAL and tried again – it fails with a data abort at 200006F4 (which gets allocated to FileSwitch in a good boot situation). After a bit of binary chopping I found 4.79.2.224 is fine and 4.79.2.225 fails. Specifically, if I change the line that shoves all the memory into the application slot from LDR R0, =(1024*1024):SHR:12 back to LDR R0, =(AplWorkMaxSize-32*1024):SHR:12 it’s all happy again. The trouble is that looks innocuous so am now puzzled, does it get allocated from VRAM by mistake and that doesn’t work? I dunno. |
William Harden (2174) 244 posts |
What’s the value of AplWorkMaxSize? Is that definitely expressed in bytes (and not in K)? Only guess so far is that AplWorkMaxSize is set at, say, 1024 (rather than 1024 * 1024). Or of course otherwise that would read =((AplWorkMaxSize-32) * 1024):SHR:12 |
Jeffrey Lee (213) 6048 posts |
The latest autobuilt ROM boots fine for me (StrongARM, 32+2MB), so it doesn’t look like I’ll be much help with this one.
VRAM should be located at the start of the free pool (see InitDynamicAreas). Pages are claimed from the end of the pool, and the RMA is initialised (with a size of 4K) immediately after InitDynamicAreas is finished (i.e. before all the RAM gets shoved into application space), so in both the new and old kernels that should mean it’s a DRAM page (most likely a high-numbered page) which gets used for the first page of RMA. I don’t think anything will attempt to reclaim that page, either (I can’t think of anything in a standard IOMD build which will request specific pages for DAs). Silly question time: Is that address the aborting address, or the address of the PC? For me that address looks like it’s been taken up with a block of patched in there by the podule manager (or the podule loader? I saw a string relating to my unipod, that’s for certain).
Yes, it’s definitely bytes (512MB) |
Sprow (202) 1158 posts |
It was the kernel’s default “DataAbort:Abort on data transfer at &200006F4 (error &80000001)” untranslated because MessageTrans hadn’t started, and I couldn’t type anything of course. Podules is an interesting angle, I’ll have to try some different configurations when my desk next has spare space (now set up for something else!) – it only had RiscTV and EtherB in use, pretty sure neither of those use much RMA or any application slot. Was it just for correctness sake that it used 512MB-32k, rather than lazy 512MB, ie. should it be 1MB-32k? |
Jeffrey Lee (213) 6048 posts |
IIRC it just bungs that many pages into the memory map, without any consideration for overwriting what comes next. So if you ask it to add too much (and you have that much RAM in the system) it’ll overwrite the start of the RMA. Asking it for less than the maximum shouldn’t cause any issues (apart from this one, it seems!) |
Jeffrey Lee (213) 6048 posts |
Hmm… bad (or buggy) podule loader (or whatever) assuming that there’s oodles of RAM in application space and blindly writing >1MB of data into it? It’d certainly be a better outcome than it turning out to be some horrible memory management bug that vanishes as soon as you start tweaking things to try and debug it. |
Sprow (202) 1158 posts |
Hmm… bad (or buggy) podule loader (or whatever) assuming that there’s oodles of RAM in application space and blindly writing >1MB of data into it?it fails with a data abort at 200006F4 Yep – that was the problem. When I removed the RiscTV podule it was happy again, it looks like when I 32 bitted the loader I’d noted the 26 bit loader corrupted R11 (by ANDing it to mask out the CMOS bits) but when I fixed it to AND the correct number of bits I’d changed it to a shift. So although the old loader was buggy it at least didn’t keep shifting the address another 10 bits every time it got called! On loaders the PRM says R10 can be corrupted, so I was able to just desolder the OTP ROM and overprogram 2 bits to change R11→R10. Now all booting fine, thanks. |