Allwinner H3
Tristan M. (2946) 1039 posts |
I finished the more comprehensive Pre-MMU interrupt killing code. It’s just the fist part of the Interrupts.s init code with a couple of very minor alterations so it gets the information it needs. It makes no difference to the execution getting stuck. |
Tristan M. (2946) 1039 posts |
Although I shouldn’t be implementing anything else, I couldn’t help myself. I implemented the watchdog. Took me a little while to work out where it went though. My boot.s didn’t have an entry in the HAL for it. e: Pushed what I have. The following is mostly a note to myself but with external pressure added! TODO:
|
André Timmermans (100) 655 posts |
I am not familiar with all this low level stuff, but am I correct in thinking that the timer is setup to have a counter with a period of 1/100 of a second and that when the counter reaches 0 it generates an interrupt that is used to increment the OS monotonic timer and reset the timer’s counter? In that’s the case and the timer’s counter is counting down, then either the interrupt is not generated or not correctly handled. |
Tristan M. (2946) 1039 posts |
André, you said the right things. I realised I made a couple of assumptions: I have no idea if either of these are correct. Both would stop the timer from functioning. The watchdog I implemented today which I believe to be at least somewhat functional clears the interrupt with the controller manually as per documentation. I never enabled the interrupt on the GIC though. Only the watchdog. It could well be an NMI though because it’s not something that should be ignored. |
Michael Grunditz (467) 531 posts |
The kernel also calls clear interrupt in timer hal. Try to stop the timer from basic (set period to 0) then clear th interrupt . Start the timer again and check the interrupt. |
Jeffrey Lee (213) 6048 posts |
The OS is completely ignorant of the SCU. If something’s printing out its address, it’ll either be your code, or some remnants of u-boot!
Easy to test by having the code print out the value of the MPIDR. You can also try putting in a dummy SEV at the start of the HAL, just to make sure that (if there are other sleeping cores) they’ll get woken up immediately instead of at some random point in the future.
Yes, they’re zero-indexed.
Apart from CP15, the HAL/OS shouldn’t talk to any copros. Or at least, not until FPEmulator/VFPSupport initialise (and if it’s a FP instruction that’s causing the abort, it should be pretty obvious).
Too bad that the base OS doesn’t use the watchdog :-) For debugging the timer code, you’ll want to look at this code which sets up the timer, this code which enables the interrupt, and this code which services the interrupt (and increments the TIME / OS_ReadMonotonicTime value). As Michael says, it’s also advisable to double-check that the interrupt code you’re using isn’t remapping the interrupt numbers in some unexpected way. |
Tristan M. (2946) 1039 posts |
Oh :(
CP15. Although I haven’t been able to prove it entirely. It does seem to be the source of an undefined instruction abort sometimes which causes a loop in my code, or a lockup in the early RO code.
It was in some of the code I used as a base. I left it because it’s not hurting anything. Those are good ideas regarding core catching. I mean I may well be wrong, but a piece of code being executed twice on one occasion that has no means of doing so is extremely suspicious. I found a bug in the timer code and fixed it. Didn’t help. Haven’t pushed most recent changes to GitHub yet. I’ll have another look at Interrupts.s |
Tristan M. (2946) 1039 posts |
I went through s.Interrupts. It looks fine, However I did pick up on a simple yet severe bug in hdr.StaticWS. I also found some other minor bugs. Unfortunately I’m back to the init jamming solid after “HAL_CleanerSpace” is printed to UART. |
Tristan M. (2946) 1039 posts |
I’ve been sprinkling some debug text in the kernel. It’s actually getting stuck somewhere in ROM decompression. This I have no idea how to deal with. |
Jeffrey Lee (213) 6048 posts |
The OS should only access standard CP15 registers, so the only thing I can think of that would cause an undefined instruction would be if the PSR has become corrupt and the CPU has dropped into user mode. If you can get the abort handler to print out the SPSR then that might help narrow down the problem (another possibility is that the CPU’s dropped into Thumb mode, but then I’d expect the crash to occur somewhere else rather than only on CP15 instructions)
I did have a quick look, but obviously not close enough to spot that problem.
Yeah, it’s highly unlikely that HAL_CleanerSpace is causing any problems. The memory it points to will only be used on StrongARM CPUs. After the call to HAL_CleanerSpace the OS will map in a bunch of other memory locations, so chances are it’s some memory corruption that’s causing one of those to fail. |
Jeffrey Lee (213) 6048 posts |
The build system doesn’t compress ROMs by default (it’s an extra step that must be done manually at the end), so unless you’re explicitly compressing your ROM that suggests that something is corrupting the OS image header that’s located in the kernel. Also I would have expected you to see a “Decompressing ROM” message prior to the hang. |
Tristan M. (2946) 1039 posts |
Thanks for the tip. I fiddled around and went further with debug messages. I got as far as the L2PTHack section before I ended up breaking HAL.s so badly it instantly freaks out with a data abort on boot. After I remove some debug messages I’ll try again. If I recall last time I tried disabling L2PTHack it broke booting, but I’ll give it another shot. |
Michael Grunditz (467) 531 posts |
From my experience RO porting is best done without much hacking. The basic OS didn’t prove much problems. The real issues comes from drivers. |
Tristan M. (2946) 1039 posts |
Michael. I created a build environment for a stripped version of the H3 HAL, copied my work over and stripped most of it out. This next part is for whoever but really more directed at Jeffrey. My port is loaded via U-boot so it is already a RAM copy. Even if I were to load it via an earlier boot stage it would still be in RAM to be executed. Because of the nature of U-boot, it has to go through and turn off some things, including the MMU before it can do much. It is loaded into the same space Linux would be loaded to, which avoids bootloader code and variables, and the pre-allocated video memory up near the 1GB area. These SoCs come with many different amounts of RAM installed too, so the mapping will always be different. Once the MMU is turned on RO will always be up in the 0xFFFwhatever range anyway. So does it really need to be copied? It doesn’t seem to be contrary to specification either, or is it? |
Jeffrey Lee (213) 6048 posts |
No. The reasons for relocating the ROM image are one or more of the following:
4 is mentioned here. 5 is mentioned here. The others may just be unofficial rules I’ve just invented myself. |
Tristan M. (2946) 1039 posts |
I crudely disabled copying for now with a branch over it. Mostly just to rule out the possibility that I’m plopping the copied image on to something it shouldn’t be for now. Still can’t get past that InitDynamicAreas thing. I’m grabbing a new source tarball to see if there are any behavioural changes. It wouldn’t be the first time that’s happened. Today I found U boot documentation which says the Allwinner H5 boots in aarch32 mode, but uses a small shim to get into aarch64 mode, which is what the U-boot runs as. This is both good and bad news. If there’s an aarch32 version that’s a really good thing because it really expands the platforms including to things like the PineBook |
Tristan M. (2946) 1039 posts |
I got the timer output from the below code.
|
Jeffrey Lee (213) 6048 posts |
e447a3f2 looks like an instruction. Not sure why yet. Maybe try getting Timer_Init to print out the TIMER_Log value to make sure nothing’s corrupted it after it was initialised? There is the obvious rookie mistake that (in Timer_Init) you’re not pushing LR onto the stack before using the debug macros (which assume that LR is corruptible). So after printing out “Timer 0 value now …” it’ll loop forever, popping things off of the stack (until it runs off the end of memory and crashes, at least). Looking at boot.s, your handling of preservation of LR seems to be ass-backwards in general. The convention is to preserve it at the start of each function (the Entry macro will do this for you), rather than saving & restoring it around each individual function call. That way you can use LR as an extra temporary register within the function body (you just need to remember that any function call will clobber it). |
Tristan M. (2946) 1039 posts |
Jeffrey, As always you are right. TBQH I wasn’t sure about how exactly the Debug macros actually functioned. Of course that’s why the Debug macros wasn’t preserving lr. Because it’s doing it in the beginning of the function. Well, you know I’m still learning :) Here’s the weird thing. Same build. I booted it again and got sane timer values both times. 7350 and 50d4. Looks about right. It’s these weird abhorrations that have been causing me hell. Thanks for picking up on one I introduced when switching from calling a debug function to the macros, and the way I push and pop lr. Also i know it complains about UAL syntax, but PUSH and POP is an ancient, ingrained thing. I can’t even remember where from. e: I meant aberration. But you know, that can stay. I feel it is more correct in spirit. Proofreading. What’s that? |
Jeffrey Lee (213) 6048 posts |
If you want less UAL warnings, you can use the “Push” and “Pull” macros (which are essentially the same as the PUSH and POP instructions, except you provide the register list in quotes like Entry instead of curly braces). |
Tristan M. (2946) 1039 posts |
I’ve been using PUSH and POP because I “get” them. There seem to be some other syntactic differences between them and Push and Pull that I’m not sure of as yet. Generally I try to treat warnings as errors, but I’ve let it slide for now because it’s not breaking anything. e: So: Is that correct? |
Jeffrey Lee (213) 6048 posts |
I think the only complexity between PUSH/POP vs. Push/Pull is if you want to make them conditional – due to the age of the Push/Pull macros they require you to specify the condition code after the register list, e.g.: POPNE {a1-a3} -> Pull "a1-a3",NE (Newer versions of objasm allow you to specify the condition code, or any other suffix, directly on the macro name, providing the macro has been set up to support it) For your example, EXIT would be Pull “a1-a3,pc”. There’s no need to split it into two instructions. |
Tristan M. (2946) 1039 posts |
I see. Thanks! After your advice I spent a few minutes last night fixing things. It booted. Unfortunately there was a sebug printout in HAL_TimerIRQClear, which is actually being called now! Repeatedly of course. In BASIC, PRINT TIME isn’t working right, but at least it isn’t showing 0. >PRINT TIME Internal error: abort on instruction fetch at &FDA754E4 |
Tristan M. (2946) 1039 posts |
In a bid to deal with the stability issues I changed power supplies and swapped in a capacitor that I use on the IO pins of the various *Pi’s to see if it would aid stability. It also got the USB tester thing plugged in to the power. No sign of anything. I contacted someone who has done work with the Orange Pi SBCs. It seems the only time he encountered stability issues was with more than one core active. No, I haven’t finished the code for checking that yet. I’ve been busy. Had to make a new header for the CPU control registers first anyway. Suspecting that the above timer retrieval issue was part of the weird boot stability issues I persevered with a power cycle / loading spree until it completed booting again. Tried PRINT TIME again and it works fine. Still the same build too. I’m not going to try another build until the CPU core state readouts and the new system reset code is complete. What’s been bugging me is if the system does manage to complete init and gets to the SVC prompt it’s stable. I’ve left it running for about a day and it still worked. If it doesn’t get that far there always seems to be some kind of corruption. The rogue CPU core theory is a longshot but I can’ think of much else. From what I’ve read, the other cores should be asleep. I believe the ROM and U-boot make sure of this. |
Tristan M. (2946) 1039 posts |
I just implemented a hardware “fix” for boot issues that worked with the H5. Doesn’t work for the H3 :( It involved swapping out the USB <→ logic level serial adaptor with an RS232<→ logic level adaptor, and a USB <→ RS232 adaptor for the Pi. In the case of the H5 I believe there is some current leakage from Rx on the USB adaptor which causes an unclean powerup. Didn’t affect the H3 unfortunately. Just confirmed that cores 1-3 aren’t in WFI or WFE mode. Not sure how to detect if they are still powered off though. The timer is still happily ticking away. Left the previous build running for over a day again without any issue. I’m thinking the 64 bit counter might be used for the system counter. |