Allwinner H3

198 posts, 14 voices

Pages: 1 2 3 4 5 6 7 8

Apr 2, 2018 2:04am Tristan M. (2946) 1039 posts	I finished the more comprehensive Pre-MMU interrupt killing code. It’s just the fist part of the Interrupts.s init code with a couple of very minor alterations so it gets the information it needs. It makes no difference to the execution getting stuck.

Apr 2, 2018 6:45am Tristan M. (2946) 1039 posts	Although I shouldn’t be implementing anything else, I couldn’t help myself. I implemented the watchdog. Took me a little while to work out where it went though. My boot.s didn’t have an entry in the HAL for it. First boot where I actually remembered to include everything that was needed it booted into the SVC prompt! Apparently I’d managed to include the NVRAM module somehow. A little perplexed about that. I’ll have to check the files. Anyway it was interesting to see the boot process stall for a fraction of a second then continue. I tried out PRINT TIME in BASIC. It still says 0. Why??? I can see the boot code initialising timer 0 and setting the period. I can see that the timer is counting down from the debug code. So frustrating! e: Pushed what I have. The following is mostly a note to myself but with external pressure added! TODO: Fix where watchdog data structure specifies reset. I don’t want it to have that right now because configured behaviour is an interrupt trigger. Remove NVRAM module. How useless! Deal with new ‘UK:Messages’ not found messages. Probably to do with kernel Options.hdr config currently. Find why RO isn’t incrementing it’s timer! Change behaviour of Pre-MMU undefined instruction abort to retry. Intermittent Abort seems to be from copro not being quite ready yet.

Apr 2, 2018 9:25am André Timmermans (100) 655 posts	I am not familiar with all this low level stuff, but am I correct in thinking that the timer is setup to have a counter with a period of 1/100 of a second and that when the counter reaches 0 it generates an interrupt that is used to increment the OS monotonic timer and reset the timer’s counter? In that’s the case and the timer’s counter is counting down, then either the interrupt is not generated or not correctly handled.

Apr 2, 2018 12:14pm Tristan M. (2946) 1039 posts	André, you said the right things. I realised I made a couple of assumptions: The timer IRQ gets enabled by the kernel after (I assume) it requests the IRQ of the timer. The kernel clears the interrupt in the GIC after requesting the timer IRQ is cleared. I have no idea if either of these are correct. Both would stop the timer from functioning. The watchdog I implemented today which I believe to be at least somewhat functional clears the interrupt with the controller manually as per documentation. I never enabled the interrupt on the GIC though. Only the watchdog. It could well be an NMI though because it’s not something that should be ignored.

Apr 2, 2018 3:48pm Michael Grunditz (467) 531 posts	The kernel also calls clear interrupt in timer hal. Try to stop the timer from basic (set period to 0) then clear th interrupt . Start the timer again and check the interrupt.

Apr 3, 2018 12:49pm Jeffrey Lee (213) 6048 posts	That reminds me. I think it’s during OS_InitARM that the physical address of the SCU is printed to UART. I’m guessing that the kernel is determining this address somehow, somewhere in its debug code and printing it. The OS is completely ignorant of the SCU. If something’s printing out its address, it’ll either be your code, or some remnants of u-boot! What if all the cores are awake and U-boot just has all but one spinning. Or perhaps early code, maybe even InitARM wakes them up. Then what? It’d be chaos! Easy to test by having the code print out the value of the MPIDR. You can also try putting in a dummy SEV at the start of the HAL, just to make sure that (if there are other sleeping cores) they’ll get woken up immediately instead of at some random point in the future. I just want to check something. Things like the timers and UARTs are zero indexed with RO, right? Yes, they’re zero-indexed. Intermittent Abort seems to be from copro not being quite ready yet. Apart from CP15, the HAL/OS shouldn’t talk to any copros. Or at least, not until FPEmulator/VFPSupport initialise (and if it’s a FP instruction that’s causing the abort, it should be pretty obvious). I implemented the watchdog. Too bad that the base OS doesn’t use the watchdog :-) For debugging the timer code, you’ll want to look at this code which sets up the timer, this code which enables the interrupt, and this code which services the interrupt (and increments the TIME / OS_ReadMonotonicTime value). As Michael says, it’s also advisable to double-check that the interrupt code you’re using isn’t remapping the interrupt numbers in some unexpected way.

Apr 3, 2018 9:01pm Tristan M. (2946) 1039 posts	Too bad that the base OS doesn’t use the watchdog :-) Oh :( Apart from CP15, the HAL/OS shouldn’t talk to any copros. CP15. Although I haven’t been able to prove it entirely. It does seem to be the source of an undefined instruction abort sometimes which causes a loop in my code, or a lockup in the early RO code. The OS is completely ignorant of the SCU. If something’s printing out its address, it’ll either be your code, or some remnants of u-boot! It was in some of the code I used as a base. I left it because it’s not hurting anything. Those are good ideas regarding core catching. I mean I may well be wrong, but a piece of code being executed twice on one occasion that has no means of doing so is extremely suspicious. I found a bug in the timer code and fixed it. Didn’t help. Haven’t pushed most recent changes to GitHub yet. I’ll have another look at Interrupts.s

Apr 4, 2018 3:10am Tristan M. (2946) 1039 posts	I went through s.Interrupts. It looks fine, However I did pick up on a simple yet severe bug in hdr.StaticWS. Jeffrey, I’m not sure if you’ve looked at that file but I use quite a few zero length entries to add compatibility between borrowed Interrups.s versions. I slipped up and the result was that the interrupt code would have been reading / writing to who knows where. I also found some other minor bugs. Unfortunately I’m back to the init jamming solid after “HAL_CleanerSpace” is printed to UART. I’ve tried that as a NullEntry, as a real stub returning -1, and even pointed it to the SRAM. I’m pretty sure it’s just something else failing concurrently but it can’t hurt to try ruling it out.

Apr 4, 2018 9:43am Tristan M. (2946) 1039 posts	I’ve been sprinkling some debug text in the kernel. It’s actually getting stuck somewhere in ROM decompression. This I have no idea how to deal with.

Apr 4, 2018 9:44am Jeffrey Lee (213) 6048 posts	CP15. Although I haven’t been able to prove it entirely. It does seem to be the source of an undefined instruction abort sometimes which causes a loop in my code, or a lockup in the early RO code. The OS should only access standard CP15 registers, so the only thing I can think of that would cause an undefined instruction would be if the PSR has become corrupt and the CPU has dropped into user mode. If you can get the abort handler to print out the SPSR then that might help narrow down the problem (another possibility is that the CPU’s dropped into Thumb mode, but then I’d expect the crash to occur somewhere else rather than only on CP15 instructions) However I did pick up on a simple yet severe bug in hdr.StaticWS. Jeffrey, I’m not sure if you’ve looked at that file … I did have a quick look, but obviously not close enough to spot that problem. I’m pretty sure it’s just something else failing concurrently but it can’t hurt to try ruling it out. Yeah, it’s highly unlikely that HAL_CleanerSpace is causing any problems. The memory it points to will only be used on StrongARM CPUs. After the call to HAL_CleanerSpace the OS will map in a bunch of other memory locations, so chances are it’s some memory corruption that’s causing one of those to fail.

Apr 4, 2018 9:51am Jeffrey Lee (213) 6048 posts	I’ve been sprinkling some debug text in the kernel. It’s actually getting stuck somewhere in ROM decompression. The build system doesn’t compress ROMs by default (it’s an extra step that must be done manually at the end), so unless you’re explicitly compressing your ROM that suggests that something is corrupting the OS image header that’s located in the kernel. Also I would have expected you to see a “Decompressing ROM” message prior to the hang.

Apr 4, 2018 10:59am Tristan M. (2946) 1039 posts	Thanks for the tip. I fiddled around and went further with debug messages. I got as far as the L2PTHack section before I ended up breaking HAL.s so badly it instantly freaks out with a data abort on boot. After I remove some debug messages I’ll try again. If I recall last time I tried disabling L2PTHack it broke booting, but I’ll give it another shot.

Apr 4, 2018 11:10am Michael Grunditz (467) 531 posts	From my experience RO porting is best done without much hacking. The basic OS didn’t prove much problems. The real issues comes from drivers. so keep HAL.s.Top as simple as possible. I skipped relocation. Do absolutely no HAL init calls from s.boot. By this method you can ensure that OS works. When you , and you will, have a stable bootup to basic, you can start to deal with Timers and Interrupts.

Apr 5, 2018 2:45am Tristan M. (2946) 1039 posts	Michael. I created a build environment for a stripped version of the H3 HAL, copied my work over and stripped most of it out. After fighting with it for a while I was reminded how hard it was to get anywhere with this platform in early boot. On the positive side, I did come up with some more coprocessor configuration, and found a minor bug in what I already had, which I copied back to my “Full” HAL. So far booting has been completely consistent. Unfortunately it still stops with the last output at HAL_CleanerSpace. This morning I deleted the kernel and replaced it with a fresh copy from my downloaded tarball in case I had broken something. Besides losing all the debugging messages I put in, it didn’t help. This next part is for whoever but really more directed at Jeffrey. I realise it’s in the RO spec to make a RAM copy of the ROM image because it’s possible it is in ROM. This still bothers me. My code does it, but it seems completely pointless and inviting problems. My port is loaded via U-boot so it is already a RAM copy. Even if I were to load it via an earlier boot stage it would still be in RAM to be executed. Because of the nature of U-boot, it has to go through and turn off some things, including the MMU before it can do much. It is loaded into the same space Linux would be loaded to, which avoids bootloader code and variables, and the pre-allocated video memory up near the 1GB area. These SoCs come with many different amounts of RAM installed too, so the mapping will always be different. Once the MMU is turned on RO will always be up in the 0xFFFwhatever range anyway. So does it really need to be copied? It doesn’t seem to be contrary to specification either, or is it?

Apr 5, 2018 8:54am Jeffrey Lee (213) 6048 posts	So does it really need to be copied? No. The reasons for relocating the ROM image are one or more of the following: The ROM is in a physical ROM chip, but the ROM chip is slow enough to be a major performance bottleneck (e.g. Kinetic RiscPC, where the RAM on the Kinetic card is miles faster than the ROM chip on the motherboard) To avoid fragmenting the physical memory map – although on machines with >=512MB of RAM this isn’t going to be much of an issue, compared to e.g. the original 128MB BeagleBoards To try and ensure consistent behaviour (by making sure the ROM is always at the same location). Mainly this is talking about when the softload tool is used, since that will load the ROM at a different location to where the current ROM is. Efficiency of memory mapping – if the HAL+OS blob is located on a 1MB boundary then the kernel can map it using 1MB section mappings in the page tables, which will give a tiny performance boost compared to lesser alignments (and if the HAL+OS isn’t even 4KB aligned then the OS will fail to boot) To support compressed ROMs – if the ROM is a physical ROM, or it’s been loaded to the high end of RAM, then the HAL will have to relocate the ROM image so that there’s enough space for the ROM to be expanded. 4 is mentioned here. 5 is mentioned here. The others may just be unofficial rules I’ve just invented myself.

Apr 6, 2018 12:28am Tristan M. (2946) 1039 posts	I crudely disabled copying for now with a branch over it. Mostly just to rule out the possibility that I’m plopping the copied image on to something it shouldn’t be for now. Still can’t get past that InitDynamicAreas thing. I’m grabbing a new source tarball to see if there are any behavioural changes. It wouldn’t be the first time that’s happened. Today I found U boot documentation which says the Allwinner H5 boots in aarch32 mode, but uses a small shim to get into aarch64 mode, which is what the U-boot runs as. This is both good and bad news. If there’s an aarch32 version that’s a really good thing because it really expands the platforms including to things like the PineBook

Apr 7, 2018 3:36am Tristan M. (2946) 1039 posts	I got the timer output from the below code. Do I need to put a memory barrier in somewhere, or is my SoC haunted? Timer 0 current value could not possibly be the reported value. What have I missed? It’s never shown a value like that before. https://github.com/experimentech/RISC_OS_AWH3/blob/master/AWH3Dev/Castle/RiscOS/Sources/HAL/AWH3/s/Timers%2Cfff `=> go 0×42000000` `Starting application at 0×42000000 … CallOSM OS_InitARM complete 01c80000 About to add RAM RAM added IO_BaseAddr= f8d00000 Workspace pointer sb= fa0001e8 UARTs mapped in SCU_BaseAddr= f9980000 TIMER_Log= f9920c00 GIC, SCU and TIMER added to StaticWS Going into Interrupt_Init Out of Interrupt_Init About to init timer Timer 0 current value= e447a3f2 Timer 0 value now = 000050d5`

Apr 7, 2018 9:14am Jeffrey Lee (213) 6048 posts	e447a3f2 looks like an instruction. Not sure why yet. Maybe try getting Timer_Init to print out the TIMER_Log value to make sure nothing’s corrupted it after it was initialised? There is the obvious rookie mistake that (in Timer_Init) you’re not pushing LR onto the stack before using the debug macros (which assume that LR is corruptible). So after printing out “Timer 0 value now …” it’ll loop forever, popping things off of the stack (until it runs off the end of memory and crashes, at least). Looking at boot.s, your handling of preservation of LR seems to be ass-backwards in general. The convention is to preserve it at the start of each function (the Entry macro will do this for you), rather than saving & restoring it around each individual function call. That way you can use LR as an extra temporary register within the function body (you just need to remember that any function call will clobber it).

Apr 7, 2018 9:22am Tristan M. (2946) 1039 posts	Jeffrey, As always you are right. TBQH I wasn’t sure about how exactly the Debug macros actually functioned. Of course that’s why the Debug macros wasn’t preserving lr. Because it’s doing it in the beginning of the function. Well, you know I’m still learning :) Here’s the weird thing. Same build. I booted it again and got sane timer values both times. 7350 and 50d4. Looks about right. It’s these weird abhorrations that have been causing me hell. Thanks for picking up on one I introduced when switching from calling a debug function to the macros, and the way I push and pop lr. Also i know it complains about UAL syntax, but PUSH and POP is an ancient, ingrained thing. I can’t even remember where from. e: I meant aberration. But you know, that can stay. I feel it is more correct in spirit. Proofreading. What’s that?

Apr 7, 2018 9:47am Jeffrey Lee (213) 6048 posts	If you want less UAL warnings, you can use the “Push” and “Pull” macros (which are essentially the same as the PUSH and POP instructions, except you provide the register list in quotes like Entry instead of curly braces).

Apr 7, 2018 12:47pm Tristan M. (2946) 1039 posts	I’ve been using PUSH and POP because I “get” them. There seem to be some other syntactic differences between them and Push and Pull that I’m not sure of as yet. Generally I try to treat warnings as errors, but I’ve let it slide for now because it’s not breaking anything. e: So: Entry “a1-a3” … EXIT is equivalent to Push “a1-a3, lr” … Pull “a1-a3, lr” mov pc, lr Is that correct?

Apr 7, 2018 2:25pm Jeffrey Lee (213) 6048 posts	There seem to be some other syntactic differences between them and Push and Pull that I’m not sure of as yet. I think the only complexity between PUSH/POP vs. Push/Pull is if you want to make them conditional – due to the age of the Push/Pull macros they require you to specify the condition code after the register list, e.g.: POPNE {a1-a3} -> Pull "a1-a3",NE (Newer versions of objasm allow you to specify the condition code, or any other suffix, directly on the macro name, providing the macro has been set up to support it) For your example, EXIT would be Pull “a1-a3,pc”. There’s no need to split it into two instructions.

Apr 8, 2018 1:09am Tristan M. (2946) 1039 posts	I see. Thanks! After your advice I spent a few minutes last night fixing things. It booted. Unfortunately there was a sebug printout in HAL_TimerIRQClear, which is actually being called now! Repeatedly of course. Today I removed the text and did another build. After a failed first boot where it choked on a very early coprossesor instruction, the second one booted straight into the SVC CLI. In BASIC, PRINT TIME isn’t working right, but at least it isn’t showing 0. >PRINT TIME Internal error: abort on instruction fetch at &FDA754E4

Apr 9, 2018 2:23am Tristan M. (2946) 1039 posts	In a bid to deal with the stability issues I changed power supplies and swapped in a capacitor that I use on the IO pins of the various *Pi’s to see if it would aid stability. It also got the USB tester thing plugged in to the power. No sign of anything. I contacted someone who has done work with the Orange Pi SBCs. It seems the only time he encountered stability issues was with more than one core active. No, I haven’t finished the code for checking that yet. I’ve been busy. Had to make a new header for the CPU control registers first anyway. Suspecting that the above timer retrieval issue was part of the weird boot stability issues I persevered with a power cycle / loading spree until it completed booting again. Tried PRINT TIME again and it works fine. Still the same build too. I’m not going to try another build until the CPU core state readouts and the new system reset code is complete. What’s been bugging me is if the system does manage to complete init and gets to the SVC prompt it’s stable. I’ve left it running for about a day and it still worked. If it doesn’t get that far there always seems to be some kind of corruption. The rogue CPU core theory is a longshot but I can’ think of much else. From what I’ve read, the other cores should be asleep. I believe the ROM and U-boot make sure of this.

Apr 10, 2018 2:09am Tristan M. (2946) 1039 posts	I just implemented a hardware “fix” for boot issues that worked with the H5. Doesn’t work for the H3 :( It involved swapping out the USB <→ logic level serial adaptor with an RS232<→ logic level adaptor, and a USB <→ RS232 adaptor for the Pi. In the case of the H5 I believe there is some current leakage from Rx on the USB adaptor which causes an unclean powerup. Didn’t affect the H3 unfortunately. Just confirmed that cores 1-3 aren’t in WFI or WFE mode. Not sure how to detect if they are still powered off though. The timer is still happily ticking away. Left the previous build running for over a day again without any issue. I’m thinking the 64 bit counter might be used for the system counter.

Pages: 1 2 3 4 5 6 7 8

Reply

To post replies, please first log in.

Forums → Porting RISC OS →

Allwinner H3

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options