kernel error with BASIC program
Terry Swanborough (455) 53 posts |
I have been reading bytes from the serial port using the SYS “OS_Hardware” The strange thing is I have now REMed out the serial port commands see I have been downloading the latest roms as they become available and the I have looked at the program and cant see any errors so I dont think I I have had to use the # list command to get the program to list OK 10ON ERROR PROCset_baud(115200):PRINT ERL:PRINT REPORT$:END 21DIM status% 1 30Port%=0 40HAL_UARTLineStatus=71 50HAL_UARTReceiveByte=69 60HAL_UARTRate=73 70 80PROCset_baud(300) 90 100char%=FNPROCReceiveSerial 110PRINT char% 120GOTO 100 150DEFPROCset_baud(rate%) 160REM set baud rate 170SYS "OS_Hardware",Port%,rate%*16,,,,,,,0,HAL_UARTRate 180REM ------------- 190ENDPROC 240DEFFNPROCReceiveSerial 250LOCAL linestatus 260REM SYS "OS_Hardware",Port%,status%,,,,,,,0,HAL_UARTLineStatus TO linestatus 270IF (linestatus AND 1) =1 THEN 280REM SYS "OS_Hardware",Port%,status%,,,,,,,0,HAL_UARTReceiveByte TO c% 290ELSE 300c%=-1 310ENDIF 320=c% *where |
Jeffrey Lee (213) 6048 posts |
If you place the code in <pre> tags then it should work. Were you running the program from the command line, a task window, or just double-clicking a BASIC file? I’ll start it running tonight so I can look into the bug. |
Terry Swanborough (455) 53 posts |
I am running the code just by double clicking the basic file from the desktop |
Jeffrey Lee (213) 6048 posts |
How long does it usually take to crash? My board has been running that program for a week now and is still going strong. Maybe I should be using one of the downloadable ROM images instead of one I’ve built myself. |
Terry Swanborough (455) 53 posts |
I ran the code again, last week at work and it ran all week, but it failed during the weekend with the same kernel error. Strangely it seems to alway crash over the weekend ? , I can’t see anything else hardware wise causing the problem as there is a lot less going on in work during the weekend , also the desktop always recovers after the crash. I am interested in using RISC OS for a commercial project at work replacing our current range of radio nurse call systems with a more power version. So I am interested in tracking this problem down as our products run 24 7 once they leave us. I can’t believe that the clock has anything to do with this error but I always set it to the correct time before I run the program, what I will do next is change the clock so that the weekend occurs during the week :-) you can see I am now grabbing at straws. I think the rom I am using I downloaded about a month ago is there a way of identifying the rom? |
Terry Swanborough (455) 53 posts |
I also have an original beagleboard non Xm and I can’t remember ever seeing |
Jeffrey Lee (213) 6048 posts |
For recent ROMs (any built since the 2nd of August), *FX 0 will show the ROM build date. If you can let me know when your ROM was built then I can probably work out where in the kernel it crashed just by using the register dump that you’ve given. Hopefully that will be enough to allow us to work out what the problem is. |
Terry Swanborough (455) 53 posts |
*FX0 returns (29th sep 2011) |
Jeffrey Lee (213) 6048 posts |
OK, I’ll have a look tonight and let you know how I get on. |
Jeffrey Lee (213) 6048 posts |
I’m surprised I didn’t spot this earlier, but it looks like it’s crashing due to a supervisor stack overflow. From looking at your register dump I can see that it’s crashing inside OS_ReadVarVal, trying to lookup FileSwitch$CurrentFilingSystem (pointed to by R0). But that doesn’t really help us work out what’s causing the stack overflow. The next time it crashes, if you could save a copy of the supervisor stack using “*save svcstack fa200000 + 8000” and then upload it somewhere or email it to me then that should allow us to work out what the problem is. A copy of the output of *modules would be useful too. |
Steve Revill (20) 1361 posts |
Don’t forget the DebugTools module and its *Where command… (assuming that works on ARMv7) |
Terry Swanborough (455) 53 posts |
Next time I get it to crash I will collect as much |
Terry Swanborough (455) 53 posts |
Just a quick update, it did crash again |
Terry Swanborough (455) 53 posts |
Another quick update, I have tried a new Xm type board *showregs |
Jeffrey Lee (213) 6048 posts |
I completely forgot about this :-( I’ll look into it tonight for you. |
Jeffrey Lee (213) 6048 posts |
I’m still working on unwinding the stack, but it looks like the stack overflow is caused by the Internet & DHCP modules getting into a state where they keep sending service calls to one another. Specifically it looks like the modules are trying to cope with an IP address change, which suggests that the problem could be triggered by the machines DHCP lease expiring. Do you know how long your DHCP server leases IP addresses for? (*DHCPInfo should tell you when the lease was obtained, and when it’s due to expire). Or is there anything which could be screwing with your network over the weekend? When I was doing the testing I was running with a static IP setup, but I’ve switched to using DHCP now to see if that triggers the bug. My router seems to use a 24 hour lease time, so sometime tomorrow I should know whether the DHCP lease expiring is enough to trigger the bug. |
Terry Swanborough (455) 53 posts |
I think you are on the right track, over the weekend our Internet connection is |
Steffen Huber (91) 1953 posts |
I am no DHCP expert – what does the DHCP protocol demand if a client wants to renew its lease and the original address provider (DHCP server) is no longer available? I could imagine that this scenario was possibly not tested properly with the RISC OS Internet and DHCP module. |
Mark Scholes (148) 2 posts |
According to the spec: http://tools.ietf.org/html/rfc2131#section-4.4.5 “If the lease expires before the client receives a DHCPACK, the client It first tries to renew with the current dhcp server, then tries to rebind with any dhcp server, then should go back to whatever it does with no dhcp server |
Jeffrey Lee (213) 6048 posts |
I’m still not sure how it gets stuck in the loop, but I’ve now got a fairly good idea of why it can’t get out of it:
So, there are a few points of failure here:
To fix the crash, I’ll probably go with fixing point 2 in the list above. But instead of crashing, you’ll now just end up with no DHCP. So we also need to work out why the DHCP module thinks the interface needs to be released. Plus there’s the question of why sending the DHCP release packet always fails. I’ll have another go at recreating the crash here, but if that fails I think I might have to send you a debug version of the DHCP module so we can see what the root cause of the problem is. |
Terry Swanborough (455) 53 posts |
Just for confirmation I ran the beagle board over the weekend with the internet TCP/IP protocol switched off and everything was still running OK on Monday morning. |