zero pain (finding the function in C)

54 posts, 12 voices

Pages: 1 2 3

Aug 4, 2015 10:48am David Pilling (401) 41 posts	Hi, Does anyone know a quick way of going from the addresses zero pain gives to the name of the function in C. I know about the linker -MAP flag but that appears to only give me the file name (which often contains many functions). I’d really like to reproduce the Windows situation, where I am told the exact line of code. It would be possible to make simple tools to produce some of the above information, but it’s all about zero pain …and so far zero gain. David Pilling

Aug 4, 2015 11:55am Adrian Lees (1349) 122 posts	The linker option ‘-S filename’ will write out the symbols, including all function names, but not in numerical order. ‘-Symdefs filename’ writes out symbol definitions which contain essentially the same information. Take your pick. Then, I search the text file, working backwards numerically from the address/offset of the failure point (say it’s 0×9d78, I try 0×9d78, 0×9d70, 0×9d60…) until I find the symbol which most immediately precedes the failure point. At which point I can easily pair up the disassembly with the generated C/handwritten C and thus find the failure point in the source. All rather laborious, but this is what I currently do for release builds. If you have a build with function names then you can load the binary file into the a text editor in ASCII mode, ‘GoTo’ the address/offset’ and read backwards until you find a name, then switch to Disassembly mode. As for line numbers, you’d need a full debug build. Since I have so often had to perform this laborious procedure manually it doesn’t take me long, but since the ZeroPain ‘fun’, I’m currently putting together a library for reading symbol tables and automating some of this for release builds. It’ll probably be open-sourced on my website pretty soon…

Aug 4, 2015 12:04pm Rick Murray (539) 13840 posts	If you have a build with function names then you can load the binary file into the a text editor in ASCII mode, ‘GoTo’ the address/offset’ and read backwards until you find a name, then switch to Disassembly mode. That’s what I do, more or less. I load the program into Zap (automatically in disassembly mode) and I Page Up a few times to see if there is a function name. If not, I switch to Text mode and just start reading backwards from the current cursor position to find a function (and, given the space between the function name and the offending code, roughly how far into function the problem is). Then, back to disasembly to see what the code is trying to do (and sadly while the old enhanced disassembler used to provide annotations for CLib calls, the standard disassembler doesn’t). Armed with that it is time to look at the function. If nothing stands out in the code, it will be fun and games with DDT and single-stepping… ;-)

Aug 4, 2015 1:44pm David Pilling (401) 41 posts	Thanks. Very helpful. In the past I’ve often used the dump of function calling sequence when a program crashes – stack back trace – whatever. However that has never been particularly reproducible and does not seem to be working in my current environment (don’t let me mislead you I’m not on the zeropage version of RISC OS yet).

Aug 5, 2015 4:36pm David Pilling (401) 41 posts	Wow, after another hour or two I found my first zero page access bug on RISC OS. By then the “post mortem” dump, calling sequence, back trace had started to work, I was more inclined to believe it than what the symbol map implied, although they did agree. <advert> If you want to know how I do that, my desktop application library is now available in source code form: http://www.davidpilling.com/wiki/index.php/XL see c.deb </advert> Now I can save symbol maps, my idea is to make available a version of Ovation Pro that users can report zero pain address values for. Sense (always in short supply) suggests writing a ‘sort’ command line for the symbol maps before wasting another minute searching the raw version.

Aug 5, 2015 9:55pm Matthew Phillips (473) 721 posts	I’m sure that most of the programmers on this thread are well-versed with reading assembly language and tracing it back to the C source, but just for the record, here are a few techniques we have found useful the last few weeks when looking into ZeroPain bugs. Assuming you have identified the function that the problems is in, any BL instructions within the function are good markers, if they are within the application address space. Look up the address and (providing you have function names compiled into the runimage) you’ll be able to identify which function is being called at that point and identify the location in the source of your problematic function. Remember that in APCS the first four parameters of a function call will be passed in R0 to R3, and an integer result will be returned in R0. That can help you near the start of a long function working out what is going on. We had a couple of cases where the ZeroPain error reported was in the Shared C Library module. It wasn’t a fault in the CLib, but a fault in the way we had called it, for example calling strchr with a NULL pointer rather than a pointer to a real string. To track these back we used Zap to grab the Shared C Library module from memory and worked out where the function probably started. Then we grabbed our own application using Zap (menu on iconbar → Create → Read memory) which will differ from the !RunImage file because the running task will have the CLib stubs initialised and pointing to the right locations in the module. Then it was a matter of working back from the C Library module address and searching for where that was called from. This allowed us then to identify which function was the problem and eventually find the faulty call. If there’s a better way to do that, it would be nice if someone can explain!

Aug 7, 2015 8:57pm David Pilling (401) 41 posts	I’ve found that a quick technique is to look up the error address in the linker -MAP output, which gives me a file. Then generate an assembler version of the file using the compiler -s option. Finally match the code around the error in the disassembly in the compiler generated assembler. Anyway, can you help me with my homework… Someone has sent me the following – am I right to assume these errors are not due to SparkFS because they’re in the ROM? Time: Thu Aug 6 16:44:51 2015 Location: Unknown Current Wimp task: Unknown Last app to start: BASIC -quit “SCSI::RISC OS.$.!SparkFS.Resources.Resfind” SparkFS R0 = 00000001 R1 = 80000113 R2 = 60000000 R3 = fa20783c R4 = fa20783c R5 = fa207c3c R6 = fa207cb0 R7 = fa207cb8 R8 = 00000150 R9 = 00000000 R10 = 00000013 R11 = 00000025 R12 = 80000113 R13 = fa2077fc R14 = 80000113 R15 = fc026b58 DFAR = 00000001 Mode SVC32 Flags Nzcv if PSR = 80000113 fc026b10 : e3a0c000 : MOV R12,#0 fc026b14 : e35a002d : CMP R10,#&2D ; =“-” fc026b18 : 03e0c000 : MVNEQ R12,#0 fc026b1c : 135a002b : CMPNE R10,#&2B ; =“+” fc026b20 : 02811001 : ADDEQ R1,R1,#1 fc026b24 : e3a00000 : MOV R0,#0 fc026b28 : ef020021 : SWI XOS_ReadUnsigned fc026b2c : e35c0000 : CMP R12,#0 fc026b30 : 42622000 : RSBMI R2,R2,#0 fc026b34 : e49df004 : LDR PC,[R13],#4 fc026b38 : e10f1000 : MRS R1,CPSR fc026b3c : e3811003 : ORR R1,R1,#3 fc026b40 : e3c110c0 : BIC R1,R1,#&C0 ; =“À” fc026b44 : e121f001 : MSR CPSR_c,R1 fc026b48 : e202220e : AND R2,R2,#&E0000000 fc026b4c : e52d0004 : STR R0,[R13,#-4]! fc026b50 * e4d01001 * LDRB R1,[R0],#1 fc026b54 : e351000d : CMP R1,#&0D ; =13 fc026b58 : 1351000a : CMPNE R1,#&0A ; =10 fc026b5c : 13510000 : CMPNE R1,#0 fc026b60 : 049d0004 : LDREQ R0,[R13],#4 fc026b64 : 0a000007 : BEQ &FC026B88 fc026b68 : e331003c : TEQ R1,#&3C ; =“<” fc026b6c : 1afffff7 : BNE &FC026B50 fc026b70 : e49d0004 : LDR R0,[R13],#4 fc026b74 : e59fc044 : LDR R12,&FC026BC0 fc026b78 : e5dc1300 : LDRB R1,[R12,#768] fc026b7c : e201107f : AND R1,R1,#&7F ; =127 fc026b80 : e5cc1300 : STRB R1,[R12,#768] fc026b84 : e1822981 : ORR R2,R2,R1,LSL #19 fc026b88 : e2222202 : EOR R2,R2,#&20000000 fc026b8c : e4d01001 : LDRB R1,[R0],#1 Time: Thu Aug 6 16:44:51 2015 Location: Unknown Current Wimp task: Unknown Last app to start: “<SparkFS$Dir>.!RunImage” R0 = 00000001 R1 = 80000113 R2 = 60000000 R3 = fa2078ec R4 = fa2078ec R5 = fa207cec R6 = fa207d60 R7 = fa207d68 R8 = 00000150 R9 = 00000000 R10 = 00000013 R11 = 00000025 R12 = 80000113 R13 = fa2078ac R14 = 80000113 R15 = fc026b58 DFAR = 00000001 Mode SVC32 Flags Nzcv if PSR = 80000113 fc026b10 : e3a0c000 : MOV R12,#0 fc026b14 : e35a002d : CMP R10,#&2D ; =“-” fc026b18 : 03e0c000 : MVNEQ R12,#0 fc026b1c : 135a002b : CMPNE R10,#&2B ; =“+” fc026b20 : 02811001 : ADDEQ R1,R1,#1 fc026b24 : e3a00000 : MOV R0,#0 fc026b28 : ef020021 : SWI XOS_ReadUnsigned fc026b2c : e35c0000 : CMP R12,#0 fc026b30 : 42622000 : RSBMI R2,R2,#0 fc026b34 : e49df004 : LDR PC,[R13],#4 fc026b38 : e10f1000 : MRS R1,CPSR fc026b3c : e3811003 : ORR R1,R1,#3 fc026b40 : e3c110c0 : BIC R1,R1,#&C0 ; =“À” fc026b44 : e121f001 : MSR CPSR_c,R1 fc026b48 : e202220e : AND R2,R2,#&E0000000 fc026b4c : e52d0004 : STR R0,[R13,#-4]! fc026b50 * e4d01001 * LDRB R1,[R0],#1 fc026b54 : e351000d : CMP R1,#&0D ; =13 fc026b58 : 1351000a : CMPNE R1,#&0A ; =10 fc026b5c : 13510000 : CMPNE R1,#0 fc026b60 : 049d0004 : LDREQ R0,[R13],#4 fc026b64 : 0a000007 : BEQ &FC026B88 fc026b68 : e331003c : TEQ R1,#&3C ; =“<” fc026b6c : 1afffff7 : BNE &FC026B50 fc026b70 : e49d0004 : LDR R0,[R13],#4 fc026b74 : e59fc044 : LDR R12,&FC026BC0 fc026b78 : e5dc1300 : LDRB R1,[R12,#768] fc026b7c : e201107f : AND R1,R1,#&7F ; =127 fc026b80 : e5cc1300 : STRB R1,[R12,#768] fc026b84 : e1822981 : ORR R2,R2,R1,LSL #19 fc026b88 : e2222202 : EOR R2,R2,#&20000000 fc026b8c : e4d01001 : LDRB R1,[R0],#1

Aug 7, 2015 9:22pm Rick Murray (539) 13840 posts	am I right to assume these errors are not due to SparkFS because they’re in the ROM? Not necessarily. RISC OS doesn’t do much in the way of sanity checking, so the first step is to find out what (in ROM) is being called, and then check to see if you are calling it with a null pointer. It’s a shame ZeroPain doesn’t look for a valid backtrace – might make things easier…

Aug 7, 2015 9:57pm Rick Murray (539) 13840 posts	Okay. I have no idea what machine you are using, but since the code at that address in my ROM doesn’t look anything like that, I had to take another approach. There are 140 instances of “ReadUnsigned” in my ROM. One of them (luckily closer the top than the bottom) is followed by CMP R12,#0 then RSBMI something. With this in mind, switch to text view, I can see two messages about Return code just above. It appears to be kernel code, but early kernel code as it isn’t part of UtilityModule. Looking through the entire kernel source, I can see “RCExc” in two places. The second (in s.Arthur2) is what I’m looking for. Now, going down down down I can see your offending code. `Push "r0" 10 LDRB r1, [r0], #1 CMP r1, #13 CMPNE r1, #10 [etc]` You (or something on your behalf) has called GSINIT (to expand a string?), but R0 (pointer to string) is one (1?) for some reason, leading to what is effectively trying to load a byte into R1 from address &1. Um. ;-) Hope this helps.

Aug 7, 2015 10:10pm David Pilling (401) 41 posts	Thanks Rick, I’ll investigate further. The person who sent me this SparkFS problem, also sent an Ovation Pro one which was caused by an applet passing a null value for a view pointer.

Aug 9, 2015 2:58pm David Pilling (401) 41 posts	The GSInit problem would seem to be caused by both ResFind – which is a public domain Basic program for finding international resources and the SparkFS filer !RunImage. Neither of these call GSInit direct. I’m going to take the bold step of installing the zero page ROM on a computer and do more work from there…

Aug 9, 2015 3:15pm Rick Murray (539) 13840 posts	Hi, If it helps, I have written this… http://www.heyrick.co.uk/software/resfinder/

Aug 9, 2015 3:50pm Frederick Bambrough (1372) 837 posts	I’m going to take the bold step of installing the zero page ROM on a computer and do more work from there… Be sure to tie string around the bottom of your trouser legs first!

Aug 9, 2015 5:00pm Raik (463) 2061 posts	The latest version of ResFind is 4.0 aviable via gag.de. The version in SparkFS I use (1.43) has 2.01b inside. The author (HzN) you can contact via gag.de.

Aug 9, 2015 6:02pm Rick Murray (539) 13840 posts	Be sure to tie string around the bottom of your trouser legs first! Ewww! 8-o

Aug 10, 2015 4:06pm David Pilling (401) 41 posts	Thanks for the pointers to ResFind. It’s not a simple situation. I’ve got SparkFS running on the zeropage RISC OS with no problems. It also runs on Iyonix with Prot1K installed without errors. There must be something unusual about the set up of the person who reported this. Zeropain works nicely. Pleasant to use RISC OS on the Pi after Windows 10…

Aug 10, 2015 4:11pm David Pilling (401) 41 posts	Ah just got an email – user says “HOWEVER, I think I have solved the problem ! Using a bare !Boot structure, it doesn’t give an error. So I started to add the ‘extras’ I have in MY standard boot and the only time it threw up an error, was when the VProtect module was loaded. Keeping this out of !Boot let other ‘problem’ apps run without a glitch.” Virus protection software on RISC OS 8-) (in the sense virus protection software on Windows is often a reason things don’t work)

Aug 11, 2015 5:22pm Martin Avison (27) 1494 posts	I have been trying to work my way back to the root cause of a ZeroPain report at Offset 000130c0 in module SharedCLibrary, which is obviously caused by a program I maintain. I followed Matthew Phillips hints in his post above, and found two jump tables in the running load module which I assume are the Clib stubs. What is the best way to locate these in a load module? The tables I found were sets of LDR PC,addr and the addr contains addresses FC1xxxxx. There were 48 entries in the first set, and 185 in a second set immediately after. Is there a list anywhere of the function for each of these jumps? I extracted all the addresses and targets, sorted them into target sequence, and found the address corresponding to target offset 130c0, and looked for that in the running load module. However, there were 75 BL or B to that address! I suppose Clib routines are low level and therefore can be well used. But how can I make progress with this ZeroPain? I am aware that I can kill the ZP module and force an abort, but the program does not give a traceback, and anyway it would be aborted before it reached this Clib ZP problem. How can a traceback be added? I have made considerable progress with ZP problems in Basic and assembler modules, and in the C program itself, but as a newcomer to using C, I am now floundering. All clues welcome!

Aug 11, 2015 6:04pm Rick Murray (539) 13840 posts	The tables I found were sets of LDR PC,addr and the addr contains addresses FC1xxxxx. There were 48 entries in the first set, and 185 in a second set immediately after. Is there a list anywhere of the function for each of these jumps? That’s right. The smaller set are the kernel (low level) functions, and the larger set are the CLib functions. For the kernel functions, you will want k_entries followed by k_entries2 here: https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources/Lib/RISC_OSLib/kernel/s/ For the CLib functions, you will want cl_entries followed by cl_entries2 here: https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources/Lib/RISC_OSLib/clib/s/ – its an insanely long list, as you have noticed. However, there were 75 BL or B to that address! Something that is called a lot then… It’s a shame you gave an offset rather than “entry #123 in the bigger table”. I suppose Clib routines are low level and therefore can be well used. Yes. But note that CLib doesn’t do much (if any) validation of input, so if something passes a NULL pointer where it shouldn’t, CLib will happily try to deal with it and blow up in the process. Generally, if a CLib function is failing, you need to look at what arguments are being passed in. but the program does not give a traceback, Why not? Take a look to see if something at startup is calling `signal()`, and if it is, comment out those lines. The default option should be to output a backtrace if the program crashes. and anyway it would be aborted before it reached this Clib ZP problem. ? How can a traceback be added? What library are you using? Some have code to output a nice backtrace, otherwise the code to do it is a bit unpleasant. If you’re only looking at a quick hack, then this will sort-of-work (but will trash the stack so it isn’t a recommended method): `; at the top of the code extern void _backtrace(void); ; when you want a backtrace _backtrace();` It doesn’t appear that the nicer postmortem is directly available from CLib. But this shouldn’t matter, with signal handling removed, CLib ought to give a backtrace automatically. For example: `#include <stdio.h> int main(void) { double mynum; mynum = mynum / 0; return 0; }` will compile. And run. And do this: `Test Floating point exception: invalid operation Postmortem requested Arg2: 0x0000b3d8 46040 -> [0x0000b3e8 00000000 0x3e694c3c 0x00000008] Arg1: 0x00000001 1 80b0 in function main Arg2: 0x0000808c 32908 -> [0xe1a0c00d 0xe92dd813 0xe24cb004 0xe15d000a] Arg1: 0x0000af04 44804 -> [0x74736574 00000000 0x0000af0c 0x001a1719] fc16ef58 in shared library function 8538 in anonymous function ` &80B0 is the instruction after the faulty one. This, the backtrace, should be automatic unless it has been disabled. Note – by the way – that you will need to compile the program with function names embedded if you want the backtrace to be readable. If you can find the line in the MakeFile that calls cc, check that it is not passing the `-f` option with an `f` following (like “-ffah” or suchlike).

Aug 11, 2015 8:15pm David Pilling (401) 41 posts	What Rick his hinting at, is that applications very often do have signal handlers – you don’t want someone hitting Escape terminating your wimp task. There’s also some messing about has to go on to handle terminating printing. If it is a simple command line C program then you don’t have a problem – the backtrace is very useful.

Aug 11, 2015 10:27pm Martin Avison (27) 1494 posts	Thanks Rick for the clues about function names: I now think that +130c0 is at address FC14c210, and is in Clib strcmp, which starts at Clib+1309c and is entry 46 in the Clib table. I can find 73 ‘strcmp(’ in the source, so it seems very close to the 75 B/BL to 7D258. Is there no easy way to find the jump tables in the program memory? I can find no signal( in the source, so I need to re-check that there is no traceback. It uses RISC_OSLib, so maybe that does something? The program does have function names, so that is not an issue. With ZP killed the program will abort before the Clib ZP error, because there are other unresolved ZP bugs before then! I will investigate the backtrace asap, as it seems it will be needed before I can track down which strcmp call needs fixing. Which means that any previous ZP problems will need fixing first. :-(( Thanks for the help.

Aug 11, 2015 10:56pm Rick Murray (539) 13840 posts	Is there no easy way to find the jump tables in the program memory? You can check, but it isn’t easy. Both Zap and StrongEd can extract a task’s WimpSlot. From there, you’ll be looking at a wall of gibberish… Everything you’ll need will be in the task’s slot, but doing it that way is akin to making a cake by first sowing wheat. I can find 73 ‘strcmp(’ in the source Okay then. Now the next stage – what is causing the ZeroPain? You won’t be executing all of them at the same time. Is it loading the program? Saving some data? Searching? Think about which particular activity is making this issue apparent. That is the code path you’ll want to investigate. The strcmp function has various permutations depending on whether or not the start addresses are word aligned. At it’s most basic word aligned version, it looks like this: `do { w1 = (int )a, a += 4; w2 = (int )b, b += 4; res = w1 - w2; [etc]` So I’m sure you can understand how reading a word from `(int )<address>` could throw ZeroPain into a tizzy if one of the strings is actually a NULL pointer. Which means that any previous ZP problems will need fixing first. :-(( Well, that would need to be done, one way or another, yes. ;-) Do you have any debug code in your program? I link my programs into the DADebug module, so I can spit out pertinently placed debug information, and then type `DADPrint` in a TaskWindow to read it. It can be useful to see what is going on inside the program without going the whole hog and using DDT. You know, one of the best debugging aids is some judiciously placed printing statements. DADebug makes this possible in a multitasking world and* buffered (so you can watch the code crash and burn, then look to see how it got in that state). as it seems it will be needed before I can track down which strcmp call needs fixing. Just to repeat myself – before you do that, think about what the program is doing at the point of the ZeroPain report. The point at which this strcmp is failing. Some ought to stand out as being likely candidates. By the way – the ZeroPain report – what is recorded in R14? Remember, R14 is the return address, so there is a chance that it might point back to the calling code, in application space.

Aug 12, 2015 7:09pm Steve Fryatt (216) 2105 posts	Do you have any debug code in your program? I link my programs into the DADebug module, so I can spit out pertinently placed debug information, and then type *DADPrint in a TaskWindow to read it. It can be useful to see what is going on inside the program without going the whole hog and using DDT. You know, one of the best debugging aids is some judiciously placed printing statements. DADebug makes this possible in a multitasking world and buffered (so you can watch the code crash and burn, then look to see how it got in that state). I’d imagine that Martin would be using Reporter… (but yes, the same point applies).

Aug 12, 2015 10:29pm Martin Avison (27) 1494 posts	I’d imagine that Martin would be using Reporter Oh yes of course – but then I would :-)) Lets me watch it as it happens, in real time. No extra commands to run! I now have a program which uses a C Wimp task to interrogate and find the Clib jump tables, and extract them all with the Clib offsets and function names etc. For example… `Address Offset JumpNo BLaddr Function Name FC139174 Module Start FC139C20 AAC 121 7D1F4 _clib_initialise FC139C58 AE4 280 7D470 _clib_finalisemodule etc` If this may be any use to anyone, send me a PM. It does need ArmSort, and uses Reporter (but does not have to). Crude, but it seems to wok. Remember, R14 is the return address, so there is a chance that it might point back to the calling code Yup. Already looked at them. For these particular Clib issues it is 1010101 … which is probably not a useful address :-(( I did try running the program with ZP killed … it locked the machine up pretty quickly! It was on my development machine, so not clean. I will try on my (clean) RPi when I get chance.

Aug 13, 2015 8:33am Rick Murray (539) 13840 posts	I did try running the program with ZP killed … it locked the machine up pretty quickly! Check you aren’t calling a file operation with a duff (NULL?) handle. I have found that bad handles to functions like fread() will stiff the machine – absolutely horrendous if you ask me, but there you go… :-/ https://www.riscosopen.org/forum/forums/4/topics/3218 And here is a recount of my recent ZeroPain. Luckily the offending code was inlined into the program by the compiler so it wasn’t in CLib, though I wasn’t expecting this… ;-) https://www.riscosopen.org/forum/forums/1/topics/3418?page=2#posts-44665