zero pain (finding the function in C)
David Pilling (401) 41 posts |
Hi, Does anyone know a quick way of going from the addresses zero pain gives to the name of the function in C. I know about the linker -MAP flag but that appears to only give me the file name (which often contains many functions). I’d really like to reproduce the Windows situation, where I am told the exact line of code. It would be possible to make simple tools to produce some of the above information, but it’s all about zero pain …and so far zero gain. David Pilling |
Adrian Lees (1349) 122 posts |
The linker option ‘-S filename’ will write out the symbols, including all function names, but not in numerical order. ‘-Symdefs filename’ writes out symbol definitions which contain essentially the same information. Take your pick. Then, I search the text file, working backwards numerically from the address/offset of the failure point (say it’s 0×9d78, I try 0×9d78, 0×9d70, 0×9d60…) until I find the symbol which most immediately precedes the failure point. At which point I can easily pair up the disassembly with the generated C/handwritten C and thus find the failure point in the source. All rather laborious, but this is what I currently do for release builds. If you have a build with function names then you can load the binary file into the a text editor in ASCII mode, ‘GoTo’ the address/offset’ and read backwards until you find a name, then switch to Disassembly mode. As for line numbers, you’d need a full debug build. Since I have so often had to perform this laborious procedure manually it doesn’t take me long, but since the ZeroPain ‘fun’, I’m currently putting together a library for reading symbol tables and automating some of this for release builds. It’ll probably be open-sourced on my website pretty soon… |
Rick Murray (539) 13840 posts |
That’s what I do, more or less. I load the program into Zap (automatically in disassembly mode) and I Page Up a few times to see if there is a function name. If not, I switch to Text mode and just start reading backwards from the current cursor position to find a function (and, given the space between the function name and the offending code, roughly how far into function the problem is). Armed with that it is time to look at the function. If nothing stands out in the code, it will be fun and games with DDT and single-stepping… ;-) |
David Pilling (401) 41 posts |
Thanks. Very helpful. In the past I’ve often used the dump of function calling sequence when a program crashes – stack back trace – whatever. However that has never been particularly reproducible and does not seem to be working in my current environment (don’t let me mislead you I’m not on the zeropage version of RISC OS yet). |
David Pilling (401) 41 posts |
Wow, after another hour or two I found my first zero page access bug on RISC OS. By then the “post mortem” dump, calling sequence, back trace had started to work, I was more inclined to believe it than what the symbol map implied, although they did agree. <advert> Now I can save symbol maps, my idea is to make available a version of Ovation Pro that users can report zero pain address values for. Sense (always in short supply) suggests writing a ‘sort’ command line for the symbol maps before wasting another minute searching the raw version. |
Matthew Phillips (473) 721 posts |
I’m sure that most of the programmers on this thread are well-versed with reading assembly language and tracing it back to the C source, but just for the record, here are a few techniques we have found useful the last few weeks when looking into ZeroPain bugs. Assuming you have identified the function that the problems is in, any BL instructions within the function are good markers, if they are within the application address space. Look up the address and (providing you have function names compiled into the runimage) you’ll be able to identify which function is being called at that point and identify the location in the source of your problematic function. Remember that in APCS the first four parameters of a function call will be passed in R0 to R3, and an integer result will be returned in R0. That can help you near the start of a long function working out what is going on. We had a couple of cases where the ZeroPain error reported was in the Shared C Library module. It wasn’t a fault in the CLib, but a fault in the way we had called it, for example calling strchr with a NULL pointer rather than a pointer to a real string. To track these back we used Zap to grab the Shared C Library module from memory and worked out where the function probably started. Then we grabbed our own application using Zap (menu on iconbar → Create → Read memory) which will differ from the !RunImage file because the running task will have the CLib stubs initialised and pointing to the right locations in the module. Then it was a matter of working back from the C Library module address and searching for where that was called from. This allowed us then to identify which function was the problem and eventually find the faulty call. If there’s a better way to do that, it would be nice if someone can explain! |
David Pilling (401) 41 posts |
I’ve found that a quick technique is to look up the error address in the linker -MAP output, which gives me a file. Then generate an assembler version of the file using the compiler -s option. Finally match the code around the error in the disassembly in the compiler generated assembler. Anyway, can you help me with my homework… Someone has sent me the following – am I right to assume these errors are not due to SparkFS because they’re in the ROM? Time: Thu Aug 6 16:44:51 2015 R0 = 00000001 R1 = 80000113 R2 = 60000000 R3 = fa20783c fc026b10 : e3a0c000 : MOV R12,#0 Time: Thu Aug 6 16:44:51 2015 R0 = 00000001 R1 = 80000113 R2 = 60000000 R3 = fa2078ec fc026b10 : e3a0c000 : MOV R12,#0 |
Rick Murray (539) 13840 posts |
Not necessarily. RISC OS doesn’t do much in the way of sanity checking, so the first step is to find out what (in ROM) is being called, and then check to see if you are calling it with a null pointer. It’s a shame ZeroPain doesn’t look for a valid backtrace – might make things easier… |
Rick Murray (539) 13840 posts |
Okay. I have no idea what machine you are using, but since the code at that address in my ROM doesn’t look anything like that, I had to take another approach. There are 140 instances of “ReadUnsigned” in my ROM. One of them (luckily closer the top than the bottom) is followed by CMP R12,#0 then RSBMI something. With this in mind, switch to text view, I can see two messages about Return code just above. It appears to be kernel code, but early kernel code as it isn’t part of UtilityModule.
You (or something on your behalf) has called GSINIT (to expand a string?), but R0 (pointer to string) is one (1?) for some reason, leading to what is effectively trying to load a byte into R1 from address &1. Um. ;-) Hope this helps. |
David Pilling (401) 41 posts |
Thanks Rick, I’ll investigate further. The person who sent me this SparkFS problem, also sent an Ovation Pro one which was caused by an applet passing a null value for a view pointer. |
David Pilling (401) 41 posts |
The GSInit problem would seem to be caused by both ResFind – which is a public domain Basic program for finding international resources and the SparkFS filer !RunImage. Neither of these call GSInit direct. I’m going to take the bold step of installing the zero page ROM on a computer and do more work from there… |
Rick Murray (539) 13840 posts |
Hi, If it helps, I have written this… http://www.heyrick.co.uk/software/resfinder/ |
Frederick Bambrough (1372) 837 posts |
Be sure to tie string around the bottom of your trouser legs first! |
Raik (463) 2061 posts |
The latest version of ResFind is 4.0 aviable via gag.de. The version in SparkFS I use (1.43) has 2.01b inside. |
Rick Murray (539) 13840 posts |
Ewww! 8-o |
David Pilling (401) 41 posts |
Thanks for the pointers to ResFind. It’s not a simple situation. I’ve got SparkFS running on the zeropage RISC OS with no problems. It also runs on Iyonix with Prot1K installed without errors. There must be something unusual about the set up of the person who reported this. Zeropain works nicely. Pleasant to use RISC OS on the Pi after Windows 10… |
David Pilling (401) 41 posts |
Ah just got an email – user says “HOWEVER, I think I have solved the problem ! Virus protection software on RISC OS 8-) |
Martin Avison (27) 1494 posts |
I have been trying to work my way back to the root cause of a ZeroPain report at Offset 000130c0 in module SharedCLibrary, which is obviously caused by a program I maintain. I followed Matthew Phillips hints in his post above, and found two jump tables in the running load module which I assume are the Clib stubs. What is the best way to locate these in a load module? The tables I found were sets of LDR PC,addr and the addr contains addresses FC1xxxxx. There were 48 entries in the first set, and 185 in a second set immediately after. Is there a list anywhere of the function for each of these jumps? I extracted all the addresses and targets, sorted them into target sequence, and found the address corresponding to target offset 130c0, and looked for that in the running load module. However, there were 75 BL or B to that address! I suppose Clib routines are low level and therefore can be well used. But how can I make progress with this ZeroPain? I am aware that I can kill the ZP module and force an abort, but the program does not give a traceback, and anyway it would be aborted before it reached this Clib ZP problem. How can a traceback be added? I have made considerable progress with ZP problems in Basic and assembler modules, and in the C program itself, but as a newcomer to using C, I am now floundering. All clues welcome! |
Rick Murray (539) 13840 posts |
That’s right. The smaller set are the kernel (low level) functions, and the larger set are the CLib functions. For the kernel functions, you will want k_entries followed by k_entries2 here: https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources/Lib/RISC_OSLib/kernel/s/ For the CLib functions, you will want cl_entries followed by cl_entries2 here: https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources/Lib/RISC_OSLib/clib/s/ – its an insanely long list, as you have noticed.
Something that is called a lot then… It’s a shame you gave an offset rather than “entry #123 in the bigger table”.
Yes. But note that CLib doesn’t do much (if any) validation of input, so if something passes a NULL pointer where it shouldn’t, CLib will happily try to deal with it and blow up in the process.
Why not? Take a look to see if something at startup is calling
?
What library are you using? Some have code to output a nice backtrace, otherwise the code to do it is a bit unpleasant. If you’re only looking at a quick hack, then this will sort-of-work (but will trash the stack so it isn’t a recommended method):
It doesn’t appear that the nicer postmortem is directly available from CLib. But this shouldn’t matter, with signal handling removed, CLib ought to give a backtrace automatically. For example:
will compile. And run. And do this:
&80B0 is the instruction after the faulty one. This, the backtrace, should be automatic unless it has been disabled. Note – by the way – that you will need to compile the program with function names embedded if you want the backtrace to be readable. If you can find the line in the MakeFile that calls cc, check that it is not passing the |
David Pilling (401) 41 posts |
What Rick his hinting at, is that applications very often do have signal handlers – you don’t want someone hitting Escape terminating your wimp task. There’s also some messing about has to go on to handle terminating printing. If it is a simple command line C program then you don’t have a problem – the backtrace is very useful. |
Martin Avison (27) 1494 posts |
Thanks Rick for the clues about function names: I now think that +130c0 is at address FC14c210, and is in Clib strcmp, which starts at Clib+1309c and is entry 46 in the Clib table. I can find 73 ‘strcmp(’ in the source, so it seems very close to the 75 B/BL to 7D258. Is there no easy way to find the jump tables in the program memory? I can find no signal( in the source, so I need to re-check that there is no traceback. It uses RISC_OSLib, so maybe that does something? The program does have function names, so that is not an issue. With ZP killed the program will abort before the Clib ZP error, because there are other unresolved ZP bugs before then! I will investigate the backtrace asap, as it seems it will be needed before I can track down which strcmp call needs fixing. Which means that any previous ZP problems will need fixing first. :-(( Thanks for the help. |
Rick Murray (539) 13840 posts |
You can check, but it isn’t easy. Both Zap and StrongEd can extract a task’s WimpSlot. From there, you’ll be looking at a wall of gibberish… Everything you’ll need will be in the task’s slot, but doing it that way is akin to making a cake by first sowing wheat.
Okay then. Now the next stage – what is causing the ZeroPain? You won’t be executing all of them at the same time. Is it loading the program? Saving some data? Searching? Think about which particular activity is making this issue apparent. That is the code path you’ll want to investigate. The strcmp function has various permutations depending on whether or not the start addresses are word aligned. At it’s most basic word aligned version, it looks like this:
So I’m sure you can understand how reading a word from
Well, that would need to be done, one way or another, yes. ;-) Do you have any debug code in your program? I link my programs into the DADebug module, so I can spit out pertinently placed debug information, and then type
Just to repeat myself – before you do that, think about what the program is doing at the point of the ZeroPain report. The point at which this strcmp is failing. Some ought to stand out as being likely candidates. By the way – the ZeroPain report – what is recorded in R14? Remember, R14 is the return address, so there is a chance that it might point back to the calling code, in application space. |
Steve Fryatt (216) 2105 posts |
I’d imagine that Martin would be using Reporter… (but yes, the same point applies). |
Martin Avison (27) 1494 posts |
Oh yes of course – but then I would :-)) Lets me watch it as it happens, in real time. No extra commands to run! I now have a program which uses a C Wimp task to interrogate and find the Clib jump tables, and extract them all with the Clib offsets and function names etc. For example… If this may be any use to anyone, send me a PM. It does need ArmSort, and uses Reporter (but does not have to). Crude, but it seems to wok.
Yup. Already looked at them. For these particular Clib issues it is 1010101 … which is probably not a useful address :-(( I did try running the program with ZP killed … it locked the machine up pretty quickly! It was on my development machine, so not clean. I will try on my (clean) RPi when I get chance. |
Rick Murray (539) 13840 posts |
Check you aren’t calling a file operation with a duff (NULL?) handle. I have found that bad handles to functions like fread() will stiff the machine – absolutely horrendous if you ask me, but there you go… :-/ And here is a recount of my recent ZeroPain. Luckily the offending code was inlined into the program by the compiler so it wasn’t in CLib, though I wasn’t expecting this… ;-) |