A new disassembler for the Debugger module

55 posts, 13 voices

Pages: 1 2 3

Dec 10, 2010 2:11pm Jeffrey Lee (213) 6048 posts	Chances are that I’m going to have a couple of days spare over Christmas where I don’t have access to my full kit, so I’m thinking of using that time to start work on a new disassembler for the Debugger module. The main aim of the rewrite would obviously be to add support for all ARMv7 instructions (including Thumb2 and ThumbEE), but there are also a few other things I had in mind: The new disassembler would be written in C, using decgen to do all the grunt work. This’ll make it much easier to write/maintain/test. Add support for producing disassembly from the perspective of different architectures/CPUs. To be honest I’m not really sure how useful people would find that feature, but since the instruction decoder will be machine generated it’s a feature that’ll be very easy to implement. Update the list of disassembly warnings to detect all the stuff that the ARMv7 ARM warns against doing, and anything else that I can think of (e.g. it would be nice if it could detect some of the code sequences mentioned in the ARMv7 compatibility primer) Allow disassembly to be produced in UAL or non-UAL form. Although the existing SWI interface will continue to work, I’m thinking that a new set of SWIs will be needed. Apart from allowing you to disassemble Thumb2 & ThumbEE, these new SWIs could also provide control over the other features of the disassembler (register naming conventions, UAL/non-UAL, architecture version, etc.). Since the current SWI interface only disassembles one instruction at a time, the module currently uses some workspace variables to remember waht the last instruction(s) were, to help with detecting bad instruction sequences. But the new SWI interface will provide a proper mechanism for the module to store the disassembly state, so everything will work properly if two things try disassembling code at once Are there any other features that people would like to see? I’m a bit worried that the new disassembler will be too big to fit into some of our existing ROM images, but if that happens then I’m sure there are at least a couple of tricks that can be used to get it down to size.

Dec 10, 2010 5:03pm W P Blatchley (147) 247 posts	This sounds great. I haven’t got any particular feature requests for the disassembler itself, but have you considered (I expect you have) incorporating some of the ideas from Darren Salt’s Debugger Plus? He had some excellent ideas in there. I’m not sure how many are already in the current Debugger module. And are you going to be touching the *memoryX commands, or just sticking strictly to the disassembler? I’d like to see a new command or an extension to existing ones to allow a read-modify-write sequence from the command line. If you could supply a mask along with the value to write, that would be really useful for quick-hacking things from the command line, especially combined with the “P” option. Just a couple of thoughts!

Dec 10, 2010 5:30pm Jeffrey Lee (213) 6048 posts	have you considered (I expect you have) incorporating some of the ideas from Darren Salt’s Debugger Plus? I haven’t even heard of it before! I’ll have to take a look some time. At the moment I’m focusing solely on the disassembler, but I might take a look at a few other things in the future. In fact there is one feature that I’d find useful – at the moment there’s a bug/mistake in the page table settings that the OMAP port uses, which means that pages which should be fully read-only can still be written to from privileged modes. And for the past year I’ve been unwittingly exploiting this bug, by using the debugger module to set breakpoints in the ROM image (I thought it was just the case that the debugger was smart enough to unprotect the pages when setting breakpoints). So once the bug in the page table settings gets fixed, it would be nice if the debugger did gain the ability to unprotect read-only pages when setting breakpoints, as setting breakpoints inside the ROM image is something I often end up doing.

Dec 10, 2010 6:00pm W P Blatchley (147) 247 posts	I haven’t even heard of it before! I’ll have to take a look some time. The easiest way to find it and its documentation is in a distribution of !Zap. Look inside Code.Extensions.ExtAsm in the main application directory.

Dec 11, 2010 8:13am Sprow (202) 1158 posts	I’m a bit worried that the new disassembler will be too big to fit into some of our existing ROM images, but if that happens then I’m > sure there are at least a couple of tricks that can be used to get it down to size. I suspect 90+% of desktop users have never used the Debugger module, at least not at the command line. How about having a slim version in the ROM which supports just the Debugger_Disassemble SWI (so Zap and StrongEd disassembly mode work) and the various MEMORY star commands. Because it’s in ROM you’d also know which instruction sets to support and code sequence warnings to enable – no need for Thumb 1 on a Cortex that can’t run that code anyway. Then, a developer (‘pro’) version supplied on disc which supports all the instruction sets and extra options. The disc version can afford to be much larger.

Dec 12, 2010 8:21pm W P Blatchley (147) 247 posts	Found a proper link for DebuggerPlus here.

Dec 12, 2010 9:53pm Jeffrey Lee (213) 6048 posts	A few of the things in Debugger Plus look good (e.g. using DCD for undefined instructions, quoting SWI names, HS & LO instead of CS & CC, etc.), so I’ll definitely try to incorporate those. But I don’t think I’ll add support for anything that involves detecting instruction sequences (e.g. ADRL, LDRL – it just doesn’t make sense to support them if there’s no way of supressing the output of the first instruction of the pair), and I don’t think I’ll add support for ADRW/LDRW/etc. (I don’t see how they’re any better than just showing R12 + offset) I suspect it won’t be possible to reuse the Debugger Plus SWI interface, so to avoid compatability issues I’ll make sure that the new Debugger SWIs don’t clash with the Debugger Plus SWIs (especially since we only need 19 more checkins before we hit Debugger 2.00 and we start getting version number conflicts).

Dec 12, 2010 11:28pm Steve Revill (20) 1361 posts	I think technically, you should use DCI for undefined instructions. Also, did you spot Disassemble$Options in the standard Debugger module and the facility it provides to disassemble with APCS register names rather than ARM and/or replace R13/SP, R14/LR, etc?

Dec 13, 2010 1:11am Jeffrey Lee (213) 6048 posts	I think technically, you should use DCI for undefined instructions. Yes, you’re probably right! Also, did you spot Debugger$Options in the standard Debugger module and the facility it provides to disassemble with APCS register names rather than ARM and/or replace R13/SP, R14/LR, etc? Yes – I’ll be keeping the current options and just expand Debugger$Options to cope with the new ones.

Dec 13, 2010 8:23am Terje Slettebø (285) 275 posts	Chances are that I’m going to have a couple of days spare over Christmas where I don’t have access to my full kit, so I’m thinking of using that time to start work on a new disassembler for the Debugger module. That’s cool. As I understand, Ben Avison wrote the improved DecAof, so he might be able to help with any issues regarding the disassembler. Also, you might find it useful for testing to run the extASM test files (which cover all the ARMv7 instructions and their variations, except Thumb) with the new Debugger. I’ve tested all these files with DecAof, which produces the equivalent disassembly, so it should be a good test. I think it’s good that this will be implemented in C (or some other high-level language), as the more components of RISC OS we get implemented in high-level languages (or at least platform-independent ones), the easier it will make maintenance and porting.

Dec 13, 2010 1:54pm Jeffrey Lee (213) 6048 posts	Thanks for reminding me about DecAof – it would probably be good if I checked the output against that. To start with I’ll probably just disassemble a couple of ROM images using the old & new debugger, and use that to make sure all the main features are correct (disassembly, formatting, address generation for PC-relative instructions, etc.) Then, assuming there aren’t any big differences in formatting, I can try setting up a testbed which will disassemble all instructions and compare them against DecAof. This should help track down any remaining formatting problems or issues with instructions being incorrectly classified. I’ve got a fair amount of faith that decgen will classify instructions correctly (the encoding files that it will use have already been machine-checked for ambiguities), but it’s always possible that there’s a bug that I haven’t found. Or alternatively this test might reveal a bug in DecAof – which is quite possible, since I think DecAof is still using a hand-crafted decoder. Or at least I can be fairly certain that it’s not using decgen, since until I made the new release on Saturday there were a couple of simple bugs that would have prevented most decoders from compiling ;)

Dec 13, 2010 2:57pm Steve Revill (20) 1361 posts	On the subject of Ben’s work on DecAOF, you should look at this announcement and the associated wiki page.

Mar 7, 2011 2:07pm Jeffrey Lee (213) 6048 posts	Just in case anyone’s wondering what happened to this – although I didn’t get a chance to start work on it over Christmas, I did start work on it this past weekend. The pace of development will most likely be governed by how often I have to keep halting my work on the Portable module while waiting for the Touch Book to charge/discharge. But if the progress I made this weekend is any indication then I’d expect to be able to produce the first working ARM-only disassembler sometime this week. Most likely it’ll be sometime next month when the code is in a releasable state (all instructions/instruction sets supported, no major disassembly errors).

Mar 7, 2011 4:57pm W P Blatchley (147) 247 posts	Just in case anyone’s wondering what happened to this I was wondering – but just figured you’d been too busy. The pace of development will most likely be governed by how often I have to keep halting my work on the Portable module while waiting for the Touch Book to charge/discharge. But if the progress I made this weekend is any indication then I’d expect to be able to produce the first working ARM-only disassembler sometime this week. So you’re saying you’ll be developing a disassembler, in the ‘gaps’ between working on the Portable module, all in your spare time, and probably have something working within a week… Holy cow, Jeffrey! It’s like Jack Bauer makes regular mortals feel when they watch 24! Ahem! Anyway, looking forward to it…and good luck. Going dark now. (Sorry for the OT chatter to those who don’t like it!)

Mar 9, 2011 1:08pm Steve Revill (20) 1361 posts	(Sorry for the OT chatter to those who don’t like it!) Keep it up – personally, I like our forums to have a bit of personality…

Mar 10, 2011 2:09pm Jeffrey Lee (213) 6048 posts	I managed to produce the first working disassembler on tuesday, and spent last night focusing on creating a testbed app to allow me to compare the output against both the Debugger module and DecAOF. So the next stage is to add the coprocessor instructions and tweak the output formatting so I can compare the disassembly of real programs. Size-wise the disassembler is currently at about 95K, which is larger than DecAOF (about 70K, IIRC). Once I’ve added the Thumb and coprocessor instructions it wouldn’t surprise me if the whole thing is around 250-300K in size. There are a couple of bits I can tweak to reduce the size, but nothing that will produce any major savings, unless I was to completely change the way everything works. It would effectively involve replacing the machine-generated decoder with a hand-crafted one, which would obviously eliminate all the benefits of using a machine-generated decoder in the first place. Maybe I’ll be able to make some improvements to decgen to result in smaller code being produced, and maybe a newer versions of Norcroft will help (i.e. one that supports the UBFX and SBFX instructions), but for now I think we’ll be stuck with a rather bulky disassembler.

Mar 13, 2011 10:37pm Jeffrey Lee (213) 6048 posts	Looks like I’ll need to do some more work on decgen to make this disassembler a reality. Although it coped fine with the ARM, FPA & VFP instruction sets, when I just tried adding the NEON instructions it all fell apart. Tree generation time went from a couple of minutes to over an hour, and the Wimpslot went from a couple of MB up to 31MB. And then it crashed while trying to shrink the tree. Apart from the bug that caused the crash, and a piece of code that’s in desperate need of optimisation (it’s the cause of most of the hour-long delay), I think the big problem is how decgen handles itself when it’s generating a subtree which has to detect the difference between just two different encodings (i.e. instructions). The current tree representation only allows each node to perform one test, so even if there are only two possible outcomes of the subtree (encoding A or encoding B) it often has to generate several nodes in order to correctly classify each input value. Before I added the NEON instructions I think it was generating a tree with around 17,000 nodes (which would then get shrunk to a few hundred) – but now it’s generated a tree containing 205,000 nodes!

Mar 14, 2011 11:55pm Steve Revill (20) 1361 posts	Ouch. Is the tree generation a run-time step of the disassembler, or a step in the decgen process which produces the decoder engine of the disassembler? I think you’re saying it’s the latter so the performance issue isn’t as much of an issue but it still sounds like some major architectural problems there. :( In many ways, Ben has been having a similar kind of fun adding lots of stuff that does the opposite in objasm… And I just spent three or four days adding a clever bit of algorithm to objasm’s opcode identifying function only to find it saves just 2.5% of the build time for an OMAP ROM’s worth of assembler modules. Bummer. On the plus side, we’re hoping it won’t get any slower with support for all the weird and wonderful new instructions added.

Mar 15, 2011 10:32am Jeffrey Lee (213) 6048 posts	Ouch. Is the tree generation a run-time step of the disassembler, or a step in the decgen process which produces the decoder engine of the disassembler? The latter. I think you’re saying it’s the latter so the performance issue isn’t as much of an issue but it still sounds like some major architectural problems there. :( Luckily it wasn’t too hard to solve most of the issues. The crash was a simple null pointer dereference, the hour-long wait was due to a simple algorithm which was implemented as O(N²) instead of O(NlogN), and the memory usage was solved by performing some of the tree optimisation steps while generating the tree instead of just at the end. So now it takes about 15 minutes and 10.5MB of RAM. 15 minutes is still quite a long build time, especially since the decision tree needs rebuilding as soon as you make any changes to the input files – even if those changes won’t affect the decision-making process. I’m hoping I’ll be able to get it back down to around the two/three minute mark by tweaking the code that decides what type of test to perform next (i.e. which node it should insert into the tree next). Basically at the moment it only scores each candidate node once it knows (roughly) which encodings each child of that node will have to match. But by working out the upper bounds of the node scoring function, it should be possible for me to make it score the node at several points during the evaluation process, discarding the candidate as soon as the upper bound of its potential score drops below the score of the current best candidate.

Sep 25, 2011 7:29pm Jeffrey Lee (213) 6048 posts	I haven’t spent much time on this since March, but since BASIC, objasm & decaof now support most/all of the ARMv7 instrctions I figured it would make sense if I at lesat released a copy of what I’ve got so far. Over here is an archive containing a copy of the disassembler binary ‘dis2’ (it’s a simple CLI utility at the moment) and source. Note that to compile the source you’ll also need the latest version of decgen. The code can be compiled with either GCC or norcroft (just use ‘make GCC=TRUE’ for GCC). Norcroft is significantly faster and produces smaller code than GCC, so it’s probably best to stick to that if you’ve got it. I’m hoping I’ll be able to get it back down to around the two/three minute mark by tweaking the code that decides what type of test to perform next… Unfortunately I wasn’t able to get this code working properly. As things are currently, it’ll take an Iyonix just under 10 minutes to process the encodings/action files and produce the 1.1MB source file containing the disassembler core. Some further notes: Although the disassembler core has quite a few options available for tweaking the output format, the current command-line tool only provides three formatting options: ‘debugger’, ‘decaof’, and ‘decaofual’. These options attempt to match the output of the debugger module, decaof, and decaof (in UAL mode) respectively. I’ve also included the source for my testbed program (just do ‘make testbed’ to build it). This program allows you to specify a file to disassemble, or a range of opcodes to disassemble. It will disassemble the input using all three methods that dis2 supports, and then compare it against the output of the debugger module and decaof. For this tool to work you’ll need a copy of the binaof utility (from mixed.RiscOS.Library.Build), grep, and GNU diff (named ‘gnudiff’). The testbed tool places its temporary files & output into a folder of your choice. However the folder must be created beforehand. Since dis2 isn’t able to match exactly the output of the other disassemblers, the testbed program contains code to manipulate the output slightly so that the diff is cleaner. At the moment this just involves stripping anything that looks like a comment, and stripping the header/footer from decaof’s output. Since I’m hoping to one day compare the disassembly of the full instruction set, I’ll probably go through and either tweak the output of dis2 to allow it to match the other tools (e.g. MCR/MRC register naming) or write my own diff routine for use in the testbed app (This is probably a convenient solution for dealing with all the number formatting differences, e.g. where one disassembler outputs hex but the other decimal). Dissasembler features/known issues: Should support full ARMv7 instruction set, including VFP/NEON, FPA, XScale DSP, and (CMN\|CMP\|TEQ\|TST)P No Thumb support yet NEON instructions are always output in UAL form. Corrupt output produced for non-UAL version of VCVTB/VCVTT (VFPv3 half-prevision to single-precision). I haven’t found a reference for what the non-UAL form should look like. NEON floating point immediate constants probably aren’t displayed in a way that will allow the instruction to be recompiled. Not all warnings/feedback features are implemented. E.g. it knows which CPU architectures most instructions require, but won’t display that info to the user yet. Let me know if you’ve got any comments/bugs/feedback!

Oct 18, 2012 12:58pm Jeffrey Lee (213) 6048 posts	I’m still slowly moving forwards with this. A few months ago I made a concerted effort to get things to a state where I can do a full comparison against the Debugger module & decaof to weed out any bugs and formatting discrepancies. This went quite well, with plenty of bugs being found in my disassembler. When I next look at this, if there’s nothing major that pops up in the disassembly comparison, it’s probably high time I sorted out Thumb support and started to try and plug the code into the Debugger module. If I can remember, I’ll probably upload a new test build sometime in the next few days (assuming people care about such things!). I suspect I’ve been sitting on some decgen bugfixes/improvements too.

Oct 20, 2012 8:36pm Jeffrey Lee (213) 6048 posts	If I can remember, I’ll probably upload a new test build sometime in the next few days (assuming people care about such things!). I suspect I’ve been sitting on some decgen bugfixes/improvements too. Done and done.

Nov 28, 2013 2:07am Jeffrey Lee (213) 6048 posts	I’m not sure if I made it in time for the nightly build (the 1 hour offset on the CVS viewer is throwing me off), but I’ve now checked in the first version of the Debugger module that (partially) features a decgen-based, machine-generated disassembler. Disassembly of VFP/NEON instructions uses the new engine, while all other instructions continue to use the old engine. The VFP/NEON instructions are annotated with which VFP/NEON version they require, and are using UAL syntax, so will mostly match the mnemonics used with the BASIC assembler. Note that although the Debugger module is only using the new disassembler for VFP/NEON instructions, I have checked in the full sources for my ARM+VFP+NEON+FPA+XScale DSP disassembler, so we could easily switch over to the full disassembler engine if desired (although it does cost a fair bit more ROM space – and I haven’t yet convinced Sprow that it’s space well spent). A standalone version of the full disassembler engine can also be built for testing, and the testbed app to compare the output against the Debugger module and decaof is in there too.

Nov 28, 2013 6:07pm Steve Drain (222) 1620 posts	I don’t know if it is relevant to the disassembler, but I noticed one error in the data you posted for me in the VFPtutorial thread: VNEG requires a suffix: VNEG.F32 or VNEG.F64 I would eventually have mentioned it in the other thread.

Jun 1, 2016 10:05pm Jeffrey Lee (213) 6048 posts	As I sit here wondering where on earth the correct place is to poke the Debugger sources so that it won’t consider “CPSID i, #mode” to be an undefined instruction, I figured it’s a reasonable excuse to bring this topic back to the fore again. Pros and cons of the new disassembler: Pros: Encoding definitions are machine-checked to make sure that there are no gaps or overlaps. Each of the 2^32 word values are guaranteed to match against exactly one encoding. No head scratching required to work out where to insert new instructions without inadvertently breaking something else Cons: Larger code size (Debugger module is 70k (including the machine-generated VFP/NEON disassembler), decaof is 73k, my standalone “dis2” disassembler is 83k) Can be a bit fiddly to add new instructions – you need to work out which encoding the instruction currently matches against and then manually “subtract” the desired encoding from it. But worst-case you should only need to go through the encoding tables in the ARM ARM and play spot the difference, unlike the Debugger, where there’s been no real attempt to make the high-level decoding match what’s presented in recent ARM docs. Probably a few rough edges – e.g. I still haven’t properly sorted out Thumb decoding. I also need to add the (AArch32) ARMv8 instructions at some point, although they’re not supported by the Debugger yet either. Originally this post was going to say “Significantly larger code size”, but it seems that the size optimisation pass I did a while ago must have had a bigger effect than I remembered. Although as a disclaimer, the Debugger and dis2 builds I’m comparing are optimised for the Pi, so it’s possible e.g. IOMD may see a more significant size difference (and I haven’t calculated how much of the Debugger is actually the disassembler and how much is the other facilities) But to get to the point: What, if anything, do people feel is necessary for me to do in order for the new disassembler to be considered an acceptable replacement for the current one?