A new disassembler for the Debugger module
Jeffrey Lee (213) 6048 posts |
Chances are that I’m going to have a couple of days spare over Christmas where I don’t have access to my full kit, so I’m thinking of using that time to start work on a new disassembler for the Debugger module. The main aim of the rewrite would obviously be to add support for all ARMv7 instructions (including Thumb2 and ThumbEE), but there are also a few other things I had in mind:
Are there any other features that people would like to see? I’m a bit worried that the new disassembler will be too big to fit into some of our existing ROM images, but if that happens then I’m sure there are at least a couple of tricks that can be used to get it down to size. |
W P Blatchley (147) 247 posts |
This sounds great. I haven’t got any particular feature requests for the disassembler itself, but have you considered (I expect you have) incorporating some of the ideas from Darren Salt’s Debugger Plus? He had some excellent ideas in there. I’m not sure how many are already in the current Debugger module. And are you going to be touching the *memoryX commands, or just sticking strictly to the disassembler? I’d like to see a new command or an extension to existing ones to allow a read-modify-write sequence from the command line. If you could supply a mask along with the value to write, that would be really useful for quick-hacking things from the command line, especially combined with the “P” option. Just a couple of thoughts! |
Jeffrey Lee (213) 6048 posts |
I haven’t even heard of it before! I’ll have to take a look some time. At the moment I’m focusing solely on the disassembler, but I might take a look at a few other things in the future. In fact there is one feature that I’d find useful – at the moment there’s a bug/mistake in the page table settings that the OMAP port uses, which means that pages which should be fully read-only can still be written to from privileged modes. And for the past year I’ve been unwittingly exploiting this bug, by using the debugger module to set breakpoints in the ROM image (I thought it was just the case that the debugger was smart enough to unprotect the pages when setting breakpoints). So once the bug in the page table settings gets fixed, it would be nice if the debugger did gain the ability to unprotect read-only pages when setting breakpoints, as setting breakpoints inside the ROM image is something I often end up doing. |
W P Blatchley (147) 247 posts |
The easiest way to find it and its documentation is in a distribution of !Zap. Look inside Code.Extensions.ExtAsm in the main application directory. |
Sprow (202) 1158 posts |
I suspect 90+% of desktop users have never used the Debugger module, at least not at the command line. How about having a slim version in the ROM which supports just the Debugger_Disassemble SWI (so Zap and StrongEd disassembly mode work) and the various MEMORY star commands. Because it’s in ROM you’d also know which instruction sets to support and code sequence warnings to enable – no need for Thumb 1 on a Cortex that can’t run that code anyway. Then, a developer (‘pro’) version supplied on disc which supports all the instruction sets and extra options. The disc version can afford to be much larger. |
W P Blatchley (147) 247 posts |
Found a proper link for DebuggerPlus here. |
Jeffrey Lee (213) 6048 posts |
A few of the things in Debugger Plus look good (e.g. using DCD for undefined instructions, quoting SWI names, HS & LO instead of CS & CC, etc.), so I’ll definitely try to incorporate those. But I don’t think I’ll add support for anything that involves detecting instruction sequences (e.g. ADRL, LDRL – it just doesn’t make sense to support them if there’s no way of supressing the output of the first instruction of the pair), and I don’t think I’ll add support for ADRW/LDRW/etc. (I don’t see how they’re any better than just showing R12 + offset) I suspect it won’t be possible to reuse the Debugger Plus SWI interface, so to avoid compatability issues I’ll make sure that the new Debugger SWIs don’t clash with the Debugger Plus SWIs (especially since we only need 19 more checkins before we hit Debugger 2.00 and we start getting version number conflicts). |
Steve Revill (20) 1361 posts |
I think technically, you should use DCI for undefined instructions. Also, did you spot Disassemble$Options in the standard Debugger module and the facility it provides to disassemble with APCS register names rather than ARM and/or replace R13/SP, R14/LR, etc? |
Jeffrey Lee (213) 6048 posts |
Yes, you’re probably right!
Yes – I’ll be keeping the current options and just expand Debugger$Options to cope with the new ones. |
Terje Slettebø (285) 275 posts |
That’s cool. As I understand, Ben Avison wrote the improved DecAof, so he might be able to help with any issues regarding the disassembler. Also, you might find it useful for testing to run the extASM test files (which cover all the ARMv7 instructions and their variations, except Thumb) with the new Debugger. I’ve tested all these files with DecAof, which produces the equivalent disassembly, so it should be a good test. I think it’s good that this will be implemented in C (or some other high-level language), as the more components of RISC OS we get implemented in high-level languages (or at least platform-independent ones), the easier it will make maintenance and porting. |
Jeffrey Lee (213) 6048 posts |
Thanks for reminding me about DecAof – it would probably be good if I checked the output against that. To start with I’ll probably just disassemble a couple of ROM images using the old & new debugger, and use that to make sure all the main features are correct (disassembly, formatting, address generation for PC-relative instructions, etc.) Then, assuming there aren’t any big differences in formatting, I can try setting up a testbed which will disassemble all instructions and compare them against DecAof. This should help track down any remaining formatting problems or issues with instructions being incorrectly classified. I’ve got a fair amount of faith that decgen will classify instructions correctly (the encoding files that it will use have already been machine-checked for ambiguities), but it’s always possible that there’s a bug that I haven’t found. Or alternatively this test might reveal a bug in DecAof – which is quite possible, since I think DecAof is still using a hand-crafted decoder. Or at least I can be fairly certain that it’s not using decgen, since until I made the new release on Saturday there were a couple of simple bugs that would have prevented most decoders from compiling ;) |
Steve Revill (20) 1361 posts |
On the subject of Ben’s work on DecAOF, you should look at this announcement and the associated wiki page. |
Jeffrey Lee (213) 6048 posts |
Just in case anyone’s wondering what happened to this – although I didn’t get a chance to start work on it over Christmas, I did start work on it this past weekend. The pace of development will most likely be governed by how often I have to keep halting my work on the Portable module while waiting for the Touch Book to charge/discharge. But if the progress I made this weekend is any indication then I’d expect to be able to produce the first working ARM-only disassembler sometime this week. Most likely it’ll be sometime next month when the code is in a releasable state (all instructions/instruction sets supported, no major disassembly errors). |
W P Blatchley (147) 247 posts |
I was wondering – but just figured you’d been too busy.
So you’re saying you’ll be developing a disassembler, in the ‘gaps’ between working on the Portable module, all in your spare time, and probably have something working within a week… Holy cow, Jeffrey! It’s like Jack Bauer makes regular mortals feel when they watch 24! Ahem! Anyway, looking forward to it…and good luck. Going dark now. (Sorry for the OT chatter to those who don’t like it!) |
Steve Revill (20) 1361 posts |
Keep it up – personally, I like our forums to have a bit of personality… |
Jeffrey Lee (213) 6048 posts |
I managed to produce the first working disassembler on tuesday, and spent last night focusing on creating a testbed app to allow me to compare the output against both the Debugger module and DecAOF. So the next stage is to add the coprocessor instructions and tweak the output formatting so I can compare the disassembly of real programs. Size-wise the disassembler is currently at about 95K, which is larger than DecAOF (about 70K, IIRC). Once I’ve added the Thumb and coprocessor instructions it wouldn’t surprise me if the whole thing is around 250-300K in size. There are a couple of bits I can tweak to reduce the size, but nothing that will produce any major savings, unless I was to completely change the way everything works. It would effectively involve replacing the machine-generated decoder with a hand-crafted one, which would obviously eliminate all the benefits of using a machine-generated decoder in the first place. Maybe I’ll be able to make some improvements to decgen to result in smaller code being produced, and maybe a newer versions of Norcroft will help (i.e. one that supports the UBFX and SBFX instructions), but for now I think we’ll be stuck with a rather bulky disassembler. |
Jeffrey Lee (213) 6048 posts |
Looks like I’ll need to do some more work on decgen to make this disassembler a reality. Although it coped fine with the ARM, FPA & VFP instruction sets, when I just tried adding the NEON instructions it all fell apart. Tree generation time went from a couple of minutes to over an hour, and the Wimpslot went from a couple of MB up to 31MB. And then it crashed while trying to shrink the tree. Apart from the bug that caused the crash, and a piece of code that’s in desperate need of optimisation (it’s the cause of most of the hour-long delay), I think the big problem is how decgen handles itself when it’s generating a subtree which has to detect the difference between just two different encodings (i.e. instructions). The current tree representation only allows each node to perform one test, so even if there are only two possible outcomes of the subtree (encoding A or encoding B) it often has to generate several nodes in order to correctly classify each input value. Before I added the NEON instructions I think it was generating a tree with around 17,000 nodes (which would then get shrunk to a few hundred) – but now it’s generated a tree containing 205,000 nodes! |
Steve Revill (20) 1361 posts |
Ouch. Is the tree generation a run-time step of the disassembler, or a step in the decgen process which produces the decoder engine of the disassembler? I think you’re saying it’s the latter so the performance issue isn’t as much of an issue but it still sounds like some major architectural problems there. :( In many ways, Ben has been having a similar kind of fun adding lots of stuff that does the opposite in objasm… And I just spent three or four days adding a clever bit of algorithm to objasm’s opcode identifying function only to find it saves just 2.5% of the build time for an OMAP ROM’s worth of assembler modules. Bummer. On the plus side, we’re hoping it won’t get any slower with support for all the weird and wonderful new instructions added. |
Jeffrey Lee (213) 6048 posts |
The latter.
Luckily it wasn’t too hard to solve most of the issues. The crash was a simple null pointer dereference, the hour-long wait was due to a simple algorithm which was implemented as O(N2) instead of O(NlogN), and the memory usage was solved by performing some of the tree optimisation steps while generating the tree instead of just at the end. So now it takes about 15 minutes and 10.5MB of RAM. 15 minutes is still quite a long build time, especially since the decision tree needs rebuilding as soon as you make any changes to the input files – even if those changes won’t affect the decision-making process. I’m hoping I’ll be able to get it back down to around the two/three minute mark by tweaking the code that decides what type of test to perform next (i.e. which node it should insert into the tree next). Basically at the moment it only scores each candidate node once it knows (roughly) which encodings each child of that node will have to match. But by working out the upper bounds of the node scoring function, it should be possible for me to make it score the node at several points during the evaluation process, discarding the candidate as soon as the upper bound of its potential score drops below the score of the current best candidate. |
Jeffrey Lee (213) 6048 posts |
I haven’t spent much time on this since March, but since BASIC, objasm & decaof now support most/all of the ARMv7 instrctions I figured it would make sense if I at lesat released a copy of what I’ve got so far. Over here is an archive containing a copy of the disassembler binary ‘dis2’ (it’s a simple CLI utility at the moment) and source. Note that to compile the source you’ll also need the latest version of decgen. The code can be compiled with either GCC or norcroft (just use ‘make GCC=TRUE’ for GCC). Norcroft is significantly faster and produces smaller code than GCC, so it’s probably best to stick to that if you’ve got it.
Unfortunately I wasn’t able to get this code working properly. As things are currently, it’ll take an Iyonix just under 10 minutes to process the encodings/action files and produce the 1.1MB source file containing the disassembler core. Some further notes:
Let me know if you’ve got any comments/bugs/feedback! |
Jeffrey Lee (213) 6048 posts |
I’m still slowly moving forwards with this. A few months ago I made a concerted effort to get things to a state where I can do a full comparison against the Debugger module & decaof to weed out any bugs and formatting discrepancies. This went quite well, with plenty of bugs being found in my disassembler. When I next look at this, if there’s nothing major that pops up in the disassembly comparison, it’s probably high time I sorted out Thumb support and started to try and plug the code into the Debugger module. If I can remember, I’ll probably upload a new test build sometime in the next few days (assuming people care about such things!). I suspect I’ve been sitting on some decgen bugfixes/improvements too. |
Jeffrey Lee (213) 6048 posts |
|
Jeffrey Lee (213) 6048 posts |
I’m not sure if I made it in time for the nightly build (the 1 hour offset on the CVS viewer is throwing me off), but I’ve now checked in the first version of the Debugger module that (partially) features a decgen-based, machine-generated disassembler. Disassembly of VFP/NEON instructions uses the new engine, while all other instructions continue to use the old engine. The VFP/NEON instructions are annotated with which VFP/NEON version they require, and are using UAL syntax, so will mostly match the mnemonics used with the BASIC assembler. Note that although the Debugger module is only using the new disassembler for VFP/NEON instructions, I have checked in the full sources for my ARM+VFP+NEON+FPA+XScale DSP disassembler, so we could easily switch over to the full disassembler engine if desired (although it does cost a fair bit more ROM space – and I haven’t yet convinced Sprow that it’s space well spent). A standalone version of the full disassembler engine can also be built for testing, and the testbed app to compare the output against the Debugger module and decaof is in there too. |
Steve Drain (222) 1620 posts |
I don’t know if it is relevant to the disassembler, but I noticed one error in the data you posted for me in the VFPtutorial thread: VNEG requires a suffix: VNEG.F32 or VNEG.F64I would eventually have mentioned it in the other thread. |
Jeffrey Lee (213) 6048 posts |
As I sit here wondering where on earth the correct place is to poke the Debugger sources so that it won’t consider “CPSID i, #mode” to be an undefined instruction, I figured it’s a reasonable excuse to bring this topic back to the fore again. Pros and cons of the new disassembler: Pros:
Cons:
Originally this post was going to say “Significantly larger code size”, but it seems that the size optimisation pass I did a while ago must have had a bigger effect than I remembered. Although as a disclaimer, the Debugger and dis2 builds I’m comparing are optimised for the Pi, so it’s possible e.g. IOMD may see a more significant size difference (and I haven’t calculated how much of the Debugger is actually the disassembler and how much is the other facilities) But to get to the point: What, if anything, do people feel is necessary for me to do in order for the new disassembler to be considered an acceptable replacement for the current one? |