A new disassembler for the Debugger module
Rick Murray (539) 13840 posts |
How much work would you like to make for yourself? ;-) Okay, there was an enhanced disassembler in the 26 bit days. I think it came with Zap? Two useful things that it did:
Possibly more but I’m in bed reading this ‘cos I can’t sleep, so the grey matter is running in power saving mode while my peripheral devices are attempting to go into standby mode… |
Jeffrey Lee (213) 6048 posts |
Those two should be easy enough to support – there’s already code for that in the Debugger module, because the exception dump annotation code makes use of it :-) |
Sprow (202) 1158 posts |
I’m sure your search skills can’t really have had trouble locating the CPS decoding but I can’t get CPSID i,#mode to go wrong. It’s one of the examples I tested 2 years ago, and I just tried it in objasm and that still works for me here. Admittedly it’s not trying all 256 possibilities, so maybe something else is wrong. How are you generating the instruction? If it’s relating to the CPS macros does that use objasm for CPS or building the bitfield and DCI? Is the mode change bit not set?
According to objsize the breakdown is thus:
so it depends if you count “incremental extra weight on top of existing” or “percentage heavier than the original” or something else. I’ve tried looking at the action scripts but being a simple creature I find them a bit of a mystery, or rather too much hassle to be worried about looking into since they seem to work! |
Jeffrey Lee (213) 6048 posts |
That’s because CPU mode 12 isn’t a valid mode. If you use a valid CPU mode (i.e. 19 for SVC32) then it gets caught out by a block of code elsewhere because bits 4 and 7 are both set. This makes me wonder how many other unconditional instructions are getting caught out by the same trap. So the question then becomes one of whether I bodge in a fix just for CPS, or whether I refactor the entire handling of unconditional instructions so that they’re decoded according to the ARM ARM, or whether I replace the entire disassembler with something which is a bit more trustworthy.
They’re fragments of C code, with a fair amount of macro use to take care of some of the common tasks (e.g. flagging instructions as being unpredictable if they meet certain requirements). The real magic is the encoding files which describe the bit patterns for each instruction, along with any extra constraints that must be satisfied in order for it to be a successful match. The documentation explains the format of both sets of files. |
Jeffrey Lee (213) 6048 posts |
Pros/Cons … Significantly larger code size After tweaking a few bits to allow the new disassembler to be used for all ARM instructions, I get the following for a softload build of the module: Object file code-size data-size debug-size o.Debugger 19016 0 0 o.DebuggerMsgs 6560 0 0 o.dis2_arm 52040 11332 0 o.exc 8756 228 0 o.support 1548 64 0 o.util 9588 728 0 objsize: file o._dirs doesn't exist? Total (of all files): 97508 12352 0 That’s a total of 109860. A softload build of the current disassembler gets a total of 76828, so there’s a difference of about 33k. |
Rick Murray (539) 13840 posts |
My last ROM build… Starting phase join ... romlinker: version 0.05 (07 Sep 2014) romlinker: Standard ROM image construction romlinker: Image has 667968 bytes spare (652.31K) romlinker: crc16=a4cb So I think I can spare 33K. Now, we know the size difference. How about the usability difference? Here’s a question. If I used a real variable in a BASIC program and assembled it with ABC, the resultant code is this: 000089E4 : #.†í : ED9C1123 : LDFS F1,[R12,#140] 000089E8 : q..î : EE191171 : MRC CP1,0,R1,C9,C1,3 Is this correct, or is the current disassembler failing to correctly decode the MRC to CP1? Because, with the help of the ARM7500FE data sheet, I parse the instruction as: xxxx 1110 000 1 1001 0001 0001 0111 0001 cond CPRT abc L/S eFn Rd 0001 fgh1 iFm abc L/S = 000 1 = FIX (Rd = Fm) The only anomaly is the ‘e’ bit is supposed to be zero, but this might apply specifically to the ARM7500FE. So I think MRC CP1,0,R1,C9,C1,3 is supposed to be FIXZ R1, F1
|
Steve Pampling (1551) 8170 posts |
You know if this was the last century and I was running it on the A500 I might1 wonder at that. 1 Then again, probably not. |
Sprow (202) 1158 posts |
It’s one of the examples I tested 2 years ago, and I just tried it in objasm and that still works for me here. For the purposes of testing the disassembler it seemed as good as any (I probably initially used 42 but it’s only a 5 bit field). 3 works (SVC26). &13 doesn’t, so yeah something has choked on bit 4 as you highlight.
You’ve clearly got the knives out for some reason on this code, for reasons I’ve not understood, because I know we’ve been here before when the VFP decoder was added. My feeling is spending 100k of a ROM budget on something (extremely useful but) used by only a small number of people, is 100k poorly spent. If the existing ARM decoder is (say) 99% right, fixing the 1% would seem more sensible than replacing it with something 2x bigger (assuming ~15k of Debugger.o is the assembler – I’ve not looked) and potentially a different set of bugs. If we ignore the condition code there’s only 256k possible instructions and another 256k with the NV condition code, so comparing 512k lines of text would give a more objective assessment. It can’t be that bad, or rather I don’t remember seeing any mistakes. |
Jeffrey Lee (213) 6048 posts |
I spent a lot of time working on it, that’s why. It’s good to see that the VFP/NEON disassembler is in there, but it’s disappointing that the ARM disassembler isn’t in there as well, because it does solve some very real maintenance issues with the code.
In that case you might want to try using the testbed that I wrote to compare the output of my disassembler against the Debugger and decaof. |
Rick Murray (539) 13840 posts |
Okay, I’ve pulled this apart some more. xE191171 xxxx 1110 0001 1001 0001 0001 0111 0001 xE101171 xxxx 1110 0001 0000 0001 0001 0111 0001 cond 1110 abcL eFn Rd 0001 fgh1 iFm It’s a weird encoding, setting ‘e’ to 1 and setting the otherwise unused Fn to 1. I’m guessing there is some function that builds FP instructions and it is including some unnecessary stuff. The next test is to assemble it and run it.
When run, the result should be ‘}’ (ASCII 125) written to the screen. Pick whether to use the FIX instruction, the &EE10… (FIXZ) or the &EE19… words – the result is always the same. I tried also using the value &EE1F1171 (setting all bits in the eFn) and it made no difference. As far as I can tell, the FPE doesn’t care about these four bits. [confirmed] So the question is – if the FPE doesn’t, should the disassembler? Because |
Rick Murray (539) 13840 posts |
Um…
Surely new-debugger (107¼K) – old-debugger (75K) is 32¼K of ROM budget (compared to now)? Or would you plan to include both debuggers? |
Chris Evans (457) 1614 posts |
Surely the ROM build size doesn’t matter for anything apart from the IOMD build! |
Fred Graute (114) 645 posts |
Like others I don’t really mind the larger size of the debugger module. More important to me are completeness and consistency. Currently Debugger doesn’t decode all valid instructions (it outputs ‘Undefined instruction’). I assume though that the new debugger will handle those instructions so probably not a real issue. What is an issue is consistency of instruction format. The V6V7tests, that Sprow linked to, shows some instructions in UAL format and presumably ObjAsm accepts that, eg The BASIC assembler however only accepts the pre-UAL format It would be helpful if the same instruction format (eg pre-UAL for all ARM instructions and UAL for VFP/NEON) was used by all applications/modules for a number of reasons:
|
Rick Murray (539) 13840 posts |
Option bit – UAL or pre-UAL. I’m rather see SWI than SVC. To me, SVC is a processor mode… |
Clive Semmens (2335) 3276 posts |
To ARM documentation it is, too. [Edit: memory like a sieve. SWI did indeed change to SVC with UAL – and I documented the change… I would have objected, but I silently deferred to senior engineers, what with being a tech author. I recall it grating at the time though.] |
Sprow (202) 1158 posts |
This is how ARM gets to use previously unused bits of instruction space. In the era of the 7500FE the FPA instruction FIXZ is defined and you’re told that some unused fields should be zero. The silicon implementing that instruction set might choose to be sloppy and ignore those bits, and it’d still be compliant with the architecture definition (which says they should be zero). In a later silicon implementation, those bits now translate to a different instruction (MRC in this case). That’s fine, because it’d be executed before being sent to the undefined instruction vector (where FPEmulator would step in). There are quite a few instructions now that recycle Rd=15 where it doesn’t make sense to target the PC, plus ARM clawed back the ‘NV’ condition code too for various instructions that didn’t need to be conditional (effectively doubling the number of ‘AL’ instructions possible). |
Rick Murray (539) 13840 posts |
Isn’t that pretty much how ARM implements the FP? A bunch of co-processor instructions that get picked up by real silicon or something emulating it? CP0 and CP1 are the FPA.
That’s fair enough. The real culprit here is ABC setting bits it shouldn’t, and this possibly ought to be looked into some time as “future proofing”.
I never did understand the purpose of the NV condition code; any single cycle non-destructive operation would suffice as a NOP (such as |
Rick Murray (539) 13840 posts |
Brief overview for anybody interested, it’s actually rather clever. The FP instructions, if decoded back to ARM form (instead of FPE form) will appear to be one of the following:
If there is FPA hardware, it will execute the instruction. There’s a document somewhere that explains the process, but it is complicated and the intricacies of the mechanism aren’t relevant to us. One could, in theory, emulate VFP/NEON on a RiscPC using the same method – but that’d be a lot of work for practically zero gain. But it does highlight that our development tools really ought to be offering means to “leverage”1 the built in floating point hardware (as is practically omnipresent in modern ARM devices these days – hell, even the Cortex-M has options for FP!) in preference to an emulation of an FP system. 1 Sorry, I’ll go sit in the corner. |
Jeffrey Lee (213) 6048 posts |
I don’t think there was a purpose. They figured that 14 (conditional) condition codes was good enough, so didn’t add any extra logic to make the chip do something with NV (although the fact that it’s a NOP means that it is reasonably future-proof). http://www.righto.com/2016/01/conditional-instructions-in-arm1.html
VFP yes, NEON no. NEON makes heavy use of the NV condition code space, so you’d only be able to emulate it on architectures where NV codes abort (e.g. ARMv5+. For ARMv4 I believe aborting was optional) Of course you could also go the other route and build VFP silicon and attach that to the coprocessor bus, if the chip you’re targeting actually has one (modern ARMs don’t). |
David Feugey (2125) 2709 posts |
Yes and no. We will be able to provide binaries optimized for VFP computers, and still compatible with old hardware. |
Chris Hall (132) 3554 posts |
I naïvely thought that if you used FPE instructions, RISC OS was clever enough to use FPE hardware in place of pure emulation wherever possible – i.e. that the floating point emulator would use hundreds of ARM code instructions on hardware with no arithmetic capability, on the 7500 would use the instruction directly, but on modern hardware would use a few NEON or whatever instructions that would be the equivalent of the FPE instruction for the older hardware. Am I correct? |
David Feugey (2125) 2709 posts |
Some FPEmulator versions did use FPA. But I don’t think it’s the case today with FPEmulator on VFP processors. |
Jeffrey Lee (213) 6048 posts |
David is correct. The 7500FE contains a cut-down version of the FPA hardware compared to the original ARM2 version (IIRC the trig instructions were removed?). So most instructions will execute natively and it’s only the removed instructions which need to be emulated. There’s no version of FPEmulator which uses VFP or NEON instructions. It would be possible to create one, but the effort vs. reward ratio is unknown, and ultimately it will only encourage people to continue using code which has sub-optimal performance on modern machines. It would be much better to focus on producing a version of VFPSupport which can provide full VFP emulation for older machines, and to add VFP support to Norcroft. |
David Feugey (2125) 2709 posts |
Basically, it would be much better to have both :) |
Rick Murray (539) 13840 posts |
Hmm, you have identified the need to add VFP support to the compiler; however I would like to take issue with the above statement. Those who are hand coding FP instructions for speed probably already use VFP on modern hardware. |