RISC OS Open: Forum: A new disassembler for the Debugger module

Jun 1, 2016 10:30pm

Rick Murray (539) 13840 posts

What, if anything, do people feel is necessary for me to do in order for the new disassembler to be considered an acceptable replacement for the current one?

How much work would you like to make for yourself? ;-)

Okay, there was an enhanced disassembler in the 26 bit days. I think it came with Zap? Two useful things that it did:

It understood C style function prefixes so was able to annotate a BL to a function with a comment giving the function name; or note that it was a function branch (by spotting APCS entry?) if no name was available.
Better yet, it understood C jump tables so it was capable of annotating branches to the jump table with comments like “; sprintf” instead of branches to essentially random addresses.

Possibly more but I’m in bed reading this ‘cos I can’t sleep, so the grey matter is running in power saving mode while my peripheral devices are attempting to go into standby mode…

Jun 1, 2016 10:51pm

Jeffrey Lee (213) 6048 posts

Those two should be easy enough to support – there’s already code for that in the Debugger module, because the exception dump annotation code makes use of it :-)

Jun 2, 2016 12:55pm

Sprow (202) 1158 posts

As I sit here wondering where on earth the correct place is to poke the Debugger sources so that it won’t consider “CPSID i, #mode” to be an undefined instruction

I’m sure your search skills can’t really have had trouble locating the CPS decoding but I can’t get CPSID i,#mode to go wrong.

It’s one of the examples I tested 2 years ago, and I just tried it in objasm and that still works for me here. Admittedly it’s not trying all 256 possibilities, so maybe something else is wrong. How are you generating the instruction? If it’s relating to the CPS macros does that use objasm for CPS or building the bitfield and DCI? Is the mode change bit not set?

Pros/Cons … Significantly larger code size

According to objsize the breakdown is thus:

dis2_vfp 19572 + 6484
exc      8832  + 228
support  1552  + 64
util     5840  + 256
Debugger 27300 + 0 (data embedded)
       = 63096 + 7032

so it depends if you count “incremental extra weight on top of existing” or “percentage heavier than the original” or something else. I’ve tried looking at the action scripts but being a simple creature I find them a bit of a mystery, or rather too much hassle to be worried about looking into since they seem to work!

Jun 2, 2016 1:23pm

Jeffrey Lee (213) 6048 posts

It’s one of the examples I tested 2 years ago, and I just tried it in objasm and that still works for me here.

That’s because CPU mode 12 isn’t a valid mode.

If you use a valid CPU mode (i.e. 19 for SVC32) then it gets caught out by a block of code elsewhere because bits 4 and 7 are both set. This makes me wonder how many other unconditional instructions are getting caught out by the same trap. So the question then becomes one of whether I bodge in a fix just for CPS, or whether I refactor the entire handling of unconditional instructions so that they’re decoded according to the ARM ARM, or whether I replace the entire disassembler with something which is a bit more trustworthy.

I’ve tried looking at the action scripts but being a simple creature I find them a bit of a mystery, or rather too much hassle to be worried about looking into since they seem to work!

They’re fragments of C code, with a fair amount of macro use to take care of some of the common tasks (e.g. flagging instructions as being unpredictable if they meet certain requirements).

The real magic is the encoding files which describe the bit patterns for each instruction, along with any extra constraints that must be satisfied in order for it to be a successful match.

The documentation explains the format of both sets of files.

Jun 2, 2016 7:41pm

Jeffrey Lee (213) 6048 posts

Pros/Cons … Significantly larger code size

According to objsize the breakdown is thus:

After tweaking a few bits to allow the new disassembler to be used for all ARM instructions, I get the following for a softload build of the module:

Object file                      code-size   data-size   debug-size
o.Debugger                           19016           0           0
o.DebuggerMsgs                        6560           0           0
o.dis2_arm                           52040       11332           0
o.exc                                 8756         228           0
o.support                             1548          64           0
o.util                                9588         728           0
objsize: file o._dirs doesn't exist?

Total (of all files):          97508       12352           0

That’s a total of 109860. A softload build of the current disassembler gets a total of 76828, so there’s a difference of about 33k.

Jun 2, 2016 8:24pm

Rick Murray (539) 13840 posts

so there’s a difference of about 33k.

My last ROM build…

Starting phase join ...
romlinker: version 0.05 (07 Sep 2014)
romlinker: Standard ROM image construction
romlinker: Image has 667968 bytes spare (652.31K)
romlinker: crc16=a4cb

So I think I can spare 33K.

Now, we know the size difference. How about the usability difference?

Here’s a question. If I used a real variable in a BASIC program and assembled it with ABC, the resultant code is this:

000089E4 : #.†í : ED9C1123 : LDFS    F1,[R12,#140]
000089E8 : q..î : EE191171 : MRC     CP1,0,R1,C9,C1,3

Is this correct, or is the current disassembler failing to correctly decode the MRC to CP1?

Because, with the help of the ARM7500FE data sheet, I parse the instruction as:

xxxx 1110 000 1   1001 0001 0001 0111 0001

cond CPRT abc L/S eFn  Rd   0001 fgh1 iFm

abc L/S = 000 1 = FIX (Rd = Fm)

The only anomaly is the ‘e’ bit is supposed to be zero, but this might apply specifically to the ARM7500FE.

So I think

MRC CP1,0,R1,C9,C1,3

is supposed to be

FIXZ R1, F1

Jun 2, 2016 8:26pm

Steve Pampling (1551) 8170 posts

That’s a total of 109860. A softload build of the current disassembler gets a total of 76828, so there’s a difference of about 33k

You know if this was the last century and I was running it on the A500 I might¹ wonder at that.
On the RPC I’d be unconcerned and on any of this centuries machines I wouldn’t bat an eye.

¹ Then again, probably not.

Jun 2, 2016 8:27pm

Sprow (202) 1158 posts

It’s one of the examples I tested 2 years ago, and I just tried it in objasm and that still works for me here.

That’s because CPU mode 12 isn’t a valid mode.

For the purposes of testing the disassembler it seemed as good as any (I probably initially used 42 but it’s only a 5 bit field). 3 works (SVC26). &13 doesn’t, so yeah something has choked on bit 4 as you highlight.

So the question then becomes one of whether I … That’s a total of 109860.

You’ve clearly got the knives out for some reason on this code, for reasons I’ve not understood, because I know we’ve been here before when the VFP decoder was added. My feeling is spending 100k of a ROM budget on something (extremely useful but) used by only a small number of people, is 100k poorly spent.

If the existing ARM decoder is (say) 99% right, fixing the 1% would seem more sensible than replacing it with something 2x bigger (assuming ~15k of Debugger.o is the assembler – I’ve not looked) and potentially a different set of bugs. If we ignore the condition code there’s only 256k possible instructions and another 256k with the NV condition code, so comparing 512k lines of text would give a more objective assessment. It can’t be that bad, or rather I don’t remember seeing any mistakes.

Jun 2, 2016 8:49pm

Jeffrey Lee (213) 6048 posts

You’ve clearly got the knives out for some reason on this code, for reasons I’ve not understood

I spent a lot of time working on it, that’s why.

It’s good to see that the VFP/NEON disassembler is in there, but it’s disappointing that the ARM disassembler isn’t in there as well, because it does solve some very real maintenance issues with the code.

If we ignore the condition code there’s only 256k possible instructions and another 256k with the NV condition code, so comparing 512k lines of text would give a more objective assessment.

In that case you might want to try using the testbed that I wrote to compare the output of my disassembler against the Debugger and decaof.

Jun 2, 2016 8:55pm

Rick Murray (539) 13840 posts

Okay, I’ve pulled this apart some more.

xE191171 xxxx 1110 0001 1001 0001 0001 0111 0001
xE101171 xxxx 1110 0001 0000 0001 0001 0111 0001

         cond 1110 abcL eFn  Rd   0001 fgh1 iFm

It’s a weird encoding, setting ‘e’ to 1 and setting the otherwise unused Fn to 1. I’m guessing there is some function that builds FP instructions and it is including some unnecessary stuff.

The next test is to assemble it and run it.

DIM code% 255

FOR l% = 0 TO 2 STEP 2
  P% = code%
  [ OPT l%
    MVFS  F1, #5
    MVFS  F2, #5
    MUFS  F0, F1, F2  ; F0 = 5 * 5
    MUFS  F1, F0, F2  ; F1 = F0 (=25) * 5
    ;FIXZ  R1, F1      ; use F1, R1 so opcode same
    ;DCD   &EE101171   ; is FIXZ R1, F1
    DCD   &EE191171    ; is MRC CP1,0,R1,C9,C1,3 (!)

    MOV   R0, R1

    CMP   R0, #32     ; sanity check, filter Ctrl
    MOVLT R0, #46

    SWI   "XOS_WriteC"

    MOV   PC, R14

  ]
NEXT

SYS "XOS_File", 10, "$.ObjectX", &FF8,, code%, P%

END

When run, the result should be ‘}’ (ASCII 125) written to the screen. Pick whether to use the FIX instruction, the &EE10… (FIXZ) or the &EE19… words – the result is always the same.

I tried also using the value &EE1F1171 (setting all bits in the eFn) and it made no difference. As far as I can tell, the FPE doesn’t care about these four bits. [confirmed]

So the question is – if the FPE doesn’t, should the disassembler? Because FIXZ R1, F1 makes a heck of a lot more sense than the gibberish that is MRC CP1,0,R1,C9,C1,3! Perhaps it should be disassembled as a FIX with an “invalid bits set” warning?

Jun 2, 2016 9:01pm

Rick Murray (539) 13840 posts

Um…

That’s a total of 109860. A softload build of the current disassembler gets a total of 76828, so there’s a difference of about 33k.

My feeling is spending 100k of a ROM budget on something (extremely useful but) used by only a small number of people, is 100k poorly spent.

Surely new-debugger (107¼K) – old-debugger (75K) is 32¼K of ROM budget (compared to now)? Or would you plan to include both debuggers?

Jun 3, 2016 10:47am

Chris Evans (457) 1614 posts

Surely the ROM build size doesn’t matter for anything apart from the IOMD build!
What size is the physical ROM used for IOMD? Presumably 8MB or 4MB and it is compressed? (A compressed Pi ROM is about 2500K).
If it ever became a problem could it automatically be omitted from the IOMD build?
Even as a comparatively non-programmer I’d have thought it a worthwhile inclusion.

Jun 3, 2016 1:04pm

Fred Graute (114) 645 posts

Like others I don’t really mind the larger size of the debugger module. More important to me are completeness and consistency. Currently Debugger doesn’t decode all valid instructions (it outputs ‘Undefined instruction’). I assume though that the new debugger will handle those instructions so probably not a real issue.

What is an issue is consistency of instruction format. The V6V7tests, that Sprow linked to, shows some instructions in UAL format and presumably ObjAsm accepts that, eg LDRHTcc r1,[r2],r3.

The BASIC assembler however only accepts the pre-UAL format LDRccHT r1,[r2],r3. Currently Debugger outputs the instruction in this format also.

It would be helpful if the same instruction format (eg pre-UAL for all ARM instructions and UAL for VFP/NEON) was used by all applications/modules for a number of reasons:

Avoid pesky syntax errors when copying code between different applications.
Make it easier to provide syntax colouring for assembler source and disassembly. Having to support two notations for the same instruction is something I would prefer to avoid.
StrongED (and Zap) allow instructions in disassembly mode to be modified. This is done by constructing a small BASIC program to assemble the altered instruction. This works best if Debugger’s output is acceptable for the BASIC assembler.

Jun 3, 2016 1:28pm

Rick Murray (539) 13840 posts

Option bit – UAL or pre-UAL.

I’m rather see SWI than SVC. To me, SVC is a processor mode…

Jun 3, 2016 2:26pm

Clive Semmens (2335) 3276 posts

To ARM documentation it is, too. [Edit: memory like a sieve. SWI did indeed change to SVC with UAL – and I documented the change… I would have objected, but I silently deferred to senior engineers, what with being a tech author. I recall it grating at the time though.]

Jun 10, 2016 1:09pm

Sprow (202) 1158 posts

So the question is – if the FPE doesn’t, should the disassembler? Because FIXZ R1, F1 makes a heck of a lot more sense than the gibberish that is MRC CP1,0,R1,C9,C1,3! Perhaps it should be disassembled as a FIX with an “invalid bits set” warning?

This is how ARM gets to use previously unused bits of instruction space.

In the era of the 7500FE the FPA instruction FIXZ is defined and you’re told that some unused fields should be zero. The silicon implementing that instruction set might choose to be sloppy and ignore those bits, and it’d still be compliant with the architecture definition (which says they should be zero).

In a later silicon implementation, those bits now translate to a different instruction (MRC in this case). That’s fine, because it’d be executed before being sent to the undefined instruction vector (where FPEmulator would step in).

There are quite a few instructions now that recycle Rd=15 where it doesn’t make sense to target the PC, plus ARM clawed back the ‘NV’ condition code too for various instructions that didn’t need to be conditional (effectively doubling the number of ‘AL’ instructions possible).

Jun 10, 2016 5:39pm

Rick Murray (539) 13840 posts

In a later silicon implementation, those bits now translate to a different instruction (MRC in this case).

Isn’t that pretty much how ARM implements the FP? A bunch of co-processor instructions that get picked up by real silicon or something emulating it? CP0 and CP1 are the FPA.

MRC CP1,0,R1,C9,C1,3 is exactly that.
MRC CP1,0,R1,C0,C1,3, on the other hand, is FIXZ R1,F1, but it is still MRC, only now it has a different meaning too.

The silicon implementing that instruction set might choose to be sloppy and ignore those bits,

That’s fair enough. The real culprit here is ABC setting bits it shouldn’t, and this possibly ought to be looked into some time as “future proofing”.
That said, if it is a valid instruction to FPE (sloppy or not), the Debugger ought to show it as such, to be revised should such a modification to the instruction set occur.

plus ARM clawed back the ‘NV’ condition code too for various instructions that didn’t need to be conditional

I never did understand the purpose of the NV condition code; any single cycle non-destructive operation would suffice as a NOP (such as MOV R0, R0 that we use these days).

Jun 10, 2016 6:11pm

Rick Murray (539) 13840 posts

A bunch of co-processor instructions that get picked up by real silicon or something emulating it? CP0 and CP1 are the FPA.

Brief overview for anybody interested, it’s actually rather clever.

The FP instructions, if decoded back to ARM form (instead of FPE form) will appear to be one of the following:

CDP – request a co-processor to perform an instruction (independent of the ARM)
MRC – transfer a co-processor register to an ARM register
MCR – transfer an ARM register to a co-processor register
LDC – load data from memory into a co-processor register
STC – store data from a co-processor register to memory
There are ‘2’ suffixed versions of the above to extend available instructions

If there is FPA hardware, it will execute the instruction. There’s a document somewhere that explains the process, but it is complicated and the intricacies of the mechanism aren’t relevant to us.
If there is no FPA hardware, the ARM will raise an Undefined Instruction exception. This is where FPE comes in. It catches the exception, notices if it is something it should be dealing with, and if so, emulates the entire FP process before restoring the machine to a suitable state to continue processing.
It’s really clever, but suffers from the need to take an exception plus anywhere from a dozen to a couple of hundred ARM instructions for every single FP instruction executed.

One could, in theory, emulate VFP/NEON on a RiscPC using the same method – but that’d be a lot of work for practically zero gain.

But it does highlight that our development tools really ought to be offering means to “leverage”¹ the built in floating point hardware (as is practically omnipresent in modern ARM devices these days – hell, even the Cortex-M has options for FP!) in preference to an emulation of an FP system.

¹ Sorry, I’ll go sit in the corner.

Jun 10, 2016 6:36pm

Jeffrey Lee (213) 6048 posts

I never did understand the purpose of the NV condition code; any single cycle non-destructive operation would suffice as a NOP (such as MOV R0, R0 that we use these days).

I don’t think there was a purpose. They figured that 14 (conditional) condition codes was good enough, so didn’t add any extra logic to make the chip do something with NV (although the fact that it’s a NOP means that it is reasonably future-proof).

http://www.righto.com/2016/01/conditional-instructions-in-arm1.html

One could, in theory, emulate VFP/NEON on a RiscPC using the same method – but that’d be a lot of work for practically zero gain.

VFP yes, NEON no. NEON makes heavy use of the NV condition code space, so you’d only be able to emulate it on architectures where NV codes abort (e.g. ARMv5+. For ARMv4 I believe aborting was optional)

Of course you could also go the other route and build VFP silicon and attach that to the coprocessor bus, if the chip you’re targeting actually has one (modern ARMs don’t).

Jun 11, 2016 7:17am

David Feugey (2125) 2709 posts

One could, in theory, emulate VFP/NEON on a RiscPC using the same method – but that’d be a lot of work for practically zero gain.

Yes and no. We will be able to provide binaries optimized for VFP computers, and still compatible with old hardware.

Jun 13, 2016 9:04am

Chris Hall (132) 3554 posts

I naïvely thought that if you used FPE instructions, RISC OS was clever enough to use FPE hardware in place of pure emulation wherever possible – i.e. that the floating point emulator would use hundreds of ARM code instructions on hardware with no arithmetic capability, on the 7500 would use the instruction directly, but on modern hardware would use a few NEON or whatever instructions that would be the equivalent of the FPE instruction for the older hardware. Am I correct?

Jun 13, 2016 9:20am

David Feugey (2125) 2709 posts

Some FPEmulator versions did use FPA. But I don’t think it’s the case today with FPEmulator on VFP processors.

Jun 13, 2016 10:02am

Jeffrey Lee (213) 6048 posts

David is correct.

The 7500FE contains a cut-down version of the FPA hardware compared to the original ARM2 version (IIRC the trig instructions were removed?). So most instructions will execute natively and it’s only the removed instructions which need to be emulated.

There’s no version of FPEmulator which uses VFP or NEON instructions. It would be possible to create one, but the effort vs. reward ratio is unknown, and ultimately it will only encourage people to continue using code which has sub-optimal performance on modern machines. It would be much better to focus on producing a version of VFPSupport which can provide full VFP emulation for older machines, and to add VFP support to Norcroft.

Jun 13, 2016 11:22am

David Feugey (2125) 2709 posts

Basically, it would be much better to have both :)

Jun 13, 2016 12:10pm

Rick Murray (539) 13840 posts

it will only encourage people to continue using code which has sub-optimal performance on modern machines

Hmm, you have identified the need to add VFP support to the compiler; however I would like to take issue with the above statement. Those who are hand coding FP instructions for speed probably already use VFP on modern hardware.
It’s the compiler that’ll be the biggest user of old FPE, and sort of writing our own veneers – what and how it implements “float” is mostly out of our hands…..

A new disassembler for the Debugger module

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options