VFP/SIMD Questions
Alan Peters (515) 51 posts |
In testing the VFP/SIMD assembler I’ve run into a few things where I could do with a bit of assistance. I’m referencing the ARM DDI0406B document for instruction encoding. The good news is that all instructions other than the ones below are working. Validation is still to be tested heavily with messy values such as @Align. I will then convert the library to assembler and add to the BASIC source. 1) VCVT (between floating point and fixed-point, VFP) – p894/895 This has the immediate value encoded rather unusually as [imm4,i]. DecAOF is giving some odd results such as F64.S16 d0,d0,#-15. 2) VCVT, VCVTR (between floating-point and integer, VFP) – p891 For the first 4 variations where the floating point data-type is second, the R option is encoded as op=0 when specified, and op=1 otherwise. DecAOF appears to be showing it the other way around. 3) VLDx p915+ and VTBL p1111 DecAOF is showing single registers without the {} brackets, and also register ranges such as D0-D3. Useful but not in the ARM (listed with the instruction) so I’m unsure if they should be supported in the BASIC assembler. 4) VEXT p911 The ARM manual states for sizes other than 8 the instruction is a pseudo instruction, which is fine, but the explanation on the treatment of the immediate isn’t very clear (to me at least). Is it trying to say that for sizes other than 8, x must be a multiple of 2,4, or 8, or that x needs multiplying by 2,4, or 8, for encoding? 5) Use of functions / expressions When there is more than one register type available, a method is needed for specifying the data-type when using a Function or Expression to return a register number. This is because BASIC is currently lacking double, quad, and long data-types, and in any event functions are not typed. As I much as I would like to add all those data-types and typed functions, that’s for another day! A couple of possibilities: a) VMOV.Q.Q FNreg1,FNreg2 I think I probably prefer option (b) as it’s close to the ARM syntax and readable. After the dot is any BASIC expression. |
Jeffrey Lee (213) 6048 posts |
What PDF reader are you using? In my experience the errata markup (i.e. all the corrections they’ve made to the initial version) only shows up when using Adobe’s reader. And without being able to see the markup properly there’s a good chance you’ll get the encodings wrong.
It’s saying that it must be multiplied by 2, 4, or 8 for encoding. So if the assembler sees “VEXT.16 D0,D1,#4” then it acts as if it saw “VEXT.8 D0,D1,#8”
Option B looks good to me. Regarding DecAOF disassembly being wrong – if you want I can send you a copy of my WIP new disassembler for the Debugger module. Admittedly I haven’t tested the VFP/NEON disassembly much, but it’ll at least give you a second opinion on things, and it should support all ARM instructions (no Thumb support yet). |
Alan Peters (515) 51 posts |
Sorry should have provided a bit more info – Reader X on Windows 7 x64, so yes I can see the alterations in the document.
Thanks! I think I’ll probably test the behaviour of the instructions as the next step. My encoding matches ARM when I verified it manually afterwards. An update to DDT would be very useful in due course. My test assembler script with every encoding option present might be of use to test DDT if you haven’t written the same thing yourself :-) |
Jeffrey Lee (213) 6048 posts |
Well I’m mainly interested in disassembly, so I’ve got a testbed set up which will create a binary containing a sequential set of opcodes. It then runs it through the different disassemblers, and diffs the output to look for errors/inconsistencies. At some point I’m hoping to let it run through the full 2^32 set of combinations, but before I can do that I need to finish tweaking the formatting so that the diff won’t throw up millions of false positives. |
Martin Bazley (331) 379 posts |
Wouldn’t such an exercise, by definition, produce a file much larger than 2GB? |
Steve Revill (20) 1361 posts |
Note: Ben is working on VFP and NEON support in objasm (and has been for some time) which also means that in doing so he’s spotted the odd minor bug in the related disassembly that he’d already added to DecAOF. Hopefully, we’lll be able to do a new Tools release in the next few months with all this stuff in it. |
Steve Revill (20) 1361 posts |
Depends upon whether the file represents a subset of the full range of possibilities, then a diff is performed, output recorded and the process continues with the next chunk… repeat for a day or so(!) and the end result should be a nice, small file of only the diffs – which may or may not represent actual errors in one or other of the disassemblers. Jeffery, I’d advise you to contact Ben – maybe we can slip you a pre-release of the latest work-in-progress DecAOF. |
Ben Avison (25) 445 posts |
I think decaof is correct here. The constants are encoded as values subtracted from 16, 32 or 64. Some bit patterns corresponding to invalid values are undefined instructions, and some encode different instructions, but others are merely unpredictable, in which case I think it’s reasonable to disassemble as a VCVT with a negative value for frac_bits.
That one’s a valid fault. In pre-UAL, the flag character was a ‘Z’ not an ‘R’ and set bit 7, not cleared it – obviously I missed the subtle distinction there. That’ll be fixed in the next version of decoaf, thanks.
ARM ARM section A7.2.5: the brackets are optional in lists consisting of a single extension register.
Think about what the instruction is actually doing – it’s just shuffling data. imm is effectively the scalar index of the first element to extract, so it your elements are twice as large, imm should be halved to have the same effect.
I suggested something similar (a) to ARM when I first realised this was a problem for BASIC back in 2009 – but having the suffix behave like the data type suffixes, so only one would be needed in many cases, and none at all in others. But they clearly haven’t considered it important enough to formalise it, so I guess we can do whatever we feel like. (b) isn’t a bad solution either, you could even extend it to add data type qualifiers to arguments rather than to opcodes – a bit like “extended notation” which is supported by armasm and the in-progress version of objasm. |
Alan Peters (515) 51 posts |
Yes there was a bug in my version that is now fixed.
Thought as much, thanks for clearing that one up.
I’ll certainly take a look at that. Every instruction encoding from my tables is now agreeing with the output from DecAOF which is encouraging. I’m currently testing validation as there are many subtle variations in the syntax. Hopefully I’ll have time to get stuck into the integration with the BASIC source this week. Many thanks for all of the assistance. |
Jeffrey Lee (213) 6048 posts |
Wouldn’t such an exercise, by definition, produce a file much larger than 2GB?Depends upon whether the file represents a subset of the full range of possibilities, then a diff is performed, output recorded and the process continues with the next chunk… repeat for a day or so(!) and the end result should be a nice, small file of only the diffs – which may or may not represent actual errors in one or other of the disassemblers. Yeah, I’d be processing the data in chunks rather than all in one go. Plus it should allow me to complete the process a bit quicker – with a mix of real machines and RPCEmu instances I could have about 13 machines all sat churning through the data in parallel. Plus there’s the obvious optimisation of not bothering to check all the condition codes, so in total there’d only be 2^29 different instructions (2^28 conditional instructions and 2^28 unconditional).
Yeah, I’ll get in touch when I find the time to work on the disassembler again. |
Alan Peters (515) 51 posts |
A slight amendment to the proposed syntax as Q. after a space in BASIC expands to QUIT which I’d forgotten. So for now I’ve implemented Q#FNreg1 as this sits rather nicely within existing syntax – even if it doesn’t look quite as nice. I’m starting to make some progress now that I’m used to ObjAsm and the BASIC source. The VFP/SIMD assembler source is in a new file so it won’t break any of the existing code. Indeed the only modification so far is to branch out of the existing Assembler when a ‘V’ is detected – branching back if nothing matches up, or to the final assembly part with the encoded word. Various new error messages are the only structural thing I still need to look at as I haven’t glued into the error handling code properly (and it looks a bit messy). The pattern tables which describe how to parse and encode every instruction are currently a bit large at 30K, but I’m sure I can shrink them with a bit of further optimising. At least the size of the code is low and it’s all structured and documented – a first for the BASIC source! :-) The only thing that puzzles me is that I can’t seem to easily debug the compiled BASIC module using DDT. Did I forget something (again!) or is there a better or different way of doing this? Currently I have a wide array of screen based debugging going on which is OK, it’s just a bit slower than tracing through the code. |
Alan Peters (515) 51 posts |
The VDUP instruction is a bit of a headache as firstly VDU is a token, and “VDUP.” will expand to VDUPRINT due to both keywords representing tokens. For now I’m using VDPL instead for the BASIC assembler as this avoids both tokens – not sure if anyone has any better ideas? The good news is the assembler is up and running. I’m running through a lot of tests and hope to post an Alpha version to the TBA blog in the near future. |
Alan Peters (515) 51 posts |
We are now at the stage of releasing a first test version. Is it OK to publish the BASIC source tree with amendments, compiled version for Cortex A8, plus a few extra set-up scripts and help files, in a TBAFS archive through the TBA website? This is so we can get some other people to carry out testing before it’s submitted to CVS and becomes part of the build. I’m just checking there aren’t licensing restrictions or other difficulties that we are not aware of… |
Alan Peters (515) 51 posts |
Perhaps I’ve answered my own question after much reading of this website :-) |
Alan Peters (515) 51 posts |
A question on VFPSupport that’s probably for Jeffrey as the author. Hopefully it’s a sensible question and I’m not missing something obvious. I can understand how to use VFPSupport (I think) within an application. It seems to work well and handles when a task is no longer active after exiting or crashing. However for TAG we are working within a module and a SWI interface. Am I right in thinking that the approach is to: 1) Create a TAG VFP Context not in app space on Module Init 2) The TAG SWI handler switches to the TAG VFP Context on entry and restores the previous context on exit 3) The same approach as for the SWI handler would be applied to any other entry point that needs to use VFP. This includes possibly IRQ handlers – I assume VFP Support is safe to use for this? Hope this makes some sense! |
Jeffrey Lee (213) 6048 posts |
Yes, that sounds about right.
Yes, using it from IRQ handlers is fine. However you’d probably want to use a seperate context from your main one, otherwise if your IRQ handler is entered while one of your SWIs is running you’d end up corrupting the SWIs registers. From a performance standpoint the best thing to do would be to use VFPSupport_FastAPI to allow you to call the context switch functions directly. If you aren’t interested in saving the contents of the registers between calls to your code then you can also create/destroy contexts on SWI/IRQ entry/exit instead of using one or two shared global contexts. However I wouldn’t advise storing the contexts on the SVC/IRQ stack at the moment, since they won’t be cleaned up properly when aborts/errors occur. |