VFP/SIMD Questions

16 posts, 5 voices

Jun 10, 2011 1:56pm Alan Peters (515) 51 posts	In testing the VFP/SIMD assembler I’ve run into a few things where I could do with a bit of assistance. I’m referencing the ARM DDI0406B document for instruction encoding. The good news is that all instructions other than the ones below are working. Validation is still to be tested heavily with messy values such as @Align. I will then convert the library to assembler and add to the BASIC source. 1) VCVT (between floating point and fixed-point, VFP) – p894/895 This has the immediate value encoded rather unusually as [imm4,i]. DecAOF is giving some odd results such as F64.S16 d0,d0,#-15. 2) VCVT, VCVTR (between floating-point and integer, VFP) – p891 For the first 4 variations where the floating point data-type is second, the R option is encoded as op=0 when specified, and op=1 otherwise. DecAOF appears to be showing it the other way around. 3) VLDx p915+ and VTBL p1111 DecAOF is showing single registers without the {} brackets, and also register ranges such as D0-D3. Useful but not in the ARM (listed with the instruction) so I’m unsure if they should be supported in the BASIC assembler. 4) VEXT p911 The ARM manual states for sizes other than 8 the instruction is a pseudo instruction, which is fine, but the explanation on the treatment of the immediate isn’t very clear (to me at least). Is it trying to say that for sizes other than 8, x must be a multiple of 2,4, or 8, or that x needs multiplying by 2,4, or 8, for encoding? 5) Use of functions / expressions When there is more than one register type available, a method is needed for specifying the data-type when using a Function or Expression to return a register number. This is because BASIC is currently lacking double, quad, and long data-types, and in any event functions are not typed. As I much as I would like to add all those data-types and typed functions, that’s for another day! A couple of possibilities: a) VMOV.Q.Q FNreg1,FNreg2 b) VMOV Q.FNreg1, Q.FNreg2 I think I probably prefer option (b) as it’s close to the ARM syntax and readable. After the dot is any BASIC expression.

Jun 10, 2011 3:54pm Jeffrey Lee (213) 6048 posts	I’m referencing the ARM DDI0406B document for instruction encoding. What PDF reader are you using? In my experience the errata markup (i.e. all the corrections they’ve made to the initial version) only shows up when using Adobe’s reader. And without being able to see the markup properly there’s a good chance you’ll get the encodings wrong. Is it trying to say that for sizes other than 8, x must be a multiple of 2,4, or 8, or that x needs multiplied by 2,4, or 8, for encoding? It’s saying that it must be multiplied by 2, 4, or 8 for encoding. So if the assembler sees “VEXT.16 D0,D1,#4” then it acts as if it saw “VEXT.8 D0,D1,#8” 5) Use of functions / expressions Option B looks good to me. Regarding DecAOF disassembly being wrong – if you want I can send you a copy of my WIP new disassembler for the Debugger module. Admittedly I haven’t tested the VFP/NEON disassembly much, but it’ll at least give you a second opinion on things, and it should support all ARM instructions (no Thumb support yet).

Jun 10, 2011 4:19pm Alan Peters (515) 51 posts	What PDF reader are you using? Sorry should have provided a bit more info – Reader X on Windows 7 x64, so yes I can see the alterations in the document. It’s saying that it must be multiplied by 2, 4, or 8 for encoding Thanks! I think I’ll probably test the behaviour of the instructions as the next step. My encoding matches ARM when I verified it manually afterwards. An update to DDT would be very useful in due course. My test assembler script with every encoding option present might be of use to test DDT if you haven’t written the same thing yourself :-)

Jun 10, 2011 4:58pm Jeffrey Lee (213) 6048 posts	My test assembler script with every encoding option present might be of use to test DDT if you haven’t written the same thing yourself :-) Well I’m mainly interested in disassembly, so I’ve got a testbed set up which will create a binary containing a sequential set of opcodes. It then runs it through the different disassemblers, and diffs the output to look for errors/inconsistencies. At some point I’m hoping to let it run through the full 2^32 set of combinations, but before I can do that I need to finish tweaking the formatting so that the diff won’t throw up millions of false positives.

Jun 10, 2011 7:15pm Martin Bazley (331) 379 posts	At some point I’m hoping to let it run through the full 2^32 set of combinations, but before I can do that I need to finish tweaking the formatting so that the diff won’t throw up millions of false positives. Wouldn’t such an exercise, by definition, produce a file much larger than 2GB?

Jun 11, 2011 9:18pm Steve Revill (20) 1361 posts	Note: Ben is working on VFP and NEON support in objasm (and has been for some time) which also means that in doing so he’s spotted the odd minor bug in the related disassembly that he’d already added to DecAOF. Hopefully, we’lll be able to do a new Tools release in the next few months with all this stuff in it.

Jun 11, 2011 9:21pm Steve Revill (20) 1361 posts	Wouldn’t such an exercise, by definition, produce a file much larger than 2GB? Depends upon whether the file represents a subset of the full range of possibilities, then a diff is performed, output recorded and the process continues with the next chunk… repeat for a day or so(!) and the end result should be a nice, small file of only the diffs – which may or may not represent actual errors in one or other of the disassemblers. Jeffery, I’d advise you to contact Ben – maybe we can slip you a pre-release of the latest work-in-progress DecAOF.

Jun 12, 2011 10:55am Ben Avison (25) 445 posts	1) VCVT (between floating point and fixed-point, VFP) – p894/895 I think decaof is correct here. The constants are encoded as values subtracted from 16, 32 or 64. Some bit patterns corresponding to invalid values are undefined instructions, and some encode different instructions, but others are merely unpredictable, in which case I think it’s reasonable to disassemble as a VCVT with a negative value for frac_bits. 2) VCVT, VCVTR (between floating-point and integer, VFP) – p891 That one’s a valid fault. In pre-UAL, the flag character was a ‘Z’ not an ‘R’ and set bit 7, not cleared it – obviously I missed the subtle distinction there. That’ll be fixed in the next version of decoaf, thanks. 3) VLDx p915+ and VTBL p1111 ARM ARM section A7.2.5: the brackets are optional in lists consisting of a single extension register. 4) VEXT p911 Think about what the instruction is actually doing – it’s just shuffling data. imm is effectively the scalar index of the first element to extract, so it your elements are twice as large, imm should be halved to have the same effect. 5) Use of functions / expressions I suggested something similar (a) to ARM when I first realised this was a problem for BASIC back in 2009 – but having the suffix behave like the data type suffixes, so only one would be needed in many cases, and none at all in others. But they clearly haven’t considered it important enough to formalise it, so I guess we can do whatever we feel like. (b) isn’t a bad solution either, you could even extend it to add data type qualifiers to arguments rather than to opcodes – a bit like “extended notation” which is supported by armasm and the in-progress version of objasm.

Jun 12, 2011 4:52pm Alan Peters (515) 51 posts	1) I think decaof is correct here Yes there was a bug in my version that is now fixed. 4) imm is effectively the scalar index of the first element to extract Thought as much, thanks for clearing that one up. 5) a bit like “extended notation” I’ll certainly take a look at that. Every instruction encoding from my tables is now agreeing with the output from DecAOF which is encouraging. I’m currently testing validation as there are many subtle variations in the syntax. Hopefully I’ll have time to get stuck into the integration with the BASIC source this week. Many thanks for all of the assistance.

Jun 13, 2011 12:58pm Jeffrey Lee (213) 6048 posts	Wouldn’t such an exercise, by definition, produce a file much larger than 2GB? Depends upon whether the file represents a subset of the full range of possibilities, then a diff is performed, output recorded and the process continues with the next chunk… repeat for a day or so(!) and the end result should be a nice, small file of only the diffs – which may or may not represent actual errors in one or other of the disassemblers. Yeah, I’d be processing the data in chunks rather than all in one go. Plus it should allow me to complete the process a bit quicker – with a mix of real machines and RPCEmu instances I could have about 13 machines all sat churning through the data in parallel. Plus there’s the obvious optimisation of not bothering to check all the condition codes, so in total there’d only be 2^29 different instructions (2^28 conditional instructions and 2^28 unconditional). Jeffery, I’d advise you to contact Ben – maybe we can slip you a pre-release of the latest work-in-progress DecAOF. Yeah, I’ll get in touch when I find the time to work on the disassembler again.

Jun 29, 2011 2:25pm Alan Peters (515) 51 posts	b) VMOV Q.FNreg1, Q.FNreg2 A slight amendment to the proposed syntax as Q. after a space in BASIC expands to QUIT which I’d forgotten. So for now I’ve implemented Q#FNreg1 as this sits rather nicely within existing syntax – even if it doesn’t look quite as nice. I’m starting to make some progress now that I’m used to ObjAsm and the BASIC source. The VFP/SIMD assembler source is in a new file so it won’t break any of the existing code. Indeed the only modification so far is to branch out of the existing Assembler when a ‘V’ is detected – branching back if nothing matches up, or to the final assembly part with the encoded word. Various new error messages are the only structural thing I still need to look at as I haven’t glued into the error handling code properly (and it looks a bit messy). The pattern tables which describe how to parse and encode every instruction are currently a bit large at 30K, but I’m sure I can shrink them with a bit of further optimising. At least the size of the code is low and it’s all structured and documented – a first for the BASIC source! :-) The only thing that puzzles me is that I can’t seem to easily debug the compiled BASIC module using DDT. Did I forget something (again!) or is there a better or different way of doing this? Currently I have a wide array of screen based debugging going on which is OK, it’s just a bit slower than tracing through the code.

Jul 3, 2011 6:24pm Alan Peters (515) 51 posts	The VDUP instruction is a bit of a headache as firstly VDU is a token, and “VDUP.” will expand to VDUPRINT due to both keywords representing tokens. For now I’m using VDPL instead for the BASIC assembler as this avoids both tokens – not sure if anyone has any better ideas? The good news is the assembler is up and running. I’m running through a lot of tests and hope to post an Alpha version to the TBA blog in the near future.

Jul 5, 2011 9:20pm Alan Peters (515) 51 posts	We are now at the stage of releasing a first test version. Is it OK to publish the BASIC source tree with amendments, compiled version for Cortex A8, plus a few extra set-up scripts and help files, in a TBAFS archive through the TBA website? This is so we can get some other people to carry out testing before it’s submitted to CVS and becomes part of the build. I’m just checking there aren’t licensing restrictions or other difficulties that we are not aware of…

Jul 6, 2011 12:02am Alan Peters (515) 51 posts	Whether you’re developing commercial or non-commercial software, you are free to publish the shared RISC OS sources. Publishing your source code usually means you make code available on a Web site, either for its own sake or included in an archive along with your own software. The Web site does not have to be yours – for example, you may choose to use one of the many public source repository sites. Perhaps I’ve answered my own question after much reading of this website :-)

Jul 6, 2011 11:21pm Alan Peters (515) 51 posts	A question on VFPSupport that’s probably for Jeffrey as the author. Hopefully it’s a sensible question and I’m not missing something obvious. I can understand how to use VFPSupport (I think) within an application. It seems to work well and handles when a task is no longer active after exiting or crashing. However for TAG we are working within a module and a SWI interface. Am I right in thinking that the approach is to: 1) Create a TAG VFP Context not in app space on Module Init 2) The TAG SWI handler switches to the TAG VFP Context on entry and restores the previous context on exit 3) The same approach as for the SWI handler would be applied to any other entry point that needs to use VFP. This includes possibly IRQ handlers – I assume VFP Support is safe to use for this? Hope this makes some sense!

Jul 7, 2011 1:16am Jeffrey Lee (213) 6048 posts	Yes, that sounds about right. This includes possibly IRQ handlers – I assume VFP Support is safe to use for this? Yes, using it from IRQ handlers is fine. However you’d probably want to use a seperate context from your main one, otherwise if your IRQ handler is entered while one of your SWIs is running you’d end up corrupting the SWIs registers. From a performance standpoint the best thing to do would be to use VFPSupport_FastAPI to allow you to call the context switch functions directly. If you aren’t interested in saving the contents of the registers between calls to your code then you can also create/destroy contexts on SWI/IRQ entry/exit instead of using one or two shared global contexts. However I wouldn’t advise storing the contexts on the SVC/IRQ stack at the moment, since they won’t be cleaned up properly when aborts/errors occur.

Reply

To post replies, please first log in.

Forums → General →

VFP/SIMD Questions

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options