Safeguarding the past, present and future of RISC OS for everyone

News | Downloads | Bugs | Bounties | Forums | Library

Forums → Code review →

VFP/NEON assembler improvements

10 posts, 4 voices

Mar 1, 2012 2:34am Jeffrey Lee (213) 6048 posts	Today I’ve been working on fixing a couple of bugs and implementing a couple of new features in the VFP/NEON assembler. However, a couple of bits could do with some feedback: Kuemmel spotted that DCFD wasn’t working properly. This is because it stores the words in big-endian order (as used by FPA) instead of little-endian order (as used by VFP). What are peoples thoughts on the best way of solving this? Solutions I can think of would be to add a new OPT flag to indicate the endianness, or we could just add a new directive (e.g. DCVD). For reference, objasm decides on the endianness of DCFD by looking at the FPU type that was set using the —apcs parameter. With the present version of the assembler, there’s no way to directly specify 64bit int or 32/64bit float immediate values (e.g. as used by VMOV). However, you can specify the 8bit encoded value directly via the “#I64.<n>”, “#F32.<n>”, etc. syntax. If one of these special constants is used then it effectively ignores the data type that was sepcified in the instruction suffix. This is a bit nasty and un-intuitive, so in my local copy I’ve removed that functionality and replaced it with the following: For .I64 data types, the constant can be: A numeric expression, which will be converted to a 32bit integer, and then zero-extended to 64 bits. However there’s only a handful of values available if you specify the constant this way, so I’m open to other ideas (e.g. sign-extend, or duplicate the 32bit value into both words). Or an expression that evaluates to a string containing a 64bit unsigned integer. This form allows the full range of 64bit constants to be used. Currently the string is parsed using the 64bit version of OS_ReadUnsigned, so the number can be in any base from 2-36. For .F32 and .F64 data types, the constant must be a numeric expression. This expression will be converted to a single precision float, have the lower bits of the mantissa cleared, and then sent through the main constant encoding logic to try and find a suitable way of representing it within the instruction. Note that the range of constants available for .F32 and .F64 are identical, so there’s no harm in the assembler only dealing with the constant in single precision mode. The rounding code means that unless you specify a number that has an out-of-range mantissa, you’ll get a constant that at least roughly represents it. However the rounding is quite excessive (e.g. PI gets rounded to 3.125), and I can see it causing lots of hidden bugs in peoples code. So I think I’ll get rid of it and instead require that the constant is one which can be expressed exactly by the instruction. Any votes for/against? If I remove the rounding code I could probably make it so that it spits out a friendly error message showing what the rounded value would be, so the programmer can choose to use the rounded value if he wishes. Are there any other aspects of the assembler that people would like to see improved? Or any bugs they’ve spotted?

Mar 1, 2012 5:00pm Kuemmel (439) 384 posts	…regarding the DCFD issue: I’m open for anything if an automatic detection isn’t possible or too much hassle. If there will be a new directive I think it would be best to have 2 new directives (for single and double precision) that are good for VFP/Neon, so may be like DCVD and DCVS…so that they can be easily remembered. Question is for me also how to convert from FPA to VFP/Neon for double precision ? At the moment I have some code that uses a ATN-function as this is not implemented in VFP/Neon and I want to pass the result back to VFP/Neon. How would that be achieved ? Is there some supporting instruction for that (I didn’t check…) ?

Mar 1, 2012 5:20pm WPB (1391) 352 posts	DCFD wasn’t working properly. This is because it stores the words in big-endian order (as used by FPA) instead of little-endian order (as used by VFP). What are peoples thoughts on the best way of solving this? Solutions I can think of would be to add a new OPT flag to indicate the endianness, or we could just add a new directive (e.g. DCVD). Think I’d vote for the OPT flag. Adding a new directive (a pseudo instruction, I presume you mean?) might clash with a real ARM instruction in the future, and is less obvious to people who are already familiar with the instruction set from other assemblers. I guess another option would be to add a suffix to the instruction like DCFD_le or DCFD_be? So I think I’ll get rid of it and instead require that the constant is one which can be expressed exactly by the instruction. Any votes for/against? I think it’d definitely be safer to remove this, but your idea of an error message that gives the closest equivalent is a good one. I’d go for that. Are there any other aspects of the assembler that people would like to see improved? Or any bugs they’ve spotted? Haven’t even managed to find time to try it yet! Wish I could… Anyway, keep up the good work ;)

Mar 1, 2012 5:39pm Jeffrey Lee (213) 6048 posts	If there will be a new directive I think it would be best to have 2 new directives (for single and double precision) that are good for VFP/Neon, so may be like DCVD and DCVS…so that they can be easily remembered. The single precision format is the same, so there’s no particular need for a DCVS directive. Question is for me also how to convert from FPA to VFP/Neon for double precision ? At the moment I have some code that uses a ATN-function as this is not implemented in VFP/Neon and I want to pass the result back to VFP/Neon. How would that be achieved ? Is there some supporting instruction for that (I didn’t check…) ? To convert between the two formats all you need to do is swap the upper and lower words. I think there are a couple of ways you could do it, but the easiest is probably to use VMOV to transfer the value to ARM registers, but with the registers in opposite order, e.g. “VMOV R1,R0,D0”. Then use the appropriate FPA instruction (I don’t have a reference handy!) to load from R0,R1. Adding a new directive (a pseudo instruction, I presume you mean?) No, a directive. DCD, DCFS, DCFD, EQUD, EQUS, etc. aren’t assembled to instructions, all they do is allow you to insert data values.

Mar 1, 2012 6:03pm WPB (1391) 352 posts	No, a directive. DCD, DCFS, DCFD, EQUD, EQUS, etc. aren’t assembled to instructions, all they do is allow you to insert data values. Of course. Sorry, wasn’t thinking. Nevertheless, you wouldn’t want one to clash with an instruction, I guess.

Mar 1, 2012 10:55pm Kuemmel (439) 384 posts	…hm, I think a necessary FPA instruction doesn’t exist. According to Link you can only do it with one register so either FLTS F0,R0 or FLTD F0,R0 and back with FIX R0,F0. According to the exampels on Link there is a precision difference between using FLTS and FLTD and go back with FIX, but I wonder how that’s so. I didn’t read everything, but it seems there’s nothing like a FIX R0,R1,F0 or something. So I think I got to do STFD to store it in memory and then load it back with VLDR D0,value and then swap S0 and S1, or like VLDR S0,value+4 / VLDR S1,value to get into D0.

Mar 1, 2012 11:58pm Jeffrey Lee (213) 6048 posts	You’re right – my approach wouldn’t work. FLTS and FLTD both convert from an int to a float, and FIX converts from float to int. So the only way of getting a float into/out of FPA registers without integer conversion is with the load/store instructions.

Mar 5, 2012 1:57pm Jeffrey Lee (213) 6048 posts	Any more thoughts on DCFD? I’m hesitant to dedicate an OPT flag to it, since it’s only one directive/instruction out of the several hundred that the assembler supports. Plus there’s the danger of ROL’s BASIC or someone elses extended assembler reusing the same OPT flag for an entirely different purpose. But maybe this isn’t such a serious issue if we’re going to move the assembler out into a seperate module. E.g. instead of OPT just taking a number, the new version might be of the form “OPT <x>,<y>”, where <x> is the standard BASIC flags for error reporting, assembler listing, etc., and <y> is a set of flags (or maybe a string?) for controlling the extra features of the assembler backend. For the moment I’ve implemented a solution similar to WPB’s suggestion of DCFD_le and DCFD_be – you can use DCFD.fpa for the FPA version and DCFD.vfp for the VFP version. This doesn’t rely on the user having to know which one is big-endian and which is little-endian, and it matches the ‘.suffix’ syntax style that ARM seem to have become fond of. A plain old DCFD will use the FPA format, but we can easily change that to be configurable by OPT once we’ve worked out how to move the assembler out into a seperate module. Any complaints if we go with this method for the moment? Other improvements I’ve made: The friendly error message when trying to use an invalid .F32/.F64 immediate constant. Improved a few other VFP/NEON errors (before it was just using generic BASIC error messages) Fixed BASICTrans so it won’t corrupt the output buffer if you ask it to lookup an unknown error message (!) Added the VFPv4/Advanced SIMD v2 instructions (all two of them) Added ‘DCFH’ for half-precision floats. Since BASIC doesn’t allow floats to be NaNs or infinity, I’ve made the code assume that the advanced half precision format is in use. This allows DCFH to store larger numbers, but those large numbers will only be converted to floats/doubles properly if the assembler code sets the AHP bit in the FPSCR (if you don’t have the AHP bit set, they’ll be converted to a NaN or infinity). Maybe this should also be something that can be controlled via OPT, and/or via .ahp and .ieee suffixes. If everyone’s (relatively) happy with the above then I should be able to get everything checked in tonight.

Mar 5, 2012 3:32pm Steve Drain (222) 1620 posts	E.g. instead of OPT just taking a number, the new version might be of the form “OPT ,”, where is the standard BASIC flags for error reporting, assembler listing, etc., and is a set of flags (or maybe a string?) for controlling the extra features of the assembler backend. This is the method used by Extended BASIC assembler, using the directive: EXT flags%,options% .

Mar 7, 2012 11:53pm Jeffrey Lee (213) 6048 posts	The assembler changes are now checked in. For reference, the two assembler bugs that I fixed were: Shift right immediate instructions (VSHR, VSRA, VSRI, VQSHRN, etc.) and VCVT (floating point and fixed point, SIMD) had the shift amounts incorrectly assembled VLDM/VSTM style register lists wouldn’t work if you used commas to list the registers and the lowest register in the list wasn’t D0/S0

Reply

To post replies, please first log in.

Forums → Code review →

Search forums

Social

Follow us on

and

ROOL Store

Buy RISC OS Open merchandise here, including SD cards for Raspberry Pi and more.

Donate! Why?

Help ROOL make things happen – please consider donating!

RISC OS IPR

RISC OS is an Open Source operating system owned by RISC OS Developments Ltd and licensed primarily under the Apache 2.0 license.

Description

Developer peer review of proposed code alterations.

Voices

Options

Forums
Login

Contact Us | About Us

The RISC OS Open Beast theme is based on Beast's default layout
Site design © RISC OS Open Limited 2024 except where indicated

Hosted by Arachsys

Powered by Beast © 2006 Josh Goebel and Rick Olson
This site runs on Rails