How to move Zero into a VFP register?
Graeme (8815) 106 posts |
I am trying to make a compiler. Something that will compile code into FPE or VFP compatible code, where the VFP can be 16 or 32 registers. Something that can make applications, utilities or modules without the programmer having to do much different for each. It took me about eight hours to write the VFP assembler to actual code routines but some of them are far from optimised yet. Every constant value is loaded from memory. Including zero. Does VSUB.F64 D0,D0,D0 always result in zero in D0? What about rounding? What if D0 contained infinity or NAN, would the result be zero? Other solution I can see are VEOR or VMOV.I that would work but are NEON which not all CPUs have. VMOV with an immediate value only works with VFPv3 or later. |
Rick Murray (539) 13840 posts |
A quick look online suggests: veor.i64 d16,d16,d16 Note the “i64”, using an integer takes care of rounding issues and EORing a value with itself will equal zero. Maybe a similar approach with VSUB might work? You could also VLDR a zero value, but it’s a memory access so is slower. Alternatively, set an ARM register to zero, then transfer that to the VFP?
Even if you had it, it wouldn’t work. It seems that VMOV immediate can only set a limited range of numbers, and zero isn’t one of them! |
Jeffrey Lee (213) 6048 posts |
NaNs will definitely cause problems, not sure offhand about infinity. If you want something that works on VFPv1/2, I think the only options are VLDR, or using VMOV to move the value over from ARM registers. Out of the two, I suspect that VLDR will generally have the best performance. |
Graeme (8815) 106 posts |
I am currently using VLDR, so I may stick with that at the moment. VEOR is a ASIMD/NEON instruction so that will not work on some Raspberry Pis. Some older ones have VFPv2 and no NEON. I just thought there must have been a trick I was missing. My output code can also contain VRINTM at the moment. It will not work on Pi 1 models or the Pi Zero, just like VEOR. The VRINTM is apparently an int() instruction but I haven’t tested that it works like I expect yet. I am left wondering if I should target a higher version of VFP and get older versions to fall back to FPE. It may be easier to get it working first and support for older CPUs could come later. |
Stuart Swales (8827) 1357 posts |
Lack of VMOV.F64 D0, #0.0 seems a bit an an omission. Use something like VMOV.F64 D0, #0.5 then VSUB.F64 D0,D0,D0 ? Or just stick with VLDR. |
Rick Murray (539) 13840 posts |
Given the limited space in which to store such things, there’s probably a reason regarding other numbers that are more important. The surprise is that it’s taken until VFP3 to have an immediate move, and that there isn’t a read only register that is zero. Or maybe there is now?
I’d agree with that. Concentrate on performance for the newer/faster cores and worry about the older ones later on. A quick Google search suggests the OMAP3 has VFPv3, the Iyonix doesn’t appear to have FP. So it’s, what, the Pi1 and Pi0? |
Graeme (8815) 106 posts |
As much as I was wanting to use only early ARM, FPE and VFP instructions it looks like this cannot be done so easily. Does anyone have any links to good VFP/ASIMD/NEON resources? These are very hard to find. I have encoding documents but little on which instructions are supported by which versions. |
Jeffrey Lee (213) 6048 posts |
The canonical reference are the ARM Architecture Reference Manuals. The ARMv7 one will be most useful, as it covers ARMv4-ARMv7, ASIMDv2 & VFPv4: https://developer.arm.com/documentation/ddi0406/cd/?lang=en The ARMv8/v9 one will be needed for any new instructions that were introduced in ARMv8, e.g. VRINTA & VRINTM: https://developer.arm.com/documentation/ddi0487/ha/?lang=en. Note that it only documents ARMv8 and above, so it won’t list instruction availability for older architectures. |
Steve Drain (222) 1620 posts |
You might find my StrongHelp VFP manual useful. This is probably due for an update, but I am past hunting out changes. If anyone points me to what is needed I will incorporate it.
It is not directly relevant, but my Float module deals with both FPE and VFP values. There is a slightly later version with an allocated name “SmartFP”, but the module has really been overtaken by other developments, |
Steve Drain (222) 1620 posts |
I have made just a small one to the to the VFP manual to include some changes to the VFPSupport documentation, including VFPSupport_ElementaryFunctions. |
Graeme (8815) 106 posts |
I already have the float module you wrote and the comments in there were very telling! VFP and the FPE store doubles in exactly the same format but with the words swapped around. A simple two-instruction VFP solves the problem. It was in this where I first saw that the VCMP/VCMPE instructions do not set the flags for further instructions, only in a separate register but you can copy those flags over to the ARM processors flags to use in the old-fashion way. My VFP StrongHelp file is now updated, thank you. The version I had did not have VRINTM instructions listed but they are all there now. |
Martin Avison (27) 1494 posts |
@Steve
That link points to v0.42, but your website link for 0.42 still points to V041. |
Steve Drain (222) 1620 posts |
Oops! Corrected. Old age. ;-) |
Graeme (8815) 106 posts |
My assembler can now “compile” the following non-standard commands now: ADDEQ D0,D1,D4 ; Add double floats, VFP or FPE depending on settings It a nice start. This is a basis for a higher-level language. The higher-level to be converted to this first to make things easier. |