RISC OS Open: Forum: How to move Zero into a VFP register?

Apr 14, 2022 8:20pm

Graeme (8815) 106 posts

I am trying to make a compiler. Something that will compile code into FPE or VFP compatible code, where the VFP can be 16 or 32 registers. Something that can make applications, utilities or modules without the programmer having to do much different for each. It took me about eight hours to write the VFP assembler to actual code routines but some of them are far from optimised yet.

Every constant value is loaded from memory. Including zero.

Does VSUB.F64 D0,D0,D0 always result in zero in D0? What about rounding? What if D0 contained infinity or NAN, would the result be zero?

Other solution I can see are VEOR or VMOV.I that would work but are NEON which not all CPUs have. VMOV with an immediate value only works with VFPv3 or later.

Apr 14, 2022 8:33pm

Rick Murray (539) 13840 posts

What about rounding?

A quick look online suggests:

veor.i64 d16,d16,d16

Note the “i64”, using an integer takes care of rounding issues and EORing a value with itself will equal zero. Maybe a similar approach with VSUB might work?

You could also VLDR a zero value, but it’s a memory access so is slower.

Alternatively, set an ARM register to zero, then transfer that to the VFP?

VMOV with an immediate value only works with VFPv3 or later.

Even if you had it, it wouldn’t work. It seems that VMOV immediate can only set a limited range of numbers, and zero isn’t one of them!

Apr 14, 2022 9:07pm

Jeffrey Lee (213) 6048 posts

Does VSUB.F64 D0,D0,D0 always result in zero in D0? What about rounding? What if D0 contained infinity or NAN, would the result be zero?

NaNs will definitely cause problems, not sure offhand about infinity.

If you want something that works on VFPv1/2, I think the only options are VLDR, or using VMOV to move the value over from ARM registers. Out of the two, I suspect that VLDR will generally have the best performance.

Apr 15, 2022 11:43am

Graeme (8815) 106 posts

I am currently using VLDR, so I may stick with that at the moment.

VEOR is a ASIMD/NEON instruction so that will not work on some Raspberry Pis. Some older ones have VFPv2 and no NEON.

I just thought there must have been a trick I was missing.

My output code can also contain VRINTM at the moment. It will not work on Pi 1 models or the Pi Zero, just like VEOR. The VRINTM is apparently an int() instruction but I haven’t tested that it works like I expect yet.

I am left wondering if I should target a higher version of VFP and get older versions to fall back to FPE. It may be easier to get it working first and support for older CPUs could come later.

Apr 15, 2022 11:48am

Stuart Swales (8827) 1357 posts

Lack of VMOV.F64 D0, #0.0 seems a bit an an omission. Use something like VMOV.F64 D0, #0.5 then VSUB.F64 D0,D0,D0 ? Or just stick with VLDR.

Apr 15, 2022 12:12pm

Rick Murray (539) 13840 posts

Given the limited space in which to store such things, there’s probably a reason regarding other numbers that are more important.

The surprise is that it’s taken until VFP3 to have an immediate move, and that there isn’t a read only register that is zero. Or maybe there is now?

I am left wondering if I should target a higher version of VFP and get older versions to fall back to FPE.

I’d agree with that. Concentrate on performance for the newer/faster cores and worry about the older ones later on.

A quick Google search suggests the OMAP3 has VFPv3, the Iyonix doesn’t appear to have FP. So it’s, what, the Pi1 and Pi0?

Apr 16, 2022 10:10pm

Graeme (8815) 106 posts

As much as I was wanting to use only early ARM, FPE and VFP instructions it looks like this cannot be done so easily.

Does anyone have any links to good VFP/ASIMD/NEON resources? These are very hard to find. I have encoding documents but little on which instructions are supported by which versions.

Apr 16, 2022 11:01pm

Jeffrey Lee (213) 6048 posts

The canonical reference are the ARM Architecture Reference Manuals.

The ARMv7 one will be most useful, as it covers ARMv4-ARMv7, ASIMDv2 & VFPv4: https://developer.arm.com/documentation/ddi0406/cd/?lang=en

The ARMv8/v9 one will be needed for any new instructions that were introduced in ARMv8, e.g. VRINTA & VRINTM: https://developer.arm.com/documentation/ddi0487/ha/?lang=en. Note that it only documents ARMv8 and above, so it won’t list instruction availability for older architectures.

Apr 17, 2022 10:03am

Steve Drain (222) 1620 posts

Does anyone have any links to good VFP/ASIMD/NEON resources?

You might find my StrongHelp VFP manual useful.

This is probably due for an update, but I am past hunting out changes. If anyone points me to what is needed I will incorporate it.

Something that will compile code into FPE or VFP compatible code …

It is not directly relevant, but my Float module deals with both FPE and VFP values.

There is a slightly later version with an allocated name “SmartFP”, but the module has really been overtaken by other developments,

Apr 22, 2022 11:45am

Steve Drain (222) 1620 posts

This is probably due for an update

I have made just a small one to the to the VFP manual to include some changes to the VFPSupport documentation, including VFPSupport_ElementaryFunctions.

Apr 23, 2022 8:44pm

Graeme (8815) 106 posts

I already have the float module you wrote and the comments in there were very telling! VFP and the FPE store doubles in exactly the same format but with the words swapped around. A simple two-instruction VFP solves the problem. It was in this where I first saw that the VCMP/VCMPE instructions do not set the flags for further instructions, only in a separate register but you can copy those flags over to the ARM processors flags to use in the old-fashion way.

My VFP StrongHelp file is now updated, thank you. The version I had did not have VRINTM instructions listed but they are all there now.

Apr 23, 2022 10:07pm

Martin Avison (27) 1494 posts

@Steve

I have made just a small one to the to the VFP manual…

That link points to v0.42, but your website link for 0.42 still points to V041.

Apr 24, 2022 10:21am

Steve Drain (222) 1620 posts

your website link for 0.42 still points to V041

Oops! Corrected. Old age. ;-)

Apr 27, 2022 10:30pm

Graeme (8815) 106 posts

My assembler can now “compile” the following non-standard commands now:

ADDEQ D0,D1,D4 ; Add double floats, VFP or FPE depending on settings
MOV S0,#3.1415926 ; Move Pi into S0
STMEB R13!,{S0-S4,S9-S12} ; Store single floats (non-float acceptable stack) (not-allowed gap)
STM R0,[R10,#81924] ; out of range value
ADD R0,R0,#12345678 ; value is too large
LDR D0,[R13,#8200] ; out of bounds, double float loading for VFP or FPE
CONV D0,R0 ; Convert R0 into D0
EQUX 48569 ; Make 48569 into the number of bits this CPU has (32/64-bit)
LDR R0,[R11,R10,LSL#X] ; Shift by X bits, where compiling for number of bits this CPU has

It a nice start.

This is a basis for a higher-level language. The higher-level to be converted to this first to make things easier.

How to move Zero into a VFP register?

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Apr 14, 2022 8:20pm Graeme (8815) 106 posts	I am trying to make a compiler. Something that will compile code into FPE or VFP compatible code, where the VFP can be 16 or 32 registers. Something that can make applications, utilities or modules without the programmer having to do much different for each. It took me about eight hours to write the VFP assembler to actual code routines but some of them are far from optimised yet. Every constant value is loaded from memory. Including zero. Does VSUB.F64 D0,D0,D0 always result in zero in D0? What about rounding? What if D0 contained infinity or NAN, would the result be zero? Other solution I can see are VEOR or VMOV.I that would work but are NEON which not all CPUs have. VMOV with an immediate value only works with VFPv3 or later.

Apr 14, 2022 8:33pm Rick Murray (539) 13840 posts	What about rounding? A quick look online suggests: veor.i64 d16,d16,d16 Note the “i64”, using an integer takes care of rounding issues and EORing a value with itself will equal zero. Maybe a similar approach with VSUB might work? You could also VLDR a zero value, but it’s a memory access so is slower. Alternatively, set an ARM register to zero, then transfer that to the VFP? VMOV with an immediate value only works with VFPv3 or later. Even if you had it, it wouldn’t work. It seems that VMOV immediate can only set a limited range of numbers, and zero isn’t one of them!

Apr 14, 2022 9:07pm Jeffrey Lee (213) 6048 posts	Does VSUB.F64 D0,D0,D0 always result in zero in D0? What about rounding? What if D0 contained infinity or NAN, would the result be zero? NaNs will definitely cause problems, not sure offhand about infinity. If you want something that works on VFPv1/2, I think the only options are VLDR, or using VMOV to move the value over from ARM registers. Out of the two, I suspect that VLDR will generally have the best performance.

Apr 15, 2022 11:43am Graeme (8815) 106 posts	I am currently using VLDR, so I may stick with that at the moment. VEOR is a ASIMD/NEON instruction so that will not work on some Raspberry Pis. Some older ones have VFPv2 and no NEON. I just thought there must have been a trick I was missing. My output code can also contain VRINTM at the moment. It will not work on Pi 1 models or the Pi Zero, just like VEOR. The VRINTM is apparently an int() instruction but I haven’t tested that it works like I expect yet. I am left wondering if I should target a higher version of VFP and get older versions to fall back to FPE. It may be easier to get it working first and support for older CPUs could come later.

Apr 15, 2022 11:48am Stuart Swales (8827) 1357 posts	Lack of VMOV.F64 D0, #0.0 seems a bit an an omission. Use something like VMOV.F64 D0, #0.5 then VSUB.F64 D0,D0,D0 ? Or just stick with VLDR.

Apr 15, 2022 12:12pm Rick Murray (539) 13840 posts	Given the limited space in which to store such things, there’s probably a reason regarding other numbers that are more important. The surprise is that it’s taken until VFP3 to have an immediate move, and that there isn’t a read only register that is zero. Or maybe there is now? I am left wondering if I should target a higher version of VFP and get older versions to fall back to FPE. I’d agree with that. Concentrate on performance for the newer/faster cores and worry about the older ones later on. A quick Google search suggests the OMAP3 has VFPv3, the Iyonix doesn’t appear to have FP. So it’s, what, the Pi1 and Pi0?

Apr 16, 2022 10:10pm Graeme (8815) 106 posts	As much as I was wanting to use only early ARM, FPE and VFP instructions it looks like this cannot be done so easily. Does anyone have any links to good VFP/ASIMD/NEON resources? These are very hard to find. I have encoding documents but little on which instructions are supported by which versions.

Apr 16, 2022 11:01pm Jeffrey Lee (213) 6048 posts	The canonical reference are the ARM Architecture Reference Manuals. The ARMv7 one will be most useful, as it covers ARMv4-ARMv7, ASIMDv2 & VFPv4: https://developer.arm.com/documentation/ddi0406/cd/?lang=en The ARMv8/v9 one will be needed for any new instructions that were introduced in ARMv8, e.g. VRINTA & VRINTM: https://developer.arm.com/documentation/ddi0487/ha/?lang=en. Note that it only documents ARMv8 and above, so it won’t list instruction availability for older architectures.

Apr 17, 2022 10:03am Steve Drain (222) 1620 posts	Does anyone have any links to good VFP/ASIMD/NEON resources? You might find my StrongHelp VFP manual useful. This is probably due for an update, but I am past hunting out changes. If anyone points me to what is needed I will incorporate it. Something that will compile code into FPE or VFP compatible code … It is not directly relevant, but my Float module deals with both FPE and VFP values. There is a slightly later version with an allocated name “SmartFP”, but the module has really been overtaken by other developments,

Apr 22, 2022 11:45am Steve Drain (222) 1620 posts	This is probably due for an update I have made just a small one to the to the VFP manual to include some changes to the VFPSupport documentation, including VFPSupport_ElementaryFunctions.

Apr 23, 2022 8:44pm Graeme (8815) 106 posts	I already have the float module you wrote and the comments in there were very telling! VFP and the FPE store doubles in exactly the same format but with the words swapped around. A simple two-instruction VFP solves the problem. It was in this where I first saw that the VCMP/VCMPE instructions do not set the flags for further instructions, only in a separate register but you can copy those flags over to the ARM processors flags to use in the old-fashion way. My VFP StrongHelp file is now updated, thank you. The version I had did not have VRINTM instructions listed but they are all there now.

Apr 23, 2022 10:07pm Martin Avison (27) 1494 posts	@Steve I have made just a small one to the to the VFP manual… That link points to v0.42, but your website link for 0.42 still points to V041.

Apr 24, 2022 10:21am Steve Drain (222) 1620 posts	your website link for 0.42 still points to V041 Oops! Corrected. Old age. ;-)

Apr 27, 2022 10:30pm Graeme (8815) 106 posts	My assembler can now “compile” the following non-standard commands now: ADDEQ D0,D1,D4 ; Add double floats, VFP or FPE depending on settings MOV S0,#3.1415926 ; Move Pi into S0 STMEB R13!,{S0-S4,S9-S12} ; Store single floats (non-float acceptable stack) (not-allowed gap) STM R0,[R10,#81924] ; out of range value ADD R0,R0,#12345678 ; value is too large LDR D0,[R13,#8200] ; out of bounds, double float loading for VFP or FPE CONV D0,R0 ; Convert R0 into D0 EQUX 48569 ; Make 48569 into the number of bits this CPU has (32/64-bit) LDR R0,[R11,R10,LSL#X] ; Shift by X bits, where compiling for number of bits this CPU has It a nice start. This is a basis for a higher-level language. The higher-level to be converted to this first to make things easier.