RISC OS Open: Forum: FP support

Aug 13, 2015 9:22am

Steve Drain (222) 1620 posts

Reason being that all modern VFP implementations seem to lack support for hardware trapping of exceptions (In terms of RISC OS, the Pi 1 is the only machine that will trap exceptions).

Here are a few statements, just for confirmation that I have understood and not missed something that would be an advantage:

Our lowest common denominator is the Pi 1, which is ARMv6
ARMv6 implements VFP2 only (VFP1 is obsolete)
VMOV immediate constant is VFP3 – I miss that
In VFP2 exception trapping could be enabled, but for consistency with VFP3+ it should not.

Aug 13, 2015 11:51am

Jeffrey Lee (213) 6048 posts

Yes, that’s correct.

Also remember that most VFP2 implementations (such as on the Pi 1) only have 16 D registers, while VFP3+ usually has 32 (true for all current RISC OS machines, but I think it is technically possible to have VFP3 with only 16 D registers)

Aug 13, 2015 2:15pm

Ben Avison (25) 445 posts

Sorry to contradict, but my understanding is:

VFPv2: always has 16 D registers
VFPv3 without NEON, or VFPv4 without NEON: may have either 16 or 32 D registers
VFPv3 with NEON, or VFPv4 with NEON: always has 32 D registers

Aug 13, 2015 2:38pm

Jeffrey Lee (213) 6048 posts

Yes, you’re right.

Aug 13, 2015 3:59pm

Steve Drain (222) 1620 posts

The important thing to me is that the Pi 1 only has 16. ;-)

I have just finished a new version of the StrongHelp VFP manual with the information about the VFP special registers. So I am more familair with the specifications than ever.

Aug 13, 2015 4:22pm

Steve Drain (222) 1620 posts

Here is question for the gurus.

I have realised that Float could remove the need to check each routine for VFP or FPA by the value in R0 by having two SWI handlers, one for each. Then the Initialisation code would ensure the correct one in the header for the machine it was running on.

That is too ‘tricksy’, isn’t it?

;-)

Aug 13, 2015 7:28pm

Rick Murray (539) 13840 posts

Then the Initialisation code would ensure the correct one in the header for the machine it was running on.

Ehhhh…. Now yer just showin’ orf! ;-)

Aug 13, 2015 8:55pm

Rick Murray (539) 13840 posts

I have just finished a new version of the StrongHelp VFP manual

Check your contents page. Here, it crashes StrongHelp and looking at the source, it is horribly corrupted.

Given the age of the typical RISC OS user (<cough> myself included) and modern screen resolutions, it might be helpful to make the register contents diagrams a tad larger. ;-)

Aug 14, 2015 9:09am

Steve Drain (222) 1620 posts

So it does. ;-(

The copy from which the archive was made is fine, so something was corrupted during the upload process. I have repeated that and checked it and it looks ok now.

it might be helpful to make the register contents diagrams a tad larger. ;-)

I have thought on that, but I like to keep StrongHelp pages as compact as possible. I can see what the layout is on 1920×1200 and easily on lower resolution, but I realise that it is small. The actual information is in the links below, so the diagram is just an aid and is not essential. There are a couple of ideas I have that might ease your pain.

Aug 14, 2015 10:10am

Rick Murray (539) 13840 posts

There are a couple of ideas I have that might ease your pain.

MODE 2 ? ;-)

Aug 14, 2015 11:04am

Steve Drain (222) 1620 posts

No. Have a look at version 0.31 ;-)

Aug 14, 2015 11:38am

Fred Graute (114) 645 posts

No. Have a look at version 0.31 ;-)

Oh, nice touch. I like it. :-)

Aug 15, 2015 2:41pm

Steve Drain (222) 1620 posts

I am embarking on a revision of Float to look for gains in speed and accuracy and there is a trade-off between the two. This raises the question of what is the acceptable accuracy.

Float only deals with double-precision floats. For the arithmetic operations this is fine, because VFP is IEEE compliant.

For the trancendental operations there has to be a deal of calculation and it is difficult to maintain accuracy.

The FPEmulator deals with numbers internally in extended precision, 80-bits, and then returns results in double. This cannot be matched by Float, but even the FPE can loose some accuracy at times.

If I understand it, the IEE standard requires 15-17 significant digits, but the 15 is the relevant one when considering operations, and 17 for strings.

So far, I think Float can do 15 across the range and better in most cases. Is that adequate?

Aug 16, 2015 6:32pm

Steve Drain (222) 1620 posts

the Initialisation code would ensure the correct one [VFP of FPA SWIs] in the header for the machine it was running on

No go. Someone should have warned me. ;-)

As far as I can tell, the header data is taken from the module before initialisation, so any subsequent changes are ignored. It is possible to modify the module with external code using Service_ModulePreInit, but for the tiny gain it is hardly worth the effort.

Aug 16, 2015 6:50pm

Jeffrey Lee (213) 6048 posts

One change you could make though, is to have your SWI handler do something like this:

TEQ r0,#0
ORRNE r11,r11,#64
ADD pc,pc,r11,LSL #2
MOV r0,r0 ; Dummy to align start of jump table
; 64 branches here for FPA case
; 64 branches here for VFP case

The SWI jump table will be bigger (out-of-range SWIs will have to be explicitly handled), but the fact that you’ll only have one place checking for VFP instead of 20-odd should result in smaller code size for each routine, and it should make the CPU branch predictors happier. For SWIs which are VFP/FPA agnostic you can just have both the branches go to the same place.

Aug 17, 2015 10:50am

Steve Drain (222) 1620 posts

Thanks, but part of the reason for trying this is to remove the need to check r0. This could make the use by other programs simpler.

I have come up with:

CMP   r11,#(fpa_table-vfp_table)/4
.swi_offset
ADDlo r11,r11,#(fpa_table-vfp_table)/4
ADDlo pc,pc,r11,LSL #2
B     bad_swi
.vfp_table
B   vfp_Start
...
...
.fpa_table
B   fpa_Start
...
...

Then the initialisation code overwites the instruction at .swi_offset with 0, which is the neutral ANDeq r0,r0,r0, if the VFP_Support module is detected.

For SWIs which are VFP/FPA agnostic you can just have both the branches go to the same place.

Already done. ;-)

Aug 19, 2015 9:54pm

Theo Markettos (89) 919 posts

GCC with VFP support, release candidate of 4.7.4-rel2 is now available.
Add:



http://ci.riscos.info/job/gcc-4.7-native/32/artifact/gcc4/release-area/riscpkg/autobuilder_website/pkg/autobuilt

to your PackMan sources, or download the raw zipfiles to install manually.

(note this is directly from the build system and should not be regarded as a long-term URL – though you can get the most recent nightly build by replacing ‘32’ with ‘lastSuccessfulBuild’)

Would be interested in feedback – if people are happy we can make this an official release.

Aug 19, 2015 10:00pm

Theo Markettos (89) 919 posts

For the record, this is what the release candidate of GCC emits for Rick’s program on the first page, using -mfpu=vfp on a Raspberry Pi:

	.file	"vfp.c"
	.section	.rodata
	.align	2
.LC0:
	.ascii	"%f\000"
	.text
	.align	2
	.global	main
	.ascii	"main\000"
	.align	2
	.word	4278190088
	.type	main, %function
main:
	@ args = 0, pretend = 0, frame = 48, outgoing = 0
	@ frame_needed = 1, uses_anonymous_args = 0
	mov	ip, sp
	stmfd	sp!, {r9, fp, ip, lr, pc}
	sub	fp, ip, #4
	cmp	sp, sl
	bllt	__rt_stkovf_split_small
	sub	sp, sp, #48
	mov	r9, sp
	add	r2, r9, #8
	add	r3, r9, #8
	mov	r0, #66
	mov	r1, r2
	mov	r2, r3
	bl	_kernel_swi
	ldr	r3, [r9, #8]
	fmsr	s14, r3	@ int
	fsitos	s15, s14
	fsts	s15, [r9, #0]
	flds	s15, [r9, #0]
	fadds	s15, s15, s15
	fsts	s15, [r9, #4]
	flds	s15, [r9, #4]
	fcvtds	d7, s15
	ldr	r0, .L3
	fmrrd	r1, r2, d7
	bl	printf
	mov	r3, #0
	mov	r0, r3
	ldmea	fp, {r9, fp, sp, pc}
.L4:
	.align	2
.L3:
	.word	.LC0
	.size	main, .-main
	.ident	"GCC: (GCCSDK GCC 4.7.4 Release 2) 4.7.4"

Aug 20, 2015 5:28am

David Feugey (2125) 2709 posts

GCC with VFP support, release candidate of 4.7.4-rel2 is now available.

Very good news.

Aug 22, 2015 5:04pm

Chris Gransden (337) 1207 posts

I’ve uploaded a few programs that benefit from being built with VFP support. Most get a useful speed boost. Requires SharedUnixLib 1.13. See above.
Should work on RPi 1 & 2, Beagleboard & Xm, Pandboard & ES, ARMX6 and IGEPv5 & EVM. (Includes vfp2, 3 and 4).

dcraw (Convert raw digital camera files)
flac (Produce flac file from wav)
ghostscript (PostScript and PDF interpreter)
imagemagick (manipulate various image formats, has basic RISC OS Sprite support)
lame (Produce mp3 file from wav).
jpeg-progs9a (Manipulate jpeg files)
mplayer (Play movies and audio, audio decoding is faster. Video play back only marginally quicker)
pdftest (View PDF files, renders up to twice as fast)
pdftools (Manipulate PDF files)
povray (Raytrace images)
sox (Manipulate various audio formats)

Aug 23, 2015 4:41am

rob andrews (112) 200 posts

Hi chris there seems to be a problem with pdftest says bad archive.
Do you have any other software that i can test on OMAP5432EVM if you do you can get me at my email

Aug 23, 2015 10:03am

Chris Gransden (337) 1207 posts

I’ve re-uploaded pdftest and added ghostscript and pdfutils to the list.

Aug 23, 2015 10:53am

rob andrews (112) 200 posts

Where do we get 1.13 of the sharedUnix lib?? it only lists 1.12-1
Found it teach me to read all of the posts

Aug 23, 2015 11:39am

Chris Gransden (337) 1207 posts

See this post.

Aug 24, 2015 4:59am

David Feugey (2125) 2709 posts

I’ve uploaded a few programs that benefit from being built with VFP support. Most get a useful speed boost. Requires SharedUnixLib 1.13. See above.

Thanks a lot. At least, some fresh, new, faster and more modern software :)

FP support

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Aug 13, 2015 9:22am Steve Drain (222) 1620 posts	Reason being that all modern VFP implementations seem to lack support for hardware trapping of exceptions (In terms of RISC OS, the Pi 1 is the only machine that will trap exceptions). Here are a few statements, just for confirmation that I have understood and not missed something that would be an advantage: Our lowest common denominator is the Pi 1, which is ARMv6 ARMv6 implements VFP2 only (VFP1 is obsolete) VMOV immediate constant is VFP3 – I miss that In VFP2 exception trapping could be enabled, but for consistency with VFP3+ it should not.

Aug 13, 2015 11:51am Jeffrey Lee (213) 6048 posts	Yes, that’s correct. Also remember that most VFP2 implementations (such as on the Pi 1) only have 16 D registers, while VFP3+ usually has 32 (true for all current RISC OS machines, but I think it is technically possible to have VFP3 with only 16 D registers)

Aug 13, 2015 2:15pm Ben Avison (25) 445 posts	Sorry to contradict, but my understanding is: VFPv2: always has 16 D registers VFPv3 without NEON, or VFPv4 without NEON: may have either 16 or 32 D registers VFPv3 with NEON, or VFPv4 with NEON: always has 32 D registers

Aug 13, 2015 2:38pm Jeffrey Lee (213) 6048 posts	Yes, you’re right.

Aug 13, 2015 3:59pm Steve Drain (222) 1620 posts	The important thing to me is that the Pi 1 only has 16. ;-) I have just finished a new version of the StrongHelp VFP manual with the information about the VFP special registers. So I am more familair with the specifications than ever.

Aug 13, 2015 4:22pm Steve Drain (222) 1620 posts	Here is question for the gurus. I have realised that Float could remove the need to check each routine for VFP or FPA by the value in R0 by having two SWI handlers, one for each. Then the Initialisation code would ensure the correct one in the header for the machine it was running on. That is too ‘tricksy’, isn’t it? ;-)

Aug 13, 2015 7:28pm Rick Murray (539) 13840 posts	Then the Initialisation code would ensure the correct one in the header for the machine it was running on. Ehhhh…. Now yer just showin’ orf! ;-)

Aug 13, 2015 8:55pm Rick Murray (539) 13840 posts	I have just finished a new version of the StrongHelp VFP manual Check your contents page. Here, it crashes StrongHelp and looking at the source, it is horribly corrupted. Given the age of the typical RISC OS user (<cough> myself included) and modern screen resolutions, it might be helpful to make the register contents diagrams a tad larger. ;-)

Aug 14, 2015 9:09am Steve Drain (222) 1620 posts	So it does. ;-( The copy from which the archive was made is fine, so something was corrupted during the upload process. I have repeated that and checked it and it looks ok now. it might be helpful to make the register contents diagrams a tad larger. ;-) I have thought on that, but I like to keep StrongHelp pages as compact as possible. I can see what the layout is on 1920×1200 and easily on lower resolution, but I realise that it is small. The actual information is in the links below, so the diagram is just an aid and is not essential. There are a couple of ideas I have that might ease your pain.

Aug 14, 2015 10:10am Rick Murray (539) 13840 posts	There are a couple of ideas I have that might ease your pain. MODE 2 ? ;-)

Aug 14, 2015 11:04am Steve Drain (222) 1620 posts	No. Have a look at version 0.31 ;-)

Aug 14, 2015 11:38am Fred Graute (114) 645 posts	No. Have a look at version 0.31 ;-) Oh, nice touch. I like it. :-)

Aug 15, 2015 2:41pm Steve Drain (222) 1620 posts	I am embarking on a revision of Float to look for gains in speed and accuracy and there is a trade-off between the two. This raises the question of what is the acceptable accuracy. Float only deals with double-precision floats. For the arithmetic operations this is fine, because VFP is IEEE compliant. For the trancendental operations there has to be a deal of calculation and it is difficult to maintain accuracy. The FPEmulator deals with numbers internally in extended precision, 80-bits, and then returns results in double. This cannot be matched by Float, but even the FPE can loose some accuracy at times. If I understand it, the IEE standard requires 15-17 significant digits, but the 15 is the relevant one when considering operations, and 17 for strings. So far, I think Float can do 15 across the range and better in most cases. Is that adequate?

Aug 16, 2015 6:32pm Steve Drain (222) 1620 posts	the Initialisation code would ensure the correct one [VFP of FPA SWIs] in the header for the machine it was running on No go. Someone should have warned me. ;-) As far as I can tell, the header data is taken from the module before initialisation, so any subsequent changes are ignored. It is possible to modify the module with external code using Service_ModulePreInit, but for the tiny gain it is hardly worth the effort.

Aug 16, 2015 6:50pm Jeffrey Lee (213) 6048 posts	One change you could make though, is to have your SWI handler do something like this: TEQ r0,#0 ORRNE r11,r11,#64 ADD pc,pc,r11,LSL #2 MOV r0,r0 ; Dummy to align start of jump table ; 64 branches here for FPA case ; 64 branches here for VFP case The SWI jump table will be bigger (out-of-range SWIs will have to be explicitly handled), but the fact that you’ll only have one place checking for VFP instead of 20-odd should result in smaller code size for each routine, and it should make the CPU branch predictors happier. For SWIs which are VFP/FPA agnostic you can just have both the branches go to the same place.

Aug 17, 2015 10:50am Steve Drain (222) 1620 posts	Thanks, but part of the reason for trying this is to remove the need to check r0. This could make the use by other programs simpler. I have come up with: `CMP r11,#(fpa_table-vfp_table)/4 .swi_offset ADDlo r11,r11,#(fpa_table-vfp_table)/4 ADDlo pc,pc,r11,LSL #2 B bad_swi .vfp_table B vfp_Start ... ... .fpa_table B fpa_Start ... ...` Then the initialisation code overwites the instruction at `.swi_offset` with 0, which is the neutral `ANDeq r0,r0,r0`, if the VFP_Support module is detected. For SWIs which are VFP/FPA agnostic you can just have both the branches go to the same place. Already done. ;-)

Aug 19, 2015 9:54pm Theo Markettos (89) 919 posts	GCC with VFP support, release candidate of 4.7.4-rel2 is now available. Add: http://ci.riscos.info/job/gcc-4.7-native/32/artifact/gcc4/release-area/riscpkg/autobuilder_website/pkg/autobuilt to your PackMan sources, or download the raw zipfiles to install manually. (note this is directly from the build system and should not be regarded as a long-term URL – though you can get the most recent nightly build by replacing ‘32’ with ‘lastSuccessfulBuild’) Would be interested in feedback – if people are happy we can make this an official release.

Aug 19, 2015 10:00pm Theo Markettos (89) 919 posts	For the record, this is what the release candidate of GCC emits for Rick’s program on the first page, using -mfpu=vfp on a Raspberry Pi: .file "vfp.c" .section .rodata .align 2 .LC0: .ascii "%f\000" .text .align 2 .global main .ascii "main\000" .align 2 .word 4278190088 .type main, %function main: @ args = 0, pretend = 0, frame = 48, outgoing = 0 @ frame_needed = 1, uses_anonymous_args = 0 mov ip, sp stmfd sp!, {r9, fp, ip, lr, pc} sub fp, ip, #4 cmp sp, sl bllt __rt_stkovf_split_small sub sp, sp, #48 mov r9, sp add r2, r9, #8 add r3, r9, #8 mov r0, #66 mov r1, r2 mov r2, r3 bl _kernel_swi ldr r3, [r9, #8] fmsr s14, r3 @ int fsitos s15, s14 fsts s15, [r9, #0] flds s15, [r9, #0] fadds s15, s15, s15 fsts s15, [r9, #4] flds s15, [r9, #4] fcvtds d7, s15 ldr r0, .L3 fmrrd r1, r2, d7 bl printf mov r3, #0 mov r0, r3 ldmea fp, {r9, fp, sp, pc} .L4: .align 2 .L3: .word .LC0 .size main, .-main .ident "GCC: (GCCSDK GCC 4.7.4 Release 2) 4.7.4"

Aug 20, 2015 5:28am David Feugey (2125) 2709 posts	GCC with VFP support, release candidate of 4.7.4-rel2 is now available. Very good news.

Aug 22, 2015 5:04pm Chris Gransden (337) 1207 posts	I’ve uploaded a few programs that benefit from being built with VFP support. Most get a useful speed boost. Requires SharedUnixLib 1.13. See above. Should work on RPi 1 & 2, Beagleboard & Xm, Pandboard & ES, ARMX6 and IGEPv5 & EVM. (Includes vfp2, 3 and 4). dcraw (Convert raw digital camera files) flac (Produce flac file from wav) ghostscript (PostScript and PDF interpreter) imagemagick (manipulate various image formats, has basic RISC OS Sprite support) lame (Produce mp3 file from wav). jpeg-progs9a (Manipulate jpeg files) mplayer (Play movies and audio, audio decoding is faster. Video play back only marginally quicker) pdftest (View PDF files, renders up to twice as fast) pdftools (Manipulate PDF files) povray (Raytrace images) sox (Manipulate various audio formats)

Aug 23, 2015 4:41am rob andrews (112) 200 posts	Hi chris there seems to be a problem with pdftest says bad archive. Do you have any other software that i can test on OMAP5432EVM if you do you can get me at my email

Aug 23, 2015 10:03am Chris Gransden (337) 1207 posts	I’ve re-uploaded pdftest and added ghostscript and pdfutils to the list.

Aug 23, 2015 10:53am rob andrews (112) 200 posts	Where do we get 1.13 of the sharedUnix lib?? it only lists 1.12-1 Found it teach me to read all of the posts

Aug 23, 2015 11:39am Chris Gransden (337) 1207 posts	See this post.

Aug 24, 2015 4:59am David Feugey (2125) 2709 posts	I’ve uploaded a few programs that benefit from being built with VFP support. Most get a useful speed boost. Requires SharedUnixLib 1.13. See above. Thanks a lot. At least, some fresh, new, faster and more modern software :)