FP support
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Steve Drain (222) 1620 posts |
Here are a few statements, just for confirmation that I have understood and not missed something that would be an advantage:
|
Jeffrey Lee (213) 6048 posts |
Yes, that’s correct. Also remember that most VFP2 implementations (such as on the Pi 1) only have 16 D registers, while VFP3+ usually has 32 (true for all current RISC OS machines, but I think it is technically possible to have VFP3 with only 16 D registers) |
Ben Avison (25) 445 posts |
Sorry to contradict, but my understanding is:
|
Jeffrey Lee (213) 6048 posts |
Yes, you’re right. |
Steve Drain (222) 1620 posts |
The important thing to me is that the Pi 1 only has 16. ;-) I have just finished a new version of the StrongHelp VFP manual with the information about the VFP special registers. So I am more familair with the specifications than ever. |
Steve Drain (222) 1620 posts |
Here is question for the gurus. I have realised that Float could remove the need to check each routine for VFP or FPA by the value in R0 by having two SWI handlers, one for each. Then the Initialisation code would ensure the correct one in the header for the machine it was running on. That is too ‘tricksy’, isn’t it? ;-) |
Rick Murray (539) 13840 posts |
Ehhhh…. Now yer just showin’ orf! ;-) |
Rick Murray (539) 13840 posts |
Check your contents page. Here, it crashes StrongHelp and looking at the source, it is horribly corrupted. Given the age of the typical RISC OS user (<cough> myself included) and modern screen resolutions, it might be helpful to make the register contents diagrams a tad larger. ;-) |
Steve Drain (222) 1620 posts |
So it does. ;-( The copy from which the archive was made is fine, so something was corrupted during the upload process. I have repeated that and checked it and it looks ok now.
I have thought on that, but I like to keep StrongHelp pages as compact as possible. I can see what the layout is on 1920×1200 and easily on lower resolution, but I realise that it is small. The actual information is in the links below, so the diagram is just an aid and is not essential. There are a couple of ideas I have that might ease your pain. |
Rick Murray (539) 13840 posts |
MODE 2 ? ;-) |
Steve Drain (222) 1620 posts |
No. Have a look at version 0.31 ;-) |
Fred Graute (114) 645 posts |
Oh, nice touch. I like it. :-) |
Steve Drain (222) 1620 posts |
I am embarking on a revision of Float to look for gains in speed and accuracy and there is a trade-off between the two. This raises the question of what is the acceptable accuracy. Float only deals with double-precision floats. For the arithmetic operations this is fine, because VFP is IEEE compliant. For the trancendental operations there has to be a deal of calculation and it is difficult to maintain accuracy. The FPEmulator deals with numbers internally in extended precision, 80-bits, and then returns results in double. This cannot be matched by Float, but even the FPE can loose some accuracy at times. If I understand it, the IEE standard requires 15-17 significant digits, but the 15 is the relevant one when considering operations, and 17 for strings. So far, I think Float can do 15 across the range and better in most cases. Is that adequate? |
Steve Drain (222) 1620 posts |
No go. Someone should have warned me. ;-) As far as I can tell, the header data is taken from the module before initialisation, so any subsequent changes are ignored. It is possible to modify the module with external code using Service_ModulePreInit, but for the tiny gain it is hardly worth the effort. |
Jeffrey Lee (213) 6048 posts |
One change you could make though, is to have your SWI handler do something like this: TEQ r0,#0 ORRNE r11,r11,#64 ADD pc,pc,r11,LSL #2 MOV r0,r0 ; Dummy to align start of jump table ; 64 branches here for FPA case ; 64 branches here for VFP case The SWI jump table will be bigger (out-of-range SWIs will have to be explicitly handled), but the fact that you’ll only have one place checking for VFP instead of 20-odd should result in smaller code size for each routine, and it should make the CPU branch predictors happier. For SWIs which are VFP/FPA agnostic you can just have both the branches go to the same place. |
Steve Drain (222) 1620 posts |
Thanks, but part of the reason for trying this is to remove the need to check r0. This could make the use by other programs simpler. I have come up with:
Then the initialisation code overwites the instruction at
Already done. ;-) |
Theo Markettos (89) 919 posts |
GCC with VFP support, release candidate of 4.7.4-rel2 is now available.
to your PackMan sources, or download the raw zipfiles to install manually. (note this is directly from the build system and should not be regarded as a long-term URL – though you can get the most recent nightly build by replacing ‘32’ with ‘lastSuccessfulBuild’) Would be interested in feedback – if people are happy we can make this an official release. |
Theo Markettos (89) 919 posts |
For the record, this is what the release candidate of GCC emits for Rick’s program on the first page, using -mfpu=vfp on a Raspberry Pi: .file "vfp.c" .section .rodata .align 2 .LC0: .ascii "%f\000" .text .align 2 .global main .ascii "main\000" .align 2 .word 4278190088 .type main, %function main: @ args = 0, pretend = 0, frame = 48, outgoing = 0 @ frame_needed = 1, uses_anonymous_args = 0 mov ip, sp stmfd sp!, {r9, fp, ip, lr, pc} sub fp, ip, #4 cmp sp, sl bllt __rt_stkovf_split_small sub sp, sp, #48 mov r9, sp add r2, r9, #8 add r3, r9, #8 mov r0, #66 mov r1, r2 mov r2, r3 bl _kernel_swi ldr r3, [r9, #8] fmsr s14, r3 @ int fsitos s15, s14 fsts s15, [r9, #0] flds s15, [r9, #0] fadds s15, s15, s15 fsts s15, [r9, #4] flds s15, [r9, #4] fcvtds d7, s15 ldr r0, .L3 fmrrd r1, r2, d7 bl printf mov r3, #0 mov r0, r3 ldmea fp, {r9, fp, sp, pc} .L4: .align 2 .L3: .word .LC0 .size main, .-main .ident "GCC: (GCCSDK GCC 4.7.4 Release 2) 4.7.4" |
David Feugey (2125) 2709 posts |
Very good news. |
Chris Gransden (337) 1207 posts |
I’ve uploaded a few programs that benefit from being built with VFP support. Most get a useful speed boost. Requires SharedUnixLib 1.13. See above. dcraw (Convert raw digital camera files) |
rob andrews (112) 200 posts |
Hi chris there seems to be a problem with pdftest says bad archive. |
Chris Gransden (337) 1207 posts |
I’ve re-uploaded pdftest and added ghostscript and pdfutils to the list. |
rob andrews (112) 200 posts |
Where do we get 1.13 of the sharedUnix lib?? it only lists 1.12-1 |
Chris Gransden (337) 1207 posts |
See this post. |
David Feugey (2125) 2709 posts |
Thanks a lot. At least, some fresh, new, faster and more modern software :) |
Pages: 1 2 3 4 5 6 7 8 9 10 11 12