Floating Point in C
Pages: 1 2
Raik (463) 2061 posts |
I have play around with various mathematical things in C and I have found this floating point benchmark… |
Stuart Swales (8827) 1357 posts |
Have a look at Norcroft using the /softfp calling convention together with a VFP accelerator library: https://www.riscosopen.org/forum/forums/2/topics/3457?page=12#posts-146247 |
Stuart Swales (8827) 1357 posts |
See https://www.riscosopen.org/forum/forums/2/topics/3457?page=9#posts-126493 for an example of a real-world speedup. |
Raik (463) 2061 posts |
Thanks for your answer and the links. |
Stuart Swales (8827) 1357 posts |
Clearly not Raik!
There should be a lot of warnings when linking! See https://www.riscosopen.org/forum/forums/2/topics/3457?page=10#posts-126539 |
Raik (463) 2061 posts |
Ok, not clearly descibed … to stupid (I’m not a programmer), as I wrote. Linker brings 2 infos, 13 warning and one error message. |
Stuart Swales (8827) 1357 posts |
Will see if I can get it to work and post results this afternoon. Have a good holiday! |
Raik (463) 2061 posts |
Thanks a lot. |
Stuart Swales (8827) 1357 posts |
Have a gander at https://www.croftnuisk.co.uk/coltsoft-downloads/other/linpack.zip There’s a modified linpackc-vfp.c with the appropriate bits added and it contains the apcs_softpcs library in export… Looks waaaaay faster using VFP for this, even going through the sub-optimal /softfp interface – peaks at 193172 KFLOPS for VFP/softfp vs. 8049 KFLOPS for FPE on my Pi 4 (24x better). Timings are shown in text files fpe-norcroft and vfp-norcroft; these also show how to compile with /softfp and link correctly with the library – I’m sure link can be persuaded to use Unix-like filenames, but I tend to stick to RISC OS ones on RISC OS – and what warnings you ought to expect. Update: If you assemble with the option to enforce VFP usage (with no FPE fallback, thereby reducing overhead on each call), that’s 225 MFLOPS (28x FPE). |
Rick Murray (539) 13840 posts |
<sveinung>It’s gone way past time when it’s been reasonable for the DDE to support VFP natively. It ought to say something that code compiled with ABC, which generates somewhat idiosyncratic code, is more than able to hand Norcroft it’s arse on a plate due to the lethargy of FPE.</sveinung> |
Stuart Swales (8827) 1357 posts |
And yet, many folk hereabouts still use VFP-incapable systems. I’m minded to support a new fireworkz-vfp package (so it can also use ARMv7 optimisations) in addition to keeping the old fireworkz package for FPE/FPA users. Probably not noticeable for most users, but those using a beta version report rather useful speed-ups for their matrix ops. Lots of SharedCLibrary f.p. support code is in good old assembler. Would require new calling convention – and how many FP regs does it use? – and probably an addition SCL-VFP. 12th of Never; Money, Money, Money… |
Rick Murray (539) 13840 posts |
Yup, especially the emulators. That being said, there is a discussion to be had regarding the level of ongoing support for legacy systems if it penalises contemporary ones. Maths intensive programs, however, will suffer with FPE, so having VFP available will make a big difference. Fireworkz is a good example of this.
Are they two separate builds, or does it use #ifdef to choose which bit of code to use?
Yup. I know. I have looked. ;)
In addition to supporting, as well, FPE for all of the current code.
Never say never. I think we’ll see this before a <whisper>64 bit OS</whisper>. |
Stuart Swales (8827) 1357 posts |
Oh, just use qemu-rpi2, they say. Does it work? Not for me. Otherwise we’d all be using it instead of good old RPCEmu, wouldn’t we?
In some circumstances… As mentioned, most users wouldn’t notice any different – there’s a fair bit of overhead surrounding calls to calculations.
Two different Makefiles, built from lots of little fragments, one of which adds the magic sauce to the CC flags. Then one ifdef’ed include in a common header, one ifdef’ed call in main(). |
Dave Higton (1515) 3526 posts |
Is VFP compliant with IEEE 754 in respect of accuracy? If it is, is it beyond the wit of man to make the DDE C compiler generate code that uses the fastest FP method available, on whichever platform the compiled object code runs on? |
Stuart Swales (8827) 1357 posts |
Easy answer – yes. Does it have extended precision like FPA – no.
By virtue of having multiple code paths and producing very unoptimised code? Maybe, so long as you’re prepared for the FPA path to be slower than present. Currently Norcroft can produce a sequence of consecutive FPA opcodes that the FPEmulator doesn’t have to take an exception hit per opcode for, which might be harder for a mixed-output compiler to do. Also remember that FPA and VFP have different order words in their 64-bit double :-) My library’s default is to detect VFP availability and use it if possible, falling back to using FPA opcodes to do the same job if not. My feeling was that this wasn’t too bad; ISTR RiscOSM being maybe 10% slower using the library in this mode on older systems (and of course faster on newer ones), so it was worth it for them to have two binaries, selected at run-time. |
Cameron Cawley (3514) 157 posts |
Would this be an issue for systems with hardware FPA like the A7000+ or just with FPEmulator? My experience with GCC is that it produces much faster code on the RiscPC with soft float than it does with hard float, so being able to use a soft float ABI within SharedCLibrary should provide important gains on almost all machines regardless of age. If this does get integrated into the stubs, that should make runtime selection of different code paths easier to implement. |
Rick Murray (539) 13840 posts |
If the compiler could output VFP natively, then wouldn’t that remove a lot of the overheads?
While it isn’t insurmountable, there are questions regarding the two having the data words in different order, not to mention a lot of noise and mess constantly branching for this or that. However, given the OS’ history, this sort of compromise may be the best option, provided the compiler has an option to only emit FPE or VFP as well. This will permit the programmer to choose the most appropriate setup for their program – for example something that requires capable modern hardware might want to stick with only VFP, something that barely uses FP could just stick with FPE to easily run on anything… |
David J. Ruck (33) 1635 posts |
If you do go down the multi-path route in a program, where different code will be used depending on the machine it is run on, you are going to have to lock down your FP double format at an interfaces, such as file storage or network transmission, otherwise the program on a different machine wont read the data correctly. A common solution (also used where it might not just be endian differences) is to store/transmit FP as text strings. |
Stuart Swales (8827) 1357 posts |
Just sayin’ – this is something that you can use NOW with trivial source code and Makefile change, using your favourite (Norcroft) toolchain. To my mind, resource would be best spent creating a soft-float SharedCLibrary – good performance gain for most users (including emulators) for minimal effort. A SCL with soft-float interface can always then be tweaked for each platform to maximise its internal use of VFP, like we do already with ARM options. If the very few people with real FPA hardware moan, they can always recompile packages, right ;-) |
Jean-Michel BRUCK (3009) 359 posts |
Yes! The VFP version is built in about 5s and the one without VFP 20s. (ARMX6) |
Rick Murray (539) 13840 posts |
Oops… The requested URL was not found on this server. (also, needs a blank line after the quoted part) |
Jean-Michel BRUCK (3009) 359 posts |
Sorry, spelling mistake!!! I’ll correct it… Phew! It’s done… |
Rick Murray (539) 13840 posts |
There’s no purple alien planet any more. Perhaps the Vogons got there? https://web.archive.org/web/20240302093158/https://www.purplealienplanet.com/node/20 (though the formatting is bad) |
Jean-Michel BRUCK (3009) 359 posts |
Merci Rick, |
Raik (463) 2061 posts |
Thanks a lot. Have take a look at. My mistake was a small one, looks like. |
Pages: 1 2