FP support
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Stuart Swales (8827) 1357 posts |
It could be an even better compromise if the fallback emulation wasn’t to FPEmulator here but a generic softfp library! This was just really a proof-of-concept a few days ago that went somewhat better than I expected. Anyone trying to use apcs_softpcs: note Thanks for the hints about where that specific link is hiding! |
Colin Ferris (399) 1814 posts |
What about having a rmodule – a bit like SharedClib – ie SharedFpe. Then different versions for various hardware options. |
Stuart Swales (8827) 1357 posts |
The best module to implement this would of course be the SharedCLibrary ;-) |
Steve Drain (222) 1620 posts |
A bit like my Float module of 2016? This was never properly finished, but has a registered release name of SmartFP. You could base something on it or I could pick it up again. If VFP context switching is possible it uses VFP but otherwise FPE, allowing for the float word order. It also provides VFP based transcendental operations such as SIN and EXP, so no falling back to FPE for these. |
Stuart Swales (8827) 1357 posts |
What I’ve tried to provide in the apcs_softpcs ‘package’ is something that can help boost C application performance for really very minimal changes: authors just have to add one #include, one initialisation call, compile with -apcs /softfp and link with the helper library to provide functionality expected by the compiler in that mode. Then you get an application that performs much better on hardware with VFP without much degradation with FPA (given the cost of emulation). It’s not as fast as it would be compiled purely for VFP, but I don’t think that’s the right way to go just yet except for specialist applications (use gcc). It can be made a wee bit faster by removing the run-time selection of VFP/FPA use; possibly the best way to do that would be to implement the required compiler support functions currently provided by apcs_softpcs in the SharedCLibrary as Ben thought way back (https://www.riscosopen.org/forum/forums/2/topics/3457#posts-45080) so that the SCL provided by your system would provide best performance on that system.
Some fool had to do it :-) It helped re-energise my remaining grey cell. I’d forgotten that you’d implemented transcendentals in the Float module otherwise I might have gone looking there; the first RISC OS source I looked at for the state of VFPSupport didn’t have an implementation of SWI VFPSupport_ElementaryFunctions so thought that was a future feature! Then I downloaded the current source :-) |
GavinWraith (26) 1563 posts |
Support libraries for numerical functions should contain an entry for Horner’s method – inputs: a pointer to an array of coefficients |
Stuart Swales (8827) 1357 posts |
apcs_softpcs is just designed to provide the minimum required to adapt an existing C application to using VFP if possible. Currently basic f.p. operators (+,-,*,/,==,!=,<,<=,>,>=) are provided in VFP, along with select C library functions (e.g. isgreater()). It does also use VFPSupport for implementing the standard transcendental functions that would normally be provided by the C library such as cos(). Hopefully, if this idea catches on, I will bother to provide VFP functions for more library functions. If there’s a call for a numerical function library, then I’d be happy to contribute to that as a separate project. It could usefully use the same run-time switching to adapt to being executed on VFP and FPA systems. Fireworkz and PipeDream do have the SERIESSUM spreadsheet function, and use Horner’s method to evaluate various spreadsheet functions, so I know what you are on about. [Edit: I thought that SERIESSUM evaluation used Horner’s method too, but I was wrong! Hadn’t done it that way as the coefficient array can be arranged horizontally or vertically, so had nested loops. That was a pretty easy fix – which uncovered a compiler bug (nothing to do with /softfp, I hasten to add).] |
Steve Drain (222) 1620 posts |
VFPSupport_ElementaryFunctions has appeared in the six years since Float. I seem to remember Jeffrey say that they were going to be better there than in a separate module and it looks as though that is just what he has implemented. I think that is the right place, too, so nothing more for Float. Now I will have to look whether BASICVFP uses this. It did not when I last looked a good while ago, but I bet it does now. ;-) |
Stuart Swales (8827) 1357 posts |
You may be pleasantly surprised – I was! No great fanfare as I recall. |
Martin Avison (27) 1494 posts |
I suspect it was part of this |
Stuart Swales (8827) 1357 posts |
Thanks Martin. My memory is awful. I did say that I was down to one brain cell… Anyhow, here’s an update. Some additional notes, source file names contracted so you can see better what they are in a standard Filer view, and a pre-built library for those who just want to use it: http://croftnuisk.co.uk/coltsoft-downloads/other/apcs_softpcs_20210926.zip |
Matthew Phillips (473) 721 posts |
Thanks Stuart, I’ll download the latest and take a look. I spent a bit of time hacking some code about this morning to try to get one of our applications to compile. I found I had to comment out the bits of your header file to do with time.h as the compiler was complaining about a duplicate definition. Not sure whether this was a fault in my code, but as I’m not using time.h the quickest fix was to remove it. I’ll have another look at doing it properly and report the exact error if I’m still having problems. My main aim was to compile the application and see what the speed improvement might be. Turned out to be useful, but not so dramatic that I could tell without timing it! A task that took 51 seconds on a Pi3 sped up to 44 seconds, using a recent ROM image. I think that makes a 16% speed improvement, though I may be getting my percentages in a mess. |
Matthew Phillips (473) 721 posts |
By the way, the linker threw up so many warnings about “code/data or FP calling standard conflict” that I did not notice it had actually produced a binary for a good few minutes! |
Stuart Swales (8827) 1357 posts |
Thanks for feedback, Matthew. It’d possibly be useful for the apcs_softpcs.h not to #define things relating to various headers like time.h if that header hadn’t been included! It’s only difftime() there that needs redirecting to a function that returns the double in ARM registers rather than F0.
Is that more recent than June? That’s when VFPSupport started to support the elementary functions. |
Matthew Phillips (473) 721 posts |
No discernable difference in speed between VFPSupport 0.13 and VFPSupport 0.16. I’m not sure whether there should be, or whether that only affects BASIC. |
Matthew Phillips (473) 721 posts |
By “recent” I meant 5.29 downloaded today. Should have been more specific. My timings (all Raspberry Pi 3) were: RISC OS 5.25 (11-May-18), VFPSupport 0.13: 43 seconds Original version of application: 51 seconds. I had no intuition as to what the speed improvement might be: it’s a complex application relying on lots of other things, including DrawFile_Render. I’m not even sure how much it relies on basic floating point operations and how much is trigonometry. |
Stuart Swales (8827) 1357 posts |
apcs_softpcs uses VFPSupport to create/destroy VFP contexts, and with the advent of 0.16, sin, cos, exp and friends. All the other VFP arithmetic is done by apcs_softpcs, so VFPSupport version should not matter for that. |
Stuart Swales (8827) 1357 posts |
Sadly unavoidable (unless some guru is kind enough to point out the magic settings) as areas need to have the VFP attribute to assemble VFP instructions! I suppose I could polish it by hand-mangling out the VFP attribute from each area. :-) |
Stuart Swales (8827) 1357 posts |
An interesting observation on applications like Matthew’s: if your application isn’t that f.p. intensive, you wont get much gain by using hardware f.p.. If your workload is: total(normal) = overhead + fp(normal) = 100s and fp(normal) is, say, just 20% of that total, that leaves the unavoidable overhead being 80s, whatever speed the f.p. part can be run at. Moving to apcs_softpcs on a system with hardware f.p. could reduce the time consumed by f.p. calculations just in the f.p. part by a factor of five to ten: fp(sfp) = fp(normal) x 0.1. So (sfp == run-time switchable f.p. between FPA and VFP): total(sfp)= overhead + fp(sfp) = 80s + 0.1 x 20s = 82s So why use apcs_softpcs? You could just recompile using gcc to VFP, but then would have to ship two binaries if you have customers that can’t move over to new ARM hardware with VFP, or who use emulation. If you get yet another factor of ten improvement over apcs_softpcs (which can only execute one f.p. instruction at a time) by using fully optimised gcc/VFP (which can pipeline them) you have fp(vfp) = fp(sfp) x 0.1. So: total(vfp) = overhead + fp(vfp) = 80s + 0.1 x (0.1 x 20s) = 80.2s a far less significant hike in overall performance that the previous step. Which doesn’t look to me like it’s worth maintaining two builds for, for these types of application, says the Devil’s Advocate ;-) |
Matthew Phillips (473) 721 posts |
Yes, interesting calculations. When the application is RiscOSM which is very processor-intensive, it is tempting to build an apcs_softpcs version to give users on modern machines a speed boost. Unfortunately it also gives users of machines without VFP a speed disadvantage. Is it fair to speed up the experience for those who are already on faster machines at the expense of those who are on slower ones? So we may need to issue a “normal” APCS build for older machines anyway. But the ability to do this without having to rebuild the entire application and libraries using gcc instead of Norcroft is very welcome! |
Stuart Swales (8827) 1357 posts |
Ah, that’s interesting, Matthew – it is proving to be that much slower for the older systems? I hadn’t noticed myself, and had found that some code paths were being made faster by the compiler avoiding issuing SFM/LFM at procedure entry/exit when the code path being used through the procedure didn’t always use f.p.. But I think the point about keeping using a toolchain that we ‘know and love’ is very valid and lowers the barrier to adopting measures like these. |
Matthew Phillips (473) 721 posts |
A 1:40:000 map of London took 2 minutes instead of 1:50 on our Iyonix using the VFP version. So a 9% increase in the time taken. On the Pi3 we got a 13% improvement. |
Rick Murray (539) 13840 posts |
Is it fair to hold back and restrict the experience of users of faster machines because of those who wish to remain using ancient slower hardware? The question is valid both ways around. The answer? That’s harder. ;-) However, I would suggest that the ones you offer the best experience to (old or new) should be your majority. If most of your users have modern fast machines, then implement this as the speed benefit is obvious and worthwhile. Aim to please the majority. |
Stuart Swales (8827) 1357 posts |
Indeed! |
Matthew Phillips (473) 721 posts |
Stuart, I was puzzled in the apcs_softpcs header file that you have #if defined(APCS_SOFTPCS) near the top. How is this different from #ifdef APCS_SOFTPCS I was wanting to put similar conditions in my application so as to be able to compile soft float and FPE versions, and I was wondering whether I am missing something. |
Pages: 1 2 3 4 5 6 7 8 9 10 11 12