Going round in circles, quickly
Posted by Steve Revill Sat, 10 Jul 2021 11:15:00 GMT
There are a few areas where the potential for speed-ups are huge yet to date RISC OS has been unable to harness them. One such area is where the more complicated floating point mathematical operations are used, like sine and cosine, rather than simple add/subtract/multiply/divide – but that has all changed in our nightly RISC OS 5 ROMs from July 2021 onwards.
Origins
The Arm instruction set design includes a chunk of spare space to communicate with coprocessors. Coprocessors are add-on areas of the silicon chip which handle whatever specialism those spare instructions are defined to do. One such add-on is a maths accelerator.
The original math accelerator was unimaginatively called the Floating Point Accelerator, or FPA for short. Even though the last time an FPA was available on chip was the 7500FE used in the A7000+ back in 1997, RISC OS still faithfully supports those instructions through emulation with the FPEmulator module. By modern standards, this can be quite slow, particularly for the more complex mathematical operations.
In steps the Vector Floating Point unit, or VFP for short. Introduced in 1998 the VFP replaced FPA entirely, bringing with is a larger bank of working registers, but VFP wasn’t instruction compatible with FPA. For those platforms sporting Vector Floating Point units, which is now a vast majority, the potential for high speed computations using the VFP unit is great.
Support in a roundabout way
RISC OS now supports a set of 11 hardware-accelerated functions:
- sine and arcsine
- cosine and arcsine
- tangent, arctangent and atan2 (a variant of artangent)
- logarithm base e and base 10
- exponent
- power
A bezier curve uses several of these complex mathematical functions
The VFPSupport module in RISC OS 5 is responsible for looking after the VFP coprocessor, so was the natural home for these new functions. By having them available centrally it’s possible to use any of them from any language that might offer them, or even directly from an application. For example, BASIC provides the keywords SIN
and COS
which map directly to sine and cosine:
10 start%=TIME
20 FOR rad = 0 TO 2*PI STEP 0.000001
30 x = COS(rad) : y = SIN(rad)
40 NEXT
50 PRINT TIME-start%;" centiseconds"
This works out over 6 million coordinates of a unit circle, so it should be possible to see the difference in speed when run on integer BASIC V and floating point BASIC VI. Try it!
And to round off, a word about making it happen
We’d like to make a special mention of RISC OS FR who were a key sponsor and tester for this development work. There’s a big resource of BASIC-related material on their site, and of course there’s still time to bag a copy of the BASIC-only DDE if you’d like to explore programming in BASIC at no cost.
Wow! That is impressive.
I did a few quick tests with the program above, on my Titanium…
Test#1 RO5.29 (22 Apr 2021), BASIC v1.81 (26 Sep 2020), VFPSupport v0.13 (20 Feb 2018)
Test#2 RO5.29 (11 Jul 2021), BASIC v1.84 (23 Jun 2021), VFPSupport v0.16 (28 Jun 2021)
The results are in cs, and do vary slightly from run to run.
Obviously whether this speed increase is noticeable depends on how much the accelerated functions are used in a particular program.
The posting references this being applicatable to other high level languages beyond BASIC, but it isn’t clear to me if that is now working, or requires further changes to be implemented (eg. sharedclib or something)?
PS either way, excellent progress!
Interesting. One thing I take away from this is how good the floating point routines in BASIC V are.
I thought it was possible to plot a Bezier curve parametrically without using any trig functions.
Good question Andrew. Technically, with the VFP opcodes and the additional functions now provided by VFPSupport, you can completely support IEEE-754 calls. But a direct support in DDE would be needed at some point in the future. Aswell as an accelerated FPEmulator module (for old software).
Very nice ! But proof me wrong, I think like Chris said, an implementation of cubic or quadric Bezier lines is ‘normally’ done just using multiply/add and none of the listed functions…
I used the given BASIC program to test speeds on ARMX6 and Pi 400, but my results do not show the improvement that Martin reported in his Test#2 (above). So I started thread ‘BASIC VFP speed’ in forum ‘Community Support’. It would be good to understand the caveats that seem to apply when expecting the speed-up described in your news article.