FP support
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Rick Murray (539) 13840 posts |
This is too weird. I have taken your suggested code and tweaked it as follows – I have pushed the entry R0 to R4; and you don’t need to preserve R0-R3 (APCS defines them as “can be corrupted” – normally a function returns something in R0). Here it is:
I have used R4 for a very specific reason. The reason being that R4 is not R3. Confused? Look at the code. The code doesn’t make sense, does it? I copy R4 into R3 and then call Float_Start, and then I copy R3 into R1 for Float_MUL. Easy. This nonsense code works with Float 0.65. What you see works. Uh… Okay. Over to you. My head hurts. |
Steve Drain (222) 1620 posts |
Perhaps that is what makes it odd to me. ;-)
I test from BASIC and use 10000000. I have to run the Float_Dummy SWI to discount the BASIC overheads, but I see those impressive speed advantages, too.
Maybe I picked up on this, without analysing it properly:
Yes. I spotted that a while after I posted, but I thought I would give you the pleasure of pointing it out. ;-) So, there is a use for R4 after all, although I would employ R3 for that purpose, for style only. If correcting that is not effective then I am not sure what to suggest. I would expect Float_Start to be called when the program runs and Float_Stop when it closes down. Then any Float SWI can be used in between, with the use of VFP or FPA code being automatically chosen. I had not thought it would be used the way you have. |
Rick Murray (539) 13840 posts |
Got it! (wheeeee!) Your function for Float_Start preserves R3; however you call OS_Module 18 to check that VFPSupport is present. That call corrupts R4 and R5. I’ve made the change (STMFD R13!,{R3-R5,R14} and the corresponding LDMFD) and rebuilt the module, and now Float 0.65 works. Phew! |
Steve Drain (222) 1620 posts |
My first thought is that Float_Start is corrupting R4, but I am not near the machine to check. I will report back. I am glad that otherwise the code now works. ;-) Edit: We crossed, but good news. I will make the change. |
Rick Murray (539) 13840 posts |
Pfffft…. I spent an hour and a half chasing ghosts. I was so busy getting the “fake BL by messing with R14 and PC” to work nicely that I forgot to pass the parameters to the Float_MUL code, and then wondering why it crashed. Duh. Observations:
Okay. Here’s the final result, on a Pi model B revision 2, clocking at default speed (what is that, 700MHz or something?).
The OS_IntOn SWI is the SWI with the lowest overheads I could think of that doesn’t muck up registers. This was mainly to see what influence a SWI call would make. Kappa Multiply is calling Float_MUL. KappaDirect Multiply is calling the same by branching directly into the jump table. The direct call into the Float module has the danger that it sets up a CLib “fixed pointers problem”, but unlike CLib, Float will refuse to quit if there are active clients – and for that Steve is to be commended. It has been shown that for those using assembler (and BASIC) that there is a viable option to call FP operations regardless of whether FPA, FPEmulator, or VFP is installed. Float’s 32cs is slower than raw VFP’s 7cs, but it is by far better than FPA’s 388cs. I suppose Float on a non-VFP system might be kind of slow, but there are fewer and fewer such machines and really we ought to look at methods to make use of the VFP we have instead…hang on, I have a feeling I’ve said this before. Actually, I noticed that Float can be optimised further – it looks like most of the instructions check the context pointer given and branch if VFP. Why? As the FPEmulator is the slow path on most 32 bit machines, by having the VFP code as the fall-through and branching to the FPE code could bring that number down a little? ;-) Okay. I’ve spent way too long on this stuff, but I wasn’t willing to let it go until the test program worked. |
Rick Murray (539) 13840 posts |
PS: Steve, get Float registered. :-) |
Steve Drain (222) 1620 posts |
@Rick Thanks for all your help. I think it is true to say that developers crave feedback, and reports of errors as much as anything. Silence is the killer of ambition. ;-)
I use an alternative method that reduces the ‘messing about’. This is taken straight from a bit of BASIC assembler:
In this case r14 is also the branch table base, so substitute it in another situation. A more extensive version of this is used in Basalt.
I have been caught with that error, but not consistently. I think it is may be partly a consequence of not preserving R0 over the call. I have changed that and I will see what happens in the future.
There is a Float_Dummy SWI. It is not documented, but I did mention it before.
;-)
Those figures are commensurate with what I found and justified the pursuit of this method.
I took this point when you mentioned it before and I am in the process of reversing the branch order. It makes a tiny, but detectable, change.
OK, but it cannot be registered as Float, which is the name of Steve Fryatt’s application. Any suggestions? After all this, you have really not addressed the primary purpose of Float, which is to provide the trancendental operations that VFP does not have. Perhaps you could run your test on COS, for instance. |
Dave Higton (1515) 3525 posts |
FloatingPoint |
Rick Murray (539) 13840 posts |
SmartFP ? |
Rick Murray (539) 13840 posts |
Tell me about it. I have in the past come across some very obvious problems and have thought how has nobody else seen this? People – pay attention. Whinge (nicely!) or there is a chance we may not ever see the problem. I don’t know about you, but sometimes in my programs I add features that I don’t need that are in some way relevant to make the program more useful. Consider the idea of mail merge. If I was writing a word processor, I would add something like that to it, despite having never had a use for it in my life…
;-) I was just looking for ways to handle non-integer numbers better than emulating a piece of hardware nearly a third of a century old….. Sure – I would be happy to do that. How? Isn’t COS the one to draw circles on the screen? I read your brief overview of the extra operations. Didn’t understand a word of it…. ;-) |
Rick Murray (539) 13840 posts |
Update – looked COS up on wiki. https://en.wikipedia.org/wiki/Trigonometric_functions Riiiiight. Is there a version written in English? And I don’t mean https://simple.wikipedia.org/wiki/Trigonometric_function ;-) |
GavinWraith (26) 1563 posts |
I think I am right in saying that BBC BASIC, at least back in the days of the BBC B, implemented the transcendental functions that VFP does not have by using rational function approximants, as continued fractions. That is to say, for each function f you store two tables of numbers a[i], b[i] whose size, n, determines the accuracy and range of the approximation and then define
No doubt appropriate values of these tables for a given size of fp-number can be found in the holy books. This gives a straightforward and uniform method of implementation, at least for an appropriate range of argument. |
Steve Drain (222) 1620 posts |
I just mean that you substitute Float_COS for Float_MUL in your routine, remembering that COS has only one parameter. You are relieved of all responsibility for the mathematics. ;-) |
Steve Drain (222) 1620 posts |
That could well be the the case – I have no knowledge – but continued fractions involve repeated division, and even on a system with hard float, division is more costly than multiplication. Float uses Chebychev polynomials, which are derived from Taylor series expansions. The latter are interesting mathematically, but very inefficient computationally. The aim with Chebychev polynomials is to have as few terms as possible for the required precision. The number of terms is also governed by the range over which the precision is required, so there is an element of range reduction before actual computation starts. The calculation of the polynomial coefficients for the required precision is not simply done. The Holy Book for this is ‘Computer Approximations’ by John Hart et al, first published in 1968, half of which is computer typeset tables. The other half is ‘Notes’ on how to use them, but what notes! Continued fractions are mentioned, but only to derive polynomial approximations. Here is an example for cos(x) to 8 decimal places for -1/2 < x < +1/2 313x^4 + 6900x^2 + 15120 cos(x)~ ---------------------------- 13x^4 + 660x^2 + 15120 For the computation we first calculate X = x^2 then do: P = 15120 + X(6900 + X(313)) Q = 15120 + X(660 + X(13)) And lastly cos(x)= P/Q That is 4 additions, 5 multiplications and 1 division. Here endeth the lesson. ;-) |
GavinWraith (26) 1563 posts |
And only integer coefficients!
You can always get away with only one division, and use Horner’s method to minimize the number of multiplications for the numerator and denominator polynomials. Rational functions may not make sense for approximating asymptotic behaviour. Theoretically you can aqpproximate any continuous function by polynomials over a bounded closed interval, but using rational functions probably has computational advantages. The code to evaluate a rational function can be completely generic, taking as parameters the argument x to the function, the size n of the two arrays and pointers to them. That might explain why vfp does not bother to include individual transcendental functions – but then the same might apply to other fp systems, I suppose. |
Steve Drain (222) 1620 posts |
It took some effort to twist my head around what I have done already, but I am a sucker for more punishment, so can you point me to any books/papers on this? And the Holy Book, of course. |
GavinWraith (26) 1563 posts |
Not sure what to recommend. I only mean that the library would contain code compiled from something like this for evaluating rational functions, with the user able to cook up her own. I am no numerical analyst. It is a shame Joe Taylor is not on hand as this kind of thing is up his street, and he would know which Holy books to consult.
|
Steve Drain (222) 1620 posts |
@Gavin On closer inspection, I think we are talking about the same thing, but coming from different directions and with only slightly overlapping terminology. What you have above is equivalent to what I have in Float, except that mine is in assembler. I do not think there is likely to be an advantage to be gained looking further. |
Rick Murray (539) 13840 posts |
Okay then. FPEmulator’s COSD versus Float’s Float_COS (using VFP). This is a rather obvious race, but okay, let’s do it, we might be surprised… Oh, really? Surprising would be America being dumb enough to vote Donald Trump for president1. Surprising would not be Float kicking FPEm’s ass, that’s just so obvious.
So, Steve, a minute and a half versus seven seconds (worse case with SWI decode). Does that answer your question? ☺ 1 Before anybody says naah, it’ll never happen, remember they re-elected Dubyah, leading to a perfect front page of the Daily Mirror. |
Steve Drain (222) 1620 posts |
Many thanks. It is consistent with what I found, but your tests are more rigorous and do not have BASIC getting in the way. I have asked for the module to be registered as SmartFP. |
jim lesurf (2082) 1438 posts |
Not sure it is relevant to the above, but FWIW whenever I need to compute maths functions I tend to start from either: 1) Numerical recipies in [whatever language] or 2) Abramowiz and Stegun, Pocketbook of mathematical functions (1) is, I assume, well known by programmers and has useful code examples. (2) may be less well known unless the programmer is also a mathematician. Dunno, I just latched on to it as a ‘go to’ for solving all the problems I’m not mathematician enough to do for myself. :-) There is also a Handbook of mathematical functions. Bigger and useful if you want tables of values to check against your own results. Jim |
Jeffrey Lee (213) 6048 posts |
Steve: If you’re going to release Float/SmartFP ‘officially’ then I should probably point out a few concerns that came to mind when I had a look at the code the other night:
If you can fix those then I think the code will be in a pretty good state to use as the basis for a VFP version of BASIC64. |
Steve Drain (222) 1620 posts |
@Jim Thanks. I have come across the Abramowiz books. They are facinating, but when it comes to approximations at double-precision they come nowhere close to providing a solution. |
Steve Drain (222) 1620 posts |
@ Jeffrey
Something I was aware of, but I did not see anything simple to do with it, and my immediate attention then was on the calculation. I will go back and look with your suggestions in mind.
Without looking, I think I have done this somewhere in the module already.
It so happens that the coefficients for LOG10 are much easier to find the range for with the least computation at the necessary precision. I did read quite a lot about LOG2 algorithms, but in practical terms it did not seem the best solution. It may still be worth another look, but I am happy the speed is reasonably good.
The same rule I quoted to Gavin. Your suggestion sounds like an excellent idea.
Oh, yes! I rapped Rick’s knuckles for not reading the header REMs where this is specified. ;-)
So that is how it works! I am at this moment doing the FPSCR page for an update to the StrongHelp VFP manual and I was puzzling how you use it. The float SWIs do attempt to forestall exceptions by checking the input. They provide more useful error messages than the FPA ones, I think.
I would prefer to trap Nan and Infinity at the input stage, because I do not foresee a situation where the SWIs should be using them.
Thanks. I have been mentally preparing a hack of the BASIC64 module to do that. ;-) |
David Feugey (2125) 2709 posts |
Very good news for Basic
Very good too :) |
Pages: 1 2 3 4 5 6 7 8 9 10 11 12