FP support

280 posts, 29 voices

Pages: 1 2 3 4 5 6 7 8 9 10 11 12

Aug 3, 2015 7:18pm David Feugey (2125) 2709 posts	No, I was actually pointing out that there is more than one way of contributing to a project and that if someone’s expertise lies in the wordsmith arena then they can work on documentation updates and still contribute. Really? What a surprise. It’s so new to me. OK, replace, “So users have no word to say until they become GCCSDK developers?” by “So users have no word to say until they become GCCSDK contributors?” if you want. Strange sensation of deja vu in this thread. I would prefer to stick on the main subject, please.

Aug 3, 2015 7:30pm Steve Pampling (1551) 8170 posts	OK, replace, “So users have no word to say until they become GCCSDK developers?” by “So users have no word to say until they become GCCSDK contributors?” if you want. Not what I want to do at all, it doesn’t convey the point. It’s about doing rather than complaining that someone else isn’t doing. With which I shall go and play with the Filer again.

Aug 3, 2015 8:59pm Rick Murray (539) 13840 posts	Alas, as far as I can understand what people are debating, I’ll attempt to summarise it: Whinge whinge it’s complicated. More or less. Maybe. (^.^) having RO machines that actually cause existing programs I’ve compiled from ‘C’ using the ROOL compiler to access and use real floating point hardware. This won’t happen. If you used floats and such, the compiler will have used FPE instructions in dealing with them. At the very least, if a VFP capable compiler arrives, you will need to recompile to make use of it. Well… I suppose somebody clever could rewrite a version of the FPEmulator that picks up on the FPA instructions and calls the corresponding VFP instruction where it exists… Things won’t be as fast as using VFP natively, but ought to be faster than ARM code. that “just does this” without the user or programmer having to do anything special for it to occur when FP hardware is present. Ah, now herein lies a potential gotcha. Did you notice that my VFP code was wrapped in calls to create a VFP context, and then destroy it afterwards, yet the FPE code didn’t do this? If I remove the line that calls VFPSupport_CreateContext and then call the VFP code, I see this: `Internal error: undefined instruction at &0000826C Postmortem requested [...lots of gibberish...]` Add it back in again, the code works. Therefore – either RISC OS is going to have to assign itself a “default” VFP context so that applications can use the VFP hardware like they used the FPA (and creating/destroying subsequent contexts if a specific application and/or state is required) or every application is going to need to be aware of this and manage VFP contexts manually – which can include around Wimp_Poll and the like. It might be a good idea to consider the OS itself maintaining a default VFP context, so that VFP instructions can be used more freely than at the moment. For a while in the far past, this worked. For a while, there existed FPA chips that could be inserted into the early computers, Acorn never seemed to consider FP to be something worth promoting. Not present as standard on most machines, rare, expensive, and forgotten in the RiscPC era. Heck, even CJE don’t have any in stock! :-) The thing that hurts, though, is that we are still emulating an FP unit like we did in 1987 when most modern hardware has one, if not two, entirely functional FP units built in. You could use VFP today if you roll your own code away from the C implementation. But from C? Not yet…

Aug 3, 2015 9:16pm Rick Murray (539) 13840 posts	The Rick Guide to hacking VFP into a C program: Here’s how I made my example. I wrote it in C. Yup. It started as a C program. And I checked it worked. Then I compiled with the “-S” flag set. This tells the compiler to spit out a textual “source” of the assembler instructions that the compiler translated the C source into. It places this in the ‘o’ directory, so don’t panic if the build then fails. Just open the o.whatever file and look at it. So… I looked at the code to see what the C compiler was doing, and I created my own bit of assembler code, and copied the FPA code into a function. In the C code, I call the function. I pass a pointer to the “double” register to the function to make it easier for the assembler code to pick it up – it’s there in R0 (a1). For instance: `// the C code looked like this meh = (123.456 + argc) + 654.321;` and the -S output was like this: `[...preamble and stuff...] FLTD f0,v1 LDFD f1,[pc,#L00010c-.-8] MUFD f2,f0,f1 LDFD f3,[pc,#L000114-.-8] MUFD f0,f2,f3 STFD f0,[sp] [...stuff...] L00010c DCFD 123.456 L000114 DCFD 654.321 [...etc...]` Okay, now the complication with the above is that I multiply with “argc” (from the main() definition). This should usually be ‘1’, so I am telling the code to multiply 123.456 by 1. This is important, because the compiler is smart and if I did not multiply by an unknown value, the compiler would recognise that 123.456 and 654.321 are both constants, and it would simply work out the result and use that, skipping the calculation entirely. So, the FLTD and the first MUFD are unnecessary. The importants part here are the two LDFDs, the MUFD, and the STFD. I’m not entirely certain what that `-.-8` stuff is about, but it is trying to work out some sort of offset using an address and PC. I think this is roughly equivalent to an ADR instruction, or “calculate a pointer to this”. As I am really wanting to do a less complicated calculation, I could load the first FP value into F0, load the second into F1, then calculate the result into F2, saving F2. This means my assembler routine would look like this: `AREA \|C$$code\|, CODE, READONLY, A32bit EXPORT my_fpa_code my_fpa_code ; void my_fpa_code(double ); ; So a pointer to the double given will be in R0. LDFD F0, first ; load first value into F0 LDFD F1, second ; load second value into F1 MUFD F2, F0, F1 ; multiply them, result in F2 STFD F2, [R0] ; store F2 to where R0 points to MOV PC, R14 ; exit function, return to caller first DCFD 123,456 second DCFD 654.321 END` Notice that the LDFDs are less complicated. I tell objasm what I want to have loaded into F0 and F1 and let it worry about how best to do it. The end result is the same… And the C code? At the top I do this: `extern void my_fpa_code(double );` and in place of the calculation, I do this: `my_fpa_code( (double )&fpval );` Or, to put it all together, this: `#include <stdio.h> extern void my_fpa_code(double ); int main(void) { double fpval; my_fpa_code( (double ) &fpval ); printf("The result of the FPA calculation is %f\n", fpval); return 0; }` Note – the AREA specified in the assembler code should be “\|C$$code\|” to place the code alongside the C code. This requires you to also specify the area as being CODE and READONLY. So… I got the compiler to build a proper version this time (no more -S) and before running it I looked it over in Zap. Once the start of the main() code was found, I just stepped through it to make sure it was doing what it used to do, only now with a branch to the FPA code, and back again afterwards. If you build the above, you will see this in action. It was good, it worked as expected. So I copy-pasted the FPA code and translated the second copy to use VFP instructions. As you can see from the original source I posted, it is somewhat more long-winded. The FPA code is a nice tidy four* instructions. Load this, load that, multiply them, save the result. It is exactly the same for the VFP code, except that it needs to be wrapped in calls to create and destroy the context, and the value – when written to memory – needs to be swapped around to match the older FPA. This swapping will probably need to be done forever as not doing it will make things incompatible with the FPE, with CLib, with… It’s a monumental pain in the ass, but there you go… I mention this just in case you are feeling adventurous with trying out some VFP code. Write your function in C as you would normally, then take a look at what the compiler is doing with the FPA parts. Then have a go at moving them to an assembler file and calling those functions instead of performing the calculation in C. Once that is working, then you have the fun part – have a crack at replacing the FPA instructions with VFP ones. To help you along the way – save the C code as “c.fptest” and the assembler code as “s.fpcode”. Here is a MakeFile to build the code: # Project: FPTest # Toolflags CCflags = -c -depend !Depend -I,C -apcs 3/32bit -Wdcp -throwback -fan Linkflags = -aif -o $@ ObjAsmflags = -APCS 3/32bit -desktop -Depend !Depend -ThrowBack -Stamp -IC:, # Source files used (in 'o' directory) c_files = o.fptest s_files = o.fpcode # How to build @.FPTest: $(c_files) $(s_files) C:o.stubs link $(linkflags) $(c_files) $(s_files) C:o.stubs # A macro for building the C code .c.o:; cc $(ccflags) $< -o $@ # And the same for the assembler code .s.o:; objasm $(objasmflags) -from $< -to $@ # Dynamic dependencies: And, remember, we’re here to help. Stuck? Just ask. [disclaimer: I can help with the nuts and bolts but I suck at maths]

Aug 4, 2015 9:44am Steve Drain (222) 1620 posts	every application is going to need to be aware of this and manage VFP contexts manually – which can include around Wimp_Poll and the like. Jeffrey has implemented Wimp context switching on 26 Nov 2010: I just checked in the code to add VFP context switching to the Wimp ;) To keep things simple the context switching isn’t tied down to Wimp_Initialise version numbers or Wimp_Poll flags or anything like that; instead it just performs context switching for all tasks. Each task (created by Wimp_StartTask) starts with the null context active, so it’s down to each program to create/destroy contexts as needed.

Aug 4, 2015 11:02am Dave Higton (1515) 3525 posts	it’s down to each program to create/destroy contexts as needed. It makes me uneasy… it’s not as automatic as I’d like it to be. Is there a better way? Back in the old days, we had FP instructions – one and one only set. Apps were written to use them. If the hardware that the app was running on had hardware FP, the app ran quickly; if not, it ran slower. The key desirable feature was that it was all automatic. There was no “if hardware FP, do this, else do that”. Can we get back to that degree of being automatic?

Aug 4, 2015 12:03pm jim lesurf (2082) 1438 posts	Back in the old days, we had FP instructions – one and one only set. Apps were written to use them. If the hardware that the app was running on had hardware FP, the app ran quickly; if not, it ran slower. The key desirable feature was that it was all automatic. There was no “if hardware FP, do this, else do that”. Persackly. The point here is that from my POV what is needed is that the FPE/Clib/whatever simply trap and deal with this so the persons writing and using the program just get ‘best performance for their box’. No need for hacking about or special cases. If you have accessible FP hardware, it gets used. If not, its emulated. We had that once. We need it again. I’d like this for the ROOL compiler even though people seem to focus on GCC. TBH it would seem odd if it was done for GCC but not ROOL’s ‘own’ compiler. But that’s a different detail to the above more fundamental point. Jim

Aug 4, 2015 12:48pm Rick Murray (539) 13840 posts	Is there a better way? It would be nice if the OS owned a “default context” so VFP instructions could just be used without faulting or needing contexts created and destroyed. I’d like for for the ROOL compiler Hear hear. It makes no sense at all on a Pi (Beagle, iMX6, …) for cc to build an executable using FPA instructions. NO current system supports them natively, and the older ones that did were rare. Back in the old days, we had FP instructions – one and one only set. […] We had that once. We need it again. Please be aware that there are several co-existing issues here. First of all, the FPA is faked. Not only is it faked, it is ancient. I’m surprised ARM have not deprecated the use of its instruction space in order to put something else there… The proper hardware FP is VFP. The XScale (Iyonix) doesn’t have it. Every RISC OS machine afterwards does. We can discount NEON. It is a non-IEEE compliant mini-FP designed to get “good enough” results extremely quickly and sometimes in parallel. This isn’t aimed at precision, it is aimed at media decoding, so it is a somewhat specialist FP unit that can be largely ignored (a person wanting to use it can always drop to assembler…). It is preferable to drop support for FPA and instead consider VFP to be the new floating point system. This cannot be done for the primary reason that the FPA and the VFP store their data back to front in comparison to each other. Therefore modifying CLib to deal with VFP-style data (a simple enough modification) would immediately make it incompatible with all FPA code, including anything built using the current compiler. The only logical approach is to retain the current FPA behaviour in the compiler, and add an option to use VFP instead. This part of the compiler will know about the backwards word ordering and will swap the words after STFD (or whatever that is in UAL). Suboptimal? Yes, a bit. But it is far far better than sticking with FPA and certainly a lot nicer than arbitrarily wiping out support for the FPA (which would affect anything using floating point compiled today and earlier). If you have any question as to why, consider what is supposed to happen if you were to use the `%f` specifier in a printf statement… CLib currently expects FPA ordered data. It may need to remain that way. Is it ‘safe’ to switch FPEmulator and CLib in sync, like in tomorrow’s build, to store and access the data in VFP order? No. Not at all. Because all of the constants embedded in existing software will be in FPA order. How would a modified FPEmulator know if the data to be loaded is FPA format or VFP format? You can’t say “load old save new” as you could never reload anything saved. It is therefore pretty much fixed in stone that FP values wold need to be written to memory in FPA format. We also run into the problem of how to deal with code that is going to run on an earlier machine. For these, I would say the best solution would be to have an FPA build of the software for them. It ought to be viable, instead, to have a VFPEmulator to fake the VFP instructions, but the question there is who is going to write such a thing? Is it even viable to take the time to do it when simply building an ‘old machine’ version is easier all around? So back to this: Back in the old days, we had FP instructions – one and one only set. Today, we have an old set and the new set. For the majority, the older set was never ‘real’, the newer set is. For various reasons, the two – even though they are IEEE compliant and understand the same basic number styles – are not directly compatible due to an annoying quirk. We need to make use of the new FP hardware. We can’t arbitrarily kill off the old FP mechanism. What has been written above are my ideas. Yours are welcome too. But to be honest, to get the best use of new FP hardware while keeping FPA code going is likely to involve compromises.

Aug 4, 2015 1:42pm Jeffrey Lee (213) 6048 posts	it’s down to each program to create/destroy contexts as needed. It makes me uneasy… it’s not as automatic as I’d like it to be. Is there a better way? Back in the old days, we had FP instructions – one and one only set. Apps were written to use them. If the hardware that the app was running on had hardware FP, the app ran quickly; if not, it ran slower. The key desirable feature was that it was all automatic. There was no “if hardware FP, do this, else do that”. Can we get back to that degree of being automatic? There are several things here: There has never been “one and only one” FPA instruction set. It was always a programs responsibility to check that FPA was available (otherwise if you tried running your code on a version of the OS which predated FPEmulator being in ROM it would crash). Plus Acorn’s handling of the global FPA context hasn’t exactly been great – witness the fact that you actually have to tell the Wimp that you want it to save your FP registers – FPA handling may look automatic but it’s far from it. FPA only had a couple of versions, VFP/NEON has many. It’s important to have at least some kind of feature check there to make sure the program is compatible with the hardware it finds itself running on (and that it isn’t trying to do something silly like use software NEON implementation for a high-performance media codec). Otherwise you’ll just be giving people a bad user experience (e.g. crash due to undefined instruction). GCC will handle VFP contexts and feature checks for you. You don’t need to make any modifications to your code, all you need to do is tell the compiler to target a VFP or NEON FPU and the runtime library/stubs will do the rest. When/if Norcroft gains VFP/NEON support for C code I’d expect it to be the same. There’s no default global/per-app context because (a) I want to get people out of the mindset of “it’ll just work” (see point 2), (b) VFP contexts can be quite large (32 doublewords = 256 bytes), so for performance it would be nice to avoid pointless context switches, and © if people assume that there’s always a context available they may get into situations where they fail to correctly set up the state for their code – e.g. a SWI which assumes that VFP is available will probably fail if it gets called from code which uses short vector mode (i.e. actually uses VFP for vectors). Also note that although it would be possible to emulate VFP on all hardware, NEON can only be emulated as far back as ARMv5, since the instruction encodings make heavy use of the ‘NV’ condition code (which will be ignored on ARMv4 and below).

Aug 4, 2015 2:21pm Steve Drain (222) 1620 posts	Given a fair wind, I will have a new version [of the VFP StongHelp manual] up shortly. Version 021 is now available. ;-) Edit: updated version There are bound to be errors, especially among the NEON instructions, so please let me know.

Aug 4, 2015 3:27pm jim lesurf (2082) 1438 posts	Today, we have an old set and the new set. For the majority, the older set was never ‘real’, the newer set is. Given that the older set wasn’t “real”, but worked, that seems a strange basis for some of what you say! :-) The whole point of the FPE was to trap and deal with situations where the hardware didn’t match the compiled instructions. If there things an endian-change in the data values than I guess it would make life easier for all if that was also trapped. It seems a mess to have to recompile seperate versions. The compiled code defines what the program is meant to do. Provided there is one set of rules for the meaning, the method of how to get that enacted is something I’d assumed an FPE/CLib could handle if those were suitable machine specific. Byte-shuffles might be a slow-down, but probably not as bad as having to do everything via bucketloads of int operations. And if most platforms are now VFP that could be the default. That might mean the compilers have to flag what they now do so the FPE/Clib can tell them from old code, but again I can’t see why every programmer should have to generate multiple compiled version or think that makes more sense than having an FPE/Clib deal with it. However what do I know? Just that this did work fine for a while, so it seems weird that things have been ‘improved’ so much that what was done then is now impossible. Not what I’d have envisaged as ‘progress’ I guess… 8-] Jim

Aug 4, 2015 4:41pm GavinWraith (26) 1563 posts	Interpreted languages are often slow for number crunching because of interpretive overhead. Interpreters can only give you off-the-peg operations: standard arithmetic operations, square root, exponential, logarithm, trigonometric functions and so on. What they really need are more fundamental operations, such as inner-product of vectors, determinants, Horner’s method for evaluating polynomials, continued fraction evaluation, etc. Joe Taylor and I produced the MATROM, a sideways ROM for the BBC B, to extend BBC BASIC with matrix arithmetic, some time before Acorn did. Ours did matrix inversion, but Acorn’s did not. So what mathematical operations would you like to see in an interpreted language? Or does one throw up one’s hands and say use assembly language instead ? To avoid interpretive overhead one wants to get as much as possible, particularly looping, done with low level code. I always felt that BBC BASIC was never developed as far as it could have been in this direction.

Aug 4, 2015 5:19pm Rick Murray (539) 13840 posts	Jim – there is not a big problem in dealing with VFP now/soon (depends how much the hardware can do and how much needs to be picked up in software – typically the more esoteric functions are handled in software). The problem, as you might have guessed from my lengthy message(s) is being able to do this in such a way as to not affect code compiled with the existing FPA instructions. Unfortunately the word endianness (words, not bytes) is different, which means that one cannot easily make assumptions as to what an FP value is when it is seen in memory. If you refer to the original program I posted when doing the comparison between VFP and FPE, I got a really weird value for the FPA code because objasm spotted the VFP instructions and stored the FP data in the appropriate format. The FPA code loaded them and saw something completely different because the two words of data that comprise the FP value were back to front. Not invalid data, just something else. The compiled code defines what the program is meant to do. Indeed, and it will typically aim for the lowest common denominator in order to have the widest compatibility. It wasn’t until the 32 bit change that the compiler started to use the MRS instructions. I’ve just looked through a program of mine and I see MUL a number of times, but no UMULL or any of those. Why? Well, I have a vague recollection that a 20 year old processor might be upset by it, and since RISC OS 5 could potentially run on a 23 year old processor that doesn’t support it at all, the base default option is not use it. It’s the same with FP. The base default option is that FPEmulator is supplied built into every computer. The FPA chip hasn’t existed in any Acorn/RISC OS hardware since the A5000, but the emulated version is omnipresent, so that is the default regardless of whether or not it makes sense given the hardware capabilities. The sticking point again – RiscPC and Iyonix machines, which are capable of running RISC OS 5, do not have VFP. Unless somebody is willing to step up and provide some sort of VFP emulation, these machines likely won’t have VFP in any form and thus would require an FPEmulator-friendly version. but again I can’t see why every programmer should have to generate multiple compiled version It isn’t ideal, but to my knowledge there is no VFP emulator for older machines nor one planned… However what do I know? Just that this did work fine for a while, so it seems weird that things have been ‘improved’ so much that what was done then is now impossible. Not what I’d have envisaged as ‘progress’ I guess… 8-] Sarcasm aside, the problem (as I see it) is not in supporting VFP or even whether or not RISC OS should mandate basic VFP (v2?) going forward. The problem is how to do this and co-exist with earlier code that uses a different FP system.

Aug 4, 2015 6:03pm Steve Drain (222) 1620 posts	The problem is how to do this and co-exist with earlier code that uses a different FP system. Not a problem without any solution – I carefully described one a few days ago. You might not want to do it that way, but you should not make it seem more difficult than necessary. However, a slot-in replacement for the FPE is going to be difficult and is way beyond my pay grade. ;-)

Aug 4, 2015 6:15pm Steve Drain (222) 1620 posts	The FPA chip hasn’t existed in any Acorn/RISC OS hardware since the A5000 A7000+ with ARM7500FE. ;-)

Aug 4, 2015 6:49pm Steve Drain (222) 1620 posts	@Gavin I find myself in total agreement, but I think this should be in a new topic

Aug 4, 2015 7:24pm Rick Murray (539) 13840 posts	I carefully described one a few days ago. Yup – you did. I was going to plug it into my code to test it, but Float crashes on my Pi. The second instruction of the module init code is SWI &54C81 which is not recognised. Oh, I see. It is Reporter. Okay, I’ve rebuilt it without that stuff (yay for including the source). Hmmm… `Performing FPE MUL 4,096,000 times: 260[long wrong number ;-)] in 290cs Performing VFP MUL & SWI 4,096,000 times: 80779.853376 in 52cs Performing VFP MUL 4,096,000 times: 80779.853376 in 7cs Performing Kappa Multiply 4,096,000 times: 260[long wrong number ;-)] in 242cs` The SWI with the VFP is OS_IntOn, which seemed to be the simplest SWI I could think of. I had used XOS_GenerateError but that took ages to do nothing. It’s to see what the impact of an overlying SWI call would be. The answer? “It varies”. I’m not worried about your module returning the wrong data. You say you use FPA format data, so you’re reading it the same as FPEmulator, and indeed get the same result. Calling your module via SWIs takes only marginally less time than the emulated FPA functions. I know you’re calling VFP because if I edit the program to force FPA mode (pass a null context to Float_MUL), it takes 552cs; about twice as long as calling the FPEmulator. Is your module safe to call in USR mode? I wonder if I could look up the address of your module, work out where the SWI handler is, then set up the registers to permit a direct jump into there rather than the SWI mechanism. Perhaps this might permit some speed to be gained? You might not want to do it that way, but you should not make it seem more difficult than necessary. It would be nice if the compiler could emit code that would use VFP if available, or FPE otherwise. I think your idea of hanging onto the context is the way to go – a context means VFP, no context means fall back to FPE. I think in the compiler this could be checked with a load and a compare, with a branch to the FPA code (let VFP be the fall through case, as FPE is slow so an extra branch won’t change much). Sort of like this, to have a (slightly larger) executable that will work best on anything: `LDR v1,\|fpecontext\| CMP v1,#0 BEQ C000df0 ; what follows here is VFP code ADR r0,addr1 VLDR s0,[r0,#4] VLDR s1,[r0] ADR r0,addr2 VLDR s2,[r0,#4] VLDR s3,[r0] VMUL.F64 d2,d0,d1 VSTR s4,[sp,#4] VSTR s5,[sp] B C00dfc C00df0 ; what follows here is FPA code LDFD f0,[addr1] LDFD f1,[addr2] MUFD f2,f0,f1 STFD f2,[sp] C00dfc ; code continues here LDMIA sp,{a2,a3} BL printf [...]` I really like your idea of storing backwards using the single versions of the FP registers. I wish I’d thought of that! ;-)

Aug 4, 2015 7:47pm Steve Drain (222) 1620 posts	Oh, I see. It is Reporter. Okay, I’ve rebuilt it without that stuff Sorry about that. The easier thing is to set the debug flag to 0 at the start of the program. (yay for including the source). My pleasure. ;-) I’m not worried about your module returning the wrong data. You did assemble in BASIC VI, as noted in the REMs, didn’t you? Calling your module via SWIs takes only marginally less time than the emulated FPA functions. That is much as I would expect. Is your module safe to call in USR mode? I wonder if I could look up the address of your module, work out where the SWI handler is, then set up the registers to permit a direct jump into there rather than the SWI mechanism. Perhaps this might permit some speed to be gained? No need for that. SWI “Float_Start” returns the start of the SWI table – it is all documented. And I CALL the code from BASIC, so it is USR safe. There is even a library to put the routine addresses into variables. ;-)

Aug 4, 2015 8:15pm Rick Murray (539) 13840 posts	You did assemble in BASIC VI, as noted in the REMs, didn’t you? Err… No. I didn’t read them. ;-) However the incorrectness is the same incorrectness as the FPA code, and due to the fact that objasm stored the floats in VFP word order. So while the answer is technically not what was expected, it is exactly correct given the input. No need for that. SWI “Float_Start” returns the start of the SWI table – it is all documented. Where? I’m looking at “Float_Start (&0C0040)” in StrongHelp. It returns R0 = context, R1 = prev context. There’s a nit about how it uses VFPSupport to set itself up, plus some extra notes below. Nothing about the SWI table. Every single one of the examples uses the line `SYS"Float_Start" TO c%,p%` (yes, I checked them all, I’m that sad!). Furthermore, there is nothing obvious in the module init code that would suggest a pointer like that being passed. Ummm… I took it from the first page of this thread, “swFloat060” – that is the latest isn’t it? Maybe you added this and I’ve got an early one?

Aug 4, 2015 8:26pm Steve Drain (222) 1620 posts	Oh b****r. Mea culpa. Its 18 months since I uploaded that and I have been using 0.65 here. Look tomorrow and that version will be up. It will be worth it. ;-) Edit: www.kappasite.pwp.blueyonder.co.uk/Modules/swFloat065.zip Note that it is unregistered and was always intended as something for discussion rather than use.

Aug 5, 2015 11:06am Theo Markettos (89) 919 posts	I hate to quell the storm in this teacup but… Currently the only way to get a VFP/NEON capable version of GCC is to build it yourself. It’s also worth pointing out that any programs compiled to use VFP/NEON (using GCC) will need SharedUnixLibrary 1.13 – which hasn’t seen a public release yet. If you build GCC yourself you’ll get a copy of it, but to avoid potential differences once the official 1.13 comes along… I can’t speak for Lee or John (there isn’t really a ‘GCC team’ except a bunch of people on a mailing list, which anyone is free to join) but if there is call for an ‘official’ release of SUL 1.13 then we can do something about that. Things only happen when people have time, so if nobody gets round to it it doesn’t happen – but asking for it is a good way to make it happen. Additionally we don’t have any automated testing at present, so any help with manual testing is very much appreciated. Test is the biggest blocker to releases we have currently: due to the huge amount of infrastructure that has been done over the years, building is now easy but ensuring the quality of the vast amount of stuff we build is hard. I don’t think the GCC team will be happy with you distributing your own version. Regarding GCC, if you want to make a test build, fine. In fact, if you make a test build and give feedback on what does/doesn’t work, that’s very useful. We can very easily turn the handle and make an ‘official’ release so there’s not much to be gained by having confusing third party forks, though if we are tardy in doing that then I don’t think anyone will be mortally offended. SUL is a slightly more complex story because it’s a system-wide thing – you can only run one version, like SCL. That means forking is more tricky to handle. We don’t quite have nightly builds (of everything), but it’s very close – just requires time (you can see the build server here – there are still some blockers that need sorting) . Again, asking for things is a good form of encouragement. If there’s anyone interested, we can also give you access to infrastructure to do the handle-turning yourself. It remembers me a sad story about Win32 version of GCCSDK. They were pressure around this, with some good arguments around licenses. But also bad arguments. Some people suggest that there’ll be some official Win32 release, and so it was not fair to make a fork/port. We just stopped it. 15 years (yes 15!) latter, where is GCCSDK for Win32? We tried, it didn’t work out. The problem is that GCCSDK is more than just the compiler – you need make, bash, fileutils, autoconf, etc etc to build any pre-existing programs of size. This is possible with cygwin, but there are all kinds of niggly differences between cygwin and Linux. The larger target programs become the more sensitive they become to little differences. Worse, cygwin, the upstream compiler and the packages are moving targets, so your fix today might not work tomorrow. This makes maintaining a Windows branch an equivalent amount of work to the Linux branch, which we already have enough trouble finding manpower for. In fact, given that cygwin is a hack on top of Windows and doesn’t always work, probably more so. The simple answer today is run a VM. An Ubuntu VM is easy to set up on Windows (VirtualBox is free), and GCCSDK will just work in there. It’s not worth the hassle of trying to deal with Windows+cygwin at present, given the small number of prospective users. Again, if someone is interested in changing this then do go ahead – I don’t know what happened with politics 15 years ago but any input is encouraged. TL;DR: If you want something, please at least ask. If you can help, so much the better – we’re happy to show you how. If you can’t, we don’t always (often?) have developer time to do something about it (as with ROOL bounties), but simple things are quick, and if you pester enough someone might find some time.

Aug 5, 2015 12:45pm jim lesurf (2082) 1438 posts	Sarcasm aside, the problem (as I see it) is not in supporting VFP or even whether or not RISC OS should mandate basic VFP (v2?) going forward. It wasn’t really sarcasm TBH. It was a mix of accepting I don’t grasp all the complications and regret that what in many ways is “improvements” in the hardware we can use has also come with a lot of complications. But from my POV the ideal aim would be to have a complier that generates a ‘standard’ set of instructions for the floating point ops it wants. Then have the FPE/Clib/whatever on given machine deal with that by making interpretations that make sense on that machine. I would guess converting FP code to run on a different FP hardware would be faster than converting it to run on int hardware. It makes sense that – if possible – this approach should be arranged so the newer faster machines would need the least ‘interpretation’. So code should work on an Iyonix or other ‘old’ system, but the user couldn’t expect that to run as fast anyway. For word order, presumably the compiler can put a new flag into the executable. Then the FPE/Clib/whatever can spot when this is absent and go “Aha, old word order mode needed”. So again, the result might be slower, but would work. And given the source code, could be recompiled to run faster because the word order was native for the new hardware. Indeed, the ‘new’ instructions from a new compiler that can flag its new could match the main new hardware, I’d guess. I can’t help feeling this would turn out easier for program creators and users because the system deals with the hardware differences. Presumably at the expense of more effort having to go into the compilers and the new model FP adopted. However I admit I have no idea how feasible all this would be, or if it simply requires so much more work on FPE/Clib/whatever that it simply doesn’t make sense because there are other things which need more urgent attention. Hence my “what do I know?” after thinking this way seems preferable to me. I realise I’m in no position to judge how hard the above would be. Just that it would seem desirable for users who are compiling, distributing, and using many small programs. Jim

Aug 5, 2015 2:41pm David Feugey (2125) 2709 posts	but asking for it is a good way to make it happen. Thanks. So I ask :) For the tests, there are a lot of Colin’s tools that are waiting. And to be honest, I have no problems with them. I agree anyway that a specific disclaimer could be provided with SUL 1.13. We tried, it didn’t work out. Our worked. But I have not access any more the the developer team responsible of this project. and if you pester enough someone might find some time As usual, you can count on me here :) Just one stupid and completely off topic question (I’m sure someone already tries to explain me…) At was possible before to switch from x86 gcc to arm gcc (with a very simple bash command – like a link or an environment variable: I don’t remember). It was very useful: launch configure with x86 options, adapt a bit the makefile, switch to arm gcc and run make to get the tool. It was a trick, but very useful, even for complex projects (eQ did compile all SDL and X stuff this way). Autoconf rarely support RISCOS+ARM, but the makefile generated for Linux+x86 did work most of the time for a RISC OS build. Is it still possible to make this kind of switch, and how? I have no Linux VM here, but it could convince me to make some small ports of old SDL things. I just don’t have enough brain power left (and time too) to make autobuilder scripts.

Aug 5, 2015 3:15pm Steffen Huber (91) 1953 posts	David, with GCCSDK, this “trick” is no longer necessary, because basically the ARM GCC outputting the RISC OS binaries is already compatible with all this bash autoconfig configure make stuff. Or did I misunderstand you?

Aug 5, 2015 4:38pm Rick Murray (539) 13840 posts	and regret that what in many ways is “improvements” in the hardware we can use has also come with a lot of complications. If we could take the Linux approach and just mandate that “this has changed” then things would be a lot simpler. The hardware has improved. RISC OS itself is improving. But we’re stuck with having to keep a link to the past. Try `RMKill FPEmulator` and see how many things suddenly don’t work. It’s all of that stopping us easily switching to the VFP. the ideal aim would be to have a complier that generates a ‘standard’ set of instructions for the floating point ops it wants. Then have the FPE/Clib/whatever on given machine deal with that by making interpretations that make sense on that machine. You do realise – the compiler inserts FP instructions directly into the code when required? These are not library calls, they are instructions for the co-processor to execute. Furthermore, as existing code is liberally scattered with FP instructions, there is little we can do to retrospectively change that. FPEmulator takes care of them, they’re nothing to do with CLib. I suppose the compiler could* change FP instructions to call an FP library that will make a decision based upon the system in use, I’m just wondering if it is necessarily viable. The two machines that do not support the VFP are RiscPC era (RiscPC, A7000, etc) and the Iyonix. Both are old, one is very old. Frankly, I still regard the ability to select FPA only or VFP only as the sensible way forward (though for now the VFP will need to load the data backwards – thanks to Steve I see it can be done fairly easily in only two instructions). Does this mean that developers will need to offer two different versions? Yes and no. A developer with the time and need can build an old version (FPA) and a new version (VFP). A developer that thinks that is a load of bother, or who only uses FP for doing stuff like showing a percentage on the screen, can just compile with FPA instructions and it will work on all machines exactly as it does at the moment. If we mandate that VFP use be selected by the `-cpu` command line option (an ARM6 won’t, a Cortex-A8 will…) then the default for a generic MakeFile (which probably does NOT specify -cpu) will be… that nothing whatsoever has changed. In essence, the support for VFP will be there for those of us who want to make use if it. The default state is to not. I would guess converting FP code to run on a different FP hardware …is only marginally different from converting 6502 code to run on the ARM. There are similarities, but there are also differences. For instance, it looks like the FPA is able to place a value into an FP register from an ARM register (FLT). It looks like the VFP needs to transfer the data (FMSR) and then “convert it” as a separate step (FSITOS etc). Those are pre-UAL names, I’m reading from my old ARM ARM. ;-) It is a small thing, certainly, but that’s the run of porting stuff. Consider the two FP systems (three if you want to count NEON) to be like two(/three) different processors. Similar in operation but different in implementation and behaviour. Oh, and for fun, we are thus far dicussing using VFP as a replacement for FPA, we haven’t even touched upon VFP specific functionality – the “vector” part of the name, SIMD, strides, and the like. “Aha, old word order mode needed”. Or just assume the data is in the old order and load/save it in two instructions instead of one. That way, CLib (etc) doesn’t even need to be touched. VFP-enabled code would need to call two new functions (to enable/disable the VFP context) but this can be placed into “Stubs”. It is probably better there, where it can fail kindly if VFP does not exist, instead of bombing out with an invalid instruction message. CLib will need to be updated (it uses FPA instructions), but it isn’t necessarily required for VFP support by the compiler. Additionally, given storing data in FPA format, one could mix FPA and VFP code… Just that it would seem desirable for users who are compiling, distributing, and using many small programs. If the compiler remains able to output FPA instructions, and this is the ‘default setting’, then at the very least everything will appear to work “as before”. This might be the most sensible option. CC emits VFP code when the -cpu option is suitable. In doing so, it will call two functions in stubs – one to create a VFP context, one to destroy it. There will be a third function, to allow the client to read the current/previous context. Doubles are loaded as two single reads, and written likewise (for FPA word order). Uh, I think that’s probably about it for being able to use the hardware parts of the VFP. Me? If FP was a core part of my software, I’d make a VFP version available for those able to run it. You just can’t ignore FPE running 40x slower than the native hardware………

Pages: 1 2 3 4 5 6 7 8 9 10 11 12

Reply

To post replies, please first log in.

Forums → Wish lists →

FP support

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options