BASICVFP

47 posts, 12 voices

Pages: 1 2

Feb 18, 2017 3:23pm Jeffrey Lee (213) 6048 posts	Back by popular demand, it’s another discussion about making BASIC64 use the VFP instruction set. Today I spent a while looking through the BASIC reference manual and have put together the below doc which tries to cover all the relevant areas. The TL;DR version is that it’s probably going to have to be a new module (“BASICVFP”), otherwise it’s going to end up with compatibility or performance issues. BASICVFP considerations “BASICVFP” aims to produce a release of BASIC which uses the VFP instruction set for floating point math, in place of the obsolete FPA instruction set used by BASIC64. FPA supports single, double, and extended double precision IEEE 754 floats. VFP only supports single and double precision. Lukcily BASIC64 only uses double precision floats, so on the surface it would appear that a VFP version of BASIC64 would be trivial to produce, without requiring any loss of precision. However there are some other differences between FPA and VFP which make things a bit more complicated. FPA vs. VFP issues Word order FPA stored double-precision floats using a big-endian word order, i.e. when using STFD to store a double-precision float in memory the most significant 32 bits of the 64 bit value were stored in the lowest word of the pair. On the other hand, VFP uses a little-endian word order (or more correctly, the endianness of the entire 64-bit value will match the configured data endian mode of the CPU). This has the potential to cause compatibility issues with any interfaces where the memory representation of BASIC floats are exposed to the wider world. No support for trig/pow instructions The FPA instruction set has a wide range of trigonometric and power functions available, and BASIC64 made use of them for implementing its trig and power operations. The only power function VFP implements is square root. Therefore additional effort will be required to implement the full range of trig/power operations in BASICVFP. Since double-precision floats offer higher precision than BASIC 5-byte floats, we cannot simply re-use the 5-byte float code. However as a stopgap solution we could conceivably perform the operations using the FPA instruction set (which would in turn rely on FPEmulator to implement the operation in software, at 80 bit precision) Exception handling FPA had full support for hardware trapping of floating point exceptions (division by zero, NaN, etc.). BASIC64 relied on this for generating most of the floating point errors. On the other hand, hardware trapping of exceptions is optional in VFP. The hardware can still detect when an error has occured and set the relevant bit(s) in the FPSCR, but in most VFP implementations software must manually poll the register to detect any errors. This will likely result in the VFP code sequences being longer than the FPA sequences, and some care may be needed to try and avoid pipeline stalls caused by reading the FPSCR. VFP context creation FPEmulator provides the system with a default FPA context which programs can make use of. VFPSupport, on the other hand, requires that each program makes its own VFP context. Therefore some extra logic will be needed on startup/shutdown to create/destroy a VFP context for BASICVFP. Care will also be needed on any entry points (e.g. error environment handler) to make sure that BASIC restores its VFP context before calling any FP code. VFP advantages Some VFP implementations can perform operations on vectors of numbers, which could be used to speed up array/matrix operations. There’s also the potential to use the NEON instruction set for integer operations (again, most likely for array/matrix operations) BASIC64 specification review The following are key points from the BBC BASIC reference manual (ISBN 1 85250 103 0), with regards to BASIC64 "Exchanging data between BBC BASIC and other languages, like C, is now easier" (p6) Although the Norcroft C compiler is currently restricted to FPA, GCC has supported VFP under RISC OS for some time now. So it’s worth considering which word order (VFP or FPA) will make data exchange easier in the future. “The ‘\|’ indirection operator” (p161) Depending on whether BASIC V or BASIC VI are in use, this will access either a 5-byte float or a (FPA) double-precision float. Consideration is necessary for what word order to use with VFP. “When an error occurs … the values of all the variables and so on will still be intact” (p164) I.e. when handling VFP errors we need to make sure we check for any error before we write back potentially erroneous values to program variables. "Format of the CALL parameter block" (p224) CALL and USR allow 8-byte floats to be passed to assembler routines. This type will only ever be used in BASIC64, but some consideration is necessary as to whether FPA or VFP word order should be used in BASICVFP. If we are to aim for BASICVFP to be a drop-in replacement for BASIC64 then it is only natural for FPA word order to be used. However since FPA is obsolete, it would be more convenient if there was a version of BASIC available which used VFP word ordering, and allowed assembler code to make use of the VFP context which BASIC had created. This would then allow BASICVFP programs to easily be augmented with VFP or NEON assembler routines. "VARIND" (p229), "STOREA" (p229), "EXPR" (p231) The BASIC64 version of these routines use the F0 register to pass and return floating-point values. For BASICVFP this raises two possibilities: Maintain use of the F0 register. For optimal performance BASIC could have two versions of the routines; an internal routine which uses the VFP D0 register, and the external form (as exposed to assembler routines) which uses the FPA F0 register. However this may be difficult to achieve as some routines are specified as not using the stack. Only use VFP D0. This will result in optimal performance but will break compatibility with BASIC64 assembler. "INPUT#" (p294), "PRINT#" (p347) An exact specification of the BASIC64 float format is given here, describing the fact that the words are stored in big-endian order. Additionally, both versions of BASIC are capable of reading both 5-byte and 8-byte floats. Therefore for continued interoperability of data files the only sensible choice is to have BASICVFP use FPA word ordering for floats when reading and writing files. "OSCLI" (p334) This makes mention of how some of the interpreter state (e.g. CALL environment information pointer) is exposed to the executed command. Notably, there does not appear to be any way for the executed command to determine whether BASIC or BASIC64 is in use. Potentially a program could probe the end of the CALL environment block to look for the 5-byte float routines. However this does not help us if we require executed commands to be able to differentiate calls made from BASIC64 and calls made from BASICVFP. “Numeric types” (p411) The diagram on p411 represents the storage format of 8-byte floats in BASIC64 (i.e. FPA double precision floats) BASIC64 code review FPSR On entry to the interpreter, BASIC64 initialises the FPSR to &70000, i.e. the Invalid Operation, Division by Zero and Overflow exception traps will be enabled. At no other point during execution is the FPSR read or written, creating an implicit contract between BASIC64 and the user program that correct operation is only guaranteed if the program does not program the FPSR with conflicting settings. Also of note is that the FPSR is not reset when an error occurs – any code which manipulates the FPSR (e.g. a SWI) and then generates an error without restoring the correct value may break BASIC. FPOINT The FPOINT assembly constant is used to select between BASIC (FPOINT=0) and BASIC64 (FPOINT=1), allowing all relevant code to easily be located. An obvious extension of this would be to use FPOINT=2 for BASICVFP. Workspace layout Although the program and its variables can be relocated (by manipulating the PAGE, TOP, LOMEM and HIMEM variables), BASIC is also reliant on a block of non-relocatable workspace which is anchored to the start of application space (&8000). This static workspace would make a suitable place for storing the VFP context. BASICVFP proposal Two of the primary advantages of BASIC64 over standard BASIC are the increased accuracy of floating point calculations, and speed (when FPA hardware is available). BASICVFP can’t improve on precision (FPA supported IEEE extended double precision floats but VFP only supports single and double precision), but due to the prevalence of VFP hardware it can provide significant speed improvements over BASIC64. Therefore the main focus on this proposal is on gaining as much speed as possible, without being too concerned about compatibility with programs that made advanced use of BASIC64 features. I.e. “Generic” BASIC programs which don’t make any assumptions about the floating point format (and thus run fine under BASIC or BASIC64) will continue to run under BASICVFP, but programs which assume FPA word ordering or FPA register usage will fail. In detail: INPUT# and PRINT# will use FPA word ordering for double-precision floats, to maintain data file compatibility between different BASIC versions All other interfaces will use VFP word ordering and VFP registers (e.g. the “\|” operator will use VFP word ordering, and VARIND will return any floating-point value in D0) On startup BASIC will create a “full” VFP context, i.e. supporting both VFP+NEON and with all data registers available. This will allow assembler code to easily reuse the context for its own VFP/NEON calculations (although it is still the responsibility of the program to ensure the appropriate instructions/registers are available before it attempts to use them) The VFP context will be stored in the non-relocatable workspace located at &8000. However since the context size is variable, the amount of memory to reserve for the context will have to be determined at runtime. The FPSCR will be initialised to zero. It will be reset to zero by the error handler (ensuring correct recovery from code which uses VFP short vectors), but other than that it’s expected that code will not return to BASIC with it set to a value that will conflict with BASIC’s VFP usage. FP exceptions will be detected by polling the cumulative exception bits in the FPSCR. For consistency with BASIC64 only the following exceptions will generate errors: Invalid Operation Division by Zero Overflow The initial version of BASICVFP is expected to rely on FPA/FPEmulator for implementing the trig/power operations that are not supported by VFP. Future versions may provide more optimal routines, e.g. based on the routines used by Steve Drain’s “Float” module. The initial version of BASICVFP is not expected to use VFP short vectors or NEON vectors to accelerate array/matrix operations. However these are viable future improvements. The initial version of BASICVFP is not expected to tackle the issue of allowing commands executed via OSCLI to detect that BASICVFP is in use. A BASICVFP build of the module will be selected by setting FPOINT to 2. Since BASICVFP will be incompatible with some BASIC/BASIC64 programs, it will have to respond to a separate command (i.e. ‘*BASICVFP’), and will most likely be a separate module. A future goal might be to produce a version of BASICVFP which provides “full” BASIC64 compatibility, at the expense of some speed. DCFD behaviour will be left alone, for now at least (see discussion below) Other considerations Trig/power operations Since VFP doesn’t provide trig/power operations, it may be desirable to expose BASICVFP’s trig/power functions to assembler code in a similar manner to how the 5-byte float operations are exposed to BASIC V assembler routines. The addition of these extra routines could also serve as a way for commands executed via OSCLI to detect that BASICVFP is in use. However, there is no standard defined for how this list of routines should be extended, so some care may be needed in order to avoid compatibility issues with code which detects BASIC/BASIC64 via checking for the presence of the 5-byte float ops. ROM space If BASICVFP is to be included in ROM, it is worth revisiting the idea to split the assembler out into a separate module, to allow it to be shared between all three implementations: https://www.riscosopen.org/forum/forums/3/topics/903 Alternatively some ROM space could be saved by dropping BASIC64 from ROM for machines where VFP is available. DCFD The DCFD assembler directive has three forms: DCFD <number> – produces a FPA-format double precision float DCFD.fpa <number> – produces a FPA-format double precision float DCFD.vfp <number> – produces a VFP-format double precision float Arguments for whether the plain “DCFD” form should be changed to produce VFP-format double precision floats go both ways. Arguments for leaving DCFD the same: It would result in consistent behaviour across all BASIC versions Since any float-using assembler code is likely to require a rewrite to work with BASICVFP (e.g. to switch from using FPA instructions to VFP), it’s not a big deal to require the author to also update any DCFD directives Arguments for changing: Apart from INPUT# and PRINT#, all other code which interacts with double-precision floats expects them to be in VFP format. Therefore it makes sense for DCFD to match the native float type. Making features related to obsolete instruction sets easier to use than features relating to current instruction sets is counter-productive. Therefore “DCFD” (which is shorter and therefore easier to type than “DCFD.vfp”) should default to VFP format in BASICVFP. Programmers might forget to add the suffix (DCFS, DCFE and DCFH require no suffixes) A potential solution might be to introduce a new OPT bit or some other configuration variable which controls the default behaviour. However this may run into issues if the program is run on versions of BASIC which do not recognise the option. Anyone have any feedback on the above?

Feb 18, 2017 5:10pm David Feugey (2125) 2709 posts	Therefore the main focus on this proposal is on gaining as much speed as possible, without being too concerned about compatibility with programs that made advanced use of BASIC64 features. Time for Basic VII. Yeah! The initial version of BASICVFP is expected to rely on FPA/FPEmulator for implementing the trig/power operations that are not supported by VFP. Future versions may provide more optimal routines, e.g. based on the routines used by Steve Drain’s “Float” module. Or based on FPEmulator module, with ‘narrowed’ precision. Alternatively some ROM space could be saved by dropping BASIC64 from ROM for machines where VFP is available. IMHO, it’s not the best idea. Anyone have any feedback on the above? I hope that someone at ROOL will copy replicate these changes in ABC. ABCVFP mode?

Feb 18, 2017 5:27pm Clive Semmens (2335) 3282 posts	Surely ROM space really isn’t an issue, is it? It’s getting hard to get an SD card less than 8GB.

Feb 18, 2017 6:51pm Jon Abbott (1421) 2661 posts	Can you make use of the ARM VFP Support Code? Doesn’t that add IEEE 754 compatibility such as handling big-endian, missing functions, missing IEEE formats etc?

Feb 18, 2017 7:25pm Chris Mahoney (1684) 2177 posts	Surely ROM space really isn’t an issue, is it? It’s getting hard to get an SD card less than 8GB. I believe that the Titanium machines have an 8 MB flash chip, so there’s a limit there. There are probably other limits that I’m not aware of too :)

Feb 18, 2017 8:17pm Rick Murray (539) 14047 posts	My two centimes: Yes, I agree that internal representation should be VFP except for PRINT# style data (for compatibility). But I would go one further and say DCFx should be VFP by default. Remember, you aren’t replacing a BASIC, you’re creating a new additional one, so the primary objective should be VFP all the way (except for the bits where it chests, but shhhhh!). After all, most stuff expecting BASIC64 behaviour probably calls BASIC64 explicitly, so it isn’t as if loads of stuff will suddenly crash – BASIC64 will still be there. Don’t drop BASIC64, there will be stuff still using it. And is it really so painful to have three versions of BASIC available? How big is the BASIC module? Now remind me, how big are the BootFX startup images? 😜 Yes, this may start off as a softload during development, but will totally be a candidate for adding to the ROM. Even with the TI’s smallish Flash ROM size, the current OS ROM is around 5MB uncompressed, half that with compression. So…. Yeah… There’ll be space. 😀

Feb 18, 2017 9:57pm Chris Evans (457) 1614 posts	I believe that the Titanium machines have an 8 MB flash chip, so there’s a limit there I think the current Pi ROM (and all the others?) are 5MB (though IIRC only about 4.3MB is actually used) but the situation is significantly better as the ROM can be compressed, the Pi ROM compresses to IIRC a self extracting 2.7MB) So the ROM could triple in size before being a problem. In fact I wish more were included in the ROM e.g. most of what is in $.!Boot.Library Ping etc, a password protected !HForm, !Reporter…

Feb 19, 2017 1:31pm Steve Drain (222) 1620 posts	Back by popular demand, it’s another discussion about making BASIC64 use the VFP instruction set. Anyone have any feedback on the above? Impressive. ;-) it’s probably going to have to be a new module (“BASICVFP”), otherwise it’s going to end up with compatibility or performance issues. My own starting point is the opposite. A new module will cause problems of compatibility with existing programs that use double-precision floats, ie: BASIC64. They will either not benefit from a new BASICVFP or will require modification. I see that might be done in a !Run file, but nevertheless it will be necessary. As far as performance goes, a runtime choice to use FPA or VFP instructions for a keyword can be a single load and test; see below. Compared to the overhead involved in using floats via intepreted keywords from BASIC that is insignificant. All this does ignore the devil in the detail for the moment. FPA supports single, double, and extended double precision IEEE 754 floats. VFP only supports single and double precision. Lukcily BASIC64 only uses double precision floats, so on the surface it would appear that a VFP version of BASIC64 would be trivial to produce, without requiring any loss of precision. That is unforunately a beguiling but false assumption. See below. [Word order] has the potential to cause compatibility issues with any interfaces where the memory representation of BASIC floats are exposed to the wider world. I would retain all the current methods for representing and storing double-precision floats. This should mean that the wider world sees no change. The FPA instruction set has a wide range of trigonometric and power functions available, and BASIC64 made use of them for implementing its trig and power operations. […] Therefore additional effort will be required to implement the full range of trig/power operations in BASICVFP. I have the full set coded using VFP instructions. However, this is where precision cannot be at the same level as the FPE, which calculates at 80-bit extended-precision internally. In addition, the calculations in the FPE with an actual FPA could use extended-precision arithmetic instructions to achieve double-precision output. There is no way to maintain true IEEE double precision using the number of VFP calculations required for those trig and power functions. Does this matter fo BASIC? We are still talking the difference between 16,17 or 18 significant digits. Anyone who uses BASIC64 for an extended sequence of calculations has already given up on such precision because intermediate values are only double-precision. [Exception handling] The hardware can still detect when an error has occured and set the relevant bit(s) in the FPSCR, but in most VFP implementations software must manually poll the register to detect any errors. This will likely result in the VFP code sequences being longer than the FPA sequences, and some care may be needed to try and avoid pipeline stalls caused by reading the FPSCR. My own approach to this is to validate inputs to instructions and avoid generating exceptions. I cannot be sure that this can be done for all possible exceptions. [VFP context creation] VFPSupport, on the other hand, requires that each program makes its own VFP context. Therefore some extra logic will be needed on startup/shutdown to create/destroy a VFP context for BASICVFP. Care will also be needed on any entry points (e.g. error environment handler) to make sure that BASIC restores its VFP context before calling any FP code. I would assume that is intended to mean on a program basis. If I understand, the Wimp takes care of context changes for applications and I presume a single tasking program will also be ok. BASIC is also reliant on a block of non-relocatable workspace which is anchored to the start of application space (&8000). This static workspace would make a suitable place for storing the VFP context. Strictly speaking it is anchored to ARGP, which can be assembled with a different value to &8700, which it has always had. However, you are talking of storing the whole context, not just the context pointers. There remains room for a couple of words in the argument space for the latter, but you would have to think more to find a substantial chunk of memory. The 4k immediate constant range is pertinent here. CALL and USR allow 8-byte floats to be passed to assembler routines Only CALL can pass values. USR just has access to the resident integers. Variable values are passed as a pointer to the two words of the float. If no change is made to the word order then a programmer would need to take account of this. This is simply done using two single word loads in reverse order rather than a double word load, and the same for store. I suggest that is not a significant penalty. “VARIND” (p229), “STOREA” (p229), “EXPR” (p231) These are internal routines that would have to take account of VFP anyway, so once changed they would be available just as now. The first two would check the VFP status, then take account of the word order and load/store into D0 with S1 and S0, as above. This is very much simpler than the 5-byte float routines. I have not thought about EXPR. “OSCLI” (p334) This makes mention of how some of the interpreter state (e.g. CALL environment information pointer) is exposed to the executed command. Interestingly, the manual is mistaken and this information is overwritten and not simply availabe when the command code is run. Martin Avison came up with a method to make it available, which I have used for a very long time, but it is not documented as part of BASIC. When you do have this information you can first determine that the command is from BASIC by checking at the word before the environment pointer (EP), &BA51C005. Then you can determine whether it is V or VI by looking at EP+&54. This will be 9, the number of extra float routines for V, but 0 for VI. If the the use of VFP instructions is substituted for FPA at runtime, by checking if there is a non-zero VFP context pointer, for instance, then the available routines will work. If a progammer wishes to determine themselves, they can check for the context pointer, which will be at an offset from ARGP passed to the command. The FPOINT assembly constant is used to select between BASIC (FPOINT=0) and BASIC64 (FPOINT=1), allowing all relevant code to easily be located. An obvious extension of this would be to use FPOINT=2 for BASICVFP. From my own point of view this would not be necessary. Therefore the main focus on this proposal is on gaining as much speed as possible, without being too concerned about compatibility with programs that made advanced use of BASIC64 features I disagree. Any speed gained by compromising compatibility would be insignificant against the overhead of an interpreted language. A future goal might be to produce a version of BASICVFP which provides “full” BASIC64 compatibility, at the expense of some speed That should be possible as the first option. Since VFP doesn’t provide trig/power operations, it may be desirable to expose BASICVFP’s trig/power functions to assembler code in a similar manner to how the 5-byte float operations are exposed to BASIC V assembler routines. They are not so exposed, only the arithmetic and square root routines. It would be good if they were, although unofficially they can be found and used. ;-) There is good reason to do so for VFP, because only FPA instructions are otherwise avalable. A slight alternative is to expose the routines at level that determines FPA/VFP at runtime. Arguments for leaving DCFD the same: It would result in consistent behaviour across all BASIC versions My prefered option. ;-)

Feb 19, 2017 1:48pm Jeffrey Lee (213) 6048 posts	Here’s a test build for people to try and break It should implement everything as described in the original post (VFP word order for memory, software VFP error handling, use of FPA for trig/pow operations, DCFD is FPA ordering, etc.) I’ve only given it some light testing myself, so there might be a couple of major bugs in there. I’ll be putting together some kind of test suite which can be used to verify the (floating point) arithmetic functions of the various BASICs, but if other people can start throwing non-trivial programs at it and note down any issues (large or small) then that would be a big help. I ran some simple performance tests on ARM11, Cortex-A8 and Cortex-A9. Apart from the trig/power functions (which will be a bit slower than BASIC64), performance on ARM11 and Cortex-A9 was significantly higher than 5-byte floats in standard BASIC. Cortex-A8 was a bit of a mixed bag; some ops were faster while others were slower. Possibly this could be improved further, but since it’s many times faster than BASIC64 I’m not going to worry about it too much.

Feb 19, 2017 4:08pm Jeffrey Lee (213) 6048 posts	it’s probably going to have to be a new module (“BASICVFP”), otherwise it’s going to end up with compatibility or performance issues. My own starting point is the opposite. A new module will cause problems of compatibility with existing programs that use double-precision floats, ie: BASIC64. They will either not benefit from a new BASICVFP or will require modification. How do you feel about BASIC64, !ABC, or BASIC crunchers? They are all products that were created after the initial ARM BASIC, and programs had to opt-in to using them in order to take advantage of any improvements that were on offer (even if the improvement was just that the program ran faster, as is the case of BASICVFP vs. BASIC64). They are also known to have compatibility issues with some programs. As far as performance goes, a runtime choice to use FPA or VFP instructions for a keyword can be a single load and test Except for softload builds, I can’t see there being any need for BASIC to do runtime tests for whether FPA or VFP should be used. And even for softloads it isn’t strictly necessary (it could just target the lowest common denominator, i.e. FPA) “OSCLI” (p334) This makes mention of how some of the interpreter state (e.g. CALL environment information pointer) is exposed to the executed command. Interestingly, the manual is mistaken and this information is overwritten and not simply availabe when the command code is run. Yes, I did find that bit of the documentation a bit suspicious. If a progammer wishes to determine themselves, they can check for the context pointer, which will be at an offset from ARGP passed to the command. Too reliant on the internal structure of the BASIC workspace. Therefore the main focus on this proposal is on gaining as much speed as possible, without being too concerned about compatibility with programs that made advanced use of BASIC64 features I disagree. Any speed gained by compromising compatibility would be insignificant against the overhead of an interpreted language. What constitutes a compromise in compatibility? The most obvious incompatibilities are the word order and the use of VFP registers by VARIND/STOREA/etc. But there are also lots of other differences between a VFP and FPA version of BASIC (precision, internal workspace, etc.) If we tried to make BASIC64 use VFP instructions while retaining full compatibility with FPA BASIC64, then BASIC wouldn’t be able to make any assumptions about its VFP context being maintained by any SWI or assembler calls. Really it would have to juggle two contexts – the context used internally by the interpreter and whatever context the program may have created for itself. At minimum this would affect SYS, CALL, USR, and calls from assembler back into the interpreter.

Feb 19, 2017 4:41pm Sprow (202) 1168 posts	“INPUT#” & “PRINT#” I’d assumed we’d make a new variable type, but that probably has a bigger compatibility headache than just converting to FPA and using &88. ROM space If BASICVFP is to be included in ROM, it is worth revisiting the idea to split the assembler out into a separate module, to allow it to be shared between all three implementations. My contribution to the linked thread was more to allow 3rd party extensions to use the assembler machinery to extend the instruction set for (for example) 6502 and 68000 opcodes, saving ROM space was a nice bonus though. However, since 2012 I’ve added to my wish list the idea to abstract all the FP operations in BASIC, which is much more relevant here and in clawing back some ROM space. This would end up with a single module whose mode was determined at run time based on the star command you issue (or perhaps an alias in the case of BASIC64 to choose whether you prefer FPA or VFP mode). There are 105 switches on ‘FPOINT’ in the module at the moment, and in many cases these could both be enabled and selected at run time because they don’t concern FPA instructions, ie. both code paths are integer. Then, pull the actual floating point functions out into a new source file (add, subtract, truncate to int, promote from int, tan, sin, cos etc) and on starting the module you build a function table pointing to the appropriate copy for that instance. It looks like there’s about 8k difference between BASIC105 and BASIC64, so we might expect a combined INT+FPA+VFP version to be 69+8+8=85k, which is already smaller than BASIC105+BASIC64=131k even allowing for some overhead. Even putting to one side ROM space, I think this would lower the skill threshold required to work on BASIC, because at the moment there’s no update on ROOL’s project to build the Clone-Jeffrey-o-Matic-2000. Assembler can be maintainable, if it’s partitioned sensibly and internally uses ATPCS. DCFD I think it’s worth splashing an OPT bit on this, and it’s a more valid use that when bit 4 was frittered away on “new” instructions opt in.

Feb 19, 2017 6:04pm Steve Drain (222) 1620 posts	Except for softload builds, I can’t see there being any need for BASIC to do runtime tests for whether FPA or VFP should be used. And even for softloads it isn’t strictly necessary (it could just target the lowest common denominator, i.e. FPA) It is sofload builds that most concern me. There a very many machines out there with VFP that will not be updated to the latest OS, either through choice or inertia. They would benefit from a sofload version of BASIC with VFP support. Going for the lowest common denominator with them seems a shame. You might also consider the case of a programmer who encourages a user to install a recent softload version of BASIC in order to take advantage of other improvements. It would be a pity to restrict the user to FPA when VFP is available. There is also some value in maintaining compatibiliy with VRPC when there are improvements. I can see a time when BASIC VI (or VII David) with VFP floats will be the main version and BASIC V will be the special one. In a dynamic development situation sofloads maybe the norm, as in the early days. ;-) Too reliant on the internal structure of the BASIC workspace. That is surprising. When writing assembler with BASIC, relying on the internal structure is often necessary. A lot is documented and legal, so all that needs to happen is to document the VFP context pointer locations as well. What constitutes a compromise in compatibility? I see user/programming compatibiliy as paramount. Both BASIC64 and BASICVFP will be double-precision versions that will be used and programmed identically except that programs already written and used with BASIC64 will not automatically use the faster version. If we tried to make BASIC64 use VFP instructions while retaining full compatibility with FPA BASIC64, then BASIC wouldn’t be able to make any assumptions about its VFP context being maintained by any SWI or assembler calls. Really it would have to juggle two contexts – the context used internally by the interpreter and whatever context the program may have created for itself. At minimum this would affect SYS, CALL, USR, and calls from assembler back into the interpreter. There is the devil in the detail that I am not sufficiently familiar with. What in SWIs or assembler calls would change the context, except VFPControl SWIs? Why would a programmer create a context if it is known that a context is created by BASIC whenever VFP is available? How does C handle this if a VFPSupport SWI is called in the source? As an aside, I have been doing a little experimenting, but using the DDE with VFP instructions throws up warnings: Instruction not supported on target CPU UAL syntax in pre-UAL ARM code but assembly goes ahead. It is an ARMX6 machine. Should I be concerned – you can tell I am a beginner. ;-)

Feb 19, 2017 6:20pm Steve Drain (222) 1620 posts	However, since 2012 I’ve added to my wish list the idea to abstract all the FP operations in BASIC … I like what you say. I have taken to heart the suggestion “to go for it” and I have been happily rearranging the BASIC source, for myself at the moment. One thing that I see even more clearly than before is that there is very little such ‘modularisaion’. Basalt code is highly modularised by comparison, but does some quite similar things to BASIC at times. It also makes some use of function tables for entry into the BASIC module. My Float module is also best used through a function table rather than SWIs.

Feb 20, 2017 12:53am Jeffrey Lee (213) 6048 posts	Some restructuring of the code would certainly be nice. Although it was pretty straightforward to add VFP support, I did find that I was repeating the same code sequences over and over again, which does suggest that a lot of it could be replaced with a handful of functions or macros. Assembler can be maintainable, if it’s partitioned sensibly and internally uses ATPCS. Indeed – although I doubt converting BASIC to ATPCS would be worth the benefit. Just some clearer separation of functions and some comments describing register usage and it would be much better. Too reliant on the internal structure of the BASIC workspace. That is surprising. When writing assembler with BASIC, relying on the internal structure is often necessary. A lot is documented and legal, so all that needs to happen is to document the VFP context pointer locations as well. Stuff that’s been officially documented is fine (e.g. all the routines exposed by the CALL parameter block). Generally whenever I talk about something being “internal” I’m talking about stuff which hasn’t been officially documented/cleared for use by programs. Documenting the location of an internal variable would prevent us from changing the location of that variable in the future. Or it could result in programs getting false-positives where they find something that looks like the variable they’re after but is actually something completely different. A better way of doing things would to come up with a way of extending the CALL parameter block – e.g. after the block (after the 5-byte float routines, if present) you’d have a word with a special value (-1?) followed by a series of (length, type, data) tuples, terminated by a zero. Acorn almost got it right first time (“let’s put a nine here so that they know there’s nine extra routines!”) but completely failed to (a) define a way of identifying additional blocks or (b) identifying what the blocks contained. Which has made me realise… When you do have this information you can first determine that the command is from BASIC by checking at the word before the environment pointer (EP), &BA51C005. Then you can determine whether it is V or VI by looking at EP+&54. This will be 9, the number of extra float routines for V, but 0 for VI. …the above statement is wrong. https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources/Programmer/BASIC/s/Stmt2?rev=1.15#l1937 There is the devil in the detail that I am not sufficiently familiar with. What in SWIs or assembler calls would change the context, except VFPControl SWIs? Why would a programmer create a context if it is known that a context is created by BASIC whenever VFP is available? How does C handle this if a VFPSupport SWI is called in the source? It’s quite simple. A programmer writes a program which runs under BASIC64 and uses VFP (maybe they want the accuracy of 64bit floats, but also have some VFP assembler to get around the fact that BASIC64 is so slow?) We upgrade BASIC64 to make it use VFP instead of FPA Someone tries running the VFP-using program on the new version of BASIC64, and hilarity ensues as the two sides fight over use of the VFP registers. C handles this by making VFP an opt-in thing. When a programmer enables the VFP compiler switch, he knows his code has to obey by certain rules (e.g. don’t switch away from the VFP context the runtime gives you). If the programmer doesn’t like those rules (e.g. he has some legacy code which manages VFP manually), he can turn off the VFP option. But with interpreted code the programmer might not get to make this decision – he’s entirely at the mercy of whatever interpreter version the user has installed. So we have to be careful about what features we introduce and whether any unacceptable breakage could happen as a result of it. Instruction not supported on target CPU You need to use the -cpu and/or -fpu options to specify the target machine. Apart from affecting the warnings that objasm generates, these settings can also affect some of the pseudo-instructions (e.g. whether LDR R0,=&FFFF generates a reference to a literal pool or uses the MOVW instruction). See the ToolOptions file for the settings that the RISC OS build system uses. UAL syntax in pre-UAL ARM code You either need to use the pre-UAL VFP syntax (FADD, FMUL, etc.) or add an “ARM” directive to tell objasm you’re using UAL syntax. You can also use “CODE32” to switch back to pre-UAL syntax (otherwise objasm will start complaining that you’re using “SWI” instead of “SVC”) https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources/HWSupport/Sound/Sound0HAL/s/Sound0NEON?rev=1.4#l21

Feb 20, 2017 10:59am Steve Drain (222) 1620 posts	it was pretty straightforward to add VFP support For you, perhaps. Lesser mortals struggle. ;-) I did find that I was repeating the same code sequences over and over again, which does suggest that a lot of it could be replaced with a handful of functions or macros. One thing I find is a branch from one routine to the tail of another which implements the code fragment required. This is efficient, no doubt, but not a lot of fun to follow. Documenting the location of an internal variable would prevent us from changing the location of that variable in the future. So much of the argument space is already officially fixed I would consider the benefit of fixing and officially documenting something new to be worth it if it helped. It even crossed my mind that there could be a new keyword to return the value, which would remove the need for a fixed address. Acorn almost got it right first time (“let’s put a nine here so that they know there’s nine extra routines!”) … I agree. Something along the lines you suggest is a reasonable extension. [number of extra routines] …the above statement is wrong. From the code you linked to that is certainly so. However, I checked back to BASIC VI v1.20 (15 Sep 1999) and the value is indeed 0. Someone has changed that since then. I am sure it should be 0 if there are no routines. A programmer writes a program which runs under BASIC64 and uses VFP … Yes, I see that now, and I have done it to write an unreleased application with my Float module. The reasonable way to interact with VFP is from BASIC64 and its double-precision floats. I accept defeat. Let there be BASICVFP. add an “ARM” directive to tell objasm you’re using UAL syntax. Thank you for your advice. It is knowing where to look.

Feb 20, 2017 11:26am Jeffrey Lee (213) 6048 posts	So much of the argument space is already officially fixed I would consider the benefit of fixing and officially documenting something new to be worth it if it helped. If you have any pointers to where these documents can be found then that would be a big help. There’s an ‘insert new bits here’ comment in the workspace definition, but no real indication as to why. From the code you linked to that is certainly so. However, I checked back to BASIC VI v1.20 (15 Sep 1999) and the value is indeed 0. Someone has changed that since then. I am sure it should be 0 if there are no routines. A ROL version of BASIC, I guess? ROOL BASIC 1.20 is from Nov 2000 (during the middle of 32bit OS development), and I can’t immediately see any version of BASIC in CVS which will have a zero in place of the nine (even back to the RISC OS 3.6-era BASIC)

Feb 20, 2017 1:16pm Steve Drain (222) 1620 posts	If you have any pointers to where these documents can be found then that would be a big help Nothing more than what is documented under CALL: `STRACC PAGE TOP HIMEM LOMEM MEMLIMIT FSA [TALLY] TRACEF ESCWORD WIDTHLOC LOCALARRLIST INSTALLLIST LIBRARYLIST OVERPTR` Those tie down some of the space available for the other arguments, especially the tables. My BASIC StrongHelp manual reports only what those are, and I think there is a warning in there that they are not official, even if they have remained the same for 30 years. ;-) There’s an ‘insert new bits here’ comment in the workspace definition, but no real indication as to why. That may be a more recent addition. The arguments close to ARGP, above `MEMLIMIT`, have changed several times and cannot be considered ‘fixed’. I expect the comment reflects some need to keep track of where new arguments can be included. I can’t immediately see any version of BASIC in CVS which will have a zero in place of the nine (even back to the RISC OS 3.6-era BASIC) I have gone all the way back to BASIC VI v1.05 (12 Mar 1992), which I think is close to the first, and it has a 0.

Feb 20, 2017 2:23pm Steve Drain (222) 1620 posts	I have gone all the way back to BASIC VI v1.05 (12 Mar 1992), which I think is close to the first, and it has a 0. I have blundered I have been reading the 0 at the end of the branch list. Please accept my sincerest apologies. I must have interpreted that the wrong way for a very long time indeed. Mea culpa.

Feb 21, 2017 9:18pm Jeffrey Lee (213) 6048 posts	If you have any pointers to where these documents can be found then that would be a big help Nothing more than what is documented under CALL: STRACC PAGE TOP HIMEM LOMEM MEMLIMIT FSA [TALLY] TRACEF ESCWORD WIDTHLOC LOCALARRLIST INSTALLLIST LIBRARYLIST OVERPTR Oh, that’s not too bad. By “fixed” I thought you meant that the actual addresses had been published and subsequently hard-coded into apps (e.g. the start of application space is fixed at &8000). But for all of those BASIC tells you the address of the variable, and so we should be able to move them around freely within the workspace without anything breaking (apart from TRACEF LOCALARLIST INSTALLLIST LIBRARYLIST OVERPTR which must all be adjacent) Also another objasm tip for you: To get DCFD to use VFP word order you need to use a /vfp APCS variant, e.g. “objasm -apcs /vfp”. Annoyingly objasm will complain if a FPE variant has already been specified (!Builder will set up an alias which does this), and the VFP area attribute doesn’t seem to affect DCFD so you can’t use that either.

Feb 22, 2017 10:07am Steve Drain (222) 1620 posts	Well, Basalt assumes that they are all at the same offsets, as they have ever been. ;-) However, as I said ealier, they are actually tied to `ARGP`, so I am pretty sure you can move them en bloc by changing the value of `VARS` at the start of assembly. If you are looking for space for the context, maybe a 256-byte page could be freed at &8000 by making `VARS` &8800. I feel sure there have been other programs that have also made some assumptions, so it would be a pity to change offsets without a very good reason. One objective might be to make `ARGP` more flexible. It is preserved and passed in `R8` to the great majority of routines, so it should be possible to make that be true for all. Then `ARGP` might be passed as a parameter to *Basic so that a BASIC program could be run in any part of memory.

Feb 24, 2017 10:28pm Jeffrey Lee (213) 6048 posts	Question for BASIC aficionados: If a whole-array arithmetic operation fails due to a floating point exception, what state would you prefer the destination array to be left in? Mix of new and old values (current behaviour for BASIC/BASIC64, sub-optimal performance with BASICVFP) Filled with zeroes New values, apart from any NaNs/infinities, which will have been replaced with zeroes New values, including the NaNs/infinities that were generated by the exceptional ops The reason I’m asking is that, for VFP vector operations, you’ll only get the best performance if the exception checks can be left until the very end of the operation. Reading back from the FPSCR will cause the pipeline to stall until the current arithmetic operations have completed, which is a bit of a nuisance if you have to do it before writing back each set of values. In terms of implementation, 4 would be easiest, but could cause problems because NaN and infinity aren’t really in BASIC’s vocabulary. Options 2 and 3 would be about the same amount of effort. So I guess it’s a toss-up between whether you want some useful data in the destination (as per BASIC/BASIC64) or none at all (zero it all). Also an interesting thing I’ve observed is that whole-array operations for integer arrays have less error checks performed on them than for scalar integer ops. E.g. multiplication doesn’t generate a “number too big” error, you just get given the low 32 bits of the result as per a standard MUL instruction. So there’s definitely some scope for writing “garbage” to the destination for exceptional elements.

Feb 24, 2017 11:59pm Rick Murray (539) 14047 posts	or none at all (zero it all). This. If an exception occurred, the data cannot be trusted. There may be many arguments for “well if that bit isn’t zero”, but the only sensible approach is to simply cease trusting any of the results of a failed calculation…

Feb 25, 2017 9:23pm David Feugey (2125) 2709 posts	5. Old. It seems normal that if it fails, content is not changed at all. Hum, needs a copy of the whole array. Or a Basic flag to choose the best behaviour between 2, 3, 4 and 5.

Feb 26, 2017 9:55am Steve Drain (222) 1620 posts	Hum, needs a copy of the whole array. Consideration needs to be given to the fact that an array can appear on both sides of most operations, and the way BASIC deals with this is probably for both speed and memory efficiency, so no intermediate copy. If VFP is available, it will be on a machine where memory is not likely to be a problem, so as a special case there might be an intermediate copy to restore original values when the destination is the original. Otherwise 4, and let BASIC have a way to deal with NaNs and infinities in that version.

Feb 26, 2017 7:01pm David Feugey (2125) 2709 posts	so as a special case Yep, that’s why I suggest a flag to ignore this step… if you know what you do. Basic could also simply throw an error, as programmer can simulates 2 and 5 cases itself (manual zeroing or manual copy). 4 is also a very good solution. Could be also used for special things (for example, not yet assigned variables or variables removed from memory (+ garbage collector?)). It’s really time for a new Basic :)