BASIC compiler
Steve Drain (222) 1620 posts |
the assembler in RiscBASIC is a compile-time assembler May I query that? ABC compiles the BASIC assembler code, but the machine code is assembled at runtime. Was it a lapsus linguae?
My old manual says nothing to suggest it does not and it does mention |
||||||||||||||||||||||||||||
Steve Drain (222) 1620 posts |
Mm! Those were not the big time-consumers. Apart from the FPE being written in quite extensive integer code, my program also had to convert from 5-byte to 8-bye and back.
I have had the rudiments of Basalt with VFP floats for some time, but I have not put the effort into finishing it, because I feel there is actually little demand for it. |
||||||||||||||||||||||||||||
Rick Murray (539) 13840 posts |
What is assembled at runtime? I was under the impression that ABCLib behaved in a manner not unlike CLib.
Sorry, Latin wasn’t a part of my curriculum. I can guess what it means, but only in a literal sense.
I’ll have to try again. It threw an error on the LDF instruction, so I wondered if it was supported.
Of course, it’s too old to know of VFP. ;-) |
||||||||||||||||||||||||||||
Steve Pampling (1551) 8170 posts |
Slip of the tongue apparently. Note: Like you, didn’t do Latin but there’s so much in English that’s borrowed/derived that you tend to get the idea anyway. 1 Who round here would deliberately mis-translate for fun? |
||||||||||||||||||||||||||||
Clive Semmens (2335) 3276 posts |
Whistles innocently. Cor. This place dunt half echo… |
||||||||||||||||||||||||||||
Steve Drain (222) 1620 posts |
Let me clarify what I think I know. ABC and RISCBasic deal with the BASIC assembler in quite different ways. ABC only compiles the assembler code, which is executed at runtime to create the machine code, much as the source program would. RISCBasic uses the BASIC assembler code to create the machine code at compile time. I only know this from what Chris has said.
My point was that is only knows about FPACC instructions and types, not BASIC V 5-byte floats, which I assume RISCBasic does. The consequence is that ABC is slooow when dealing with float values, even if single precision is specified. As for lapsus linguae, I was too clever by half and the irony I was aiming at has probably missed everyone. That is a peril of posting. ;-) |
||||||||||||||||||||||||||||
David Feugey (2125) 2709 posts |
I tried a big integer benchmark this morning: ABC is now slow. Not its fault, since on RISC OS 4, BBC Basic took more than 120 s for this test. |
||||||||||||||||||||||||||||
Rick Murray (539) 13840 posts |
I cannot confirm or deny, as there’s no public source to look at, however the disassembly that I gave earlier appears to be a properly assembled version of the code provided, only with extra stuff added. I’m not sure what this is (bounds checking? safety net?) but it is otherwise the code as provided. Okay, it isn’t as clean as the RISCBasic version, but it does match up with the instructions given as would be expected.
Which is…? [not doubting, interested in what would lead to this conclusion]
Which? I find it hard to imagine how ABC dealing with purely integer maths could be twice as slow as BASIC given there’s no interpretation happening. Are you sure no FP has crept in there? I did a braindead test:
BASIC takes 167 centiseconds. Just for the hell of it, I changed every variable to a float (removed ‘%’ from variable) and tried to see what happens. That’s because BASIC has psycho-optimised FP routines, while ABC uses an emulation of a billion year old FP system. |
||||||||||||||||||||||||||||
Steve Drain (222) 1620 posts |
My use of ABC was 25 years ago, but I have the manual in front of me, which says:
ABC may have changed in the way it handles the in-line assembler since then, of course. ;-) In an earlier post Chris Hall said:
As I have no experience of RISCBasic, I take him at his word.
That was my first reaction. I always found integer routines to be the one big scoring point for ABC.
Which precision for the floats? I would be interested in two further results from David. First, with CRUNCH 15 and second, no CRUNCH but Basic$Crunch set, which produces the same program but must have a small delay on loading. |
||||||||||||||||||||||||||||
Rick Murray (539) 13840 posts |
ABC suffers hard with 711 centiseconds. [with floats] Default. I just removed the ‘%’ from the variables. I’ve twiddled the comments to select FP type. Default – 711 centiseconds. Single – 711 centiseconds. There isn’t much in it, but extended is, surprisingly, marginally faster. Maybe it is easier to unpack or something? |
||||||||||||||||||||||||||||
Steve Drain (222) 1620 posts |
I think the FPE does all internal calculations in extended precision, so converting to lower precision would be an overhead. I had not thought of that before ;-) |
||||||||||||||||||||||||||||
Steve Drain (222) 1620 posts |
As the subject of speed has arisen, I wonder if I could float some ideas about improvements to BASIC V itself. The first is one I have banged on about for a good while: remove the dependence on Basic$Crunch and make BASIC V always do CRUNCH 15, as BASIC VI always has. David’s figures above illustate how useful that can be. The second is to make the size of the workspace for the “synergistic cache” the same as for BASIC VI, ie 4k rather than 2k. This would likely increase the ratio of hits for instant lookup of variables and routines much more than twice. It probably only needs the changing of some switches in the source to implement. The third is a touch more complex: implement “bring to front” on the many linked lists that BASIC uses to lookup variable names and routine names. That means that when a name has been found, that list entry is replaced at the front, making it quicker to find in future. I think this should only require a handful of extra instructions to implement. A fourth idea is much more work and has implications that I have not fully explored: extend runtime crunching to crunch the names of variables. I realise that runtime crunching of routine names is probably a no-no, because of libraries, but something might still be done there. Does any of this make sense? |
||||||||||||||||||||||||||||
Paul Sprangers (346) 524 posts |
The other big scoring point, as I will keep telling, is memory block manipulation. |
||||||||||||||||||||||||||||
Steve Drain (222) 1620 posts |
I am genuinely intrigued. How does a string array become unmanageably huge?
Faced with that my inclination would be to look at the algorithm you are using. BASIC is hopeless at moving memory, although you can possibly use Wimp_TransferBlock. If you are employing a flex-style heap then the SlidingHeaps module might be useful. Myself, in these times of generous memory, I stick to OS_Heap. Basalt does have |
||||||||||||||||||||||||||||
David Feugey (2125) 2709 posts |
Sorry it’s floats. t=TIME accum = 0 count = 0 WHILE count < 30 : REM 1545 leftedge = -420 rightedge = 300 topedge = 300 bottomedge = -300 xstep = 7 ystep = 15 maxiter = 200 y0 = topedge WHILE y0 > bottomedge x0 = leftedge WHILE x0 < rightedge y = 0 x = 0 thechar = 32 xx = 0 yy = 0 i = 0 WHILE i < maxiter AND xx + yy <= 800 xx = INT((x * x) / 200) yy = INT((y * y) / 200) IF xx + yy > 800 THEN thechar = 48 + i IF i > 9 THEN thechar = 64 ENDIF ELSE temp = xx - yy + x0 IF (x < 0 AND y > 0) OR (x > 0 AND y < 0) THEN y = INT(-1 * (-1 * x * y) / 100) + y0 ELSE y = INT(x * y / 100) + y0 ENDIF x = temp ENDIF i = i + 1 ENDWHILE x0 = x0 + xstep accum = accum + thechar ENDWHILE y0 = y0 - ystep ENDWHILE IF count MOD 300 = 0 THEN PRINT accum, ENDIF count = count + 1 ENDWHILE Anyway, I hope that ROOL will be able to fix the FP issue (for example with VFP support).
Good idea
Or even more? Very good idea anyway.
Another good idea.
Good idea too. Could I suggest a table (old/new name) for runtime crunching of EVAL calls? |
||||||||||||||||||||||||||||
Paul Sprangers (346) 524 posts |
At the risk of drifting off completely: The first section of this block contains the four bytes addresses that point to the actual information which is stored in the second section. Every time that I change something in the database, e.g. remove a record, a considerable part of these addresses, if not all, have to be overwritten, while a lot of blocks have to be transferred in order to fill the gaps again (which indeed is done by Wimp_TransferBlock). No doubt, the algorithm suffers from ignorance and stupidity, but my point is (and now we can return to the original subject), that such memory block manipulations are done 40 times faster in the compiled version than in the interpreted version, which is a considerable credit of ABC, I think, no matter how much or how little my routine could be enhanced. |
||||||||||||||||||||||||||||
Steve Drain (222) 1620 posts |
I think 4k is the limit without a more significant re-write. The cache is immediately above ARGP, which is passed to nearly all BASIC subroutines in R8. Hence data can be fetched with the single instruction:
|
||||||||||||||||||||||||||||
David Feugey (2125) 2709 posts |
I make a new test with two versions: one with floats only, the other with integers only. allfloat test BBC Basic for Windows 6.02a (Core i7 2,2-3,2) RPCEmu 0.8.12 RISC OS 4.02 Basic V (same computer) RPCEmu 0.8.12 RISC OS 5.22 Basic V (same computer) Raspberry Pi 1 800 MHz allinteger test BBC Basic for Windows 6.02a (Core i7) RPCEmu 0.8.12 RISC OS 4.02 Basic V (same computer) RPCEmu 0.8.12 RISC OS 5.22 Basic V (same computer) Raspberry Pi 1 800 MHz |
||||||||||||||||||||||||||||
David Feugey (2125) 2709 posts |
rem!fast is not so easy to use/tweak. So, interpreter VS Interpreter the emulator solution is 3,6 to 3,9 slower than the native one. Very good. I’m almost sure that with some optimisations it could be only 3 time slower, and even less with the creation of a virtual computer with less specific virtual peripherals to manage (that will need a specific version of RISC OS 5 of course). Anyway, that definitively validates the idea that to deliver a software under RPCEmu and not Windows is not ‘way slower and stupid’ as some told me :) Nota, my RPCEmu ROS4 configuration is lighter, and so faster, than the ROS5 one. So you can see the difference between the old Basic and the latest one. Impressive. |
||||||||||||||||||||||||||||
Steve Drain (222) 1620 posts |
That far back? ;-)
It was to solve that problem that Basalt came into existence. It is not impossible to resize an array in BASIC, but it is awkward and needs memory managed above HIMEM. Your solution to the problem is quite close to SlidingHeaps, which was written for PowerBase, I think.
I am not surprised by that and is was just what you needed. My plea is that there may well be a solution to a speed problem in BASIC that does not require a compiler. |
||||||||||||||||||||||||||||
Steve Drain (222) 1620 posts |
@David Thanks for the figures. They now seem to be around about what I expected:
I was intersted in the difference between 4.02 and 5.22. Is it the BASIC modules or the way the OS works with RPCEmu? That could be resolved by copying the 5.22 BASIC to 4.02. |
||||||||||||||||||||||||||||
Rick Murray (539) 13840 posts |
WHILE i < maxiter AND xx + yy <= 800 I’m wondering how much better the program could be if it was written properly. How many of those floats could be integers? |
||||||||||||||||||||||||||||
David Feugey (2125) 2709 posts |
It was a generic ANSI Basic code. I just forget that variables can be integers :) |
||||||||||||||||||||||||||||
David Feugey (2125) 2709 posts |
IMHO, it’s the Basic module, as the ABC version works as it should. |
||||||||||||||||||||||||||||
Rick Murray (539) 13840 posts |
I’ve decided to give this a whirl. The program is substatially similar to yours, though the code snippet posted seemed incomplete (nothing further done with ‘t’ set on the first line, comma after PRINT accum…). Also it takes AGES, is this correct? Anyway:
The second version is identical, only with all ‘%’ removed. Standard Pi model B, 700MHz.
The two times for the C version – the first is the standard optimised build as normally output. It is possible the compiler is discarding parts of the calculations considered “unnecessary” as the Norcroft compiler is pretty smart. Now, ordered by time:
|