RISC OS Open: Forum: BASIC compiler

Jun 4, 2016 9:33am

Steve Drain (222) 1620 posts

the assembler in RiscBASIC is a compile-time assembler

ABC does its work at compile time.

May I query that? ABC compiles the BASIC assembler code, but the machine code is assembled at runtime. Was it a lapsus linguae?

I don’t think ABC likes FP instructions.

My old manual says nothing to suggest it does not and it does mention EQUFS EQUFD EQUFE EQUF, the latter implying the compiler directive TYPE. Remember ABC is tied to the FPACC/FPE for its float support.

Jun 4, 2016 9:40am

Steve Drain (222) 1620 posts

That’s not a surprise. BASIC’s built in FP stuff works directly. FPE needs to take the undefined instruction trap, decode the instruction, do something with it…you get the idea. ;-)

Mm! Those were not the big time-consumers. Apart from the FPE being written in quite extensive integer code, my program also had to convert from 5-byte to 8-bye and back.

That’s why I really wish the C compiler had an option to use VFP instead of emiting FPE instructions for working with floats.

I have had the rudiments of Basalt with VFP floats for some time, but I have not put the effort into finishing it, because I feel there is actually little demand for it.

Jun 4, 2016 1:22pm

Rick Murray (539) 13840 posts

May I query that? ABC compiles the BASIC assembler code, but the machine code is assembled at runtime.

What is assembled at runtime? I was under the impression that ABCLib behaved in a manner not unlike CLib.

Was it a lapsus linguae?

Sorry, Latin wasn’t a part of my curriculum. I can guess what it means, but only in a literal sense.

My old manual says nothing to suggest it does not

I’ll have to try again. It threw an error on the LDF instruction, so I wondered if it was supported.

Remember ABC is tied to the FPACC/FPE for its float support.

Of course, it’s too old to know of VFP. ;-)

Jun 4, 2016 2:07pm

Steve Pampling (1551) 8170 posts

Sorry, Latin wasn’t a part of my curriculum. I can guess what it means, but only in a literal sense.

Slip of the tongue apparently.
If he gone for memory then the “lingam” could have been (deliberately¹) mis-translated as a mixed language from the Hindu lingam (phallus IIRC) rather than the latin (memory).
I think the chosen was less prone but less fun.

Note: Like you, didn’t do Latin but there’s so much in English that’s borrowed/derived that you tend to get the idea anyway.

¹ Who round here would deliberately mis-translate for fun?

Jun 4, 2016 4:50pm

Clive Semmens (2335) 3276 posts

Who round here would deliberately mis-translate for fun?

Whistles innocently.

Cor. This place dunt half echo…

Jun 5, 2016 1:49pm

Steve Drain (222) 1620 posts

[ABC] What is assembled at runtime?

Let me clarify what I think I know. ABC and RISCBasic deal with the BASIC assembler in quite different ways. ABC only compiles the assembler code, which is executed at runtime to create the machine code, much as the source program would. RISCBasic uses the BASIC assembler code to create the machine code at compile time. I only know this from what Chris has said.

[ABC] Of course, it’s too old to know of VFP. ;-)

My point was that is only knows about FPACC instructions and types, not BASIC V 5-byte floats, which I assume RISCBasic does. The consequence is that ABC is slooow when dealing with float values, even if single precision is specified.

As for lapsus linguae, I was too clever by half and the irony I was aiming at has probably missed everyone. That is a peril of posting. ;-)

Jun 5, 2016 2:22pm

David Feugey (2125) 2709 posts

I tried a big integer benchmark this morning:
BBC Basic V : 50,65 (crunched CRUNCH 31)
BBC Basic V : 79,93 (classic)
ABC : 91,91

ABC is now slow. Not its fault, since on RISC OS 4, BBC Basic took more than 120 s for this test.
Too much optimizations in the interpreter :)

Jun 5, 2016 3:54pm

Rick Murray (539) 13840 posts

ABC only compiles the assembler code, which is executed at runtime to create the machine code, much as the source program would.

I cannot confirm or deny, as there’s no public source to look at, however the disassembly that I gave earlier appears to be a properly assembled version of the code provided, only with extra stuff added. I’m not sure what this is (bounds checking? safety net?) but it is otherwise the code as provided. Okay, it isn’t as clean as the RISCBasic version, but it does match up with the instructions given as would be expected.

I only know this from what Chris has said.

Which is…? [not doubting, interested in what would lead to this conclusion]

I tried a big integer benchmark this morning:

Which? I find it hard to imagine how ABC dealing with purely integer maths could be twice as slow as BASIC given there’s no interpretation happening. Are you sure no FP has crept in there?

I did a braindead test:

REM Await new centisecond
t% = TIME
REPEAT : UNTIL (t% <> TIME)

REM Assign two numbers from loop and add them; 1,000,000 times
t% = TIME
FOR l% = 1 TO 1000000
  a% = l%
  b% = l% << 1
  c% = a% + b%
NEXT
PRINT (TIME - t%)
END

BASIC takes 167 centiseconds.
ABC takes 51 centiseconds.
Each test run three times. The same result each time; on a regular Pi model B 256MMiB model.

Just for the hell of it, I changed every variable to a float (removed ‘%’ from variable) and tried to see what happens.
BASIC takes 202 centiseconds.
ABC suffers hard with 711 centiseconds.

That’s because BASIC has psycho-optimised FP routines, while ABC uses an emulation of a billion year old FP system.

Jun 5, 2016 4:46pm

Steve Drain (222) 1620 posts

I cannot confirm or deny, as there’s no public source to look at

My use of ABC was 25 years ago, but I have the manual in front of me, which says:

… you will actually be compiling a program which assembles a routine at run-time.

ABC may have changed in the way it handles the in-line assembler since then, of course. ;-)

In an earlier post Chris Hall said:

It is a compile-time assembler in RiscBASIC

As I have no experience of RISCBasic, I take him at his word.

I find it hard to imagine how ABC dealing with purely integer maths could be twice as slow as BASIC given there’s no interpretation happening.

That was my first reaction. I always found integer routines to be the one big scoring point for ABC.

ABC suffers hard with 711 centiseconds. [with floats]

Which precision for the floats?

I would be interested in two further results from David. First, with CRUNCH 15 and second, no CRUNCH but Basic$Crunch set, which produces the same program but must have a small delay on loading.

Jun 5, 2016 5:47pm

Rick Murray (539) 13840 posts

ABC suffers hard with 711 centiseconds. [with floats]

Which precision for the floats?

Default. I just removed the ‘%’ from the variables.

I’ve twiddled the comments to select FP type.

Default – 711 centiseconds.

Single – 711 centiseconds.
Double – 714 centiseconds.
Extended – 691 centiseconds.

There isn’t much in it, but extended is, surprisingly, marginally faster. Maybe it is easier to unpack or something?

Jun 5, 2016 6:12pm

Steve Drain (222) 1620 posts

extended is, surprisingly, marginally faster.

I think the FPE does all internal calculations in extended precision, so converting to lower precision would be an overhead. I had not thought of that before ;-)

Jun 5, 2016 6:38pm

Steve Drain (222) 1620 posts

As the subject of speed has arisen, I wonder if I could float some ideas about improvements to BASIC V itself.

The first is one I have banged on about for a good while: remove the dependence on Basic$Crunch and make BASIC V always do CRUNCH 15, as BASIC VI always has. David’s figures above illustate how useful that can be.

The second is to make the size of the workspace for the “synergistic cache” the same as for BASIC VI, ie 4k rather than 2k. This would likely increase the ratio of hits for instant lookup of variables and routines much more than twice. It probably only needs the changing of some switches in the source to implement.

The third is a touch more complex: implement “bring to front” on the many linked lists that BASIC uses to lookup variable names and routine names. That means that when a name has been found, that list entry is replaced at the front, making it quicker to find in future. I think this should only require a handful of extra instructions to implement.

A fourth idea is much more work and has implications that I have not fully explored: extend runtime crunching to crunch the names of variables. I realise that runtime crunching of routine names is probably a no-no, because of libraries, but something might still be done there.

Does any of this make sense?

Jun 6, 2016 8:14am

Paul Sprangers (346) 524 posts

I always found integer routines to be the one big scoring point for ABC.

The other big scoring point, as I will keep telling, is memory block manipulation.
One of my programs uses a large memory block, rather than unmanageably huge string arrays. Changing something involves extensive use of the ? and ! operators, as well as a lot of memory chunk swapping. The compiled version does this exactly 40 times faster than the interpreted version!

Jun 6, 2016 9:09am

Steve Drain (222) 1620 posts

One of my programs uses a large memory block, rather than unmanageably huge string arrays.

I am genuinely intrigued. How does a string array become unmanageably huge?

a lot of memory chunk swapping.

Faced with that my inclination would be to look at the algorithm you are using. BASIC is hopeless at moving memory, although you can possibly use Wimp_TransferBlock. If you are employing a flex-style heap then the SlidingHeaps module might be useful. Myself, in these times of generous memory, I stick to OS_Heap. Basalt does have BLOCK source%,size% TO destination% using the fast byte-aligned routine in the PRM.

Jun 6, 2016 9:31am

David Feugey (2125) 2709 posts

Sorry it’s floats.

t=TIME
accum = 0
count = 0
WHILE count < 30 : REM 1545
  leftedge   = -420
  rightedge  =  300
  topedge    =  300
  bottomedge = -300
  xstep      =  7
  ystep      =  15

  maxiter    =  200

  y0 = topedge
  WHILE y0 > bottomedge
    x0 = leftedge
    WHILE x0 < rightedge
      y = 0
      x = 0
      thechar = 32
      xx = 0
      yy = 0
      i = 0
      WHILE i < maxiter AND xx + yy <= 800
        xx = INT((x * x) / 200)
        yy = INT((y * y) / 200)
        IF xx + yy > 800 THEN
          thechar = 48 + i
          IF i > 9 THEN
            thechar = 64
          ENDIF
        ELSE
          temp = xx - yy + x0
          IF (x < 0 AND y > 0) OR (x > 0 AND y < 0) THEN
            y = INT(-1 * (-1 * x * y) / 100) + y0
          ELSE
            y = INT(x * y / 100) + y0
          ENDIF
          x = temp
        ENDIF
        i = i + 1
      ENDWHILE
      x0 = x0 + xstep
      accum = accum + thechar
    ENDWHILE
    y0 = y0 - ystep
  ENDWHILE

  IF count MOD 300 = 0 THEN
    PRINT accum,
  ENDIF
  count = count + 1
ENDWHILE

Anyway, I hope that ROOL will be able to fix the FP issue (for example with VFP support).

The first is one I have banged on about for a good while: remove the dependence on Basic$Crunch and make BASIC V always do CRUNCH 15, as BASIC VI always has. David’s figures above illustate how useful that can be.

Good idea

The second is to make the size of the workspace for the “synergistic cache” the same as for BASIC VI, ie 4k rather than 2k. This would likely increase the ratio of hits for instant lookup of variables and routines much more than twice. It probably only needs the changing of some switches in the source to implement.

Or even more? Very good idea anyway.

The third is a touch more complex: implement “bring to front” on the many linked lists that BASIC uses to lookup variable names and routine names. That means that when a name has been found, that list entry is replaced at the front, making it quicker to find in future. I think this should only require a handful of extra instructions to implement.

Another good idea.

A fourth idea is much more work and has implications that I have not fully explored: extend runtime crunching to crunch the names of variables. I realise that runtime crunching of routine names is probably a no-no, because of libraries, but something might still be done there.

Good idea too. Could I suggest a table (old/new name) for runtime crunching of EVAL calls?

Jun 6, 2016 10:02am

Paul Sprangers (346) 524 posts

I am genuinely intrigued. How does a string array become unmanageably huge?

At the risk of drifting off completely:
It’s a database program. In the past, each field contained a four-bytes number that pointed to a large string array, in which the actual data were stored. This approach made it possible to maintain rather large databases, with thousands of records, without the need of an extremely expensive 20 MB harddisc. The only problem then was that a string array could only be DIMmed once (in BASIC V and its predecessors, at least). If a database exceeded this dim, the user was urged to the restart the program, after which the string array dim was increased by 250 automatically. However, this solution was completely insufficient when I started to import other databases (CSV files) with tenths of thousands, or even hundreds of thousands of records. I then switched to a memory block.

The first section of this block contains the four bytes addresses that point to the actual information which is stored in the second section. Every time that I change something in the database, e.g. remove a record, a considerable part of these addresses, if not all, have to be overwritten, while a lot of blocks have to be transferred in order to fill the gaps again (which indeed is done by Wimp_TransferBlock).

No doubt, the algorithm suffers from ignorance and stupidity, but my point is (and now we can return to the original subject), that such memory block manipulations are done 40 times faster in the compiled version than in the interpreted version, which is a considerable credit of ABC, I think, no matter how much or how little my routine could be enhanced.

Jun 6, 2016 11:35am

Steve Drain (222) 1620 posts

Or even more? [memory allocated to the cache]

I think 4k is the limit without a more significant re-write. The cache is immediately above ARGP, which is passed to nearly all BASIC subroutines in R8. Hence data can be fetched with the single instruction: LDR r0,[r8,r1], as an immediate offset. The maximum immediate offset is 4k.

Could I suggest a table (old/new name) for runtime crunching of EVAL calls?

EVAL is one of the things that might be broken, as it would with any crunching app. As a table would have to be built to do the crunching, it could remain in memory. My mind buzzes with possibilities that would never have been implemented originally because of the value of memory.

Jun 6, 2016 11:41am

David Feugey (2125) 2709 posts

I make a new test with two versions: one with floats only, the other with integers only.

allfloat test

BBC Basic for Windows 6.02a (Core i7 2,2-3,2)
interpreted : 20,6s
exe all opt : 13,3s
exe all opt + rem!fast : 10,1s

RPCEmu 0.8.12 RISC OS 4.02 Basic V (same computer)
interpreted : 103,3s
interpreted crunched 15 : 77,1s
interpreted crunched 31 : 77,1s
abc : 92,4s
abc quick : 90,6s

RPCEmu 0.8.12 RISC OS 5.22 Basic V (same computer)
interpreted : 73,3s
interpreted crunched 15 : 52,2s
interpreted crunched 31 : 52,2s
abc : 96,1s
abc quick : 92,8s

Raspberry Pi 1 800 MHz
interpreted : 48,8s
interpreted crunched 15 : 31,8s
interpreted crunched 31 : 31,8s
abc : 70,6s
abc quick : 68,8s

allinteger test

BBC Basic for Windows 6.02a (Core i7)
interpreted : 14,81s
exe all opt : 13,02s
exe all opt + rem!fast : 9,79

RPCEmu 0.8.12 RISC OS 4.02 Basic V (same computer)
interpreted : 96,8s
interpreted crunched 15 : 70,4s
interpreted crunched 31 : 70s
abc : 25,2s
abc quick : 24,2s

RPCEmu 0.8.12 RISC OS 5.22 Basic V (same computer)
interpreted : 66,7s
interpreted crunched 15 : 46,4s
interpreted crunched 31 : 46,4s
abc : 26s
abc quick : 24,5s

Raspberry Pi 1 800 MHz
interpreted : 47s
interpreted crunched 15 : 30,3s
interpreted crunched 31 : 30,3s
abc : 19,3s
abc quick : 17,9s

Jun 6, 2016 11:55am

David Feugey (2125) 2709 posts

rem!fast is not so easy to use/tweak. So, interpreter VS Interpreter the emulator solution is 3,6 to 3,9 slower than the native one. Very good.

I’m almost sure that with some optimisations it could be only 3 time slower, and even less with the creation of a virtual computer with less specific virtual peripherals to manage (that will need a specific version of RISC OS 5 of course). Anyway, that definitively validates the idea that to deliver a software under RPCEmu and not Windows is not ‘way slower and stupid’ as some told me :)

Nota, my RPCEmu ROS4 configuration is lighter, and so faster, than the ROS5 one. So you can see the difference between the old Basic and the latest one. Impressive.

Jun 6, 2016 11:56am

Steve Drain (222) 1620 posts

an extremely expensive 20 MB harddisc.

That far back? ;-)

The only problem then was that a string array could only be DIMmed once

It was to solve that problem that Basalt came into existence. It is not impossible to resize an array in BASIC, but it is awkward and needs memory managed above HIMEM.

Your solution to the problem is quite close to SlidingHeaps, which was written for PowerBase, I think.

such memory block manipulations are done 40 times faster in the compiled version than in the interpreted version, which is a considerable credit of ABC,

I am not surprised by that and is was just what you needed. My plea is that there may well be a solution to a speed problem in BASIC that does not require a compiler.

Jun 6, 2016 12:12pm

Steve Drain (222) 1620 posts

@David

Thanks for the figures. They now seem to be around about what I expected:

ABC speeds up integers but slows down floats.
Having CRUNCH 15 as standard makes a significant difference, but removing blank lines with CRUNCH 31 does not.

I was intersted in the difference between 4.02 and 5.22. Is it the BASIC modules or the way the OS works with RPCEmu? That could be resolved by copying the 5.22 BASIC to 4.02.

Jun 6, 2016 12:50pm

Rick Murray (539) 13840 posts

WHILE i < maxiter AND xx + yy <= 800

xx = INT ((x * x) / 200)

I’m wondering how much better the program could be if it was written properly. How many of those floats could be integers?

Jun 6, 2016 1:12pm

David Feugey (2125) 2709 posts

It was a generic ANSI Basic code. I just forget that variables can be integers :)

Jun 6, 2016 1:13pm

David Feugey (2125) 2709 posts

I was interested in the difference between 4.02 and 5.22. Is it the BASIC modules or the way the OS works with RPCEmu? That could be resolved by copying the 5.22 BASIC to 4.02.

IMHO, it’s the Basic module, as the ABC version works as it should.

Jun 6, 2016 8:36pm

Rick Murray (539) 13840 posts

I’ve decided to give this a whirl. The program is substatially similar to yours, though the code snippet posted seemed incomplete (nothing further done with ‘t’ set on the first line, comma after PRINT accum…). Also it takes AGES, is this correct?

Anyway:

   10t%=TIME
   20accum% = 0
   30count% = 0
   40WHILE count% < 30 : REM 1545
   50  leftedge%   = -420
   60  rightedge%  =  300
   70  topedge%    =  300
   80  bottomedge% = -300
   90  xstep%      =  7
  100  ystep%      =  15
  110
  120  maxiter%    =  200
  130
  140  y0% = topedge%
  150  WHILE y0% > bottomedge%
  160    x0% = leftedge%
  170    WHILE x0% < rightedge%
  180      y% = 0
  190      x% = 0
  200      thechar% = 32
  210      xx% = 0
  220      yy% = 0
  230      i% = 0
  240      WHILE ( (i% < maxiter%) AND ((xx% + yy%) <= 800) )
  250        xx% = INT((x% * x%) / 200)
  260        yy% = INT((y% * y%) / 200)
  270        IF (xx% + yy%) > 800 THEN
  280          thechar% = 48 + i%
  290          IF i% > 9 THEN
  300            thechar% = 64
  310          ENDIF
  320        ELSE
  330          temp% = ((xx% - yy%) + x0%)
  340          IF ( ( (x% < 0) AND (y% > 0) ) OR ( (x% > 0) AND (y% < 0) ) ) THEN
  350            y% = INT(-1 * (-1 * x% * y%) / 100) + y0%
  360          ELSE
  370            y% = INT(x% * y% / 100) + y0%
  380          ENDIF
  390          x% = temp%
  400        ENDIF
  410        i% = i% + 1
  420      ENDWHILE
  430      x0% = x0% + xstep%
  440      accum% = accum% + thechar%
  450    ENDWHILE
  460    y0% = y0% - ystep%
  470  ENDWHILE
  480
  490  IF (count% MOD 300) = 0 THEN
  500    PRINT accum%
  510  ENDIF
  520  count% = count% + 1
  530  PRINT STR$(count%)+"/30"
  540ENDWHILE
  550
  560PRINT "end, after "+STR$(TIME-t%)+"cs"
  570END

The second version is identical, only with all ‘%’ removed.

Standard Pi model B, 700MHz.

Integer from BASIC	7229cs
Float from BASIC	7998cs
Integer from ABC	2814cs
Float from ABC	10767cs
Integer (ported to C)	133cs / 206cs
Float (ported to C)	4753cs / 7902cs

The two times for the C version – the first is the standard optimised build as normally output. It is possible the compiler is discarding parts of the calculations considered “unnecessary” as the Norcroft compiler is pretty smart.
The second time is for a “debug” build. When the compiler creates a debug build, it disables all optimisations. This is quite evident in the float version.

Now, ordered by time:

Integer (C, normal)	133cs
Integer (C, debug)	206cs
Integer (ABC)	2814cs
Float (C, normal)	4753cs
Integer (BASIC)	7229cs
Float (C, debug)	7902cs
Float (BASIC)	7998cs
Float (ABC)	10767cs

BASIC compiler

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options