BBC BASIC - 64 bit integer support, long string support
Stuart Swales (8827) 1357 posts |
You’d be reasonably entitled to throw an ‘Accuracy lost’ error in the former case, depending on the magnitude of A%% and the implementation. |
nemo (145) 2545 posts |
I disagree in the strongest terms.
In fact I’ve extended the principle – nB allows
In fact the monadic bit-rotate even operates on bytes:
|
Stuart Swales (8827) 1357 posts |
That’s why I wrote ‘depending on the magnitude of A%%’. And as for ‘depending on … the implementation’, I’d rather have an implementation that worked for 64-bit ints that fit into a double-precision mantissa without loss right now than one that catered accurately for all possibilities in the dim amd distant future, which for RISC OS can be read as never. |
nemo (145) 2545 posts |
Apropos of nothing, I’ve implemented 64b file handling in nB: This uses AH’s proposed 64b Args calls, falling back to the standard on error. Naturally if the result is too big for nB’s 40b floats you get “Number too big”. But if a file is longer than &7FFFFFFF then PTR/EXT return a float (which is precise) rather than an integer (which is the wrong sign). This isn’t useful as such, just a way of thinking about polymorphism and the difference between storage precision and evaluation precision. Naturally this means that i.e. If PTR/EXT return a 64b integer in a 64b build (which I think some here would expect) then what happens for the perfectly reasonable line Alternatively, if you suggest that it silently truncate to a negative value, then I don’t have any good answers BTW. I think the evaluator has to be able to round-trip 64b ints regardless of storage precision. |
nemo (145) 2545 posts |
Demo: and consequently: I suspect |
Rick Murray (539) 13839 posts |
Let’s not overlook the RAMdisc large enough to hold a VeryBig file. |
tymaja (278) 174 posts |
My hand has already been forced to implement 72-bit floats (with space for 80 bits / 2 byte exponent once I get the ‘9 byte float with an unused extra exponent byte’ working), because doing fully separate paths for int and FP would require so many changes it would be a full rewrite* of so much of the code! I am going to go for a ‘float80 / signed-int64’ upgrade initially, as that is the simplest first step: Other reasons I like float80/int64 are:
|
tymaja (278) 174 posts |
(I haven’t figured out how to do quote replies yet!) however : replying to nemo: Regarding the truncation of integers – there is no easy answer – backward compatibility versus new features; however, what I think I am going to do, is to create a SWI to allow BASIC to ‘configure itself’ … without even asking for an allocation (yet – but I will if it actually works!). I would also ask for an allocation before releasing any code, to prevent any possibility of issues (given I will use an arbitrary SWI base … sort of … during development); I will likely have a SWI such as BASICA64_Op, which can configure a few settings, and which can work as an X SWI. Then, BASIC can, by default, start up in ‘classic 32-bit’ mode, and programs that know about 64-bit extensions can do a SWI that ‘activates’ the 64-bit integer mode, so such programs can then fall back to a 32-bit mode if the SWI returns an error / doesn’t exist! - this would even work in a future 64-bit RISC OS; we would need to document the SYS call in BASIC’s help somehow so it is easy to find without going online to search. As the BASIC SYS command is not (immediately) a SWI, the ‘configuration’ SWIs could just be caught in BASIC’s SYS handler code, and executed there without needing to even invoke a SWI in the first place! |
Rick Murray (539) 13839 posts |
Probably a dumb question, but why not a new keyword? Can’t the non-existence of that (on older BASICs) be dealt with by a local error handler? |
Steve Pampling (1551) 8170 posts |
Short “Formatting Help” down below also the not as bold as it should be1 Textile reference link just below that. 1 To whom it may concern – Told you so :P |
tymaja (278) 174 posts |
but why not a new keyword? Can’t the non-existence of that (on older BASICs) be dealt with by a local error handler?” True – although as was noted, error handlers can mess up variables etc (although a ‘planned’ error testing a new keyword could probably avoid that). However – I know adding new keywords is frowned upon, because it really messes up older BASIC (because the keywords are tokenised, so a new token would be added, and I have no idea what older BASIC does when it finds a weird token!); I like the SWI idea, because it might be ‘SYS “XBASICA64_CheckFeature”, feature_number to result%, and result% can be -1 for lack of a feature. Would probably need to use numbered SWIs to avoid ‘SWI name not known’ of course! Another possibility is adding functionality to existing keywords, which could work well, except could get messy if there are many extensions that can be enabled! |
John WILLIAMS (8368) 493 posts |
Before you edited the immediately above, had you just put a space after the “bq.” it would have worked. Unfortunately you edited it as I logged in. |
nemo (145) 2545 posts |
As I’ve been saying. In fact there’s separation between the evaluation of an expression and the consumption of the result – there’s only the one evaluator, regardless of purpose. The one exception is the flawed array reference handling, that tries to emulate an expression evaluator using whole-arrays, but in fact only supports a very small number of idioms, one of which it gets wrong. That’s why I added the
Parson’s egg. The fast parts are fast. The slow parts are slow. eg
Actual flow-control (loops, switching etc) is easy to understand – those that require a context create a stack structure that the end-loop then unwinds to, baulking at some specific constructions that can’t be popped (you can’t ENDWHILE out of a PROC for example). The interpreter itself doesn’t really have flow control, it’s a state machine – this is in direct contrast to interpreted languages like PostScript that work more like a virtual machine. This is what makes this interpreter so contextual – there’s two different tokens for
No. Basic has no module state, and certainly has no state-per-program. All program state is kept in the Application Slot, and there is no way to link the SWI to the Application Slot. You would need command-line parameters or, as I have been using for some time, some kind of pragma. There may be a role for
It’s pointless because the editors you’re likely to use won’t support them. Zap does its own tokenisation for example. Plus third-party compressors and debugging utilities will be stumped. The interpreter throws “Syntax error”. This is why the 100s of new functions in nemoBasic are constructed with existing keywords in new unambiguous combinations. eg mutable FOR loops using |
nemo (145) 2545 posts |
Zero-length file on RAM: is reported as longer than it actually is. Viva la vector. |
nemo (145) 2545 posts |
BTW tymaja you should look at RTR’s BBC Basic for WIndows which has established idioms for all of this already – 64 bit ints, float precision and so on. That shouldn’t necessarily constrain anyone, but it’s grist to the mill. JGH also an authority I rarely disagree with. I do miss SteveD. :-( |
Rick Murray (539) 13839 posts |
That explains the really weird syntax of
It does? Or did you mean within assembler sections?
Perfect. DEF WIDTH = 64
Cheater! ;)
+=1 |
nemo (145) 2545 posts |
Re “BBC Basic is fast”: Much of it is. Some of it definitely isn’t. Flow control in particular can be punishingly slow, which nB addresses. The performance of multiline Whereas Obvious optimisation is obvious.
Yes. A thinko – REM is not semicolon. Assembly comments are a huge vulnerability to the poor overstrained interpreter – many bugs associated with them, not least because it’s VERY HARD to work out whether you’re actually IN assembly – see above for the difference checking every byte makes. nB actually makes it possible to much more precise in these kind of checks – because it only does it the once.
Absolutely. OF == Foreground, ON == on top of (background). Part of the fun of this project has been staring at every combination of keyword and punctuation and thinking “What could that do?”
<rolls sleeves up>API’s API, mate! I keep threatening to implement an FS via the vectors instead of FileSwitch, old-skool BBC Micro style. It’d still work (and could do anything at all, including Windows path syntax if one wished – auto translation |
nemo (145) 2545 posts |
Aside: One would be entitled to use DOS/Windows, Unix or even CP/M filename syntax in RISC OS as the syntax has always been explicitly documented in the API (and no, I don’t mean FSC,59). But we all know this and have built that into our code, right? <Skeletor running away meme> |
Rick Murray (539) 13839 posts |
Eh? PRM2 under FileSwitch defines the usual syntax ( You could implement a different filesystem outside of FileSwitch, but if may run afoul of assumptions based upon how FileSwitch works.
Is this a ROLtd thing?
Nine seconds becomes a blink. That’s a heck of an optimisation.
It is? Is I just typed But, SETPAN, reminds me that I wanted to make pancakes for dinner. I think I ought to get on to that right about now… |
Stuart Swales (8827) 1357 posts |
Sit on the vectors, do whatever you like. |
Rick Murray (539) 13839 posts |
…and don’t act surprised when something performs an action that is completely valid within the defined filesystem naming conventions which then crashes because “oh look, an entire filespec and not a single dot” or “no $” or… |
tymaja (278) 174 posts |
Agreed – this is actually why I spoke of letting BASIC ‘configure itself’ – BASIC has the ‘ARGP’ workspace, which is unique to each program running; and the ‘configuration’ would just be stored in there somewhere – this is also why I spoke of implementing a SWI without an allocation … and calling it from within BASIC – essentially, what I could do is extend the ‘SYS’ keyword, capture the ‘SWI’ if it is a ‘special’ BASIC-only one. Essentially, the running BASIC program could configure it’s own environment (with some rules, to allow the stack and cache to be flushed, maybe even ‘no variables except in built A%-Z% ints are defined by the program prior to changing certain aspects of state’). Extending functionality with keywords is a good idea – I could just do ‘SYS LOCAL “BASIC_Configure”,a%,b% TO result%, result2%’ Also, regarding the below from the previous page:
The ‘bytes free’ needs 33 bits to store… :) Edit : actually, re: ARGP, the space there is getting rather tight, so … I have unlinked the string accumulators at one end, and the VCACHE at the other end, so there is plenty of room to spare (and the ‘VCACHE starts at ARGP +&0000’ thing has been fixed – this required a fair few changes, as that assumption is hard-coded into the source code) |
nemo (145) 2545 posts |
Rick remarked
Well yes but that’s not API. Whereas b3-5 (so far) of the the machine type returned from OS_Byte,0 definitely is API:
Had you never wondered why there were three different numbers for 6X09 CPUs? Now you know why! (BTW “CP/M” etc are the style of filename syntax, not the actual FS involved).
Some context: BBC Basic’s speed is mostly down to its tokenisation. So when it hits &CC at the start of a line (mELSE) it just searches for &CD at the start of a line (ENDIF). Note the requirement that both of those be at the start of a line (bar spaces – even “:” defeats it) – as skipping from line to line is very fast thanks to the length byte. However, if there’s any nested IF/ENDIFs then it needs to ignore those ENDIFs until it finds its ENDIF. So it spots the multiline IF by looking for the THEN at the end of a line. This is a vulnerability which leads to the “ellipsis bug” – if you have a REM which ends in the ellipsis chr within an IF/ENDIF, plain old BBC Basic thinks it’s an IF…THEN <facepalm meme>. This is fixed in nB, but it’s an easy fix because the existence of the REM is easy to confirm – just rewind. However, assembly comments ruin that strategy. What if you rewind and get to a ; or \ chr? Is that a comment? If the line ends Whether \ and ; are comments depends on whether you are in the assembler dialect… and that starts at a Which would mean that IF has to check EVERY BYTE OF EVERY LINE in order to reliably find ENDIF instead of just looking at the first and last bytes of a line as it does at present. This I describe as “very hard” in order to underline the orders of magnitude more time involved. Compare the IF with the ENDWHILE in my timed examples (or do your own) – WHILE/ENDWHILE is slow because ENDWHILE can be anywhere in a line. And even then, plain old BBC Basic gets it wrong because ENDWHILE is a two-byte token but Sophie only checked one, meaning it mistakes ACS for a nested WHILE and hence will fall off the end of the program. <different facepalm meme> [I do note that thanks to the above optimisation, nB could now be that pernickety as it only happens once]
This is analogous to what RTR has done in BBfW for
You’re getting further and further from compatibility like that, and it isn’t necessary. nB supports LongStrings without moving STRACC, which is still used for short strings as code expects. Also note that the first 1KB of memory has always been usable as a buffer which, coupled with the scratch space (which is a defined part of the RISC OS API regardless of whether some people wish it weren’t) means Basic programs always have 17KB of temp space just sitting there at &4000. |
GavinWraith (26) 1563 posts |
I am fascinated by the above insights into the operational semantics of the BBC Basic interpreter, which is so different from that of most other interpreters nowadays. The latter usually proceed in interleaved steps: first convert a stream of characters (the text of the program) into a stream of tokens [lexical analysis], simultaneously building a literal pool; second, parse the token stream using a grammar to build bytecode for a virtual machine; last, execute the virtual machine instructions. The first two steps can be paraphrased as “compilation” and the third as “runtime”. Lua, Python, Mawk, Caml , and doubtless many others, conform to this pattern. But BBC Basic behaves quite differently, which gives it some advantages and some disadvantages. |
David J. Ruck (33) 1635 posts |
All BBC BASIC’s advantages are aimed at giving great performance on an ARM2 with only a few K of memory. It’s disadvantages as we can see from above is the difficulty of extending some of it’s built in limits, such the length of strings, integers and floats, plus not taking advantages of modern interpreter techniques as Gavin describes. Whilst I can see it would be useful to support long strings in a way that allows them to be used in many existing programs without modification, that isn’t the case for 64 bit integers which need to be implemented as an additional type. In that case how important is maintaining rigid backwards compatibility in the interchangeability of ints and floats? I’ll be sticking with Python3 as it’s handling of long strings, and arbitrarily sized integers is far better. If only it ran on RISC OS with the same performance as it does on the same hardware with Linux, or more to the point as fast as BBC BASIC for some things. |