BBC BASIC - 64 bit integer support, long string support
Rick Murray (539) 13806 posts |
It’s what is described as a filename/filespec in book 2 of the PRMs. It’s what appears on screen, is used in the command line, and really you wouldn’t get that far trying to hand the OS DOS style paths… ;)
For RISC OS? Where is this documented as applying at all to RISC OS? That you could hook into vectors and do it on RISC OS as well is due to the extreme flexibility of the system, but you shouldn’t be surprised if stuff blows up. ;)
Why? If you are at an IF and you want to find the ENDIF, given that IF is a BASIC construct, not an assembler one, surely you only need to scan forwards to look for [ and not anything prior?
Yeah, I ran into that one a long time ago. A facepalm moment for me when I was like “oh, so that’s why it didn’t work”.
It’s a bit wild west though, isn’t it? Or does FileSwitch not ralph into it these days? |
Steve Fryatt (216) 2103 posts |
It’s been a while since I looked at this stuff, but it doesn’t even need to be an ellipsis in the assembler, does it? Given that the tokeniser has no idea that the assembler even exists, ISTR that a commenting style like
within assembly code could also leave you with a THEN token at the end of a line. |
nemo (145) 2529 posts |
Gavin said
Indeed, and in stark contrast to PostScript for example, which operates in a tight execution loop:
Other “interpreted” languages are actually [semi-]compiled – eg Python; sometimes to a P-code or VM – eg OPL, Java. But Basic is fully goldfish-memory interpreted at all times. The reason the BBC Basic interpreter works this way is nowt to do with the ARM – it’s because it has always worked this way. Much of it dates back to the Atom, don’t forget: It had to be small, not fast. The SCT does its best to mitigate the constant overhead, but a small elective cache is overwhelmed by large programs – it just wasn’t designed for the memory sizes that exist now, and once memory becomes plentiful the argument for a cache over a store changes dramatically. ie No, don’t just make the SCT bigger. That really doesn’t help much in isolation. “If you want an algorithm to go fast, choose the fast algorithm”. So the interpreter is most like a Choose Your Own Adventure™ book – except you don’t get thrown in the river by the mafia or eaten by a dinosaur and have to turn back to the page you kept your finger in. [am I the only person who used to hack books?] Rick remarked
My point is although you can intercept vectors and do what you want, how is anything expected to know what you’ve chosen? There’d be no API if that’s all you did. However, if you also modified the machine type you’d be strictly compliant. Not that it’d help but you take the API point.
<points at the very clear diagram> OK I’ll try to be more clear: 1: IF…THEN/mELSE/ENDIF almost always has to skip forwards in the program – either because the expression wasn’t truthy so it has to find the matching mELSE (if present, or ENDIF otherwise); or because it has finished executing the truthy bit and hit mELSE, so has to skip to ENDIF. 2. mELSE and ENDIF have to be at the start of the line. This is for performance reasons: 3. When Basic needs to search for ENDIF (or mELSE) it immediately skips to the end of the line (by looking for CR). It then advances one whole line at a time checking for two things: 3a. Is the last byte on the previous line a THEN? If so, increment the nesting count. 3b. Is the first (non-space) byte on this line ENDIF? If so, decrement the nesting count and if negative, the matching ENDIF has been found and execution continues. So this is as fast as BBC Basic gets – it can skip from line to line trivially easily because each line contains a length byte. This is why your lines can only be 251 bytes long – there’s four bytes of gumf. 3a is where the bug is. It is not sufficient to check only the last byte, as the token for THEN happens to be the same as the character code for the ellipsis chr in Latin1. So if you have a REM ending in an ellipsis, the IF code thinks it’s a nested IF/THEN/ELSE/ENDIF and increments its nesting count. Which means it never finds the actual ENDIF, and falls off the end of the program. Hopefully that’s made the bug clearer. How have I fixed it in nemoBasic? 3a. If the last byte on the previous line is a THEN token then look for a preceding REM token in the line. If REM, ignore it. Else increment the nesting count. This works for the vast majority of cases. But as I explained previously, there are also assembly comments, and while you can recognise a REM token unambiguously you cannot tell 1) if it applies, or 2) whether you need to consider
Static analysis of individual lines is impossible in BBC Basic, which is why the Basic tokeniser gets stuff wrong in assembly which the assembler has to work around – because the tokeniser cannot know it is in assembly. So the only way for IF to be sure that the byte on the end of the line is actually THEN would be to check every single line it currently skips looking for a [ or ] and toggling the “I’m inside assembly so comments are more complicated” flag. Uggh.
The scratch space is available for use as long as you don’t call a SWI that uses it (or, because it predates it, if you’re running in a TaskWindow). “How do you know whether” is of course a question for the documentation, so it’s a shame we don’t have any for RISC OS. |
nemo (145) 2529 posts |
Steve confirmed
Yes and no: Yes via the interpreter’s tokeniser, for As it happens, it appears I’m much more likely to end a comment in an ellipsis than a SHOUTY THEN, because unlike some bugs that I identified by code inspection, this one actually caught me in the wild. I have a build switch that makes |
Rick Murray (539) 13806 posts |
That’s a painfully obscure way of being “compliant”, akin to a sledgehammer and a skull for dealing with a headache. Yes, it would work, but the side effects make it absolutely not a recommended course of action.
I wasn’t having issues with all of the hoops to jump through to find the matching clause, it was the idea that it would need to scan every line that made my brain hiccup. This is BASIC, it only needs to look forwards…
Thankfully I’ve never been bitten by that. But, then, I’m allergic to stuffing loads of lines together with the colon… <- look, an ellipsis!
<nods slowly> Sadly the PRM is lacking in certain places. But, then, it is described in the available (December 1992) PRM1 as “System workspace”. |
tymaja (278) 172 posts |
How so? (I do have the STRACC,OUTPUT,ERRORS in the same place, and 256 bytes, but I can move them anywhere, and change their length, dynamically as well). I am guessing there is software that attempts to access them in the ‘hardwired’ &8000-&8700 locations? If so … I bet there is software that directly accesses A%-Z% directly also, and software that assumes ARGP is always &8700, maybe even software that accesses VCACHE directly? My aim is to ‘extend’ BASIC’s functionality, while trying not to break stuff unnecessarily; making the string accumulators, ARGP, and VCACHE ‘mobile’ allows future expansion without unnecessarily breaking stuff; I can set the accumulators at &8000 up, 256/256/512 bytes, set ARGP to &8700, and put VCache at (ARGP+0), so if I find software that needs that, then there is no problem, and software that needs extra features will be able to ‘ask’ for them – I can change STRACC + related length while running, and can even move ARGP when running (technically); I have four memory areas : (string accumulators), (ARGP workspace), (VCACHE), (everything else / LOMEM/PAGE to HIMEM); these can be set up in the same places as original BASIC, or can be moved! I am breaking compatibility in a few ways – 10 byte floats don’t fit into 5 bytes being one example. However, I do intend to ensure that the ‘official’ entry points remain compatible (including those that can be accessed from machine code called from BASIC), which sill require some extra code for stuff like float access, when needed. I am keeping compatibility from the viewpoint of a running BASIC program, but some stuff has to be broken – one example is that you can enter binary %10101010101 longer than 32 bits (in unmodified BBC BASIC), and it doesn’t complain (the BININ functions shows why) – however a problem arises with 64 bit ints : BININ needs to be able to be used above 32 bits, but there may be software out there where someone accidentally put a 33 bit binary number in the code, but didn’t realise as it still worked fine – such software could fail in a BASIC where BININ supports 1-64 bits; there could even be software out there that uses % % to deliberately invoke a specialised error handler that they wrote 😬 I will maintain compatibility with existing BASIC programs as much as possible (the file format remains the same, no new tokens, line numbers remain with the same limits, the machine code interface will remain compatible, and many of the ‘quirks’ of the interpreter will remain unchanged) |
nemo (145) 2529 posts |
PRM5-15:
Now you might quibble about the word “module” there, but the fact that the Filer is explicitly cited means user mode so the module bit is irrelevant. |
nemo (145) 2529 posts |
tymaja said
The fact is that ARGP is part of the API of CALL, USR and OSCLI. So one has to maintain that. There is code that accesses A%-Z% by address, and &8300 is commonly used as a program buffer. So although you can do whatever you want in
Actually a problem arises with plain old 32b constants – as I’ve asked before, what is the value of &FFFFFFFF? Is it -1 or 4 billion? |
David J. Ruck (33) 1629 posts |
In 64 bit its 4 billion, &FFFFFFFFFFFFFFFF is -1 |
tymaja (278) 172 posts |
I am maintaining ARGP, just not necessarily at &8700; (CALL, USR and OSCLI just set R8 or R2 to the value of ARGP anyway, so the only software I will break there will be software that is passed ARGP in R8 (or R2), but then decides to just use &8700 regardless! I will break software that addresses A% to Z% directly (unless they cross-reference to ARGP). Software that uses FREELIST (or OUTPUT?) as temporary workspace may fail as well, although such software can likely be made to work again by; - setting BASIC to start at with string accumulators, FREELIST, ARGP, VCACHE, in the expected places |
tymaja (278) 172 posts |
Agreed! I’m going with ‘signed int 64’ for BASIC – we always managed with 32 bit signed ints, and using operators such as & can be used in places where we want direct ‘bit’ access to the integer |
nemo (145) 2529 posts |
So this perfectly normal bit of Basic would now throw an error?
You’ll note that RTR’s BBfW and BBfSDL has an option to control this, and the default is 32b for obvious reasons: And BBfW doesn’t have to do |
David J. Ruck (33) 1629 posts |
! is a 32 bit indirection operator so should still work, you would need a new redirection operator for 64 bit values, you cant use !! of course. |
tymaja (278) 172 posts |
£ ….. it could get messy! What happens with !! (will try that later today)? some symbols that are ‘rarely used’ are #@[]£; the pound symbol would annoy anyone without a UK keyboard. The assembler brackets could be used in some way … we have so little symbol ‘real estate’ left, devoting a new symbol only for ‘64 bit’ could be a challenge. Looking at Aarch64 as an example, it is kind of a hybrid 32/64 bit CPU, in that 32/64 bit load/store/data processing can be selected pretty much on a per-instruction basis. To even approach that level of flexibility in BASIC presents some challenges – but we really do need to allow 32 and 64 bit access ‘at will’ from BASIC code… !64. <—- square brackets here, but forum software changes them to something else |
Simon Willcocks (1499) 509 posts |
!!X would presumably read the word pointed to by the word pointed to by X. |
Rick Murray (539) 13806 posts |
>DIM a% 4 >DIM b% 4 >DIM c% 4 >$a% = "RICK" >b%!0 = a% >!c% = b% >PRINT ~!a%, ~!!b%, ~!!!c% 4B434952 4B434952 4B434952 > ;)
Yes, footnotes 1. You can wrap things you don’t want messed up in
Use While we’re at it, using $$ is the same as the $ indirection operator, only it assumed the string will be null terminated. Like: SYS "OS_GetEnv" TO cmdline% cmdline$ = $$cmdline% That would be a great addition, we can finally say goodbye to all that XOS_GenerateError nonsense. 1 Like this. |
nemo (145) 2529 posts |
druck suggested
You’ve totally missed the point. I think everyone else has too or is glossing over it because it’s inconvenient. Rick asked
Already syntactic –
Terrible decision isn’t it. Incompatible with so much stuff. Certainly incompatible with lots of nemoBasic syntax – string slicing, bit slicing, assembler dialects. And also you just can’t use an orphaned bracket as an operator. It’s a crime against humanity. As I said earlier, I don’t have good answers. When I implemented 64b ints years ago I hit all these problems (and solved some of them), but gave up when I realised that with the 40b floats nemoBasic uses the lack of round-trip made it impractical to continue. CASE will kill you. This is the syntax I chose which is is 100% backwards compatible, unambiguous and completely logical. But note that I also have a parametric I have this functionality switched off because it’s too much bother. eg Although you can I’d urge you to think more deeply about the implications of implicit casting in BBC Basic. |
nemo (145) 2529 posts |
@Rick
Actually, more recently I’ve added the BGET and BPUT serialisation functionality, which could trivially support 64b int. D’oh. I’m an idiot. I’ll, retrofit that. Hence:
I’d like the “25” to be “64” but it’s just an enum: |
Rick Murray (539) 13806 posts |
Rick asked No, Rick answered. Tymaja asked. ;)
Steady on now!
I’m still waiting to see when somebody might realise that &FFFFFFFFFFFFFFFF should equal &FFFFFFFF, because while they’re both extremely not the same, in their respective types they’d both be -1/TRUE. |
David J. Ruck (33) 1629 posts |
That means you need to make the point better!
I’ll agree with you a hanging bracket should not be used under any circumstance in any known universe. Brackets come in PAIRS. I don’t like £ either, as people are too used to the normal meaning of £1234. There must be another suitable character which is not already used for something else. |
nemo (145) 2529 posts |
Actual Rick actually said
At least you get the point, though you’ve not looked closely at my implementation. In fact:
In a language with both 32b and 64b integers and automatic type coercion, the interpreter cannot know what kind of casting is appropriate in general. Hence my repeated example:
Rick gets it – the above has always put -1 into m. The existence of 64b ints cannot suddenly turn that into “Number too big”. And yet if That is the point. Automatic type coercion is incompatible with variable-sized ints. So “&FFFFFFFF” has to remain a 32b int. 40 years of BBC Basic requires it. RTR dodges the bullet by:
My solution is better. IMHO. |
nemo (145) 2529 posts |
@druck Perhaps it would help to think of “ So the solution is for the expression evaluator to support both sizes of value, and every keyword, operator and function must then do the appropriate form of casting where necessary. And where explicit casting is required (and it will be): |
David J. Ruck (33) 1629 posts |
The usual way of doing it, is how I stated, the type of the left hand side of the equation determines how the right hand side is interpreted.
|
Dave Higton (1515) 3497 posts |
It strikes me that has the arguments the wrong way round. expr should be the first. Compare with LEFT$, RIGHT$, MID$.
|
Rick Murray (539) 13806 posts |
Well, nobody has, unless it’s somewhere here that I failed to spot? https://sites.google.com/view/nemo20000/ Nice work bringing OSASCI to life once again. I wonder why it was missed? Certainly it would have been easier than using OS_WriteX (whatever form chosen) followed by OS_NewLine, to have the OS be smart enough to figure this out for itself. Interesting comments regarding OSWrch. Hmm, didn’t Jeffrey make a big diagram to explain Wrch that was complexly hellish to follow?
I fully agree, but we’re late to the party and a version of BBC BASIC uses an annoying single bit of bracket so… (though, as nemo points out, the other BASIC has caveats, because suprise! shoehorning an extra integer type into BASIC has “side effects”)
Hmm… Hash is used with files, £ is a currency, @ is a variable already, & has meaning, +-*/^ are maths, %$ are already used… and brackets should be frowned upon. |