BBC BASIC - 64 bit integer support, long string support

145 posts, 21 voices

Pages: 1 2 3 4 5 6

Jul 31, 2024 1:49pm Rick Murray (539) 13839 posts	Well yes but that’s not API. It’s what is described as a filename/filespec in book 2 of the PRMs. It’s what appears on screen, is used in the command line, and really you wouldn’t get that far trying to hand the OS DOS style paths… ;) Whereas b3-5 (so far) of the the machine type returned from OS_Byte,0 definitely is API: For RISC OS? Where is this documented as applying at all to RISC OS? It simply specifies “all RISC OS machines return 6” with no mention at all about any additional interpretation of the bits, and if other systems with other values did other things, that pretty much applies to those other systems. That you could hook into vectors and do it on RISC OS as well is due to the extreme flexibility of the system, but you shouldn’t be surprised if stuff blows up. ;) and that starts at a [ statement possibly hundreds of lines earlier. Why? If you are at an IF and you want to find the ENDIF, given that IF is a BASIC construct, not an assembler one, surely you only need to scan forwards to look for [ and not anything prior? ENDWHILE is a two-byte token but Sophie only checked one, meaning it mistakes ACS Yeah, I ran into that one a long time ago. A facepalm moment for me when I was like “oh, so that’s why it didn’t work”. Might even have been you that pointed out the bug. coupled with the scratch space (which is a defined part of the RISC OS API It’s a bit wild west though, isn’t it? Or does FileSwitch not ralph into it these days?

Jul 31, 2024 5:06pm Steve Fryatt (216) 2105 posts	If the line ends ;“<&8C> is that the comment ‘;”… It’s been a while since I looked at this stuff, but it doesn’t even need to be an ellipsis in the assembler, does it? Given that the tokeniser has no idea that the assembler even exists, ISTR that a commenting style like `; IF I DO THIS, THEN ; BAD THINGS MIGHT HAPPEN` within assembly code could also leave you with a THEN token at the end of a line.

Jul 31, 2024 5:32pm nemo (145) 2545 posts	Gavin said …interpreter, which is so different from that of most other interpreters nowadays Indeed, and in stark contrast to PostScript for example, which operates in a tight execution loop: Parse input to create an object to push onto the execution stack Execute the contents of the execution stack repeat Other “interpreted” languages are actually [semi-]compiled – eg Python; sometimes to a P-code or VM – eg OPL, Java. But Basic is fully goldfish-memory interpreted at all times. The reason the BBC Basic interpreter works this way is nowt to do with the ARM – it’s because it has always worked this way. Much of it dates back to the Atom, don’t forget: It had to be small, not fast. The SCT does its best to mitigate the constant overhead, but a small elective cache is overwhelmed by large programs – it just wasn’t designed for the memory sizes that exist now, and once memory becomes plentiful the argument for a cache over a store changes dramatically. ie No, don’t just make the SCT bigger. That really doesn’t help much in isolation. “If you want an algorithm to go fast, choose the fast algorithm”. So the interpreter is most like a Choose Your Own Adventure™ book – except you don’t get thrown in the river by the mafia or eaten by a dinosaur and have to turn back to the page you kept your finger in. [am I the only person who used to hack books?] Rick remarked It’s what is described as a filename/filespec My point is although you can intercept vectors and do what you want, how is anything expected to know what you’ve chosen? There’d be no API if that’s all you did. However, if you also modified the machine type you’d be strictly compliant. Not that it’d help but you take the API point. surely you only need to scan forwards to look for [ and not anything prior? <points at the very clear diagram> OK I’ll try to be more clear: 1: IF…THEN/mELSE/ENDIF almost always has to skip forwards in the program – either because the expression wasn’t truthy so it has to find the matching mELSE (if present, or ENDIF otherwise); or because it has finished executing the truthy bit and hit mELSE, so has to skip to ENDIF. 2. mELSE and ENDIF have to be at the start of the line. This is for performance reasons: 3. When Basic needs to search for ENDIF (or mELSE) it immediately skips to the end of the line (by looking for CR). It then advances one whole line at a time checking for two things: 3a. Is the last byte on the previous line a THEN? If so, increment the nesting count. 3b. Is the first (non-space) byte on this line ENDIF? If so, decrement the nesting count and if negative, the matching ENDIF has been found and execution continues. So this is as fast as BBC Basic gets – it can skip from line to line trivially easily because each line contains a length byte. This is why your lines can only be 251 bytes long – there’s four bytes of gumf. 3a is where the bug is. It is not sufficient to check only the last byte, as the token for THEN happens to be the same as the character code for the ellipsis chr in Latin1. So if you have a REM ending in an ellipsis, the IF code thinks it’s a nested IF/THEN/ELSE/ENDIF and increments its nesting count. Which means it never finds the actual ENDIF, and falls off the end of the program. Hopefully that’s made the bug clearer. How have I fixed it in nemoBasic? 3a. If the last byte on the previous line is a THEN token then look for a preceding REM token in the line. If REM, ignore it. Else increment the nesting count. This works for the vast majority of cases. But as I explained previously, there are also assembly comments, and while you can recognise a REM token unambiguously you cannot tell 1) if it applies, or 2) whether you need to consider `\` and `;` too by only looking at that one line. eg off the top of my head: `=0:REM that's a byte:]:IF valid THEN` `=0:REM that's an FN:]:IF comment THEN` Static analysis of individual lines is impossible in BBC Basic, which is why the Basic tokeniser gets stuff wrong in assembly which the assembler has to work around – because the tokeniser cannot know it is in assembly. So the only way for IF to be sure that the byte on the end of the line is actually THEN would be to check every single line it currently skips looking for a [ or ] and toggling the “I’m inside assembly so comments are more complicated” flag. Uggh. does FileSwitch The scratch space is available for use as long as you don’t call a SWI that uses it (or, because it predates it, if you’re running in a TaskWindow). “How do you know whether” is of course a question for the documentation, so it’s a shame we don’t have any for RISC OS.

Jul 31, 2024 5:49pm nemo (145) 2545 posts	Steve confirmed ; IF I DO THIS, THEN Yes and no: Yes via the interpreter’s tokeniser, for `;` and `\`. Not via Zap, for `; THEN` and `\ THEN`. But for all cases if ellipsis (if you have a fancy keyboard that can enter it and a fancy memory that recalls how). As it happens, it appears I’m much more likely to end a comment in an ellipsis than a SHOUTY THEN, because unlike some bugs that I identified by code inspection, this one actually caught me in the wild. I have a build switch that makes `;` a to-the-next-colon comment everywhere to mitigate this problem, but have not switched it on. Now I store the result of the ENDIF scan I can afford to check every byte, as WHILEFALSE has to. <bullet byte>

Jul 31, 2024 7:19pm Rick Murray (539) 13839 posts	However, if you also modified the machine type you’d be strictly compliant. That’s a painfully obscure way of being “compliant”, akin to a sledgehammer and a skull for dealing with a headache. Yes, it would work, but the side effects make it absolutely not a recommended course of action. OK I’ll try to be more clear I wasn’t having issues with all of the hoops to jump through to find the matching clause, it was the idea that it would need to scan every line that made my brain hiccup. This is BASIC, it only needs to look forwards… =0:REM that’s a byte:]:IF valid THEN Thankfully I’ve never been bitten by that. But, then, I’m allergic to stuffing loads of lines together with the colon… <- look, an ellipsis! is of course a question for the documentation, so it’s a shame we don’t have any for RISC OS. <nods slowly> Sadly the PRM is lacking in certain places. But, then, it is described in the available (December 1992) PRM1 as “System workspace”. I’m sure, somewhere, there were some notes about the use of scratch space, but I can’t find that in my PDF PRMs. Did it get retconned out, or was it in something earlier like the Arthur PRM? 🤷🏻‍♀️

Jul 31, 2024 8:38pm tymaja (278) 174 posts	“You’re getting further and further from compatibility like that” How so? (I do have the STRACC,OUTPUT,ERRORS in the same place, and 256 bytes, but I can move them anywhere, and change their length, dynamically as well). I am guessing there is software that attempts to access them in the ‘hardwired’ &8000-&8700 locations? If so … I bet there is software that directly accesses A%-Z% directly also, and software that assumes ARGP is always &8700, maybe even software that accesses VCACHE directly? My aim is to ‘extend’ BASIC’s functionality, while trying not to break stuff unnecessarily; making the string accumulators, ARGP, and VCACHE ‘mobile’ allows future expansion without unnecessarily breaking stuff; I can set the accumulators at &8000 up, 256/256/512 bytes, set ARGP to &8700, and put VCache at (ARGP+0), so if I find software that needs that, then there is no problem, and software that needs extra features will be able to ‘ask’ for them – I can change STRACC + related length while running, and can even move ARGP when running (technically); I have four memory areas : (string accumulators), (ARGP workspace), (VCACHE), (everything else / LOMEM/PAGE to HIMEM); these can be set up in the same places as original BASIC, or can be moved! I am breaking compatibility in a few ways – 10 byte floats don’t fit into 5 bytes being one example. However, I do intend to ensure that the ‘official’ entry points remain compatible (including those that can be accessed from machine code called from BASIC), which sill require some extra code for stuff like float access, when needed. I am keeping compatibility from the viewpoint of a running BASIC program, but some stuff has to be broken – one example is that you can enter binary %10101010101 longer than 32 bits (in unmodified BBC BASIC), and it doesn’t complain (the BININ functions shows why) – however a problem arises with 64 bit ints : BININ needs to be able to be used above 32 bits, but there may be software out there where someone accidentally put a 33 bit binary number in the code, but didn’t realise as it still worked fine – such software could fail in a BASIC where BININ supports 1-64 bits; there could even be software out there that uses % % to deliberately invoke a specialised error handler that they wrote 😬 I will maintain compatibility with existing BASIC programs as much as possible (the file format remains the same, no new tokens, line numbers remain with the same limits, the machine code interface will remain compatible, and many of the ‘quirks’ of the interpreter will remain unchanged)

Jul 31, 2024 8:39pm nemo (145) 2545 posts	I’m sure, somewhere, there were some notes about the use of scratch space PRM5-15: `The public¹ area may be used by any module that is not • used in an IRQ routine • used if you call something else that might also use it An example client would be FileCore using the scratch space to hold structures while working out how to allocate some free space. Another example would be the Filer using the scratch space to hold structures for OS_HeapSort. [¹ is Scratch space &4000-&7FFF]` Now you might quibble about the word “module” there, but the fact that the Filer is explicitly cited means user mode so the module bit is irrelevant.

Jul 31, 2024 8:47pm nemo (145) 2545 posts	tymaja said I bet there is software that directly accesses A%-Z% directly also, and software that assumes ARGP is always &8700, maybe even software that accesses VCACHE directly? The fact is that ARGP is part of the API of CALL, USR and OSCLI. So one has to maintain that. There is code that accesses A%-Z% by address, and &8300 is commonly used as a program buffer. So although you can do whatever you want in `*tymajaBasic`, I’m just flagging up where it deviates from BBC Basic and hence will be automatically incompatible with some existing code, potentially in an explosive way. a problem arises with 64 bit ints Actually a problem arises with plain old 32b constants – as I’ve asked before, what is the value of &FFFFFFFF? Is it -1 or 4 billion?

Jul 31, 2024 9:03pm David J. Ruck (33) 1635 posts	In 64 bit its 4 billion, &FFFFFFFFFFFFFFFF is -1

Jul 31, 2024 9:17pm tymaja (278) 174 posts	“The fact is that ARGP is part of the API of CALL, USR and OSCLI. So one has to maintain that. “There is code that accesses A%-Z% by address, and &8300 is commonly used as a program buffer. So although you can do whatever you want in tymajaBasic, I’m just flagging up where it deviates from BBC Basic and hence will be automatically incompatible with some existing code, potentially in an explosive way.”” I am maintaining ARGP, just not necessarily at &8700; (CALL, USR and OSCLI just set R8 or R2 to the value of ARGP anyway, so the only software I will break there will be software that is passed ARGP in R8 (or R2), but then decides to just use &8700 regardless! I will break software that addresses A% to Z% directly (unless they cross-reference to ARGP). Software that uses FREELIST (or OUTPUT?) as temporary workspace may fail as well, although such software can likely be made to work again by; - setting BASIC to start at with string accumulators, FREELIST, ARGP, VCACHE, in the expected places - programs can use SYS LOCAL to extend functionality if they want to - when extending functionality, it would be easy to leave &8300 unused anyway - I am not too* worried about software accessing A%-Z% directly. Compatibility for BASIC programs doing this could be added in a similar way to BASIC V and BBC Micro MOS calls;

Jul 31, 2024 9:20pm tymaja (278) 174 posts	“In 64 bit its 4 billion, &FFFFFFFFFFFFFFFF is -1” Agreed! I’m going with ‘signed int 64’ for BASIC – we always managed with 32 bit signed ints, and using operators such as & can be used in places where we want direct ‘bit’ access to the integer

Aug 1, 2024 12:18am nemo (145) 2545 posts	`In 64 bit its 4 billion` So this perfectly normal bit of Basic would now throw an error? `!m=&FFFFFFFF` You’ll note that RTR’s BBfW and BBfSDL has an option to control this, and the default is 32b for obvious reasons: And BBfW doesn’t have to do `SYS...TO`!

Aug 1, 2024 6:58am David J. Ruck (33) 1635 posts	! is a 32 bit indirection operator so should still work, you would need a new redirection operator for 64 bit values, you cant use !! of course.

Aug 1, 2024 6:20pm tymaja (278) 174 posts	“! is a 32 bit indirection operator so should still work, you would need a new redirection operator for 64 bit values. you cant use !! of course.” £ ….. it could get messy! What happens with !! (will try that later today)? some symbols that are ‘rarely used’ are #@[]£; the pound symbol would annoy anyone without a UK keyboard. The assembler brackets could be used in some way … we have so little symbol ‘real estate’ left, devoting a new symbol only for ‘64 bit’ could be a challenge. Looking at Aarch64 as an example, it is kind of a hybrid 32/64 bit CPU, in that 32/64 bit load/store/data processing can be selected pretty much on a per-instruction basis. To even approach that level of flexibility in BASIC presents some challenges – but we really do need to allow 32 and 64 bit access ‘at will’ from BASIC code… !⁶⁴. <—- square brackets here, but forum software changes them to something else !¹²⁸ !{64} !{128} …

Aug 1, 2024 7:43pm Simon Willcocks (1499) 513 posts	!!X would presumably read the word pointed to by the word pointed to by X.

Aug 1, 2024 8:04pm Rick Murray (539) 13839 posts	!!X would presumably read the word pointed to by the word pointed to by X. >DIM a% 4 >DIM b% 4 >DIM c% 4 >$a% = "RICK" >b%!0 = a% >!c% = b% >PRINT ~!a%, ~!!b%, ~!!!c% 4B434952 4B434952 4B434952 > ;) square brackets here, but forum software changes them to something else Yes, footnotes ¹. You can wrap things you don’t want messed up in `<notextile> ... </notextile>` tags, so you can have all the square brackets you like [1] [2] [3]. some symbols that are ‘rarely used’ are #@[]£ Use `]`. Why? Because it’s what BBfW uses. I’m sure Russell has already had all of these discussions and he chose `]`, so we should do something different only if there’s a very pressing reason to do so… While we’re at it, using $$ is the same as the $ indirection operator, only it assumed the string will be null terminated. Like: SYS "OS_GetEnv" TO cmdline% cmdline$ = $$cmdline% That would be a great addition, we can finally say goodbye to all that XOS_GenerateError nonsense. ¹ Like this.

Aug 2, 2024 11:46am nemo (145) 2545 posts	druck suggested ! is a 32 bit indirection operator so should still work You’ve totally missed the point. I think everyone else has too or is glossing over it because it’s inconvenient. Rick asked What happens with !! Already syntactic – `a!!b` means “add the 32b integer at b to a and return the 32b integer at that address”. Use ] Terrible decision isn’t it. Incompatible with so much stuff. Certainly incompatible with lots of nemoBasic syntax – string slicing, bit slicing, assembler dialects. And also you just can’t use an orphaned bracket as an operator. It’s a crime against humanity. As I said earlier, I don’t have good answers. When I implemented 64b ints years ago I hit all these problems (and solved some of them), but gave up when I realised that with the 40b floats nemoBasic uses the lack of round-trip made it impractical to continue. CASE will kill you. This is the syntax I chose which is is 100% backwards compatible, unambiguous and completely logical. But note that I also have a parametric `INT` which allows the precise control over casting which is absolutely required when you have the deadly combination of both 32b and 64b ints with automatic type coercion, as BBC Basic does – this is the point many of you continue to miss. I have this functionality switched off because it’s too much bother. eg Although you can `SYSnum,a%%TOb%%` by spreading across two registers, it’s madness to try it because `SYSnum,c%+2` might do the same thing. I’d urge you to think more deeply about the implications of implicit casting in BBC Basic.

Aug 2, 2024 12:07pm nemo (145) 2545 posts	@Rick !! Actually, more recently I’ve added the BGET and BPUT serialisation functionality, which could trivially support 64b int. D’oh. I’m an idiot. I’ll, retrofit that. Hence: `BPUT(mem%,25,a%%,b%%,c%%) a%%=BGET(mem%,25)` I’d like the “25” to be “64” but it’s just an enum:

Aug 2, 2024 12:14pm Rick Murray (539) 13839 posts	Rick asked What happens with !! No, Rick answered. Tymaja asked. ;) It’s a crime against humanity. Steady on now! this is the point many of you continue to miss. I’m still waiting to see when somebody might realise that &FFFFFFFFFFFFFFFF should equal &FFFFFFFF, because while they’re both extremely not the same, in their respective types they’d both be -1/TRUE. <grabs a bag of popcorn>

Aug 2, 2024 12:40pm David J. Ruck (33) 1635 posts	You’ve totally missed the point. I think everyone else has too or is glossing over it because it’s inconvenient. That means you need to make the point better! It’s a crime against humanity. I’ll agree with you a hanging bracket should not be used under any circumstance in any known universe. Brackets come in PAIRS. I don’t like £ either, as people are too used to the normal meaning of £1234. There must be another suitable character which is not already used for something else.

Aug 2, 2024 1:30pm nemo (145) 2545 posts	Actual Rick actually said I’m still waiting to see when somebody might realise that &FFFFFFFFFFFFFFFF should equal &FFFFFFFF At least you get the point, though you’ve not looked closely at my implementation. In fact: you need to make the point better In a language with both 32b and 64b integers and automatic type coercion, the interpreter cannot know what kind of casting is appropriate in general. Hence my repeated example: `!m=&FFFFFFFF` Rick gets it – the above has always put -1 into m. The existence of 64b ints cannot suddenly turn that into “Number too big”. And yet if `a=(1<<30)2` then `!m=a` absolutely must* generate “Number too big” as it always has. Resolve bit 31! That is the point. Automatic type coercion is incompatible with variable-sized ints. So “&FFFFFFFF” has to remain a 32b int. 40 years of BBC Basic requires it. RTR dodges the bullet by: BBfW isn’t the built-in RISC OS BBC Basic It defaults to the existing 32b behaviour (but 64b then doesn’t work) Switching on 64b allows big ints, but the “Number too big” problem is introduced My solution is better. IMHO.

Aug 2, 2024 1:41pm nemo (145) 2545 posts	@druck Perhaps it would help to think of “`&FFFFFFFF`” as having a type – because it does. If you have two types of integer storage but only one type of integer value, bad things happen. So the solution is for the expression evaluator to support both sizes of value, and every keyword, operator and function must then do the appropriate form of casting where necessary. And where explicit casting is required (and it will be):

Aug 2, 2024 7:27pm David J. Ruck (33) 1635 posts	The usual way of doing it, is how I stated, the type of the left hand side of the equation determines how the right hand side is interpreted. `var32 = &FFFFFFFF: REM will be -1 var64 = &FFFFFFFF: REM will be 4294967296`

Aug 2, 2024 8:26pm Dave Higton (1515) 3525 posts	It strikes me that `INT(mode, expr)` has the arguments the wrong way round. expr should be the first. Compare with LEFT$, RIGHT$, MID$.

Aug 2, 2024 8:53pm Rick Murray (539) 13839 posts	though you’ve not looked closely at my implementation Well, nobody has, unless it’s somewhere here that I failed to spot? https://sites.google.com/view/nemo20000/ Nice work bringing OSASCI to life once again. I wonder why it was missed? Certainly it would have been easier than using OS_WriteX (whatever form chosen) followed by OS_NewLine, to have the OS be smart enough to figure this out for itself. …or is this something PrettyPrint can do? Interesting comments regarding OSWrch. Hmm, didn’t Jeffrey make a big diagram to explain Wrch that was complexly hellish to follow? Brackets come in PAIRS. I fully agree, but we’re late to the party and a version of BBC BASIC uses an annoying single bit of bracket so… (though, as nemo points out, the other BASIC has caveats, because suprise! shoehorning an extra integer type into BASIC has “side effects”) There must be another suitable character which is not already used for something else. Hmm… Hash is used with files, £ is a currency, @ is a variable already, & has meaning, +-*/^ are maths, %$ are already used… and brackets should be frowned upon. Backslash, that’s an option for assembler comments isn’t it? Hmm, we’re rapidly running out of keyboard-accessible punctuation.