BBC BASIC - 64 bit integer support, long string support

145 posts, 21 voices

Pages: 1 2 3 4 5 6

Aug 2, 2024 9:14pm tymaja (278) 174 posts	An idea I am toying with is to make the ! operator variable size… without adding any more ‘special symbols’, rather adding a new syntax which hopefully wouldn’t clash with other syntaxes. At present, my idea would be as follows; A%!B% = data% (or data%% etc) : stores the value of this variable to the 32-bit word at the address (A%+B%) A%![X]B% = data%% stores the value of data%% to the 64-bit doubleword at address (A% + B%) A%![H]B% = data%% stores the value of data%% to the 16-bit halfword at address (A% + B%) A%![W]B% = data%% stores the value of data%% to the 32-bit word at address (A% + B%) A%![B]B% = data%% stores the value of data%% to the 8-bit byte at address (A% + B%) A%![ 64 ]B% = data%% – stores data%% to the 64-bit dword at address (A% + B%] (can use 32,16,8 etc) This could even be extended to … data% = A%![SB]B% – loads byte from address A%+B%, sign extends it, and stores it into data% data%% = A%![SW]B% – loads word from address, sign extends it, and stores it i to data%% I’ve started adding this functionality already, to test it and find glitches etc. I like this idea because: – the interpreter only needs to check for a [ symbol after a !, once it finds a !. It can then run a small subroutine that parses the [..] statement, to determine the size/type of load / store involves – also, when deciding to enter the assembler, the interpreter only needs to ensure that a [ doesn’t have a ! before it, before switching to assembler – when deciding to leave the assembler, when finding a ] while in asm mode, the interpreter can … – first do the usual checks that cover LDR R0,[R1,R2] – if it isn’t part of an opcode, only then it jumps to a subroutine to rule out the ![] construct… —> this means the code used, while in assembler mode, to decide between ] (close assembler) and ] (part of a ![] construct) only runs when the assembler finds a ] that isn’t part of an opcode … so the assembler isn’t slowed down. Thoughts on this? I think it is better than some ideas I had (where the interpreter ‘guesses’ the 32/64 length based on variable type(s), because: - it removes ambiguity for loading/storing different bit sizes. - It also keeps ! preserved as a 32-bit load/store operator, keeping compatibility with unmodified BBC BASIC - it is invalid syntax in BBC BASIC, so will just crash if used on older versions of BASIC (which is good, kind of like how CIRCLEFILL would crash on a beeb!) - it also means no overhead when using !, except when you want to access extended features, so won’t slow down older programs - it can speed up BASIC programs, if sign extension is needed anywhere, or if two 32-bit !s can be replaced with a single ![ 64 ] / [X] The idea could even be extended to floating point load/store, so that \| works as expected (as it always has), but \|[floating point format specifier] loads / stores the FP number at the address, and in the specified format. I’ve started experimenting with the above to try to find issues. Any thoughts on this … why it might be a bad idea etc? (I think it shouldn’t clash with nemoBASIC syntax?) (I’m using ARM32/64 letter codes for size, so ![] B, H, W, X, Q for 8, 16, 32, 64, 128 bits (128 reserved for future expansion!)

Aug 3, 2024 12:54am tymaja (278) 174 posts	Going further than this … if something like the above doesn’t have any dealbreaking faults I haven’t considered yet … then it could even be possible to do load/store at any bit offset; like this: byte% = ![B.4]address%% byte% = the byte at (64-bit memory address address%%), starting 4 bits ‘forward’ (so, ![B.0]address%% is just a longwinded way of saying ![B]address%%, or even ?address%%) going further, it could even become … thirteen_bits% = ![S13.7] address%% thirteen_bits = (sign extended) the thirteen bits, starting at position address%% + 7 bits ‘forward’ I will start with ![X] as the rest could be added later. However … this could be used to read from bitfields, and if optimised assembler is used for this function, it would end up a LOT faster than doing it all manually in BASIC. It could even benefit from the bitfield instruction in ARM64 if we did go 64-bit at some time in the future :)

Aug 3, 2024 10:42am nemo (145) 2545 posts	Dave suggested the arguments the wrong way round In fact MID$ etc have the params the wrong way round, and nB allows the more efficient order is well as retaining compatibility with the old ordering. There are many other keywords in nB that have become parametric, and where there’s a ‘mode’ or other specifier, it comes first. The reason is partly semantic but mainly practical – let’s compare the old and new order for LEFT$, first old: • Evaluate the string expression (usually a simple string variable) which must be byte-copied to “the” STRACC buffer • Check that there was a comma • Need the length, but there’s the string in the way, so byte-copy it onto the stack • Evaluate the length expression and fix it as an integer • Byte-copy the string from the stack back to STRACC • Update its length via the integer Versus new: • Evaluate length and fix to integer • Push a single register onto the stack • Evaluate the string, which byte-copies it to the STRACC • Pull the single length register • Update the length Do you see? In the INT case it’s done for symmetry and semantic reasons, but aside from that it only involves pushing a single register instead of the float. Plus there are many function-keywords that can now be used as a statement and take a list of variables to affect, which obviously must be at the end.

Aug 3, 2024 10:48am nemo (145) 2545 posts	druck continued The usual way of doing it, is how I stated, the type of the left hand side of the equation determines how the right hand side is interpreted. Nonsense. The expression evaluator has absolutely no idea what the purpose of the expression is. The idea that a function ending in “=&FFFFFFFF” has to know what purpose the returned value will ultimately be used for before the expression can be evaluated is absurd: `!m=FNfoo ... DEFFNfoo:=FNbar DEFFNbar:=FNbaz DEFFNbaz:=&FFFFFFFF` I think we’re done with this line of argument now. That’s simply not how interpreters work in general, and certainly not how BBC Basic works.

Aug 3, 2024 11:08am nemo (145) 2545 posts	Rick asked though you’ve not looked closely at my implementation Well, nobody has, unless it’s somewhere here that I failed to spot? Hidden in plain sight: OSASCI I also have a module called OSWrites that provides OS_Write16, OS_Write32, OS_WriteAlpha (for different-sized codepoints) and OS_WriteCtrl (for the how-many-times-have-you-had-to-write-it ctrl-terminated string print). It’s on my get-around-to-releasing-this-you-lazy-bugger list. I mention the codepoint size because OS_ASCII has to use the new codepoint size API in order to work out whether the “character in R0” <vomit emoji> is a CR – it’s harder than you could imagine. Wrch…hellish What could be simpler?! There’s an entire article that can be written on this, but this is not that thread another suitable character It doesn’t have to be a single character. I’ve now extended the parametric forms of BGET/BPUT to do this unambiguously.

Aug 3, 2024 11:21am nemo (145) 2545 posts	tymaja wrote A%![X]B% Reasonable, and extensible to the monadic form too. I’m delighted you’re thinking along these lines. I’m already using […] for the string-slice and bit-slice syntax with which this is completely compatible. In particular it solves a slight vulnerability with the dyadic `\|` that I introduced. Have you wondered why you can `a?b` and `a!b` but not `a\|b`? The reason is an ancient bit of BBC Micro VDU optimisation: `VDU23,128,255\|128`. Unlike the generalisation of `a\|b` which is ambiguous in a VDU statement, `a\|[]b` wouldn’t be. I’m using ARM32/64 letter codes for size No particular reason to do that (viz SYS vs SWI). `L` might be a synonym for `X`. Lowercase too, obvs. won’t slow down older programs *noticeably. All additional checks add some overhead, and my experience says this can be detected. Also note I have alternate entrypoints for the expression evaluator etc for R10-in-hand for mitigating this kind of look-ahead. ![B.4]address%% No. I already have the postfix bit-slice syntax for a far more sophisticated way of selecting bitfields, which isn’t restricted to indirection: `address%%[32:39]`

Aug 3, 2024 11:47am nemo (145) 2545 posts	OK I’ve implemented that and it’s not bad. This syntax naturally makes `?`, `!` and `\|` synonyms: Thing is though, might one want to specify the width numerically? `vary = ![width]mem` This would be incompatible with the above symbols, though 1\|2\|4\|8 is just as clear. I think I’m going to go with an expression instead of a letter. It’s more “BASIC-syntax” (and less assembler-syntax). [Edit] Yeah:

Aug 3, 2024 12:13pm David J. Ruck (33) 1635 posts	All this indirection operator stuff is interesting, but what is it’s real use? The answer will be mainly interacting with SWIs, and specifically WIMP SWIs, in which case it’s just adding features to what is a very poor interface in desperate need of replacement. BASIC needs a way of specifying an in memory structure of different data types, so it is far easier to pass and return the necessary blocks to the wimp, without using a bunch of indirection operators with magic numbers, which are so easy to get wrong.

Aug 3, 2024 12:52pm nemo (145) 2545 posts	indirection operators with magic numbers, which are so easy to get wrong Unless you expect Basic to have the offsets for the structures used by (in this case) Wimp SWIs built-in, then some library would have to define the offsets… so you’re just shifting your “easy to get wrong” trust from one bit of Basic to another bit of Basic. I developed the object-oriented version of BBC Basic in 2003 (“OOBasic”!) which led my comrade-in-ARMs Steve Drain to add my struct syntax to Basalt – which you could be using right now! I later developed that into a full class system with the addition of methods, constructors and destructors. That’s been disabled for years due to the lack of garbage collection in the interpreter, and more recently I’ve redefined the syntax to allow tuples to be supported: Thanks to my LongString work, nemoBasic now has a full garbage collector. So I ought to go back and rework the OO features, and I’ll use the DIM syntax variant RTR adopted to avoid confusion with tuples. [I haven’t yet finished the return-tuple-from-function bit which is very messy] BTW. whereas `obj.x0%` is a simple l-value, `obj{x0%}` is a tuple that happens to contain only a single value. The old syntax of `newdef={...}` is too confusingly close to the struct assignment of `obj{}={...}`. Note that OOBasic was a hack – there’s no “object” type so uses introspection to decide the validity of the object being accessed, which is why `object={...}:object=0` is a leak, not a delete. `:-/`. A new l-value type for objects (eg obj{} in array-ref style) would be safer, but needs more work. Hmmph.

Aug 3, 2024 2:19pm nemo (145) 2545 posts	@druck This is the old-ish syntax, but it works. Is this the kind of thing you meant? I have this switched off because whilst `NEW class,object[OF buffer][,moreobjects...]` allocates memory (obviously), setting `w=0` or `window=0` simply leaks the memory, which is probably not what you want. I’ll be attending the Zoom meeting this evening (03 Aug 2024) if anyone wants to ask anything or see anything demonstrated.

Aug 3, 2024 5:38pm Rick Murray (539) 13839 posts	Hidden in plain sight: I noticed that FP, but since my maths sucks I ignored it as I didn’t feel I’d understand any explantion (other than BASIC bodges (some? all?) integers to floats for calcing). OK I’ve implemented that You know where this is going to end? A complete BASIC program will be… 10. …and it will use the ARM’s DSP co-pro to run some powerful AI code to determine what program you were trying to run, and provide it. I am guessing maybe around nemoBASIC version 4?

Aug 3, 2024 5:53pm nemo (145) 2545 posts	my maths sucks The crucial point is “what does b31 mean?”. In the 32b integer constant `&FFFFFFFF` b31 is self-evidently the sign bit, hence `&summat` covers -2147483648 to 2147483647, and that’s what `a%` or `!m` accept. Anything outside that range is “Number too big”. But when you have a 64b integer, b31 means +2147483648. In my syntax “`&&`” introduces a 64b integer constant, so `&&FFFFFFFF` is 4294967295, not -1, and hence would be “Number too big”. However, `&&ffffffffffffffff` is -1, and hence can be put into `!m`. And therefore, as you anticipated, `&FFFFFFFF == &&ffffffffffffffff`. BTW. That reminds me that nB implements `==` as a commutative comparator, which means you can write `IF a==b==c`, whereas `IF a=b=c` continues to be non-commutative and hence is a predicated ENDFN (surprisingly, I’ve always thought). [note that in the commutative case, `c` still has to be a boolean, it’s not a mathematical “a=b=c”, no doubt to the chagrin of a former colleague who wrote that and when I rejected it in a peer-review exclaimed “Why can’t the compiler know what I MEANT?”] <toys with mathematical `===` comparator>

Aug 3, 2024 8:59pm tymaja (278) 174 posts	Going with lengths sounds better, so address%![ 16 ]=value% will be what I will use (values 8,16,32,64 to start with) nemo said: “No. I already have the postfix bit-slice syntax for a far more sophisticated way of selecting bitfields, which isn’t restricted to indirection: address%%[32:39]” How would you want to use your bitfield syntax for indirection; would this be OK?: address%%![11:41]=value% to set bits 11-41 (where address%% points to bit zero) to the value in address%%?

Aug 4, 2024 4:15pm nemo (145) 2545 posts	`address%![ 16 ]=value%` Not quite right – that’s using `!` as a dyadic so requires another value after it. So it would be the monadic `![16]address%=` or the dyadic `address%![16]offset%=`. The `[...]` is a modifier for `?!\|` so their normal positioning still applies, even in the presence of the modifier. How It works like this: This is because, as an l-value, “`A%`” is the same as “`!q%`” — an integer destination. So the bit-slicing syntax operates on both identically (in fact it doesn’t know which it’s operating on! It’s just “an integer”, “here”) And hence using both simultaneously is possible: `![16]q%[12:15]=1 : REM set top nibble of short to "1"`

Aug 4, 2024 4:37pm nemo (145) 2545 posts	And here is the combination working: The orange bits are indirection, the green bit is bit-slicing (which can apply to LHS or RHS, ie to destinations and to values). Now yes, I can do the same thing as `![2]m=` via bit-slicing `!m[0:15]=` – you pays your money and takes your choice. It depends how you view the field sizes. If you had an array of shorts you’d be better off using BPUT though: `BPUT(m,14,x,y,z)` to write three consecutive shorts while incrementing m by 6.

Aug 16, 2024 11:40pm tymaja (278) 174 posts	Work continues on ARM32 BASIC with 64-bit ints! Strings are limited to 65535 bytes (including terminator), using an 8/2 byte SIB to replace the 4/1 byte SIB. FP needs some Carry stuff fixed, but is also 8:2 encoded now (well, 8:1:1 – I will extend the exponent into that spare byte later. I am porting BBC BASIC to ARM64 assembler (for fun!), but in reality, I am doing it to learn how the code works, data pathways, etc – it is very nuanced / brilliant coding, so rewriting BASIC instruction by instruction is the best way to learn ARM32 … and how BASIC works. The more I understand the code, the more I realise there is only one integer path through the code. There are different ways to specify type in different parts of the code. One type of TYPE uses &80000000 for FP. One type of TYPE sets type as follows: – type 0 = string – type 4 = signed int 32 – type 5 = 40 bit float – type 8 = 64-bit FP I am currently using (in ARM64 BBC BASIC, which is actually a test-bed for upgrading ARM32 BBC BASIC): - type 0 = string - type 8 = signed int 64 - type 10 = 80 bit float And I am going to add TYPE 4 : 4 byte signed int32 to the code later (I have put notes in the code wherever I use an ‘8’ for sint64. ) Once I find all the pitfalls in ARM64, I can then update ARM32. So, given the above, I am wondering … - any thoughts on ‘combining’ 32 and 64 bit ints? By using % for 32-bit, and %% for 64-bit … for the SAME integer variable? For example: .bq “ A%=131072 PRINT A% 131072 PRINT AA% 131072 A%=-1 PRINT A% -1 PRINT A%% (4 billion) A%%=-1 PRINT A% -1 PRINT AA% -1 variable%=12 PRINT variable% 12 PRINT variable%% 12 Setting a ‘%’ variable will zero the upper 32 bits of the % second half … → but I will repurpose a keyword (to be decided which), maybe SGN and EXT … → for A% = SGNEXT to sign extend a 32 bit signed int to 64 bits I need to think about multiply outputs regarding the above. However, I am wondering if it could be possible to use integer variables above (% and % referring to 32 and 64 bit versions of the SAME variable). – if done well, it should be possible to run unmodified BBC BASIC V software, but also to just use % whenever needing 64-bit integers (of course the internal data paths will need to be 64-bit, with careful consideration of when any 64-32 bit conversions are done). I need to think that through a lot more, but if it remains compatible with older software, and compatible with BBC BASIC SDL2 64-bit ints (although caution with same variable names!), that could be good. New software could configure itself using virtual SYS calls as well. This could be implemented with almost zero slowdown; the SYS code in BASIC could do TEQ R10,#”B” and BEQ SYS_BASIC, where ‘SYS_BASIC’ reads the SWI name, and returns to the original handler as soon as a mismatch is found). SWIs starting with B would have a few more cycles, all other SWIs would have an extra TEQ, and an ignored BEQ, only. – this would allow for something like SYS “BASIC_SetIntegerMode”, mode% – where there could be a setting to use ‘combined’ 32/64 bit ints, or to ensure integers are 32 or 64 bit only (so var%=12 : PRINT var% … No such variable) etc

Oct 5, 2024 11:57am nemo (145) 2545 posts	tymaja theorised any thoughts on ‘combining’ 32 and 64 bit ints? By using % for 32-bit, and %% for 64-bit … for the SAME integer variable? Sorry I missed this. You absolutely cannot do that. It is defined that integer arrays are contiguous, so `a%(5)` immediately follows `a%(4)`, and there’s much machine code that requires that (as Basic can’t sort arrays, authors have had to). It is also required that `@%` to `Z%` are contiguous 32b values, as code accesses these directly. Supporting `a%%` as a distinct 64b int variable separate from the 32b `a%` works just fine, and the interpreter can tell the difference both between those storage locations AND the values that can be put in them, automatically casting them as required. `a%%=a%` always works, but `a%=a%%` can lead to “Number too big”.

Oct 7, 2024 12:23am tymaja (278) 174 posts	“It is defined that integer arrays are contiguous, so a%(5) immediately follows a%(4), and there’s much machine code that requires that (as Basic can’t sort arrays, authors have had to). It is also required that @% to Z% are contiguous 32b values, as code accesses these directly.” I hadn’t (yet) thought about arrays. I am still learning about how ARM BBC BASIC works; the ARM64 port-in-progress uses 64-bit numbers as the ‘integer’. That isn’t backwards-compatible at all, but it doesn’t matter as I haven’t released any of this work. It is a learning experience. Soon I will add % and %% to the 64-bit port, and what I learn from that will go towards upgrading ARM BBC BASIC to have 64 bit ints and long strings. Splitting integers so they are all ‘combined 32 and 64 bit’ depending on whether you access them with % and %% would be a challenge coding wise. And I hadn’t considered arrays yet! However, the reason I am considering this idea (combined 32/64 bit integers) is for compatibility. @%-Z% would be a series of 32-bit integers, found in the usual place (and 32 bits wide, immediately adjacent). When accessing the 64-bit forms with %, the 32\|bit version would form the ‘low word’ of a 64 bit number. The ‘high word’ of the 64 bit number just gets loaded from the ‘high @-Z% table’ or stored back there when reading or writing with %%. Non-array dynamic variables are a challenge if custom assembler code reads them using the BASIC API. Thinking about it, arrays could well work with a combo 32/64 format too, using the same idea as with @%-Z%. I will continue to consider different approaches. One thing that does strike me, though, is that assembly language that directly accesses BASIC variables would run on an emulator anyway, in the future, if RISC OS becomes processor agnostic. That software would just use BBC BASIC V or VI (through an emulation layer) on future versions of RISC OS that aren’t running on Aarch64. They would need to use BBC BASIC V if they use stuff like CALL &FFEE though, since BBC BASIC VI breaks that particular API.

Oct 10, 2024 8:12pm nemo (145) 2545 posts	They would need to use BBC BASIC V if they use stuff like CALL &FFEE though, since BBC BASIC VI breaks that particular API. In nemoBasic the simulated MOS entrypoints are disabled if `CALLaddr;` is used (the semicolon switches it off) – just as `CALLaddr,var...` has never emulated MOS (because you couldn’t do that on the Beeb). Plus the aligned entrypoints (eg &FFF4) are disabled if `END>&FF00` for obvious reasons. The unaligned ones keep working. Not sure why you want to share bits between A% and A%% – why not make them different vars? A%-Z% are the statics, A%% is just a normal variable. That’s what I have here.

Oct 11, 2024 6:34am tymaja (278) 174 posts	Re: the reason for considering splitting the integers would be for backwards compatibility, but also allowing future expansion. If I expand the ‘static integers’ in place, then any software reading them directly will get confused. On the flipside, there are some arguments for having 64-bit ‘static integer’ variables too; one being P%, O%, and the assembler. Splitting the static integers (even if it is done as a special case, with a new format added for 64-bit dynamic integers) would allow older BASIC programs to work, when using 32-bit integers as pointers (A%-Z%) if the OS maps their virtual RAM into the first 4GB. Newer programs could run without such restrictions (which, in a few years, could easily see them assembling some code to an address above 4GB)