BBC BASIC - 64 bit integer support, long string support
tymaja (278) 172 posts |
An idea I am toying with is to make the ! operator variable size… without adding any more ‘special symbols’, rather adding a new syntax which hopefully wouldn’t clash with other syntaxes. At present, my idea would be as follows; A%!B% = data% (or data%% etc) : stores the value of this variable to the 32-bit word at the address (A%+B%) A%![X]B% = data%% stores the value of data%% to the 64-bit doubleword at address (A% + B%) A%![ 64 ]B% = data%% – stores data%% to the 64-bit dword at address (A% + B%] This could even be extended to … data% = A%![SB]B% – loads byte from address A%+B%, sign extends it, and stores it into data% I’ve started adding this functionality already, to test it and find glitches etc. I like this idea because: Thoughts on this? I think it is better than some ideas I had (where the interpreter ‘guesses’ the 32/64 length based on variable type(s), because: - it removes ambiguity for loading/storing different bit sizes. - It also keeps ! preserved as a 32-bit load/store operator, keeping compatibility with unmodified BBC BASIC - it is invalid syntax in BBC BASIC, so will just crash if used on older versions of BASIC (which is good, kind of like how CIRCLEFILL would crash on a beeb!) - it also means no overhead when using !, except when you want to access extended features, so won’t slow down older programs - it can speed up BASIC programs, if sign extension is needed anywhere, or if two 32-bit !s can be replaced with a single ![ 64 ] / [X]The idea could even be extended to floating point load/store, so that | works as expected (as it always has), but |[floating point format specifier] loads / stores the FP number at the address, and in the specified format. I’ve started experimenting with the above to try to find issues. Any thoughts on this … why it might be a bad idea etc? (I think it shouldn’t clash with nemoBASIC syntax?) (I’m using ARM32/64 letter codes for size, so ![] B, H, W, X, Q for 8, 16, 32, 64, 128 bits (128 reserved for future expansion!) |
tymaja (278) 172 posts |
Going further than this … if something like the above doesn’t have any dealbreaking faults I haven’t considered yet … then it could even be possible to do load/store at any bit offset; like this: byte% = ![B.4]address%% byte% = the byte at (64-bit memory address address%%), starting 4 bits ‘forward’ (so, ![B.0]address%% is just a longwinded way of saying ![B]address%%, or even ?address%%) going further, it could even become … thirteen_bits% = ![S13.7] address%% thirteen_bits = (sign extended) the thirteen bits, starting at position address%% + 7 bits ‘forward’ I will start with ![X] as the rest could be added later. However … this could be used to read from bitfields, and if optimised assembler is used for this function, it would end up a LOT faster than doing it all manually in BASIC. It could even benefit from the bitfield instruction in ARM64 if we did go 64-bit at some time in the future :) |
nemo (145) 2529 posts |
Dave suggested
In fact MID$ etc have the params the wrong way round, and nB allows the more efficient order is well as retaining compatibility with the old ordering. There are many other keywords in nB that have become parametric, and where there’s a ‘mode’ or other specifier, it comes first. The reason is partly semantic but mainly practical – let’s compare the old and new order for LEFT$, first old: • Evaluate the string expression (usually a simple string variable) which must be byte-copied to “the” STRACC buffer Versus new: • Evaluate length and fix to integer Do you see? In the INT case it’s done for symmetry and semantic reasons, but aside from that it only involves pushing a single register instead of the float. Plus there are many function-keywords that can now be used as a statement and take a list of variables to affect, which obviously must be at the end. |
nemo (145) 2529 posts |
druck continued
Nonsense. The expression evaluator has absolutely no idea what the purpose of the expression is. The idea that a function ending in “=&FFFFFFFF” has to know what purpose the returned value will ultimately be used for before the expression can be evaluated is absurd:
I think we’re done with this line of argument now. That’s simply not how interpreters work in general, and certainly not how BBC Basic works. |
nemo (145) 2529 posts |
Rick asked though you’ve not looked closely at my implementation Hidden in plain sight:
I also have a module called OSWrites that provides OS_Write16, OS_Write32, OS_WriteAlpha (for different-sized codepoints) and OS_WriteCtrl (for the how-many-times-have-you-had-to-write-it ctrl-terminated string print). It’s on my get-around-to-releasing-this-you-lazy-bugger list. I mention the codepoint size because OS_ASCII has to use the new codepoint size API in order to work out whether the “character in R0” <vomit emoji> is a CR – it’s harder than you could imagine.
What could be simpler?!
It doesn’t have to be a single character. I’ve now extended the parametric forms of BGET/BPUT to do this unambiguously. |
nemo (145) 2529 posts |
tymaja wrote
Reasonable, and extensible to the monadic form too. I’m delighted you’re thinking along these lines. I’m already using […] for the string-slice and bit-slice syntax with which this is completely compatible. In particular it solves a slight vulnerability with the dyadic
No particular reason to do that (viz SYS vs SWI).
*noticeably. All additional checks add some overhead, and my experience says this can be detected. Also note I have alternate entrypoints for the expression evaluator etc for R10-in-hand for mitigating this kind of look-ahead.
No. I already have the postfix bit-slice syntax for a far more sophisticated way of selecting bitfields, which isn’t restricted to indirection: |
nemo (145) 2529 posts |
OK I’ve implemented that and it’s not bad. This syntax naturally makes Thing is though, might one want to specify the width numerically?
This would be incompatible with the above symbols, though 1|2|4|8 is just as clear. I think I’m going to go with an expression instead of a letter. It’s more “BASIC-syntax” (and less assembler-syntax). [Edit] Yeah: |
David J. Ruck (33) 1629 posts |
All this indirection operator stuff is interesting, but what is it’s real use? The answer will be mainly interacting with SWIs, and specifically WIMP SWIs, in which case it’s just adding features to what is a very poor interface in desperate need of replacement. BASIC needs a way of specifying an in memory structure of different data types, so it is far easier to pass and return the necessary blocks to the wimp, without using a bunch of indirection operators with magic numbers, which are so easy to get wrong. |
nemo (145) 2529 posts |
Unless you expect Basic to have the offsets for the structures used by (in this case) Wimp SWIs built-in, then some library would have to define the offsets… so you’re just shifting your “easy to get wrong” trust from one bit of Basic to another bit of Basic. I developed the object-oriented version of BBC Basic in 2003 (“OOBasic”!) which led my comrade-in-ARMs Steve Drain to add my struct syntax to Basalt – which you could be using right now! I later developed that into a full class system with the addition of methods, constructors and destructors. That’s been disabled for years due to the lack of garbage collection in the interpreter, and more recently I’ve redefined the syntax to allow tuples to be supported: Thanks to my LongString work, nemoBasic now has a full garbage collector. So I ought to go back and rework the OO features, and I’ll use the DIM syntax variant RTR adopted to avoid confusion with tuples. [I haven’t yet finished the return-tuple-from-function bit which is very messy] BTW. whereas |
nemo (145) 2529 posts |
@druck This is the old-ish syntax, but it works. Is this the kind of thing you meant? I have this switched off because whilst I’ll be attending the Zoom meeting this evening (03 Aug 2024) if anyone wants to ask anything or see anything demonstrated. |
Rick Murray (539) 13806 posts |
I noticed that FP, but since my maths sucks I ignored it as I didn’t feel I’d understand any explantion (other than BASIC bodges (some? all?) integers to floats for calcing).
You know where this is going to end? A complete BASIC program will be… 10. …and it will use the ARM’s DSP co-pro to run some powerful AI code to determine what program you were trying to run, and provide it. I am guessing maybe around nemoBASIC version 4? |
nemo (145) 2529 posts |
The crucial point is “what does b31 mean?”. In the 32b integer constant But when you have a 64b integer, b31 means +2147483648. In my syntax “ And therefore, as you anticipated, BTW. That reminds me that nB implements <toys with mathematical |
tymaja (278) 172 posts |
Going with lengths sounds better, so address%![ 16 ]=value% will be what I will use (values 8,16,32,64 to start with) nemo said:
How would you want to use your bitfield syntax for indirection; would this be OK?: address%%![11:41]=value% to set bits 11-41 (where address%% points to bit zero) to the value in address%%? |
nemo (145) 2529 posts |
Not quite right – that’s using
It works like this: This is because, as an l-value, “ And hence using both simultaneously is possible:
|
nemo (145) 2529 posts |
And here is the combination working: The orange bits are indirection, the green bit is bit-slicing (which can apply to LHS or RHS, ie to destinations and to values). Now yes, I can do the same thing as If you had an array of shorts you’d be better off using BPUT though: |
tymaja (278) 172 posts |
Work continues on ARM32 BASIC with 64-bit ints! Strings are limited to 65535 bytes (including terminator), using an 8/2 byte SIB to replace the 4/1 byte SIB. FP needs some Carry stuff fixed, but is also 8:2 encoded now (well, 8:1:1 – I will extend the exponent into that spare byte later. I am porting BBC BASIC to ARM64 assembler (for fun!), but in reality, I am doing it to learn how the code works, data pathways, etc – it is very nuanced / brilliant coding, so rewriting BASIC instruction by instruction is the best way to learn ARM32 … and how BASIC works. The more I understand the code, the more I realise there is only one integer path through the code. There are different ways to specify type in different parts of the code. One type of TYPE uses &80000000 for FP. One type of TYPE sets type as follows: I am currently using (in ARM64 BBC BASIC, which is actually a test-bed for upgrading ARM32 BBC BASIC): - type 0 = string - type 8 = signed int 64 - type 10 = 80 bit floatAnd I am going to add TYPE 4 : 4 byte signed int32 to the code later (I have put notes in the code wherever I use an ‘8’ for sint64. ) Once I find all the pitfalls in ARM64, I can then update ARM32. So, given the above, I am wondering … - any thoughts on ‘combining’ 32 and 64 bit ints? By using % for 32-bit, and %% for 64-bit … for the SAME integer variable? For example:.bq “ A%=-1 A%%=-1 variable%=12 Setting a ‘%’ variable will zero the upper 32 bits of the % second half … I need to think about multiply outputs regarding the above. However, I am wondering if it could be possible to use integer variables above (% and % referring to 32 and 64 bit versions of the SAME variable). New software could configure itself using virtual SYS calls as well. This could be implemented with almost zero slowdown; the SYS code in BASIC could do TEQ R10,#”B” and BEQ SYS_BASIC, where ‘SYS_BASIC’ reads the SWI name, and returns to the original handler as soon as a mismatch is found). SWIs starting with B would have a few more cycles, all other SWIs would have an extra TEQ, and an ignored BEQ, only. |
nemo (145) 2529 posts |
tymaja theorised
Sorry I missed this. You absolutely cannot do that. It is defined that integer arrays are contiguous, so Supporting |
tymaja (278) 172 posts |
I hadn’t (yet) thought about arrays. I am still learning about how ARM BBC BASIC works; the ARM64 port-in-progress uses 64-bit numbers as the ‘integer’. That isn’t backwards-compatible at all, but it doesn’t matter as I haven’t released any of this work. It is a learning experience. Soon I will add % and %% to the 64-bit port, and what I learn from that will go towards upgrading ARM BBC BASIC to have 64 bit ints and long strings. Splitting integers so they are all ‘combined 32 and 64 bit’ depending on whether you access them with % and %% would be a challenge coding wise. And I hadn’t considered arrays yet! However, the reason I am considering this idea (combined 32/64 bit integers) is for compatibility. @%-Z% would be a series of 32-bit integers, found in the usual place (and 32 bits wide, immediately adjacent). When accessing the 64-bit forms with %, the 32|bit version would form the ‘low word’ of a 64 bit number. The ‘high word’ of the 64 bit number just gets loaded from the ‘high @-Z% table’ or stored back there when reading or writing with %%. Non-array dynamic variables are a challenge if custom assembler code reads them using the BASIC API. Thinking about it, arrays could well work with a combo 32/64 format too, using the same idea as with @%-Z%. I will continue to consider different approaches. One thing that does strike me, though, is that assembly language that directly accesses BASIC variables would run on an emulator anyway, in the future, if RISC OS becomes processor agnostic. That software would just use BBC BASIC V or VI (through an emulation layer) on future versions of RISC OS that aren’t running on Aarch64. They would need to use BBC BASIC V if they use stuff like CALL &FFEE though, since BBC BASIC VI breaks that particular API. |
nemo (145) 2529 posts |
In nemoBasic the simulated MOS entrypoints are disabled if Not sure why you want to share bits between A% and A%% – why not make them different vars? A%-Z% are the statics, A%% is just a normal variable. That’s what I have here. |
tymaja (278) 172 posts |
Re: the reason for considering splitting the integers would be for backwards compatibility, but also allowing future expansion. If I expand the ‘static integers’ in place, then any software reading them directly will get confused. On the flipside, there are some arguments for having 64-bit ‘static integer’ variables too; one being P%, O%, and the assembler. Splitting the static integers (even if it is done as a special case, with a new format added for 64-bit dynamic integers) would allow older BASIC programs to work, when using 32-bit integers as pointers (A%-Z%) if the OS maps their virtual RAM into the first 4GB. Newer programs could run without such restrictions (which, in a few years, could easily see them assembling some code to an address above 4GB) |