What would AArch64 BASIC look like?
GavinWraith (26) 1563 posts |
Yes. I think ASCII clean is the phrase here. Unicode to the side, there are also problems with storing the body of a string as an array of consecutive characters. Fine for printing out, not so good for concatenation or comparison. Just think of two multimegabyte strings that differ in the last character. The mindset that strings are serial beasts is not always helpful. Hashing has a lot going for it. And so do trees, of course. |
Richard Russell (1920) 95 posts |
Comparison isn’t such a big issue if your CPU has an efficient string-compare instruction, such as the x86’s I recently carried out a comparison of the efficiency of string allocation in different versions of BBC BASIC, using as a test case extending two strings alternately (which defeats strategies such as extending into the heap ‘in situ’ to avoid reallocation). This was the test program:
And these were the results:
I was pleased with the performance of my current BASICs in supporting the largest maximum string length whilst achieving the smallest total memory usage in this test. |
Steve Drain (222) 1620 posts |
Strands (long strings) in Basalt have 32-bit length.
Strands can contain arbitrary data.
I have tested that strands can hold multi-megabyte binary files and be searched with INSTR, or copy them like this: SAVE (LOAD$(srcfile$)),dstfile$
Multiple ;-) Strands are, however, more awkward to use because of the constraints Basalt works under. The underlying machine code exists, though. |
Steve Drain (222) 1620 posts |
I am sure that Richard’s strings do [hold the length], but strands hold the length as a prefix to the data for that very reason. Edited to make clear what I meant. ;-) |
Richard Russell (1920) 95 posts |
My strings work the same way as in every other version of BBC BASIC (AFAIK) in using a ‘string descriptor’ containing the string’s address and its length. The length isn’t stored with the string. |
Steve Drain (222) 1620 posts |
RISC OS has Territory_Collate.
Somewhere I have a copy of an exchange, which SW sent me, between herself and Minerva about that inefficiency of the BASIC V string allocation. I think it panned out that it was possible, but extremely unlikely in a real life situation. Strands can be very efficient for a limited number of extensions and then very inefficent. Memory is cheap. ;-) |
Richard Russell (1920) 95 posts |
How do Basalt’s strands compare with my and Brandy’s long(ish) strings in the extension test?
giving:
|
Richard Russell (1920) 95 posts |
Memory is cheap, but large amounts of contiguous memory (and AFAIK all versions of BBC BASIC require the heap to be contiguous) can still be in short supply. I know nothing about RISC OS, but features of modern Windows and Linux (particularly ASLR) mean that requesting more than about 500 Mbytes of contiguous address space can still fail, even if you have plenty of physical memory. |
Richard Russell (1920) 95 posts |
In the context of this thread the “language” is surely BBC BASIC (not Atom BASIC or System BASIC) in which case its “native” strings are the regular string variables with a $ suffix. In any case I don’t consider strings handled by indirection to be ‘true’ BASIC strings, because there is necessarily one character (e.g. CR) which acts as a terminator and therefore cannot appear ‘within’ a string. Although there are few odd BASIC dialects which use NUL-terminated strings (I think PureBASIC is one) string variables capable of containing arbitrary binary data are a key feature of the BASIC language in my opinion.
It has never made any sense to me that ARM BASIC 5 didn’t increase the maximum string length. Brandy has always supported lengths up to 65535, I believe. |
Richard Russell (1920) 95 posts |
Who are these “most people”?! Even if it’s true of ‘most RISC OS users’ (and I doubt it) nobody using Brandy or my BASICs would be likely to use that kind of string apart from in exceptional circumstances. To bring this back on topic, moving to 64-bits means that it’s best to avoid direct memory access of any sort (allocating memory with DIM, indirection etc.) because it introduces the nasty complication of 64-bit pointers. If you stick with BASIC’s native data types, the 64-bit-ness is hidden from you behind the scenes and you can write (compatible) code just as you always have. I spend a fair amount of my time porting other people’s BBC BASIC programs to run on 64-bit platforms like iOS. One of the first things I do is search the program to see if they have used indirection. My heart sinks if they have, because I know it’s going to make conversion a pain. |
Steve Drain (222) 1620 posts |
Strands use memory management from a heap above HIMEM, so the result of your test as it stands would be abitrarily small. There is nothing sophisticated about OS heaps, so apart from the granularity of 32 bytes the memory required would be very large. This is not a real life test, though. I do not offer strands as a proper solution, but more of an indication that long strings could be written into BASIC V some day. As for contiguous memory, the heap is in the application slot, which can grow and shrink as required and its address space is contiguous. Application slots are limited to 512M, but failing that we have dynamic areas of any size with contiguous addresses. |
Richard Russell (1920) 95 posts |
Ironically, in my BASICs the opposite is true: strings are the native memory management mechanism and if I want to allocate a block of memory for any purpose I leverage the string mechanism. These routines are workhorses when dynamic memory allocation is required:
|
Rick Murray (539) 13850 posts |
There you go again applying what might be your use case to the whole world. I use indirected strings only when necessary (data blocks like Wimp stuff, sprite names, etc). Other times, I use BASIC’s native strings. I doubt I’m alone, as I’ll explain below. As for system calls rather than BASIC routines… when you know that most of BASIC’s routines make those system calls, there’s a lot to be said for readability and coding speed using OPENIN rather than SYS “OS_File”. Likewise GCOL and COLOUR rather than spitting a lot of seemingly random bytes to the VDU driver, or a pile of ColourTrans SWIs.
I trust that you understand that the limitations in native strings also apply to indirected strings. 10 DIM x% 4095 20 FOR l% = 0 TO 4094 30 x%?l% = 64 + RND(26) 40 NEXT 50 x%?4095 = 13 : REM Terminate it 60 PRINT CHR$(34)+$x%+CHR$(34) 70 PRINT LEN($x%) 80 REM To see the memory... 90 REM A = GET 100 REM OSCLI("Memory "+STR$~x%+" +1000") It will abort with a “String too long” error. You’ll need to put the terminator at
I think an almost obsessive desire that BBC BASIC (on ARM) be as backwardly compatible with BBC BASIC (on 6502) as possible. |
Kevin (224) 322 posts |
The thing to be aware with Brandy BASIC if getting information from icons it will read all the info from the icon including the icon’s flags etc. That caused me lots of head scratching with my QrCode application |
Steve Drain (222) 1620 posts |
The first thing I implemented was dynamic memory: DIM HIMEM a% 4095 REM claim DIM HIMEM a% 512 REM resize DIM HIMEM a% -1 REM release and similarly for arrays. That would be very easy to introduce to BASIC V. |
Richard Russell (1920) 95 posts |
It’s hard to think of a compatibility implication that would have resulted from increasing the maximum string length, especially when you consider that almost every other aspect of the string management system was significantly changed from 6502 to ARM BASIC (not least reusing memory freed by reallocating a string). And I need hardly mention the significant (but admittedly unavoidable) loss of backwards compatibility through many other changes, including introducing new keywords (which may have been used as variable names in 6502 programs). Don’t get me wrong, I’m not criticising the changes in BASIC 5, they were essential to modernise the language, but increasing the maximum string length would have had less impact on compatibility than most.
So do my interpreters! CALL or USR to addresses in the range &FFE0 to &FFFF will work fine in BBC BASIC for Windows or BBC BASIC for SDL 2.0 !
|
Michael McConnell (8708) 11 posts |
One example I can think of is the file format for PRINT#, which if memory serves, has the length followed by the string (backwards!). Also, Matrix Brandy also implements the BBC CALLs (where supported), indeed CALL and USR to other locations is not supported as there is no assembler. @Kevin
If you feel this is a bug in Brandy, please can you post more details over on the Stardot thread or a Github ticket and I’ll see what I can do about it on the Matrix Brandy fork. @Steve Drain
I rather like this. In Matrix Brandy I’d implement that simply as a wrapper for malloc(), realloc() and free(), and I would hazard a guess the same could be true for BBCSDL. (Update: |
Richard Russell (1920) 95 posts |
It’s highly unlikely that I would add it to BBCSDL, because I’m trying to discourage the use of direct memory access (e.g. using indirection), not encourage it! That’s particularly relevant to 64-bit BBC BASIC, which is what this thread is about. In programs that I write today, I always prefer to use ‘managed’ memory objects, such as strings, arrays and structures, because typically they avoid the use of 64-bit pointers and are better protected against ‘fatal’ crashes through accessing inappropriate addresses. I know it’s not in the same league of complication, but I would far rather see structures added to Matrix Brandy! I look back with horror to the days when passing a structure to an OS API function meant reserving some memory with DIM and populating it with indirection! |
Richard Russell (1920) 95 posts |
That doesn’t introduce an incompatibility, so long as short (<= 255 byte) strings are still written and read in the ‘old’ format. Brandy has 65535-byte strings so this is something it must already tackle. |
David Feugey (2125) 2709 posts |
Thanks! |
Steffen Huber (91) 1953 posts |
“Most people” in the sense of “me, myself and I” I guess? You truly live in your own universe. I have not even understood what you exactly mean with “indirected strings”, as – as Rick has shown by example – they solve nothing unless you start to pretend that “oh, every block of memory can be a string, and I have replicated all of BASIC’s string operations with my own code, and nobody needs automatic memory management like BASIC does for strings anyway”. To bring this back on topic: how does BBC BASIC for Windows/for SDL attempt to solve the encoding problem? I.e. how are the strings stored, are there codepage conversion routines available etc. |
Steve Drain (222) 1620 posts |
I might add: DIM LOMEM a% 160 REM claim DIM LOMEM a% 128 REM resize DIM LOMEM a% -1 REM release Which uses the string allocation for blocks up to 255 bytes. That would be the same as Richard’s DEFFNDim_Lomem(s%):REM claim a block using SAT LOCAL a%,l% l%=&8400+s% ANDNOT 3:REM string allocation list IF !l% THEN a%=!l%:!l%=!a% ELSE DIM a% s%:REM use existing block or claim new ?a%=s%:REM put size byte at start =a%+1:REM return useable address DEFPROCDim_Lomem(RETURN a%):REM release a block to SAT LOCAL l%:a%-=1:REM point to block l%=&8400+?a% ANDNOT 3:REM string allocation list !a%=!l%:!l%=a%:REM add block to list a%=FALSE:REM unset address ENDPROC Resizing is more complex and there are no bounds checks here. ;-) |
Richard Russell (1920) 95 posts |
BBC BASIC for Windows and BBC BASIC for SDL 2.0 support two encodings: ANSI and UTF-8. Because the standard BASIC string functions assume one byte-per-character I supply a library (utf8lib.bbc) containing UTF-8 equivalents when needed: FN_uleft(), FN_uright(), FN_umid(), FN_ulen() and FN_uinstr(). The library also includes FN_utf8_to_ansi() and FN_ansi_to_utf8(). I should add that I only support the Basic Multilingual Plane (Unicode code points &00000 to &0FFFF) because there are a few 16-bit bottlenecks in the VDU drivers. Sorry. |
Michael McConnell (8708) 11 posts |
Yep, it outputs longer strings using a different format, which would make ARM and 6502 BBC BASIC choke (and it doesn’t reverse the string!) Short strings <=255 characters are output in an Acorn-compatible fashion. |
Bryan Hogan (339) 593 posts |
Here is a link to Richard Russell’s excellent talk at the recent ABUG meeting about BBC BASIC in the 21st Century – https://stardot.org.uk/forums/viewtopic.php?f=60&t=21428&p=306770#p306770 |