RISC OS Open: Forum: OS_GBPB9 in practice

Jan 17, 2023 8:57pm

Ronald (387) 195 posts

I have kept a copy of heebygeeby and Charles’ text on r4 in a handy place. Not going into installing Python at the moment. Luckily I found a copy of a later port of Perl by Chris Gransdon and saved me getting involved in that biggy as well. A few components are missing but the old port ones worked OK.

With gbpb9 in multi entry mode, I am using an output buffer to accumulate the incoming buffers.
There doesn’t seem to be any way to convert a buffer of variable length strings into an array by duplicating the start address, so I am stepping through and converting the separator to either space or nl as required at the same time.

Settling on buffer sizes is the thing, I guess I should add an error in case of filling the output buffer.
The gbpb9 is already capable of reporting buffer space problems?

Edit: To avoid the output buffer, I could possibly create an array of pointers and assign them at each ‘\0’ as I am stepping through gbpb buffer.

Jan 17, 2023 10:03pm

nemo (145) 2556 posts

Ronald mused

There doesn’t seem to be any way to convert a buffer of variable length strings into an array

How an “array” is arranged in memory is heavily dependent on your language, which GBPB does not concern itself with. The multiple “objects” returned from GBPB are simply concatenated in memory, with word alignment for GBPB,10-12, and so you do need to do a strlen of the leafname to be able to step to the next object in the buffer (if there is one).

This is why the original GBPB,8 (that dates back to the BBC Micro and is somewhat problematic) returns Pascal strings – they’re easier to step over!

Buffer overruns are reported as “Buffer overflow”, error number &1E4 (V set and R0 pointing to an error block IF you called XOS_GBPB, or the error handed to the Error Handler if you only called OS_GBPB).

Jan 17, 2023 10:06pm

nemo (145) 2556 posts

Side comment for old-timers: How much more efficient would SWI name lookup have been if Pascal strings had been used instead of null-termination. Always annoyed me.

Jan 17, 2023 10:08pm

Ronald (387) 195 posts

I should be able to add a number to swi 12 to get xos.
Mmmm hex 20000

Jan 17, 2023 10:13pm

nemo (145) 2556 posts

[deleted my irrelevant fight with Textile]

Jan 17, 2023 10:14pm

Stuart Swales (8827) 1357 posts

I should be able to add a number to swi 12 to get xos.

As you are using _kernel_swi, you are automatically getting X-prefixed SWI. You have to try much harder to get it to call the non-X SWI.

in case of filling the output buffer

Just make the GBPB buffer a kilobyte. If you have a filing system with leafnames over a kilobyte you’re not going to be able to do much with them anyhoo.

If you mean a buffer into which you are accumulating all the output leafnames that match, just keep realloc’ing one to fit rather than imposing an articifical limit. Sounds like you’re walking over the returned data and replacing R3-1 NULLCH terminators with spaces, so just keep track of its allocated length, add strlen(spaced_transformed_leafnames), realloc to that length, memcpy(new_buffer_address + old_buffer_length-1, spaced_transformed_leafnames, strlen(spaced_transformed_leafnames)+1) /* overwrites the last terminator, and adds another */

Jan 17, 2023 10:19pm

Ronald (387) 195 posts

Thats handy to know, Stuart. The prm’s indicated that I should have got a no such directory type of error,
but in practice the program runs away.
So I checked the return code for non zero myself.
I possibly would get non zero for a buffer error as well, havent tried it yet.

Edit maybe there would be a difference if compiling with Norcroft I don’t know.
I understood the libraries were similar/same for both.

Jan 17, 2023 10:38pm

nemo (145) 2556 posts

Mr Optimistic wrote

filing system with leafnames over a kilobyte

Ho ho!

It is true that FileSwitch tends to view filename buffers as 1024 bytes, but it’s also true that *GOS (for example) only has 1023 bytes for the whole command line, and though DDEUtils has a configurable command line length, that’s only of help to things that use the DDEUtils API… which ain’t much outside the build tools.

The much-mentioned GBPB,8 incompatibility code (oops) assumes that no leafname could be more than 255 bytes (and will silently go horribly wrong if any fool tried it).

But that’s all moot. Though the Filer uses 272 byte filename buffers internally, no Wimp message could cope with more than 235 bytes. And in particular, the Data Transfer Protocol messages that allow you to interact with files in the Desktop are limited to 211 bytes.

And since if you can’t do it in Basic then it doesn’t really exist any filename > 255 bytes is a crime against humanity.

Jan 17, 2023 10:46pm

Ronald (387) 195 posts

any filename > 255 bytes is a crime against humanity

I agree with the sentiment, we have gone from the sublime 10 characters to the ridiculous ‘anything you like, write a story there’
They deserve to be truncated really.

Jan 17, 2023 10:52pm

Stuart Swales (8827) 1357 posts

I understood the libraries were similar/same for both.

I thought that you were linking with the SCL anyway? [Edit: I see you are in fact linking w/ unixlib, but it will be implemented the same way (fingers crossed)]

It does help to examine the result from _kernel_swi!

things that use the DDEUtils API

Apps transparently get its benefit for free when using the C library (either SharedCLibrary or SharedUnixLibrary) for their args.

BASIC takes a bit more effort with XOS_GetEnv/XOS_ReadArgs to read command lines > 255 bytes, and a bit more again if you bother to use the DDE SWIs, but ain’t that what BASIC libraries are for?

Jan 17, 2023 11:05pm

Ronald (387) 195 posts

I thought that you were linking with the SCL anyway

yes I can use libscl or default unixlib in gcc, for the autobuilder it would be unixlib unless it is a standalone utility.
I havn’t got a recent norcroft compiler. Mmmm could run the old one in RiscPCemu, but it would be unrealistic to ask for support anyway.
It is my no1 thing to buy, even if not to use much, would support ROOL

Edit: compiling the 928byte program with gcc and then the Acorn box set


                                      default shared ELF  size =     16775                                                     
                                             -static ELF  size = 1,144,321
                                  -static ELF converted to AIF =   159,732
                                                  -mlibscl AIF =      3740
                                 default  Acorn C v5.05 26bit  =      5680

Adding a c file to an existing suite of programs is not going to give such dramatic size increase.

Jan 17, 2023 11:38pm

Ronald (387) 195 posts

Nemo, what are you referring to in the heebygeeby readme when you say

R1 Directory (ctrl terminated)

Normally a string would be ‘\0’ terminated could be ctrl-A I suppose.
I recall BASIC using both NL and for some unknown reason CR.

Jan 18, 2023 12:20am

nemo (145) 2556 posts

Ronald noticed

what are you referring to

C programmers will be so used to strings ending in null that string termination is easily forgotten, like fish forget about water. However, RO has a long tradition of BBC Basic, which predates it (and to an appreciable extent actually made it, especially in the distant past), and BBC Basic always¹ uses CR.

But there are times when you are reading lines from a text file, which unlike the BBC Micro, usually has LF line endings. And in lots of cases there are places where control codes are used to do interesting things with the VDU, but should not be used as part of a filename, say.

So, it is often the case that RO APIs actually take ctrl-terminated strings – i.e. the string ends at the first byte <32. But in other APIs (especially those written in C by fish) only null is taken as the terminator.

Therefore good documentation specifies exactly which termination type the API employs. This is especially important where APIs are passed down Vectors or implemented by plug-ins (such as Filing Systems, Filters, etc) and the code that does the work may not be written by the people who wrote the API and the ‘host’ (such as FileSwitch).

Getting these details wrong can produce difficult bugs, and we’ve got plenty of simple bugs already thanks.

¹ Though it actually supports CR, LF and null termination of strings as returned results from SYS. But that’s the only place you get that convenience feature in standard BBC Basic. Points at nemoBasic suggestively.

Jan 18, 2023 8:32am

Steve Pampling (1551) 8172 posts

Points at nemoBasic suggestively.

Translation: Gestures in the direction of container labelled “Mine, hands off, No Entry, Death by Dragons” :)

Jan 18, 2023 10:48am

nemo (145) 2556 posts

>HELP BGET
This function gives the next byte from the specified file: BGET#<channel>.
BGET(addrvar,type): reads data from and increments addr where type is:
 -n Fixed-length string
  0 null-terminated string
  1 ctrl-terminated string
  2 Pascal string
  3 Reverse Pascal string
  4 int32 
  5 Basic Real
  6 IEEE Half
  7 IEEE Single
  8 IEEE Double
  9 IEEE Extended
 10 bfloat16
 11 ALIGN
 12 uint8
 13 sint8
 14 uint16
 15 sint16
 16 uint16BE
 17 sint16BE
 18 int32BE
>HELP STRING$()
STRING$(<number>,<string>): gives string replicated the number of times.
STRING$(addr,arg[,max]): reads a string from memory where
 arg<0  fixed length string of -arg bytes, else
 b0-8   terminator chr value
 b9     terminate at any ctrl
 b10    terminate at null,LF,CR
 b11    include ASCII only
>

No prizes for guessing what BPUT can do:

Jan 18, 2023 1:31pm

Rick Murray (539) 13851 posts

But in other APIs (especially those written in C by fish) only null is taken as the terminator.

I get the impression that RISC OS wanted to use C style strings (such as OS_Write0), but had to make concessions for BASIC being different, leading to messes where parts of the UI expect null terminated strings, others accept Null/CR/LF, and the Wimp accepts any control character – meaning one can’t put tabs in error messages, for example.

usually has LF line endings

In something I wrote recently, my line-by-line reader converts CR to LF, and if there are two LFs in a row it’ll discard the second and try reading in the next line of input.
Handles Beeb files (CR), RISC OS files (LF), and DOS/Windows files (CRLF).

only null is taken as the terminator.

Which is correct. If you’re marking string endings in that way (as opposed to counted strings like BASIC), there should be one and only one terminator.

Note – string endings are different to line endings in a file, where one means “this is the end of the data” and the other means varying interpretations of “start on a new line”.

Jan 18, 2023 2:07pm

nemo (145) 2556 posts

Rick ruminated

I get the impression

The general rule is simple: lenient on input, strict on output. There’s precious few APIs where null-termination makes MORE sense than ctrl-termination (but Write0 is an obvious does-what-it-says-on-the-tin case – but how useful would a WriteCC that stopped at ctrl codes have been?!).

tabs in error messages

Exotic. You’re circling a can of worms there though, as you’re probably eyeing PrettyPrint and that’s always been a bug magnet. One place where tabs in strings would be spectacularly valuable (with an appropriate bit to enable them) is Wimp icons, and especially menu items – the Wimp could retire the dreadful hotkey formatting hack that goes so badly wrong.

Custom icon type 90 does a similar thing, but with \T instead of &09:

In something I wrote recently

My usual approach is to handle LF,not-CR, CR,not-LF, CR,LF and (thanks to *Spool) LF,CR, plus U+2028 and U+2029 as breaks. Suppressing multiple consecutive LFs reminds me of Draw text areas (IIRC).

there should be one and only one terminator

No. Not without good reason. The ‘American Customer Service Mantra’ is a valuable guide here – “Find a way to say yes”. This is an Operating System, not a compliance tester. Its purpose is to do the right thing, not to force you to do the right thing.

“Computer says no” is a joke, not a plan.

Jan 18, 2023 6:25pm

Steve Pampling (1551) 8172 posts

Custom icon type 90 does a similar thing, but with \T instead of &09:

\t, \n, \r should be familiar to all people of a technical inclination, never mind just programmers. Way easier to remember than the &09 style and I’d say far more intuitive

Jan 18, 2023 6:41pm

Rick Murray (539) 13851 posts

but how useful would a WriteCC that stopped at ctrl codes have been?!

Hmm, if you’re thinking BASIC style strings (at low level, that is), then you have a string and you have the length.
So WriteN has you covered. ;)

as you’re probably eyeing PrettyPrint

No, just thinking about nicer formatted error boxes, like almost every other UI out there, rather than just splatting the text centre aligned into it because “that’s how it’s been since Arthur”.

And, yes, menus.

retire the dreadful hotkey formatting hack that goes so badly wrong.

Especially since it was badly hacked to… what was it, recognise some colours as hotkeys? So if your menu text ends in, say, “blue” or something…
I forget the details, but it is something I came across with Manga and the list of available. I think I might have replaced spaces with non-breaking ones just to work around the Wimp doing weird and unexpected things. But, fuzzy memory.

and (thanks to *Spool) LF,CR

Gah! I knew there’d be one…

No. Not without good reason.

The problem with being lenient is that everybody has their own ideas. Look above at how many permutations there are for denoting the end of a line. In reality there should be only two. A CRLF pair as a painfully literal translation of what a printer actually did, based upon what a typewriter actually did…
…and One Byte To Rule Them (cough, or something), because in a computer there’s no carriage or roller so a single byte can do the function of the previous two.

I bet out there is some oddball setup that takes EOT as a newline.

“Find a way to say yes”

Diametric proposition: Do it like everybody else so this stuff works…hoodaya think you are, Apple?

That being said, text in general is a godawful mess – about the only thing one can have certainty in is if the text in question passes several valid UTF-8 sequences and no invalid ones. Otherwise… we’re into a world of mismatched character sets, double byte or other “wide” characters, and one can’t even make assumptions about “it’s a sequence of characters between 32ish and 126ish” because it’s probably straight ASCII but could be EBCDIC or some other fruitcake arrangement.

There is (very tenuous) relevance to the subject of this thread. And that is that there’s no indication about what a filename is. One can generally assume Latin1 as that’s what most RISC OS machines are set to, but if the alphabet changes and there were filenames using high ASCII for accents and such, it turns into gibberish. Ditto going to and from UTF-8.

Jan 18, 2023 8:05pm

Steve Pampling (1551) 8172 posts

Nemo:

“Find a way to say yes”

“I can do it if you have the right budget, do you have a cost code I can order against? Don’t worry, just email it and once I’ve got that I can start” Requestor disappears and rarely re-appears.

Rick:

hoodaya think you are, Apple?

Medical Equipment supplier: we always do it like xyz.

Me: That won’t work in this setup, do abc

Medical Equipment supplier: we always do it like xyz.

Me: OK, do it xyz way. When it doesn’t work, come back and talk to me nicely.

Jan 18, 2023 8:40pm

nemo (145) 2556 posts

Steve surmised

\t, \n, \r should be familiar to all people of a technical inclination

\i swaps italic, \b bold, \u underlining, and \0-\6 (IIRC) change colours, it’s a very flexible icon type. And extremely amenable to Messages translation.

Rick remembered

colours as hotkeys

Yup. “Deep Red” is disappointing. Not only is this completely unavoidable – there’s no way I know to stop it from happening – it is also hard-wired to the idea that character code 160 is always hard space. Which it isn’t (and in some Alphabets there simply isn’t one). So it does stupid stuff beyond your control and messes up the font. And all for the want of a menu item flag that says “this item’s hotkey is separated by a tab chr”.

I am reserving control code 31 in font strings for an encoding-independent hard space, inspired by PrettyPrint.

there’s no indication about what a filename is

On ‘installed’ media that’s not true really. But on portable drives you’re right.

Part of the competence of the UTF8Alphabet module is to explicitly define what has otherwise been an implied encoding – the ‘Fallback Alphabet’. Which, yes, is often Latin1, but may not be for a particular user. The concept of Fallback Alphabet in addition to system Alphabet ensures that 8bit text continues to make sense when UTF-8 is selected… without hard-wiring anything. That said, if your 8bit Alphabet had been Cyrillic, you’re going to get a lot of false positives in UTF-8. I’m drying my eyes about that.

Jan 19, 2023 1:53pm

Jon Abbott (1421) 2651 posts

STRING$(addr,arg[,max]): reads a string from memory where...

Nemo, what version of BASIC are you using to get that? That’s actually rather useful.

Jan 19, 2023 4:39pm

nemo (145) 2556 posts

My version. See here

Jan 24, 2023 2:13pm

Alan Adams (2486) 1149 posts

I agree with the sentiment, we have gone from the sublime 10 characters to the ridiculous ‘anything you like, write a story there’

When I was working in primary schools we used to get a lot of very long filenames in Word documents. Turned out that if you clicked save, by default Word used the first paragraph as the filename. Since the kids didn’t start their work with a title as the first line …

Jan 24, 2023 5:46pm

Steve Pampling (1551) 8172 posts

Turned out that if you clicked save, by default Word used the first paragraph as the filename

MS, being bears of little brain failures¹ think it’s good to not break at a space, but good to break at a hyphen, comma, apostrophe, @ and many other things.
Thus "I don’t want to know how bad this is " will generate a title of “I don.docx”

I swear, even Paddington Bear would do better, and you’d probably a free marmalade sandwich as a bonus.

¹ Clearly while attempting the “of little brain” part they fall short of the mark.

OS_GBPB9 in practice

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Jan 17, 2023 8:57pm Ronald (387) 195 posts	I have kept a copy of heebygeeby and Charles’ text on r4 in a handy place. Not going into installing Python at the moment. Luckily I found a copy of a later port of Perl by Chris Gransdon and saved me getting involved in that biggy as well. A few components are missing but the old port ones worked OK. With gbpb9 in multi entry mode, I am using an output buffer to accumulate the incoming buffers. There doesn’t seem to be any way to convert a buffer of variable length strings into an array by duplicating the start address, so I am stepping through and converting the separator to either space or nl as required at the same time. Settling on buffer sizes is the thing, I guess I should add an error in case of filling the output buffer. The gbpb9 is already capable of reporting buffer space problems? Edit: To avoid the output buffer, I could possibly create an array of pointers and assign them at each ‘\0’ as I am stepping through gbpb buffer.

Jan 17, 2023 10:03pm nemo (145) 2556 posts	Ronald mused There doesn’t seem to be any way to convert a buffer of variable length strings into an array How an “array” is arranged in memory is heavily dependent on your language, which GBPB does not concern itself with. The multiple “objects” returned from GBPB are simply concatenated in memory, with word alignment for GBPB,10-12, and so you do need to do a strlen of the leafname to be able to step to the next object in the buffer (if there is one). This is why the original GBPB,8 (that dates back to the BBC Micro and is somewhat problematic) returns Pascal strings – they’re easier to step over! Buffer overruns are reported as “Buffer overflow”, error number &1E4 (V set and R0 pointing to an error block IF you called XOS_GBPB, or the error handed to the Error Handler if you only called OS_GBPB).

Jan 17, 2023 10:06pm nemo (145) 2556 posts	Side comment for old-timers: How much more efficient would SWI name lookup have been if Pascal strings had been used instead of null-termination. Always annoyed me.

Jan 17, 2023 10:08pm Ronald (387) 195 posts	I should be able to add a number to swi 12 to get xos. Mmmm hex 20000

Jan 17, 2023 10:13pm nemo (145) 2556 posts	[deleted my irrelevant fight with Textile]

Jan 17, 2023 10:14pm Stuart Swales (8827) 1357 posts	I should be able to add a number to swi 12 to get xos. As you are using _kernel_swi, you are automatically getting X-prefixed SWI. You have to try much harder to get it to call the non-X SWI. in case of filling the output buffer Just make the GBPB buffer a kilobyte. If you have a filing system with leafnames over a kilobyte you’re not going to be able to do much with them anyhoo. If you mean a buffer into which you are accumulating all the output leafnames that match, just keep realloc’ing one to fit rather than imposing an articifical limit. Sounds like you’re walking over the returned data and replacing R3-1 NULLCH terminators with spaces, so just keep track of its allocated length, add strlen(spaced_transformed_leafnames), realloc to that length, memcpy(new_buffer_address + old_buffer_length-1, spaced_transformed_leafnames, strlen(spaced_transformed_leafnames)+1) /* overwrites the last terminator, and adds another */

Jan 17, 2023 10:19pm Ronald (387) 195 posts	Thats handy to know, Stuart. The prm’s indicated that I should have got a no such directory type of error, but in practice the program runs away. So I checked the return code for non zero myself. I possibly would get non zero for a buffer error as well, havent tried it yet. Edit maybe there would be a difference if compiling with Norcroft I don’t know. I understood the libraries were similar/same for both.

Jan 17, 2023 10:38pm nemo (145) 2556 posts	Mr Optimistic wrote filing system with leafnames over a kilobyte Ho ho! It is true that FileSwitch tends to view filename buffers as 1024 bytes, but it’s also true that `GOS` (for example) only has 1023 bytes for the whole command line, and though DDEUtils has a configurable command line length, that’s only of help to things that use the DDEUtils API… which ain’t much outside the build tools. The much-mentioned GBPB,8 incompatibility code (oops) assumes that no leafname could be more than 255 bytes (and will silently go horribly wrong if any fool tried it). But that’s all moot. Though the Filer uses 272 byte filename buffers internally, no Wimp message could cope with more than 235 bytes. And in particular, the Data Transfer Protocol messages that allow you to interact with files in the Desktop are limited to 211 bytes. And since if you can’t do it in Basic then it doesn’t really exist* any filename > 255 bytes is a crime against humanity.

Jan 17, 2023 10:46pm Ronald (387) 195 posts	any filename > 255 bytes is a crime against humanity I agree with the sentiment, we have gone from the sublime 10 characters to the ridiculous ‘anything you like, write a story there’ They deserve to be truncated really.

Jan 17, 2023 10:52pm Stuart Swales (8827) 1357 posts	I understood the libraries were similar/same for both. I thought that you were linking with the SCL anyway? [Edit: I see you are in fact linking w/ unixlib, but it will be implemented the same way (fingers crossed)] It does help to examine the result from _kernel_swi! things that use the DDEUtils API Apps transparently get its benefit for free when using the C library (either SharedCLibrary or SharedUnixLibrary) for their args. BASIC takes a bit more effort with XOS_GetEnv/XOS_ReadArgs to read command lines > 255 bytes, and a bit more again if you bother to use the DDE SWIs, but ain’t that what BASIC libraries are for?

Jan 17, 2023 11:05pm Ronald (387) 195 posts	I thought that you were linking with the SCL anyway yes I can use libscl or default unixlib in gcc, for the autobuilder it would be unixlib unless it is a standalone utility. I havn’t got a recent norcroft compiler. Mmmm could run the old one in RiscPCemu, but it would be unrealistic to ask for support anyway. It is my no1 thing to buy, even if not to use much, would support ROOL Edit: compiling the 928byte program with gcc and then the Acorn box set `default shared ELF size = 16775 -static ELF size = 1,144,321 -static ELF converted to AIF = 159,732 -mlibscl AIF = 3740 default Acorn C v5.05 26bit = 5680` Adding a c file to an existing suite of programs is not going to give such dramatic size increase.

Jan 17, 2023 11:38pm Ronald (387) 195 posts	Nemo, what are you referring to in the heebygeeby readme when you say R1 Directory (ctrl terminated) Normally a string would be ‘\0’ terminated could be ctrl-A I suppose. I recall BASIC using both NL and for some unknown reason CR.

Jan 18, 2023 12:20am nemo (145) 2556 posts	Ronald noticed what are you referring to C programmers will be so used to strings ending in null that string termination is easily forgotten, like fish forget about water. However, RO has a long tradition of BBC Basic, which predates it (and to an appreciable extent actually made it, especially in the distant past), and BBC Basic always¹ uses CR. But there are times when you are reading lines from a text file, which unlike the BBC Micro, usually has LF line endings. And in lots of cases there are places where control codes are used to do interesting things with the VDU, but should not be used as part of a filename, say. So, it is often the case that RO APIs actually take ctrl-terminated strings – i.e. the string ends at the first byte <32. But in other APIs (especially those written in C by fish) only null is taken as the terminator. Therefore good documentation specifies exactly which termination type the API employs. This is especially important where APIs are passed down Vectors or implemented by plug-ins (such as Filing Systems, Filters, etc) and the code that does the work may not be written by the people who wrote the API and the ‘host’ (such as FileSwitch). Getting these details wrong can produce difficult bugs, and we’ve got plenty of simple bugs already thanks. ¹ Though it actually supports CR, LF and null termination of strings as returned results from SYS. But that’s the only place you get that convenience feature in standard BBC Basic. Points at nemoBasic suggestively.

Jan 18, 2023 8:32am Steve Pampling (1551) 8172 posts	Points at nemoBasic suggestively. Translation: Gestures in the direction of container labelled “Mine, hands off, No Entry, Death by Dragons” :)

Jan 18, 2023 10:48am nemo (145) 2556 posts	>HELP BGET This function gives the next byte from the specified file: BGET#<channel>. BGET(addrvar,type): reads data from and increments addr where type is: -n Fixed-length string 0 null-terminated string 1 ctrl-terminated string 2 Pascal string 3 Reverse Pascal string 4 int32 5 Basic Real 6 IEEE Half 7 IEEE Single 8 IEEE Double 9 IEEE Extended 10 bfloat16 11 ALIGN 12 uint8 13 sint8 14 uint16 15 sint16 16 uint16BE 17 sint16BE 18 int32BE >HELP STRING$() STRING$(<number>,<string>): gives string replicated the number of times. STRING$(addr,arg[,max]): reads a string from memory where arg<0 fixed length string of -arg bytes, else b0-8 terminator chr value b9 terminate at any ctrl b10 terminate at null,LF,CR b11 include ASCII only > No prizes for guessing what BPUT can do:

Jan 18, 2023 1:31pm Rick Murray (539) 13851 posts	But in other APIs (especially those written in C by fish) only null is taken as the terminator. I get the impression that RISC OS wanted to use C style strings (such as OS_Write0), but had to make concessions for BASIC being different, leading to messes where parts of the UI expect null terminated strings, others accept Null/CR/LF, and the Wimp accepts any control character – meaning one can’t put tabs in error messages, for example. usually has LF line endings In something I wrote recently, my line-by-line reader converts CR to LF, and if there are two LFs in a row it’ll discard the second and try reading in the next line of input. Handles Beeb files (CR), RISC OS files (LF), and DOS/Windows files (CRLF). only null is taken as the terminator. Which is correct. If you’re marking string endings in that way (as opposed to counted strings like BASIC), there should be one and only one terminator. Note – string endings are different to line endings in a file, where one means “this is the end of the data” and the other means varying interpretations of “start on a new line”.

Jan 18, 2023 2:07pm nemo (145) 2556 posts	Rick ruminated I get the impression The general rule is simple: lenient on input, strict on output. There’s precious few APIs where null-termination makes MORE sense than ctrl-termination (but Write0 is an obvious does-what-it-says-on-the-tin case – but how useful would a WriteCC that stopped at ctrl codes have been?!). tabs in error messages Exotic. You’re circling a can of worms there though, as you’re probably eyeing PrettyPrint and that’s always been a bug magnet. One place where tabs in strings would be spectacularly valuable (with an appropriate bit to enable them) is Wimp icons, and especially menu items – the Wimp could retire the dreadful hotkey formatting hack that goes so badly wrong. Custom icon type 90 does a similar thing, but with `\T` instead of &09: In something I wrote recently My usual approach is to handle LF,not-CR, CR,not-LF, CR,LF and (thanks to *Spool) LF,CR, plus U+2028 and U+2029 as breaks. Suppressing multiple consecutive LFs reminds me of Draw text areas (IIRC). there should be one and only one terminator No. Not without good reason. The ‘American Customer Service Mantra’ is a valuable guide here – “Find a way to say yes”. This is an Operating System, not a compliance tester. Its purpose is to do the right thing, not to force you to do the right thing. “Computer says no” is a joke, not a plan.

Jan 18, 2023 6:25pm Steve Pampling (1551) 8172 posts	Custom icon type 90 does a similar thing, but with \T instead of &09: \t, \n, \r should be familiar to all people of a technical inclination, never mind just programmers. Way easier to remember than the &09 style and I’d say far more intuitive

Jan 18, 2023 6:41pm Rick Murray (539) 13851 posts	but how useful would a WriteCC that stopped at ctrl codes have been?! Hmm, if you’re thinking BASIC style strings (at low level, that is), then you have a string and you have the length. So WriteN has you covered. ;) as you’re probably eyeing PrettyPrint No, just thinking about nicer formatted error boxes, like almost every other UI out there, rather than just splatting the text centre aligned into it because “that’s how it’s been since Arthur”. And, yes, menus. retire the dreadful hotkey formatting hack that goes so badly wrong. Especially since it was badly hacked to… what was it, recognise some colours as hotkeys? So if your menu text ends in, say, “blue” or something… I forget the details, but it is something I came across with Manga and the list of available. I think I might have replaced spaces with non-breaking ones just to work around the Wimp doing weird and unexpected things. But, fuzzy memory. and (thanks to Spool) LF,CR Gah! I knew there’d be one… No. Not without good reason. The problem with being lenient is that everybody has their own ideas. Look above at how many permutations there are for denoting the end of a line. In reality there should be only two. A CRLF pair as a painfully literal translation of what a printer actually did, based upon what a typewriter actually did… …and One Byte To Rule Them (cough, or something), because in a computer there’s no carriage or roller so a single byte can do the function of the previous two. I bet out there is some oddball setup that takes EOT as a newline. “Find a way to say yes” Diametric proposition: Do it like everybody else so this stuff works…hoodaya think you are, Apple?* That being said, text in general is a godawful mess – about the only thing one can have certainty in is if the text in question passes several valid UTF-8 sequences and no invalid ones. Otherwise… we’re into a world of mismatched character sets, double byte or other “wide” characters, and one can’t even make assumptions about “it’s a sequence of characters between 32ish and 126ish” because it’s probably straight ASCII but could be EBCDIC or some other fruitcake arrangement. There is (very tenuous) relevance to the subject of this thread. And that is that there’s no indication about what a filename is. One can generally assume Latin1 as that’s what most RISC OS machines are set to, but if the alphabet changes and there were filenames using high ASCII for accents and such, it turns into gibberish. Ditto going to and from UTF-8.

Jan 18, 2023 8:05pm Steve Pampling (1551) 8172 posts	Nemo: “Find a way to say yes” “I can do it if you have the right budget, do you have a cost code I can order against? Don’t worry, just email it and once I’ve got that I can start” Requestor disappears and rarely re-appears. Rick: hoodaya think you are, Apple? Medical Equipment supplier: we always do it like xyz. Me: That won’t work in this setup, do abc Medical Equipment supplier: we always do it like xyz. Me: OK, do it xyz way. When it doesn’t work, come back and talk to me nicely.

Jan 18, 2023 8:40pm nemo (145) 2556 posts	Steve surmised \t, \n, \r should be familiar to all people of a technical inclination \i swaps italic, \b bold, \u underlining, and \0-\6 (IIRC) change colours, it’s a very flexible icon type. And extremely amenable to Messages translation. Rick remembered colours as hotkeys Yup. “Deep Red” is disappointing. Not only is this completely unavoidable – there’s no way I know to stop it from happening – it is also hard-wired to the idea that character code 160 is always hard space. Which it isn’t (and in some Alphabets there simply isn’t one). So it does stupid stuff beyond your control and messes up the font. And all for the want of a menu item flag that says “this item’s hotkey is separated by a tab chr”. I am reserving control code 31 in font strings for an encoding-independent hard space, inspired by PrettyPrint. there’s no indication about what a filename is On ‘installed’ media that’s not true really. But on portable drives you’re right. Part of the competence of the UTF8Alphabet module is to explicitly define what has otherwise been an implied encoding – the ‘Fallback Alphabet’. Which, yes, is often Latin1, but may not be for a particular user. The concept of Fallback Alphabet in addition to system Alphabet ensures that 8bit text continues to make sense when UTF-8 is selected… without hard-wiring anything. That said, if your 8bit Alphabet had been Cyrillic, you’re going to get a lot of false positives in UTF-8. I’m drying my eyes about that.

Jan 19, 2023 1:53pm Jon Abbott (1421) 2651 posts	`STRING$(addr,arg[,max]): reads a string from memory where...` Nemo, what version of BASIC are you using to get that? That’s actually rather useful.

Jan 19, 2023 4:39pm nemo (145) 2556 posts	My version. See here

Jan 24, 2023 2:13pm Alan Adams (2486) 1149 posts	I agree with the sentiment, we have gone from the sublime 10 characters to the ridiculous ‘anything you like, write a story there’ When I was working in primary schools we used to get a lot of very long filenames in Word documents. Turned out that if you clicked save, by default Word used the first paragraph as the filename. Since the kids didn’t start their work with a title as the first line …

Jan 24, 2023 5:46pm Steve Pampling (1551) 8172 posts	Turned out that if you clicked save, by default Word used the first paragraph as the filename MS, being bears of little brain failures¹ think it’s good to not break at a space, but good to break at a hyphen, comma, apostrophe, @ and many other things. Thus "I don’t want to know how bad this is " will generate a title of “I don.docx” I swear, even Paddington Bear would do better, and you’d probably a free marmalade sandwich as a bonus. ¹ Clearly while attempting the “of little brain” part they fall short of the mark.