OS_GBPB9 in practice
Ronald (387) 195 posts |
I have kept a copy of heebygeeby and Charles’ text on r4 in a handy place. Not going into installing Python at the moment. Luckily I found a copy of a later port of Perl by Chris Gransdon and saved me getting involved in that biggy as well. A few components are missing but the old port ones worked OK. With gbpb9 in multi entry mode, I am using an output buffer to accumulate the incoming buffers. Settling on buffer sizes is the thing, I guess I should add an error in case of filling the output buffer. Edit: To avoid the output buffer, I could possibly create an array of pointers and assign them at each ‘\0’ as I am stepping through gbpb buffer. |
nemo (145) 2556 posts |
Ronald mused
How an “array” is arranged in memory is heavily dependent on your language, which GBPB does not concern itself with. The multiple “objects” returned from GBPB are simply concatenated in memory, with word alignment for GBPB,10-12, and so you do need to do a strlen of the leafname to be able to step to the next object in the buffer (if there is one). This is why the original GBPB,8 (that dates back to the BBC Micro and is somewhat problematic) returns Pascal strings – they’re easier to step over! Buffer overruns are reported as “Buffer overflow”, error number &1E4 (V set and R0 pointing to an error block IF you called XOS_GBPB, or the error handed to the Error Handler if you only called OS_GBPB). |
nemo (145) 2556 posts |
Side comment for old-timers: How much more efficient would SWI name lookup have been if Pascal strings had been used instead of null-termination. Always annoyed me. |
Ronald (387) 195 posts |
I should be able to add a number to swi 12 to get xos. |
nemo (145) 2556 posts |
[deleted my irrelevant fight with Textile] |
Stuart Swales (8827) 1357 posts |
As you are using _kernel_swi, you are automatically getting X-prefixed SWI. You have to try much harder to get it to call the non-X SWI.
Just make the GBPB buffer a kilobyte. If you have a filing system with leafnames over a kilobyte you’re not going to be able to do much with them anyhoo. If you mean a buffer into which you are accumulating all the output leafnames that match, just keep realloc’ing one to fit rather than imposing an articifical limit. Sounds like you’re walking over the returned data and replacing R3-1 NULLCH terminators with spaces, so just keep track of its allocated length, add strlen(spaced_transformed_leafnames), realloc to that length, memcpy(new_buffer_address + old_buffer_length-1, spaced_transformed_leafnames, strlen(spaced_transformed_leafnames)+1) /* overwrites the last terminator, and adds another */ |
Ronald (387) 195 posts |
Thats handy to know, Stuart. The prm’s indicated that I should have got a no such directory type of error, Edit maybe there would be a difference if compiling with Norcroft I don’t know. |
nemo (145) 2556 posts |
Mr Optimistic wrote
Ho ho! It is true that FileSwitch tends to view filename buffers as 1024 bytes, but it’s also true that The much-mentioned GBPB,8 incompatibility code (oops) assumes that no leafname could be more than 255 bytes (and will silently go horribly wrong if any fool tried it). But that’s all moot. Though the Filer uses 272 byte filename buffers internally, no Wimp message could cope with more than 235 bytes. And in particular, the Data Transfer Protocol messages that allow you to interact with files in the Desktop are limited to 211 bytes. And since if you can’t do it in Basic then it doesn’t really exist any filename > 255 bytes is a crime against humanity. |
Ronald (387) 195 posts |
any filename > 255 bytes is a crime against humanity I agree with the sentiment, we have gone from the sublime 10 characters to the ridiculous ‘anything you like, write a story there’ |
Stuart Swales (8827) 1357 posts |
I thought that you were linking with the SCL anyway? [Edit: I see you are in fact linking w/ unixlib, but it will be implemented the same way (fingers crossed)] It does help to examine the result from _kernel_swi!
Apps transparently get its benefit for free when using the C library (either SharedCLibrary or SharedUnixLibrary) for their args. BASIC takes a bit more effort with XOS_GetEnv/XOS_ReadArgs to read command lines > 255 bytes, and a bit more again if you bother to use the DDE SWIs, but ain’t that what BASIC libraries are for? |
Ronald (387) 195 posts |
I thought that you were linking with the SCL anyway yes I can use libscl or default unixlib in gcc, for the autobuilder it would be unixlib unless it is a standalone utility. Edit: compiling the 928byte program with gcc and then the Acorn box set
Adding a c file to an existing suite of programs is not going to give such dramatic size increase. |
Ronald (387) 195 posts |
Nemo, what are you referring to in the heebygeeby readme when you say R1 Directory (ctrl terminated)Normally a string would be ‘\0’ terminated could be ctrl-A I suppose. I recall BASIC using both NL and for some unknown reason CR. |
nemo (145) 2556 posts |
Ronald noticed
C programmers will be so used to strings ending in null that string termination is easily forgotten, like fish forget about water. However, RO has a long tradition of BBC Basic, which predates it (and to an appreciable extent actually made it, especially in the distant past), and BBC Basic always1 uses CR. But there are times when you are reading lines from a text file, which unlike the BBC Micro, usually has LF line endings. And in lots of cases there are places where control codes are used to do interesting things with the VDU, but should not be used as part of a filename, say. So, it is often the case that RO APIs actually take ctrl-terminated strings – i.e. the string ends at the first byte <32. But in other APIs (especially those written in C by fish) only null is taken as the terminator. Therefore good documentation specifies exactly which termination type the API employs. This is especially important where APIs are passed down Vectors or implemented by plug-ins (such as Filing Systems, Filters, etc) and the code that does the work may not be written by the people who wrote the API and the ‘host’ (such as FileSwitch). Getting these details wrong can produce difficult bugs, and we’ve got plenty of simple bugs already thanks. 1 Though it actually supports CR, LF and null termination of strings as returned results from SYS. But that’s the only place you get that convenience feature in standard BBC Basic. Points at nemoBasic suggestively. |
Steve Pampling (1551) 8172 posts |
Translation: Gestures in the direction of container labelled “Mine, hands off, No Entry, Death by Dragons” :) |
nemo (145) 2556 posts |
No prizes for guessing what BPUT can do: |
Rick Murray (539) 13851 posts |
I get the impression that RISC OS wanted to use C style strings (such as OS_Write0), but had to make concessions for BASIC being different, leading to messes where parts of the UI expect null terminated strings, others accept Null/CR/LF, and the Wimp accepts any control character – meaning one can’t put tabs in error messages, for example.
In something I wrote recently, my line-by-line reader converts CR to LF, and if there are two LFs in a row it’ll discard the second and try reading in the next line of input.
Which is correct. If you’re marking string endings in that way (as opposed to counted strings like BASIC), there should be one and only one terminator. Note – string endings are different to line endings in a file, where one means “this is the end of the data” and the other means varying interpretations of “start on a new line”. |
nemo (145) 2556 posts |
Rick ruminated
The general rule is simple: lenient on input, strict on output. There’s precious few APIs where null-termination makes MORE sense than ctrl-termination (but Write0 is an obvious does-what-it-says-on-the-tin case – but how useful would a WriteCC that stopped at ctrl codes have been?!).
Exotic. You’re circling a can of worms there though, as you’re probably eyeing PrettyPrint and that’s always been a bug magnet. One place where tabs in strings would be spectacularly valuable (with an appropriate bit to enable them) is Wimp icons, and especially menu items – the Wimp could retire the dreadful hotkey formatting hack that goes so badly wrong. Custom icon type 90 does a similar thing, but with
My usual approach is to handle LF,not-CR, CR,not-LF, CR,LF and (thanks to *Spool) LF,CR, plus U+2028 and U+2029 as breaks. Suppressing multiple consecutive LFs reminds me of Draw text areas (IIRC).
No. Not without good reason. The ‘American Customer Service Mantra’ is a valuable guide here – “Find a way to say yes”. This is an Operating System, not a compliance tester. Its purpose is to do the right thing, not to force you to do the right thing. “Computer says no” is a joke, not a plan. |
Steve Pampling (1551) 8172 posts |
\t, \n, \r should be familiar to all people of a technical inclination, never mind just programmers. Way easier to remember than the &09 style and I’d say far more intuitive |
Rick Murray (539) 13851 posts |
Hmm, if you’re thinking BASIC style strings (at low level, that is), then you have a string and you have the length.
No, just thinking about nicer formatted error boxes, like almost every other UI out there, rather than just splatting the text centre aligned into it because “that’s how it’s been since Arthur”. And, yes, menus.
Especially since it was badly hacked to… what was it, recognise some colours as hotkeys? So if your menu text ends in, say, “blue” or something…
Gah! I knew there’d be one…
The problem with being lenient is that everybody has their own ideas. Look above at how many permutations there are for denoting the end of a line. In reality there should be only two. A CRLF pair as a painfully literal translation of what a printer actually did, based upon what a typewriter actually did… I bet out there is some oddball setup that takes EOT as a newline.
Diametric proposition: Do it like everybody else so this stuff works…hoodaya think you are, Apple? That being said, text in general is a godawful mess – about the only thing one can have certainty in is if the text in question passes several valid UTF-8 sequences and no invalid ones. Otherwise… we’re into a world of mismatched character sets, double byte or other “wide” characters, and one can’t even make assumptions about “it’s a sequence of characters between 32ish and 126ish” because it’s probably straight ASCII but could be EBCDIC or some other fruitcake arrangement. There is (very tenuous) relevance to the subject of this thread. And that is that there’s no indication about what a filename is. One can generally assume Latin1 as that’s what most RISC OS machines are set to, but if the alphabet changes and there were filenames using high ASCII for accents and such, it turns into gibberish. Ditto going to and from UTF-8. |
Steve Pampling (1551) 8172 posts |
Nemo:
“I can do it if you have the right budget, do you have a cost code I can order against? Don’t worry, just email it and once I’ve got that I can start” Requestor disappears and rarely re-appears. Rick:
Medical Equipment supplier: we always do it like xyz. Me: That won’t work in this setup, do abc Medical Equipment supplier: we always do it like xyz. Me: OK, do it xyz way. When it doesn’t work, come back and talk to me nicely. |
nemo (145) 2556 posts |
Steve surmised
\i swaps italic, \b bold, \u underlining, and \0-\6 (IIRC) change colours, it’s a very flexible icon type. And extremely amenable to Messages translation. Rick remembered
Yup. “Deep Red” is disappointing. Not only is this completely unavoidable – there’s no way I know to stop it from happening – it is also hard-wired to the idea that character code 160 is always hard space. Which it isn’t (and in some Alphabets there simply isn’t one). So it does stupid stuff beyond your control and messes up the font. And all for the want of a menu item flag that says “this item’s hotkey is separated by a tab chr”. I am reserving control code 31 in font strings for an encoding-independent hard space, inspired by PrettyPrint.
On ‘installed’ media that’s not true really. But on portable drives you’re right. Part of the competence of the UTF8Alphabet module is to explicitly define what has otherwise been an implied encoding – the ‘Fallback Alphabet’. Which, yes, is often Latin1, but may not be for a particular user. The concept of Fallback Alphabet in addition to system Alphabet ensures that 8bit text continues to make sense when UTF-8 is selected… without hard-wiring anything. That said, if your 8bit Alphabet had been Cyrillic, you’re going to get a lot of false positives in UTF-8. I’m drying my eyes about that. |
Jon Abbott (1421) 2651 posts |
Nemo, what version of BASIC are you using to get that? That’s actually rather useful. |
nemo (145) 2556 posts |
My version. See here |
Alan Adams (2486) 1149 posts |
When I was working in primary schools we used to get a lot of very long filenames in Word documents. Turned out that if you clicked save, by default Word used the first paragraph as the filename. Since the kids didn’t start their work with a title as the first line … |
Steve Pampling (1551) 8172 posts |
MS, being bears of little brain failures1 think it’s good to not break at a space, but good to break at a hyphen, comma, apostrophe, @ and many other things. I swear, even Paddington Bear would do better, and you’d probably a free marmalade sandwich as a bonus. 1 Clearly while attempting the “of little brain” part they fall short of the mark. |