Horizontal Scroll Bar for Boot/Run-at-startup
nemo (145) 2556 posts |
Text selection in writeables is an essential. |
Rick Murray (539) 13850 posts |
Nemo said:
Text selection, cut/copy and paste all around is something important. I use it a hell of a lot in day to day use (such as the above quote…). Jess said:
Which is why when I needed to download a DVD iso for burning myself, I needed to format an SD card as NTFS, which involved some messing around in the deeper recesses to tell Windows not to optimise for quick removal [which is rubbish anyway – ever successfully ejected media to have Windows to throw a “Delayed write failed” error, and chkdsk to report a pile of errors… but don’t get me started on how poor I think Microsoft’s handling of FAT is; I knew we were in trouble when ScanDisk on W98 would cause vfat to roll over and play dead – some repair tool, huh…]
I am led to believe that NTFS is quite resource-heavy. Not a big problem with a couple of hundred megabytes sitting there doing little, but maybe supporting ext would be bring us closer ties with the Linux community? |
Jess Hampshire (158) 865 posts |
Would UDF be another sensible option? It is an ISO standard. It would also give us access to those Bluray and DVDROMs that aren’t formatted like big CDROMs. It is supported by Windows. |
Ben Avison (25) 445 posts |
This thread has gone rather a long way off topic. While we’re listing pie in the sky aspirations, let’s not forget AArch64 support, eh?
I thought we’d already discussed this one to death? While allowing legacy applications to exist in a bubble with an alphabet that’s at odds with the rest of the system may seem a desirable thing at first glance, it’s a world of pain to implement once you think about it. This is because suddenly you need an alphabet translation layer in every API to which any application might interface that involves strings or characters. At the moment, there is no programmatic way to know which pointers are pointers to strings, or which fields in which structures are strings (nor does it know things like the alignment rules for fields following any strings within structures). A proposed translation layer wouldn’t know whether those strings are in read-write or read-only storage, so it would need to be able to reallocate memory to hold a translated copy, causing unsolvable problems if the recipient thinks it knows which heap manager can be used to reallocate or free the pointers. The translation layer would need to be able to intercept any OS API (SWIs, service calls, UpCalls etc) not only in OS modules, but in all third-party modules, not all of which may be well documented, and would also need to be able to intercept and translate all OS and third-party inter-application Wimp messages as well. There are problems of uniqueness: if you load a file called “涼宮ハルヒ” into such a legacy Latin-1 application, so it’s been translated into "?????", and then it tries to save it again, how is the filesystem supposed to know whether you intended to overwrite the original file, or a different one called “Добро”? (This isn’t a filing system-specific problem – it’s anything where string matching is required.) Don’t underestimate the extent of this task – there are an awful lot of APIs that process strings! Taken to extremes, you could even argue that it should translate the contents of any files it reads or writes, or any TCP/IP protocols it talks, so that the application can live in blissful ignorance of the rest of the world having moved to Unicode. There is a much simpler solution, which is utilise what Acorn was mandating since at least the Style Guide way back in 1993 – that all textual messages for applications be stored in its Messages, Templates and Res files – precisely so that they can be switched out easily without requiring code changes. Add a set of shadow territories with UTF8 as their alphabet, and a Unicode desktop becomes Just Another Territory. It’s really not that difficult – if you try a territory like Japan which is already UTF-8, you’ll find that most applications do already work perfectly in UTF8. I’m also not understanding the hate directed at the multi-call Key_Pressed event. Despite its name, Key_Pressed with R1+24’s bit 8 clear has always meant “character byte inserted” not “key pressed”, because for example a value of 90 (ASCII “Z”) has always meant a different key was pressed on a German keyboard than a UK one. The action that a Unicode-aware application needs to take in response to that event is identical in UFT8 to any other alphabet – which is to insert the given byte into the current document. Changing that API does nothing except mandating additional code to translate from UCS4 to UTF8 in every application that uses it, and does nothing to fix the only main code-related issue with UTF8 in legacy applications – which is that when processing the Delete or Backspace keys, the application needs to cut a whole multi-byte character rather than a single byte. It’s not like this is a catastrophic failure anyway, it just means the user would – although this depends upon the fonts you’re using – see the Unicode “replacement character” � until they’ve pressed Delete/Backspace enough times. This would be more obvious if more fonts had had that glyph added, or if the Font Manager had had the intended font substitution code finished (the fact that that was planned was the reason why I didn’t add it to the WImp’s list of characters to substitute with the WimpSymbol font – in retrospect, I wish I had). Any application that does all its input via writeable icons or text areas would work perfectly without any modification – and that’s the majority of applications. Contrast this with trying to intercept every string API, which has a huge potential for bugs because of the number of code changes required, not to mention unresolvable problems where multiple UTF8 strings get translated to the same Latin1 strings. And we’d also be saddled with all this code to maintain on an indefinite basis, whereas adding additional sets of resources for applications is a mechanical, one-shot process. |
WPB (1391) 352 posts |
We have. In Rick’s defence, I think he was “missing in action” at the time. The discussions are here and here. Perhaps the next development version of RISC OS (5.21) could be the first to have a UTF8 alphabet as default so we can start to sort out any problems and move properly into the 21st century? |
Rick Murray (539) 13850 posts |
Perhaps I wasn’t clear enough… There would be a distinction between the world as seen by the Desktop, and everything else. In essence, apart from aware helper modules, only the Desktop would be UTF8. It must be painfully obvious that you cannot sensibly save UTF8 files (except DOSFS with LFN as that works in UTF16), plus what does anybody expect to happen if you pass squiggles to OS_Write0? It isn’t a perfect solution, but it is, I believe, a step in the right direction. Plus we have to accept that the low-level parts of the OS probably won’t be UTF… |
Ben Avison (25) 445 posts |
That’s an artificial (and vague) boundary. You might get an error from any SWI you call, how are you supposed to know which alphabet to display it in? If you got the error from a “Desktop” module, how do you know it didn’t pass it through from a non-Desktop module? If you have a font menu, how are you supposed to know what alphabet the font names are in? Supposing the application is a sprite editor, how stupid is it that the sprite names would display differently in that application to how they do in Paint? Here’s an example of a tricky API to translate, even though it’s clearly within the “Desktop” realm. Suppose one of these legacy apps claims the clipboard for the string “café”. It’s then pasted into a UTF8 application. This would require you to intercept both the RAM transfer and DataSave/DataLoad APIs, which means yes, you need to edit file contents. You can’t even re-use the RAM buffer, because the UTF-8 representation of the string is longer and you can’t guarantee it will fit. To make things (significantly) more complicated, the clipboard protocol lets the applications negotiate the file format in which they exchange data, so in order to preserve formatting information, they may choose to wrap the string up in HTML, or a DrawFile, or PostScript, or whatever file format they prefer, including any proprietary file formats they may mutually understand. Your translation layer would need to be able to handle all of those formats. |
Rick Murray (539) 13850 posts |
It’s a start, is it not? What we need is a fully UTF-capable OS. Which, like with so many “wouldn’t it be nice” suggestions, would completely break everything before it.
Non-Wimp (and possibly non-Territory) API would probably be “configured language”. You say my boundary is “vague”. Well, I see a lot of stuff that isn’t UTF8. For now, maybe for a while, some compromises will need to be made.
How do you mean “Desktop module”? Like Filer? I do not have an answer for this right now, no doubt the problem (and potential solutions) would become clearer upon examination of every little nook and cranny.
I could ask that now – and would point to how some stuff just “goes wrong” if you select a different character set to that which the software was written. You know, load up a program with French language resources and set the charset to the Slavic one and see if anything on-screen still makes sense.
There will be a mechanism for tagging something as being “UTF8”. Things that aren’t tagged will be assumed non-UTF8; and you can either use it raw, convert it yourself via the SWI provided to handle these things [which might be necessary if the content is contained in binary data, you’d need to extract the strings to be converted], or let the Wimp intercept and convert for you [but note that the auto-convert would back off the moment it spots control characters]. The only automatic you-have-no-choice conversion is if a UTF8 application passes a UTF8 parameter out to a non-UTF8 application, the Wimp would convert it. Having said that, one could pass a parameter tagged as not being UTF8; though this in itself could raise potential problems with interaction with other such applications. You know, it would be a heck of a lot easier if we could just say “the Wimp is going UTF8 as of July 2013, please modify your applications to that effect”. I will jot that down on a Post-It and stick it to that side of this flying pig that’s running around the house belching out diesel fumes…
Why would it? Why would it not suffice to include character set indications within the negotiated data? And if such negotiation fails, either support old-style charsets, or refuse to accept the data, as is your preference. I’ve just had a look through PRM4 at the DrawFile spec, and there is nothing that provides an indication as to the “language” of the contents of the file. I could create a nice-looking diagram with annotations and stuff in the UTF8 character space [assuming RO5’s !Draw is capable of this], but not only will it come out looking messed up on something other than UTF8, but additionally nothing would warn me that this garbling has happened. Not just UTF8, I could write down the lyrics to Put’s “Don’t Ever Cry” (Eurovision 1993, in English/Croatian, www.youtube.com/watch?v=oKanpKMeLfQ) and it would become wrong when viewed on any machine using 8859/1.
RISC OS is how old now? And the question of retaining English language (via the “UK” setting) but be on a different timezone has not cropped up before? Maybe this is why the character set thing hasn’t seemed that important before? Because it was accepted just to switch to a different character set as was required (after all, RISC OS can now do UTF8, only it’s a global switch). Maybe for now you are happy with the current situation… I don’t know. I’m trying to think of a way forward that won’t mean messing up the past. It isn’t a perfect solution, I fully understand that. |
Chris Hall (132) 3559 posts |
I think the ASCII character set (8 bit, using top bit set characters) is perfectly adequate for RISC OS and find Windows’ use of bizarre more-than-8-bits characters in a few APIs a really stupid and annoying feature that has to be worked around with text conversions etc. before or after each such call. Just let’s leave well alone. I have used windows extensively and never come across anything other than ASCII characters in normal usage. |
Theo Markettos (89) 919 posts |
Διαφωνώ. That was ‘I disagree’, in case you were wondering. Beyond Unicode in applications (email clients etc) I’d say my primary use would be Unicode filenames. Just imagine trying to name your music collection in a different alphabet to the one the songs are written in. I’ve used various forms of ‘Greeklish’ and they aren’t pretty (worse, they aren’t unique – everyone has their own way of transliterating). Just because you happen to live in a country that uses the Latin alphabet and speaks English doesn’t mean everyone else does. How would you feel if everything in RISC OS was written in Chinese and latin letters weren’t available? |
Eric Rucker (325) 232 posts |
For that matter, I come across symbols that require Unicode in daily usage, in 100% English language environments. |
Rob Kendrick (86) 50 posts |
It is a cliché to be so naïve about English not requiring accents or wider Unicode coverage. What about all those people with rôles to play in hôtels and cafés? Especially the ones that charge €10 for a caffè latte. |
Rick Murray (539) 13850 posts |
You do. I don’t.
Are you a VisualBasic coder? I find it insanity that older versions of VB (5, 6…) were Unicode internally yet used the ANSI API. Outside of that, many many Windows API calls were dupes, one with an A (ANSI, 8 bit) suffix and the same with a W (UTF16) suffix. Perhaps your library/headers were hiding this from you?
Then please write for me, on RISC OS, the name of the capital city of Japan in Hepburn romanisation correctly. |
Chris Hall (132) 3559 posts |
How many non-British speaking users does RISC OS have? (And I include Australia, New Zealand and Canada and, perhaps, USA here). I am surprised that non-Latin characters are allowed in filenames, I thought they were restricted to A-Z 0-9 and a few (specified) symbols. Applications are free to allow all sorts of symbols – the ones that get saved as ‘?’ if you save the file as text. There’s no need for the user to know how it stores it internally. No need to change the Operating System. Accented characters are already available in the 8 bit ASCII set. Just not worth a lot of effort for a small number of people who insist on using ridiculous PC nonsense such as ‘Mumbay’ when they mean ‘Bombay’. I thought Tōkyō was spelled Tokio anyway. Unless of course you are actually aiming at Japanese computer users when you will need a rather larger keyboard, BASIC keywords in a completely different form etc etc! That was ‘I disagree’, in case you were wondering. I’m afraid I did Greek at school but I only got as far as the verb ‘to loose’ before I decided I would be satisfied with Latin, French and German. One dead language was enough. Couldn’t do Spanish as that was for the thickos only (the top stream did Latin as it was required for University entrance, both of them, the next stream German or Spanish and the bottom stream Woodwork and extra French). |
Frank de Bruijn (160) 228 posts |
“…ridiculous PC nonsense such as ‘Mumbay’ when they mean ‘Bombay’.” You can’t possibly be THIS clueless… |
Jess Hampshire (158) 865 posts |
I am a native British English speaker, but I find the lack of support for Cyrillic on RISC OS limiting. (More annoying than not being able to use a file more than 2GB to be honest.) |
nemo (145) 2556 posts |
Speaking from a long way out of the RISC OS world, the only suggestion that makes any sense from my POV is what Ben has already implemented a very long time ago – UTF8 as the Territory (by which I mean Encoding).
Well that’s trivial when using UTF8 (with the right fonts… which is a different story). A more unpleasant question would have been suggesting Arabic or Japanese. [Ben will at this point claim that RISC OS already supports Japanese, which will allow me to demonstrate staggering levels of font-geekery to dismiss such a rash claim ;-)] |
nemo (145) 2556 posts |
Rob quipped:
Which would have been more impressive if you’d used any characters from outside RISC OS’s default Latin1 repertoire. ;-) Łįķē ţħīş ;-p (I wrote my own Windows keyboard driver. It has ten dead keys. I’m such a show off) |
Rick Murray (539) 13850 posts |
You’ll need to count me out then, I guess. I don’t recognise “British” as a language. A nationality, yes. A language, no. Frankly, while your attitude seems to be a rather selfish “it works for me”, I’m not thinking of how many “British” users RISC OS has, but how many users (irrespective of nationality) that RISC OS could have.
Would you consider “è” to be a Latin character? It is not A-Z or 0-9 or a symbol, but it is a defined part of the Latin1 (and others) character set.
…thus by definition said application would be broken in that respect. Really, a bunch of question marks are of no use to anybody.
SOME accented characters. Not all, nor variations at the same time. Nor Cyrillic, nor Greek, nor Korean, Chinese, Japanese, Arabic, and anything else that looks like a squiggle and is routinely used without major problems on other platforms… You know, the 26 little characters and their lowercase variants that make up the English language, plus twiddles for the Frenchies, Germans, Spanish, etc represents a remarkably small amount of the bewildering number of languages on our planet.
WTF? “Bombay”, as far as I can tell, is a British mangling of a Portuguese mangling of “Mumbai”, which means … it is something to do with a mother goddess. It’s their city, they could call it “turd” if they wanted, who are we (the colonials, I might add) to say otherwise? Though, I spectacularly fail to see how “politically correct nonsense” has any relevance whatsoever in whether or not our favourite operating system has support for more languages than just the one you happen to speak?
<facepalm> Please allow me to inform you that Tōkyō is a two syllable word. Tō, which looks like 東 and is an ideographic represention of the sun (日) rising behind a tree (木) which means “east”; and kyō, which looks like 京 which is an ideographic representation of the ancient lanterns that used to guard the walls of the capital city, thus it means “capital city”. So Tō·kyō means “eastern capital” (as opposed to the ancient capital, Kyōto, and here the Kyō is also written as 京 because that is what it was). Anybody who says it like “toe-kee-oh” is wrong. Now, please let me introduce you to another facet of Japanese language. Short vowels versus long vowels. Those bars over the letters (in Hepburn Romanisation) aren’t there ‘cos it looks cute (they’d be HelloKitty bows in that case). They exist because a vowel with a bar over it is said for about twice the length of a vowel without. Like Kyoooto or sayooonara! Some examples: “sato” is a village, “satō” is sugar; “kibo” is to scale (as in resize), “kibō” is hopes and wishes; “busū” is an ugly woman, while “busu” is a number of copies of something. Finally, “kyoka” is permission while “kyōka” is a subject (as in school). My comprehension of Japanese is probably funny to those more experienced in the language, however, it makes the point. A word with and without the macron is said differently, and carries a very different meaning. And, this is referring to a Latin character with a less usual accent; never mind actually writing in a non-Latin script under RISC OS.
That’s pretty clueless really. Take a look at the photos on the final third of a blog entry of mine at http://www.heyrick.co.uk/blog/index.php?diary=20110310 and you will see that I have put katakana labels on my keys, and you’ll see that it fits onto a regular looking keyboard. There are a few keys missing to control the IME, but I can do that manually. It works with the Microsoft IME. I have put katakana on my keyboard as I am teaching myself katakana first. Real Japanese people have hiragana on their keyboards (the difference being that Japanese words and conjugation is in hiragana; while “foreign” words are usually written in katakana). Here’s a picture of an Apple MacBook keyboard for the Japanese market: http://en.wikipedia.org/wiki/File:MacBookProJISKeyboard-1.jpg Somewhere, kicking around, I have a BBC BASIC user guide written in Spanish. BASIC keywords were… in English. Apart from pandering to the Americans with “COLOR” instead of “COLOUR”, BASIC is BASIC.
My emphasis. I mean, I know Greece is busy flushing itself down the bog right now, but they still speak Greek in, um, Greece. Or are you mixing up Greek as Theo would understand with “ancient Greek” (which is, I think, like Middle Ages English vs contemporary English)?
Must be some school that does a language for “thickos”. At my school, the “thickos” had remedial English and got such fun exercises as writing simple sentences in different colours for the different sorts of words. But, again, so nice to know that because you’re happy with English, we all should be. I’ll leave you with this to contemplate. And note that I make paperwork of the Japanese songs that I like containing the Japanese, the Romanised, and a translation that I can read as I am listening to the song. I do this using OvationPro……under Windows. Because it works. いつまでも信じていたい [ 反町隆史 / 井上慎二郎 ] |
Steve Pampling (1551) 8172 posts |
In a short Aldershot contribution: As a total aside, since there was no city of Mumbai/Bombay before the British developed a port at that location which enveloped the largest nearby settlement of Thane, the first name for the city was the one the British gave it. Since the British aren’t there any more the locals are free to rename it and don’t have anyone arguing. 1 “Tripe” being a slang term for rubbish. |
Rick Murray (539) 13850 posts |
;-) Well, tracking down the oft-referenced-in-docs Japanese IME might be a start, eh? Dunno where the Korean one in CVS came from, mind you. |
WPB (1391) 352 posts |
I lived and worked in Japan for a number of years, and I’ve never seen a Japanese person typing in kana. They pretty much all type in romaji. I’m sure Rick knows this, as he’s clearly far from ignorant about things Japanese, but I thought I’d point it out.
Displaying Japanese works reasonably well – I’ve certainly never encountered too many limitations. So it’s probably not unreasonable to claim the RO Font Manager supports Japanese (supporting Unicode to the extent that it does, which isn’t too bad). However, inputting Japanese is obviously all but impossible at the moment. I had hoped that pointing to two existing threads on this topic above would have been enough to steer this discussion back onto one of them, instead of polluting this thread even further. Now we’ve got three places to look at to follow the discussion. Oh well, as it’s all pointless anyway because “ASCII should suffice” – I suppose it really doesn’t matter anyway. ;) |
Chris Hall (132) 3559 posts |
I think it is a matter of priorities I suppose. Considering that Excel 97 cannot even save the symbol for ‘1/8’ (one eighth or ) in a ‘.csv’ file, I would much rather see support for commonly used fractions before all sorts of weird and obscure accents. There isn’t even an HTML symbol for this. The 8.3 filename format for FAT neatly overcomes this as accents are not required on capital letters! I had a list of the ALT-nnn keystrokes needed for things like degree symbols (ALT-248 °) and lots of others, frustratingly a different list for Word etc. under Windows and for RISC OS! |
John K. (1549) 27 posts |
Excel 97 is fifteen years old and has been superseded five times by later versions of Excel/Office. Are you really complaining about it? |
Chris Hall (132) 3559 posts |
Are you really complaining about it? Not really – I have learnt to work around its more peculiar features such as not recognising dates before 1900. But I do use Office 97 Professional frequently and see no reason to give Microsoft any more money. [Incidentally I don’t know whether later versions save a ‘1/8’ symbol in a ‘.csv’ file in a more useful form or not.] I was making a more general point about 1/8 being more useful than some more exotic individual accented characters (in the context of the default character set – which is limited by design to a choice of 256 – rather than to constrain how an application might allow accented characters (for example by a ‘dead’ accent key followed by the character to be accented) to be input and/or stored). You could, for example, treat the accents as characters that do not advance the cursor and would appear in text like this: ‘r^ole’ but be shown, by applications aware of the protocol, as ‘rôle’. That would make more efficient use of a limited resource. I think the Font Manager can handle zero width characters (for kerning) that have an actual width (for plotting)? I presume that the Territory manager automatically selects a different set of ‘top bit set’ characters depending on the location. |