Horizontal Scroll Bar for Boot/Run-at-startup

94 posts, 16 voices

Pages: 1 2 3 4

Nov 20, 2012 9:44am WPB (1391) 352 posts	I would much rather see support for commonly used fractions before all sorts of weird and obscure accents This really is missing the point somewhat. However, moving towards using UTF-8 as a default and global alphabet for the whole OS and its applications will bring you all the weird and obscure fractions you’d like and commonly used glyphs from all over the world as well. It’s the way to go. We should all get behind the change and make it happen.

Nov 20, 2012 10:15am Chris Hall (132) 3559 posts	We should all get behind the change and make it happen. So long as it is backwards compatible, I agree. For example use a few of the most obscure and little used top-bit-set characters to specify a different range of top-bit-set characters for applications that are aware. For example use &F8 to &FF to specify one of eight possible different sets of 120 characters. That should easily be enough. Unaware applications would simply display the default top-bit-set character instead of the special one. And &F8 to &FF could simply display themselves as a space.

Nov 20, 2012 10:39am WPB (1391) 352 posts	So long as it is backwards compatible, I agree. For example use a few of the most obscure and little used top-bit-set characters to specify a different range of top-bit-set characters for applications that are aware. For the avoidance of doubt, I am absolutely NOT advocating this. The last thing we want to do is introduce yet another character encoding scheme into the world. The Unicode Consortium have worked so hard to come up with one encoding to rule them all (I know that’s a simplification; UTF-8 isn’t all things to all men, but has a great deal of benefits), and a huge amount of research went into the spec. of UTF-8 – we would have to be mad at this stage to go with some RISC OS-only character encoding over that. Chris, if you have any time, you should look into UTF-8 and what it was designed to achieve. The change would not be fully backwards compatible, but as Ben as outlined in detail, it doesn’t need to be terribly painful, either. Not a lot will break in the majority of apps. What we need is tools to help with the transition. I will start working on a few once I’ve got something else I’m working on out of the way.

Nov 20, 2012 11:12am Jess Hampshire (158) 865 posts	The 8.3 filename format for FAT neatly overcomes this as accents are not required on capital letters! That sort of reminds me of the clichéed Englishman abroad, who talks LOUDLY and slowly so Johnny Foriegner can understand. symbol for ‘1/8’ Hmm, the number of times I have wanted to do this is on a par with the number of pole dancers I have dated. (i.e. not many) Couldn’t do Spanish as that was for the thickos only Same in my school, (though thicko was obviously relative, since everyone had passed the 11+) we learnt Russian, which I hated at the time, but now years later (having been somewhere where even a tiny bit of Russian was useful), regret a bit not trying harder, and am now trying to learn it again. Ironically, the last few weeks I’ve been trying to learn a little Spanish, because I have a holiday booked to Tenerife. Unicode, shouldn’t there be a separate filetype for it? Then at least it would be possible to copy and paste between aware applications. (Allowing an on screen keyboard to generate it.)

Nov 20, 2012 11:53am WPB (1391) 352 posts	Unicode, shouldn’t there be a separate filetype for it? Then at least it would be possible to copy and paste between aware applications. (Allowing an on screen keyboard to generate it.) What do you mean? A separate filetype for Unicode text files or something? Then would we have separate filetypes for UTF-8 textfiles / UTF-16 textfiles / UTF-32 textfiles / etc.? I’m not sure this would help in any way. There should just be a way for textual data on the clipboard to have an associated encoding, and possibly even to be able to paste in a different encoding (that’s always handy to be able to do).

Nov 20, 2012 12:37pm Jess Hampshire (158) 865 posts	A separate filetype for Unicode text files or something? That was the sort of thing I was thinking. So when you open a file, it opens with something that understands it. As far as I can see from a non-programmer POV, that would improve the situation, because any unicode aware program would have a means of loading and saving unicode, and I don’t see it need any big change to the underlying OS.

Nov 20, 2012 3:45pm Andrew Flegg (1574) 28 posts	There will be a mechanism for tagging something as being “UTF8”. The only way I can think of doing this would be to have a BOM in front of any UTF-8 string. But that doesn’t give you backwards compatible transliteration to things which can’t handle UTF-8, and would mean two separate strings couldn’t be concatenated to form another valid UTF-8 string. Any attempt to do half of a multi-byte character encoding, or otherwise switching between different encodings in different applications will result in horrible bugs, broken behaviour and data loss.

Nov 20, 2012 5:05pm nemo (145) 2556 posts	OMG stop, people, please stop. If you find yourself talking about Unicode without knowing what a BOM is, stop talking. ;-) Unicode is a character set, just like Acorn Latin1 or BFont is, but with wider character codes. UTF8 is an 8bit encoding of the Unicode character set, largely compatible with ASCII and pretty resiliently compatible* with Latin1/WinAnsi/MacRoman. Filetype is orthogonal to encoding. This is why http has MimeType (filetype) and Charset (encoding and hence character set). You do NOT have a “Unicode filetype”. It is possible to tell the difference between one of the 8bit encodings I’ve mentioned and UTF8 with about 98% certainty even without a BOM, but a BOM is definitive. Consequently one can prefix a BOM on a text file or CSV file, one can put the appropriate Content-Type in HTTP and XML files. Fractions are an embarrassment. Unicode jumped the shark around 5.1 but started in the right direction despite difficult starting conditions: Unicode is a universal character set, and hence seeks to represent every character. I stress that because it is important. Fractions aren’t characters, they’re combinations of characters… except… those difficult starting conditions. Being a universal character set means it needs a 1:1 mapping to all existing character sets. That meant inheriting the duplications and unfortunate choices of those existing character sets, and that means anachronisms like small fractions having their own Unicode. However, that isn’t what Unicode is for. ½ is a hang-over from typewriters. 1/2 is what we should have in the text. Now as a typographer, I’m not happy with “1/2” – ½ is much more appealing, but that’s presentation, not semantic meaning. Unicode is not supposed to represent presentation, and it is this founding principle that has now been gleefully set on fire and thrown out of the window (see u1F534 and uFE0F and weep). Thankfully, other technologies fix the presentation aspect. Any good OpenType font can automatically replace 1/2 with ½ and, more to the point, will also happily represent 527/756 similarly, which no one could suggest be represented by an individual glyph. The fact that RISC OS doesn’t support OTs is another story. two separate strings couldn’t be concatenated to form another valid UTF-8 string. There’s nothing magical about UTF8 in that respect, if you try concatenating two strings in any two different encodings you’re going to get a silly result. We’ve had multiple Alphabets forever. The fact that “little Acorners” never ventured outside Latin1 doesn’t alter the fact that this has always been the case. More interestingly, ‘old’ (ie non-‘language’) Acorn fonts could have their own Encoding file… which didn’t have to conform to any defined Alphabet or even be a subset of any known character set (not even the AGL). My !IntChars program parsed these and automatically mapped keypresses in the configured Alphabet to appropriate character codes in the font’s Encoding (where possible), and also performed case-swapping in the font’s Encoding. As far as I know it’s the only program that did so. Such mapping is essential when mixing encodings, UTF-8 is absolutely no different in that regard. * By which I mean one can pretty easily and unambiguously detect whether you have UTF8 or Latin1. UTF-16 is much more dangerous in this regard – see “Bush hid the facts”

Nov 20, 2012 5:14pm nemo (145) 2556 posts	I quipped: [Ben will at this point claim that RISC OS already supports Japanese, which will allow me to demonstrate staggering levels of font-geekery to dismiss such a rash claim ;-)] prompting Rick and WPB to fall into my Ben trap: tracking down the oft-referenced-in-docs Japanese IME might be a start, eh? not unreasonable to claim the RO Font Manager supports Japanese I shall say two words and allow Rick or WPB to regain their Japanophile reputation by explaining: Vertical Punctuation Stares significantly at the FontManager.

Nov 20, 2012 5:27pm Chris Hall (132) 3559 posts	Hmm, the number of times I have wanted to do this is on a par with the number of pole dancers I have dated. (i.e. not many) I have just finished publishing a book. The master is an HTML file generated from the ‘csv’ output from a spreadsheet which is then turned into a pdf for printing. It is a reference work containing many imperial dimensions such as 4⅛″ which gave me enormous difficulty. First Excel translated them to ‘4?” ’ when it saved the ’.csv’ file which I had to idenify and convert in my processing, using an image for the fraction (as my reference book on HTML didn’t mention the code `⅛` for an eighth) and `″` for the quotes so that they looked like the symbol for inches rather than sexed quotation marks or a ditto mark. Anyway many thanks for drawing my attention to UTF-8 which I had never heard of – this has allowed me to find the HTML codes for the odd eighths fractions.

Nov 20, 2012 6:58pm Eric Rucker (325) 232 posts	And now you can see why UTF-8 support is a good idea.

Nov 20, 2012 8:17pm Rick Murray (539) 13850 posts	I lived and worked in Japan for a number of years, and I’ve never seen a Japanese person typing in kana. They pretty much all type in romaji. I think it depends upon the people you work with. The very few people I’ve known (tripe that if you include keyboards used in films) were more kana than romaji. While both methods work, and the various IMEs will accept either method, I suspect there are a large number of kana users – especially the younger ones. Think about it, why should the bar for working with a computer be set at the very artificial level of needing to know how to write your language in somebody else’s characters? It would be like “it’s okay to code for RISC OS, just so long as you first learn how to do it on a Cyrillic keyboard”! A young person, who still depends upon Furigana to be able to read stuff, is not going to appreciate having to battle Latin characters which they could just get on with Hiragana. Or, as a Japanese bloke said to me once, it is a heck of a lot quicker to type in kana; the same text in roughly half as many keystrokes. This is why I have kana on my keys and will be teaching myself (slowly, very slowly, for my memory is like a Swiss mountain cheese – white, stinky, and full of holes).

Nov 20, 2012 8:19pm Rick Murray (539) 13850 posts	For example use a few of the most obscure and little used top-bit-set characters to specify a different range of top-bit-set characters for applications that are aware. …or just use an internationally agreed and widely supported way of specifying characters that gets away from this problem entirely?

Nov 20, 2012 8:27pm Rick Murray (539) 13850 posts	As far as I can see from a non-programmer POV, that would improve the situation, because any unicode aware program would have a means of loading and saving unicode, and I don’t see it need any big change to the underlying OS. It is pretty easy for UTF-8. If the text is just plain English with no “extended” characters (anything over 127), then it is just plain text. If the file contains extended characters in UTF-8, they will begin with a specific marker, one of %110xxxxx or %1110xxxx or %11110xxx which specifies the number of bytes used to represent the character (2-4; up to 6 possible but this is unusual), followed by the corresponding number of six-bit bytes in the form %10xxxxxx. In other words: `%110xxxxx %10xxxxxx %1110xxxx %10xxxxxx %10xxxxxx %11110xxx %10xxxxxx %10xxxxxx %10xxxxxx` Anything that does not fit this pattern is not a UTF-8 file. I believe UTF-16 and so on have other identifying attributes, including an optional marker to undicate endianness.

Nov 20, 2012 8:36pm Rick Murray (539) 13850 posts	The only way I can think of doing this would be to have a BOM in front of any UTF-8 string. Why? If you are creating a newer protocol that accepts UTF-8 encoded text, how hard is it to reserve a flags bit to say “actually, this isn’t UTF-8”? I don’t mean like to Wimp_CreateIcon and everywhere, but more in sending/receiving messages and places where it might be desired to have non-UTF-8 strings. Doesn’t BOM mean Byte Orientation Marker or something? It applies more to UTF-16 than our UTF-8, right? mean two separate strings couldn’t be concatenated to form another valid UTF-8 string. Rubbish! If you have a byte at the start of the strings and you know have a marker byte (or some description) there, you just increment the pointer of the second string to point beyond the marker of the second string (preserve the first one so the result is valid) and just use strncpy() as normal. In BASIC? It’s hardly much harder, you use MID$(second_string$, 2) instead of just second_string$.

Nov 20, 2012 8:51pm Rick Murray (539) 13850 posts	I shall say two words and allow Rick or WPB to regain their Japanophile reputation by explaining: * Vertical * Punctuation Stares significantly at the FontManager. ;-) Can HTML manage vertical yet? Yeah… I tend to gloss over this point, but thankfully since most animé credits are horizontal (so much so that Chihayafuru’s “backwards sideways scrolling vertical credits” was quite jarring) so it’ll do using horizontal for now… Works for ja.wikipedia.org ! Punctuation? You’re asking a programmer? Have you seen my nested brackets, in written stuff? Just be glad I don’t wrap paragraphs in curly braces! (^_^) However, what you perhaps mean is that a Japanese comma looks like 、 and a full stop looks like 。 and there are kinda cool quotes like 『this』 (and a single-line version), etc. I’m not sure what this is a FontManager issue though, shouldn’t the input method convert “.” to “。” and so forth?

Nov 20, 2012 8:57pm Rick Murray (539) 13850 posts	Fractions are an embarrassment. Unicode jumped the shark around 5.1 A fellow troper? However, that isn’t what Unicode is for. ½ is a hang-over from typewriters. 1/2 is what we should have in the text. How about ¹/₂? That was: `<sup>1</sup>/<sub>2</sub>` <flippant> If we want the real typewriter feel, don’t forget “l” for one and “O” for zero. Backspace-slash is optional. ;-) BTW, sorry for the mass of posts. Site appeared to be down while I was on break at work, so I’m catching up.

Nov 20, 2012 9:07pm Steve Pampling (1551) 8172 posts	I’m not sure what this is a FontManager issue though, shouldn’t the input method convert “.” to “。” and so forth? I would say no. Leave translation of a language to something like google translate and leave FontManager to make the right marks on the screen when presented with the right byte combination. Otherwise you run into the context issues when swapping from our favourite Latin set used by the non-English inhabitants of the UK being translated to to/from something like Greek. Is “ll” in “wallflower” the same as “ll” in “Machylleth” or “Llandudno”? Punctuation tends to be very language dependent.

Nov 20, 2012 9:27pm Rick Murray (539) 13850 posts	[me] I’m not sure what this is a FontManager issue though, shouldn’t the input method convert “.” to “。” and so forth? I would say no. Leave translation of a language to something like google translate and leave FontManager to make the right marks on the screen when presented with the right byte combination. Huh? I’m not sure what Google Translate is doing here, but isn’t this what I am saying? The input method (not FontManager) should be the one responsible for noticing that you have pressed a full stop, and when in Japanese mode, should offer a Japanese style one, perhaps¹? The task of FontManager, as you say, should be to only display what it has been directed to display. ¹ The question is moot anyway, just looked at my kana keys and the punctuation is Japanese-specific on the Japanese layout (duh): 、 on <, 。 on >, ・ on ?, and 「」 on { and }. So forget what I said about converting “.” to “。”, just switch keyboard layouts and do it that way. For the record, this was written in Firefox under XP, just switching back and forth between layouts. I ought to assign a keypress to this instead of clicking the taskbar (but my keyboard lacks the kana toggle key!). Bootnote: Ironically, having looked again, shift-comma is the JP-comma, and shift-period is the JP-period; handled by the input method (the keyboard driver), so it is sort of what I was suggesting anyway. ;-)

Nov 20, 2012 10:40pm WPB (1391) 352 posts	I shall say two words and allow Rick or WPB to regain their Japanophile reputation by explaining: * Vertical * Punctuation Stares significantly at the FontManager. A “Ben trap”, eh? I don’t think I’ve ever fallen into one of them before… I wouldn’t say vertical text is a prerequisite for saying the FM supports Japanese. Sorry, but apart from in newspapers, very little printed Japanese is written vertically. In the various offices where I worked, pretty much all printed material was written horizontally. The reason is simple – software support on most OSes is geared towards horizontal display of text. If you told a Japanese person they wouldn’t be able to use vertical text in a word processor, they’d likely say, “Yeah, so?” I don’t know what the problem is with punctuation. Unicode contains plenty of normal JP punctuation characters. There’s nothing special about them. RO can display them fine. When kids hand write Japanese on the little bits of squared paper they use (yes, vertically usually, but that’s because it’s hand-written), bits of punctuation occupy a space on their own. That’s the same as what FM does – treating punctuation as seperate characters. Perhaps I’ve missed the point? bq. two separate strings couldn’t be concatenated to form another valid UTF-8 string. There’s nothing magical about UTF8 in that respect, if you try concatenating two strings in any two different encodings you’re going to get a silly result. I think nemo’s switched the logic here. What Andrew Flegg was suggesting is that you can’t concatenate two UTF-8 strings if they have BOMs on the front. That’s true. He wasn’t talking about concatenating strings in different encodings. Of course that won’t work. Equally, I don’t think Rick was suggesting putting a BOM on the front of strings in memory. Probably what he meant was, add some metadata to the transfer of strings to indicate the UTF-8 encoding, leaving the string itself untouched. Think about it, why should the bar for working with a computer be set at the very artificial level of needing to know how to write your language in somebody else’s characters? It would be like “it’s okay to code for RISC OS, just so long as you first learn how to do it on a Cyrillic keyboard”! A young person, who still depends upon Furigana to be able to read stuff, is not going to appreciate having to battle Latin characters which they could just get on with Hiragana. I agree it seems a little odd, but that’s the practice in Japan. I’ve taught Japanese kids, and they all learn to type in romaji. I think probably the Japanese see it as a good way to get more familiar with the alphabet. Learning to type at all takes a considerable amount of time. Learning to type in two totally different ways takes twice as long. IMEs allow you to learn just one way, but input anything you like. Perhaps that’s the rationale. You do NOT have a “Unicode filetype”. You could if you wanted to. But where would it end? That would imply having a filetype for text files in every different encoding supported by the system. That’s really why it’s not a good idea. I’m not sure what this is a FontManager issue though, shouldn’t the input method convert “.” to “。” and so forth? Yes, this is exactly what happens on pretty much every IME I’ve ever used. Or you can type it in directly, as Rick points out. This is nothing to do with linguistic translation. It’s to do with input. Is “ll” in “wallflower” the same as “ll” in “Machylleth” or “Llandudno”? Punctuation tends to be very language dependent. A Welsh IME could legitimately change “l” followed by “l” into the Welsh “ll” (I think it has its own codepoint – U+1EFA maybe? No doubt nemo will jump in here and tell me this is orthogonal to something – perhaps an equals sign ;).

Nov 21, 2012 12:23am Jess Hampshire (158) 865 posts	I finally managed to type some Russian on RISC OS I used Paul Sprangers’ Keymap, and Stronged and the following HTML file. <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <pre> Typing here Setting the Keyboard to Russian unicode and typing between the tags produced gobblediegook but clicking on run displayed what I had typed in netsurf. Am I right in thinking that there is no way of entering text and viewing it as you write? From what has been said, if RISC OS were to use UTF-8, everything would work fine until you tried typing a non asci character (like a pound sign) into a non unicode program. Keymap shows that the keyboard can be switched, so isn’t it possible to switch betwing UTF-8 and the current system, depending on which program has focus?

Nov 21, 2012 1:24am WPB (1391) 352 posts	Am I right in thinking that there is no way of entering text and viewing it as you write? From what has been said, if RISC OS were to use UTF-8, everything would work fine until you tried typing a non asci character (like a pound sign) into a non unicode program. You might have had better luck if you set the system alphabet to UTF-8. Then, if you managed to tell StrongED to use anti-aliased fonts, and selected a font containing the Russian glyphs, you might have seen them. It depends how StrongED is coded. If it doesn’t mess about with the characters it gets given by the Wimp, and doesn’t specify an encoding on opening its fonts for display, the Font Manager would have used the system alphabet as the encoding and should have painted the string correctly. As you can see, that’s quite a lot of "if"s. Really, in all but the most trivial of cases, it’s not just as simple as “making RISC OS use UTF-8”. First of all what you type needs to be got into an application. That happens in the system alphabet usually. (I think Keymap forces UTF-8, regardless of the system alphabet). Then the application needs to know it’s expecting UTF-8, so it can let the caret move properly within the string, and calculate string lengths sensibly. If everything’s done in icons, the Wimp handles much of this for you (again, if the system alphabet is correctly set). Otherwise it’s up to application authors. It’s not hard, but it doesn’t just happen magically. Keymap shows that the keyboard can be switched, so isn’t it possible to switch betwing UTF-8 and the current system, depending on which program has focus? Yes, it’s possible. But it doesn’t solve many of the problems. IMHO, Ben is right – the only sensible way is to create mirror territories and switch the system alphabet to UTF-8.

Nov 21, 2012 5:21am Chris Hall (132) 3559 posts	There’s more work to do UTF-8 thhan there appears at first sight. At present typing `PRINT ASC("£")` produces 163 on Windows (BBC BASIC for Windows as well as Notepad) and RISC OS but Word (naturally) gets it wrong as ALT-163 produces some nonsense. Fortunately the character &A3 is unused in UTF-8 (characters &80 to &BF are only used as continuation characters) and so the pound sign should appear correctly under both systems (if they have been written sensibly). More tricky is `PRINT ASC(CHR$(&C2)+CHR$(&A3))` which should also produce 163 [again the pound sign but encoded in UTF8] and `PRINT ASC(CHR$(&E2)+CHR$(&85)+CHR$(&9B))` which should produce `&215B` [vulgar fraction one eighth]. This introduces a problem with the codes &C0 to &FF which are now ambiguous – either they are intended to produce a value from 192 to 255 or they are the start of a multi-byte UTF8 string. Perhaps there should be a setting in BASIC as to which way such characters should be rendered? Also returning values larger than 255 from the ‘ASC’ function would be a new feature.

Nov 21, 2012 6:10am WPB (1391) 352 posts	UTF-8 support in BASIC is a whole other topic! And one left well alone for now probably! Note that under RISC OS (5), only the Font Manager has Unicode support. The CLI/system font has none. Fortunately the character &A3 is unused in UTF-8 (characters &80 to &BF are only used as continuation characters) and so the pound sign should appear correctly under both systems (if they have been written sensibly). No, in UTF-8 encoding, you can’t use any single byte code above &7F to mean anything other than what it was intended to mean in UTF-8. Mixing two encodings at the same time is not going to work. &A3 does mean something – it means it’s a byte from the middle or end of a sequence. If you start trying to interpret codes differently depending on where they occur in a sequence (in this case, you’re saying if it comes at the beginning of a sequence you know it’s not UTF-8), you break many of the great things about the UTF-8 encoding – like being able to recover from errors in the stream, or being able to move quickly about in the stream. This introduces a problem with the codes &C0 to &FF which are now ambiguous – either they are intended to produce a value from 192 to 255 or they are the start of a multi-byte UTF8 string. There is no ambiguity, because you must specify the encoding you’re talking about. If you’re talking about UTF-8, you know exactly what these codes mean.

Nov 21, 2012 10:37am Chris Hall (132) 3559 posts	and so the pound sign should appear correctly under both systems No, in UTF-8 encoding, you can’t use any single byte code above &7F In a sense you are right. So long as the user can type the pound sign and it appears correctly, it doesn’t matter what goes on ‘under the bonnet’. I would extend this to typing ALT-163 and displaying correctly existing strings containing “£”. You just get to a difficulty when concatenating strings of different encodings when one string will have to be forced into UTF-8 as part of the concatenation process. Mixed encodings in a single string would not be permitted and the encoding whether, ASCII, UTF-8, invalid [mixed] or neutral [no top-bit-set characters], can be determined by examination of the string contents. It is just fortunate [if Word could get it wrong then so, probably, could competent programmers] that &A3 (in everything except Word) corresponds to U+00A3 and `£` as the pound sign. UTF-8 support in BASIC is a whole other topic! An essential one, if wider support for UTF-8 within RISC OS is being seriously considered.