Keyboard Handling
Pages: 1 2
David Feugey (2125) 2687 posts |
First step, switch the French keyboard module to Latin1 Old and annoying bug. |
Rick Murray (539) 13406 posts |
Seriously? First consideration: I can’t tell you Germany’s dialling code. I can tell it it’s “de”. Same for Holland and “nl”. Second consideration: You reference Canada being unable to be told apart from America. Well, we’re halfway there if we have “ca” versus “us-f**kyeah” or “maga”. But the question should be asked – is a keyboard layout a property of a country or of the language within the country that it pertains to? I may be wrong, but I think Canada uses two different layouts – American, and Canadian bilingual (looks like this).
Probably F1 is Rule Brittania also because it’s a known layout guaranteed to be present in every version of RISC OS. Otherwise, yes, I agree. But does it need to be F3-F11? How many keyboard layouts does one man need? [especially given we have no IME for fancy wibbles]
Sounds like something the ROM builder could deal with. The source version is the wordy labels, and something can strip ’em out before making it a part of the ROM. 40K is a lot better than ~700K!
I’m not. Create a new API that works, make some sort of fudge that presents the old API to older applications. If it looks like a duck and quacks like a duck, they’ll think it’s a small yellow waterfowl. As long as we’re tied to the many weird, peculiar, and just BAD decisions of the Territory/Country/Keyboard system, we will always have this broken baggage messing things up. It needs to be made a strictly legacy interface, for it is horrifically broken. I’ve pontificated for many screenfuls on my fear and loathing of it all so I’ll won’t bother rehashing the same old rants once more, suffice to say Country != Keyboard != Language != Timezone (and the whole shebang is hardwired). |
Chris Mahoney (1684) 2100 posts |
Bingo. Ignoring timezone, my work PC is in New Zealand, has a Maori keyboard, and is used exclusively in English. My home Mac is also in New Zealand, has a physical US keyboard but is configured for Japanese, and displays the OS in English but allows me to type in both languages. As for timezones, where I live it’s +12:00. If I lived on the Chatham islands then it’d be +12:45 (but I’d still be in NZ!) |
Clive Semmens (2335) 3130 posts |
With judicious hacking, I was able to type in every European language that uses the Roman alphabet*, plus Greek, Russian, and Hindi, on RiscPCs – without having to remember silly codes for all the accented characters. But oh, what a hack it was – not something you could release into the wild. Except that I did – it was the subject of an article in Acorn User, with the software on the cover disc. If the same keyboard handling (or one on the same principles) could deliver UTF-8 output, how happy I would be. * and others, probably – but not Vietnamese, with its extreme multiply-accented characters. |
John Williams (567) 768 posts |
I expect the Scoll indicator light on the top right (temoin) is for poor spellers with sore feet! |
nemo (145) 2437 posts |
Rick misthought
And what are the symbols on that keyboard? Are they ABC, or are they ऄभख? How do you type “Arabic” and in what language when your keyboard has Greek keytops? This is why we use numbers or function keys.
On this PC I’m running three different Latin ones, a Hindi one and a Japanese. I’ve had more but I forget the layouts.
It is extraordinary that there’s no New Zealand country code. Now for something beefier in a separate post. |
nemo (145) 2437 posts |
Country codes are used for a number of things, but pragmatically it’s a language selector. I think the IANA Language/Script/Region system is the most sensible, but for backwards compatibility (and also for reasons of sanity) we should define what Country number means in terms of IANA LSR. This is an attempt at that, but also includes those languages that might be implied by the “country”, but which can’t be by the Country number. I know this is a lot of data, but some consensus will have to be reached before this is set in stone. Country name is in quotes if it is not a country. There are many “countries” that don’t actually define a country. And I have no idea what “Lapp” was supposed to mean. Suggestions on a postcard, or indeed here. Num Name Language Script IANA ----------------------------------------------------------------------- 0 "Default" 1 UK English Latin en-GB 2 "Master" <varies> <varies> <varies> 3 "Compact" <varies> <varies> <varies> 4 Italy Italian Latin it-IT 5 Spain Spanish Latin es-ES 6 France French Latin fr-FR Breton Latin br-FR 7 Germany German Latin de-DE Sorbian (Lower) Latin dsb-DE Sorbian (Upper) Latin usb-DE 8 Portugal Portuguese Latin pt-PT 10 Greece Greek Latin el-GR 11 Sweden Swedish Latin sv-SE Sami (Northern) Latin se-SE Sami (Southern) Latin sma-SE Sami (Lule) Latin smj-SE 12 Finland Finnish Latin fi-FI Sami (Northern) Latin se-FI Sami (Inari) Latin smn-FI Swedish (Fin) Latin sv-FI 13 reserved 14 Denmark Danish Latin da-DK 15 Norway Norwegian Latin no-NO Bokmal Latin nb-NO Nynorsk Latin nn-NO 16 Iceland Icelandic Latin is-IS 17 Canada French (CA) Latin fr-CA 18 Canada English (CA) Latin en-CA 19 Canada <varies> Latin <varies>-CA Inuktitut(Inuk) Inuktitut iu-Cans-CA Inuktitut(Lat) Latin iu-Latn-CA Mohawk Latin moh-CA 20 Turkey Turkish Latin tr-TR 21 "Arabic" Arabic Arabic ar-<varies> 22 Ireland English (IE) Latin en-IE Irish Latin ga-IE 23 Hong Kong Cantonese (HK) Chinese(Simp) zh-HK Cantonese Chinese(Simp) yue-Hans Cantonese Chinese(Trad) yue-Hant 24 Russia Russia Cyrillic ru-RU Bashkir Cyrillic ba-RU Yakut Cyrillic sah-RU Tatar Cyrillic tt-RU 25 Russia2 Russia Cyrillic ru-RU 26 Israel Hebrew Hebrew he-IL 27 Mexico Spanish (MX) Latin es-MX 28 "LatinAm" Spanish (419) Latin es-419 (many!) 29 Australia English (AU) Latin en-AU 30 Austria German (AT) Latin de-AT 31 Belgium French (BE) Latin fr-BE Dutch (BE) Latin nl-BE 32 Japan Japanese Kana+Kanji jp-JP 33 "MiddleEast" Arabic Arabic ar-<varies> 34 Netherlands Dutch Latin nl-NL Frisian Latin fy-NL 35 Switzerland German (CH) Latin de-CH French (CH) Latin fr-CH Italian (CH) Latin it-CH Romansh Latin rm-CH 36 Wales English (CY) Latin en-GB ? Welsh Latin cy-GB 37 "Maori" Maori Latin mi-NZ 38-47 reserved 48 United States English (US) Latin en-US Spanish (US) Latin es-US 49 Wales2 Welsh Latin cy-GB 50 China Mandarin Chinese(Simp) zh-Hans-CN 51 Brazil Portuguese (BR) Latin pt-BR 52 South Africa English (SA) Latin en-ZA 53 South Korea Korean Hangul ko-KR 54 Taiwan Mandarin (TW) Chinese(Trad) zh-Hant-TW 55-69 reserved 70 "DvorakUK" English Latin en-GB 71 "DvorakUS" English (US) Latin en-US 72-79 reserved 80 "ISO1" <varies> Latin <varies> 81 "ISO2" <varies> Latin <varies> 82 "ISO3" <varies> Latin <varies> 83 "ISO4" <varies> Latin <varies> 84 "ISO5" <varies> Latin <varies> 85 "ISO6" <varies> Latin <varies> 86 "ISO7" Greek Greek gr-<varies> (GR) 87 "ISO8" Hebrew Hebrew he-<varies> (IL) 88 "ISO9" <varies> Latin <varies> 89-94 reserved 95-125 reserved (but overlaps with alphabet numbers) 126 "Special" <varies> <varies> <varies> 127 "Read" - OS_Byte 70 128 Faroe Faroese Latin fo-FO 129 Albania Albanian Latin sq-AL 130 South Africa Afrikaans Latin af-ZA English (SA) Latin en-ZA Sesotho sa Leboa Latin nso-ZA Setswana Latin tn-ZA isiXhosa Latin xh-ZA isiZulu Latin zu-ZA 131 "Bengal" Bengali Bengali bn-<varies> (IN/BD) 132 Bulgaria Bulgarian Cyrillic bg-BG 133 "ByeloRussian" ByeloRussian Cyrillic ru-<varies> 134 "Czech" Czech Latin cs-<varies> (CZ) 135 "Devang" Hindi Devanagari hi-<varies> (IN) 136 "Farsi" Persian Arabic ar-<varies> 137 "Gujarati" Gujarati Gujarati gu-<varies> (IN) 138 Estonia Estonian Latin et-EE 139 "Gaelic" Gaelic (Scots) Latin gd-GB ? Gaelic (Manx) Latin gv-IM ? Gaelic (Irish) Latin ga-IE ? 140 "Ancient Greek" Ancient Greek Greek grc-GR 141 Greenland Kalaallisut Latin kl-GL 142 Hungary Hungarian Latin hu-HU 143 "Lapp" ? 144 Latvia Latvian Latin lv-LV 145 Lithuania Lithuanian Latin lt-LT 146 Macedonia (FYR) Macedonian Cyrillic mk-MK 147 Malta Maltese Latin mt-MT 148 Poland Polish Latin po-PO 149 "Punjab" Punjabi Guru pa-<varies> (IN) 150 Romania Romanian Latin ro-RO 151 "SerboCroat" Serbo-Croatian <varies> sh-<varies> (many!) 152 "Slovak" Slovakian Latin sk-SK 153 "Slovene" Slovenian Latin sl-SI 154 "Tamil" Tamil Tamil ta-<varies> (IN) 155 Ukraine Ukrainian Cyrillic uk-UK 156 "Swiss1" French (CH) Latin fr-CH 157 "Swiss2" German (CH) Latin de-CH 158 "Swiss3" Italian (CH) Latin it-CH 159 "Swiss4" Romansh Latin rm-CH |
Clive Semmens (2335) 3130 posts |
In the days when I did a lot of foreign language work, I only ever used one Latin keyboard layout, with accent keys rather than accented characters. But I did have separate keyboard layouts for Cyrillic, Greek and Hindi. |
Rick Murray (539) 13406 posts |
I notice there’s only one entry for Japan – Kana+Kanji. There is also a way of writing called Wāpuro rōmaji which enters romanised Japanese using a Western keyboard. There is an incredibe amount of duplication in that table. Why Wales and Wales2 (what’s the difference?) when really it belongs as a category of United Kingdom. Well, Brexit might eventually fix that, but for the moment it’s a part of UK and probably ought to be treated as such. My guess for “Lapp” is somebody got halfway to making a territory for Lapland and didn’t realise it’s sometimes called “Lappi” but often called “Sápmi” (as “Lappi” is a subset of Sápmi). Shouldn’t country and language be separate and not implied? I can nominate myself here. My system should be set as “France” (rather than hacking the UK territory), but it cannot be as there’s the assumption that France = French. While it is not an unreasonable assumption, it’s an inflexible system that can’t handle “user is located [here] and speaks [this]”. Android, iOS, XP… no problems with that concept. |
Steve Pampling (1551) 7932 posts |
Am I allowed to say that the only columns containing what I consider to be sensible and meaningful data are column 1 and column 5? |
Tristan M. (2946) 1036 posts |
“Honk Kong” literally the first thing I saw on the list. Sorry. |
nemo (145) 2437 posts |
Rick confirmed
That is the central thrust of my point, yes. The purpose of the above table is to be certain what the existing Country codes mean for both country and language. As can be seen, the answer is often It may be that someone is using Country number n for purposes incompatible with the IANA LSR I’ve suggested, and I’d rather know that now than later! Steve asked
You, Steve, are allowed anything. Column 2 is just the official Acorn name for the ‘Country’, reprinted here because I don’t expect people to know that Country 151 is ‘SerboCroat’ for example. Column 3 is the language implied by the ‘Country’. Again, someone might disagree with me – 156 is definitely French(CH), but I’ve no idea which ‘Gaelic’ 139 is supposed to be. I’m guessing Scots, but maybe it’s Irish… but then maybe Country 22 is Irish. Column 4 is the script, which is sort of specified by the Country-to-Alphabet functionality, but not really in the case of Cyrillic (since the Acorn Cyrillic alphabets, both of them, contain Cyrillic and Latin). This is important because it informs font repertoire requirements in the Unicode world. I’m happy you described column 5 as ‘meaningful’ – that was the intention. Lamentably the many I have an alphabet module here that supports (and restores, for RO5) the ‘Master’ and ‘Compact’ choices, but allows the ISO country to be separately defined. It was when I was considering extending that strategy that I started compiling this list to understand the scale of the problem. This would allow someone using Country 139 ‘Gaelic’ to specify whether they mean GB, IM or IE and hence imply the correct IANA LSR, just as my existing code allows someone using ‘Master’ to specify ‘GR’ (because, yes, the original BFont encoding supports Greek). No, I don’t think anyone is actually using the ‘Master’ alphabet for this reason… but that is not the point. It’s a test case. The exact same situation applies to Bengali for example, which may well be hi-IN, but could just as easily be hi-BD (or GB, if one wished). Rick noted
Indeed, and 90% of Japanese computer users use this method. However, that affects neither the Language nor the Script… that’s a keyboard layout (or IME) issue. This is another of the problems with the euro-centric concept of “selecting keyboard by country”. It always was staggeringly parochial.
Because, as the IANA LSRs I’ve suggested make plain, one is for English in Wales and the other for Welsh. Welsh has vowels us mere mortals lack, so it benefits from the Welsh alphabet instead of Latin1 – try
Is probably frighteningly close to the reality of how we got most of these. :-O
Absolutely. And “bleedin immigrants” aside, you could be a 100% pureblood Breton and be offended by having to select “French”. But that’s enough politics. ;-)
Thank you! What would be good would be certainty of what language ‘Hong Kong’ is supposed to imply. I’ve no desire to provoke an international incident, but there must, historically, have been an intention to mean something… I suspect it was probably “English… but somewhere foreign!”. :-/ |
nemo (145) 2437 posts |
Further to my earlier “MessageTrans would automatically…” arm waving. Having monitored what MT is actually asked to do, there are only a few patterns that would need to be grokked for compatibility: • …<countryname>.<leafname> The majority however are of the “path vars make it someone else’s problem” variety: • <a path variable>:<leafname> However, there’s also a lot of this one, which is as close to not internationalising as it is possible to get. • <Obey$Dir>.Messages The same patterns apply to Templates of course, which can be intercepted through Wimp_Extend. Toolbox Resource files are more awkward though. I’ve yet to look into that. The intention is to allow the user to specify her language preferences as a string – eg “en-GB,fr-CA,zh-HK”. This would cause MT et al to check for the following resources, in this order: • en-GB (merged on top of en if available) It’s clear that “UK” has to be retained for compatibility with a large number of existing applications. Therefore Ukrainian will have to always be “uk-UK” and never simplify. Should the postfix pattern be “Messages-en-GB” rather than “Messagesen-GB”? I suspect so. Ideally I’d want to check a centralised ‘Language Pack’ (eg !Territory) for pre-localised files for applications, much as happens with Oh and yes, Chinese is complicated. What the Chinese government regard as ‘dialects’ is what the rest of us would call ‘mutually unintelligible separate languages written in the same script’, but don’t tell them I said so. So when searching for an IANA LSR: …<countryname>.<leafname> => …<LSR>.<leafname> …Messages<countrynum> => …Messages-<LSR> …<countrynum padded>.<leafname> => …<LSR>.<leafname> <a path variable>:<countryname> => <pathvar>:<LSR> <a path variable>:<leafname> => <pathvar>:<leafname>-<LSR> <Obey$Dir>.Messages => <Obey$Dir>.Messages-<LSR> where <countryname> and <countrynum> are of the current country, which is the backwards-compatible bit. And when trying an old country number: …<countryname>.<leafname> => …<tryname>.<leafname> …Messages<countrynum> => …Messages<trynum> …<countrynum padded>.<leafname> => …<trynum padded>.<leafname> <a path variable>:<countryname> => <pathvar>:<tryname> <a path variable>:<leafname> => <pathvar>:<leafname><trynum> <Obey$Dir>.Messages => <Obey$Dir>.Messages<trynum> These two sets aren’t identical. It would be tempting to make |
Rick Murray (539) 13406 posts |
Isn’t Honk (!) the enclave of Catonese speakers? I don’t know, I am simply saying that because movies from Hong Kong don’t sound the same as the likes of the various over-the-top wuxia productions that make it to the west (Jet Li, Donnie Yen, Zhang Ziyi…).
The stupid numbers are really annoying.
<cough> uk-UA </cough>
What’s wrong with a subdirectory? Messages.en-GB ?
I wouldn’t worry too much about that. I mean, it’s not as if there’s any built-in support for any squiggly ideographs…
And there lies the problem. Not that you’re still mixing up country and language, but mostly because country numbers are broken. Language numbers are broken too. Both should be retained in the minimal implementation required for compatibility. And then… *Country CA *Language fr-CA, en-CA *Country Canada (CA) *Language 1. Française Canadienne (fr-CA) 2. Canadian English (en-CA) Don’t quote me on the spelling (or gender) of the fr-CA, I didn’t bother looking it up. This might simplify it for you too. All the UK, France, 1, 6, etc etc can be part of how the legacy stuff operates (I think most will likely do it according to whatever ResFind or the like detects). |
Steve Pampling (1551) 7932 posts |
I’m afraid you and my wife differ on that and you’re outvoted.
Maybe you’re looking from the wrong end? How many duplicates of the IANA code are there? Try resorting on column 5 and it becomes obvious that IANA may have had a clue what they were doing. Note also that the “LatinAm” is another naffness, and is about as useful as expecting everyone in Europe to speak English (or, since we’re allegedly leaving, ask them to speak German – cos they’re more sensible) Picking something we know is a total dogs dinner as you index field is only ever going to lead to pain. Let’s build things on a sensible basis, look to see what breaks and how to drop in a crutch for those items. |
Steve Pampling (1551) 7932 posts |
Upon sorting one of the first obvious items was the partial duplication of some entries: Welsh Latin cy-GB Deleting those tidies things a little. |
nemo (145) 2437 posts |
Steve first
Many. I’m not trying to achieve a one-to-one mapping, I’m trying to be clear about what the Country numbers actually mean, because it doesn’t seem to be clearly defined. So where I’ve put multiple LSRs against one Country number, it’s because there may be people who are using that country number to mean any of the choices. ‘Gaelic’ for example. I realise that this is an academic question because we actually know the first names of everyone who still uses RISC OS
Indeed. But “Chinese” is actually worse. When you see the choice between “Chinese (Simplified)” and “Chinese (Traditional)”, this is a choice of script – it’s like being given a choice between 𝒊𝒕𝒂𝒍𝒊𝒄 and 𝖋𝖗𝖆𝖐𝖙𝖚𝖗 (you’ll need a proper browser and OS to see what I did there) – not really a choice of language. Someone from Shanghai would not write the same thing in either script that someone from Hong Kong would, colloquially. Think of it like this: We now get to specify sprite types by colour depth, bits per component and colourant order… but first we had to be sure what the old MODE numbers actually mean in the new terms. This is the same exercise.
I’m not sure why you’re saying these are duplications. My point is that Country 15 is probably going to mean Norwegian in Latin script… but it could have been used to localise into Nynorsk. Again, an academic point. If someone selects ‘nn-NO’ and runs an old program, would they prefer Norwegian or English? I suppose it would be better to use the rest of their ordered preference rather than assume that there are any Nynorsk localised applications that used 15 as the Country number. Is that what you mean? Rick said
Indeed, hence my zh-yue clarification, and also of Simplified rather than Traditional orthography. However, I’m not convinced (pardon my scepticism) that whoever at Acorn coined that Country didn’t actually mean “English as she is spoken in our glorious occidental colony”. Do you see what I mean?
…Yes well I obviously put that in as a test to see if you’re paying attention. Which you are…
I’m thinking of the backwards-compatible behaviour. A program that tries to select its own translation will detect whether it has to fall back to English by checking for the existence of the appropriate file. Thanks to that now ancient API revision to OS_File,5, many programs get this wrong by checking R0=1 on exit. So if it is expecting ‘Messages’ to be a file, we can’t turn it into a Directory. And the idea is for this to work without modifying existing programs. New programs would use “Messages” and allow MT to select (and compose) the appropriate one (I take the point though that apps could pass the directory name to the new MT… but that wouldn’t be backwards compatible with older MTs). If MessageTrans et al are smarter, then we can in effect ignore which localisation the app has chosen as MT will override it… but only under the most common scenario of the app asking MT to load the file. It is also permissible for the App to select and load the file itself, and present it to MT via its message block, and that can’t really be overridden. I haven’t checked how many things actually do this though, I expect it to be a small number.
My point is only that if the user selects “es-PE”, the Country number has to be set to something so that awkward applications don’t just stubbornly present English. Country 126 is guaranteed to not have a localisation, whereas 5 might (more likely than 28 anyway).
It would be better if ‘old’ applications continued to work without having to rename bits of them. |
Steve Pampling (1551) 7932 posts |
Half duplications – as in only some of the columns have an entry, which since you’re referencing the numeric code (absent) and the name (absent) makes them an effective null in the old/existing system does it not?
Whereas I was dealing with a non-maskable interrupt instead (feed the cats – several times) |
Pages: 1 2