Compose mode to imput UTF-8
Pages: 1 2
adr (12133) 23 posts |
First post, so hi to everyone! I’m used to input a variety of unicode characters using the compose system (from old dec terminals) existing in the X server or in the plan9 OS. For those not familiar with this input system: You can use different types of mnemonics, for example: I found AltKeys: I found later Rick Murray’s MoreKeys: So, any advice or pointing to resources I can use to achieve this (after learning how to write modules in the first place)? Thanks in advance. |
Paul Sprangers (346) 525 posts |
You might also give !KeyMap a try. Be aware though that Unicode on RISC OS is severely underdeveloped. |
adr (12133) 23 posts |
Oh, I know it already. Its a 1 key → 1 char mapping application. I could modify the keymap module but… no source! |
Rick Murray (539) 13850 posts |
MoreKeys supports UTF-8. You just need to set the system alphabet to “UTF8” in order for it to work (it’s pointless otherwise as the result would be mojibake). The module is extremely simple, you can probably figure it out just by looking at a disassembly in Zap. As for writing a module, here’s how to do it on assembler: https://heyrick.eu/blog/index.php?diary=20120528 And if you’re doing anything more complicated, it’s better to use C so you can concentrate on writing the code and not dealing with tedious rubbish like ensuring the stack is unwound correctly (as RISC OS has exactly zero tolerance for having the SVC stack messed up). Here’s writing modules in C: https://heyrick.eu/blog/index.php?diary=20150323 |
adr (12133) 23 posts |
MoreKeys supports UTF-8. You just need to set the system alphabet to “UTF8” in order for it to work (it’s pointless otherwise as the result would be mojibake). I want to avoid the use of alphabet. Right now it will make the use of other applications a nightmare, especially in the menus. Would it not be better to adapt the entire system to use only UTF-8, and let the alphabet subsytem as a legacy facility, so old applications can be patched or wrapped to tell the system “I’m too old, please translate my IO text as If we were using alphabet”? What I want is to push a string of bytes representing a unicode UTF-8 encoded char. I will try to modify the applications I need to Every text data that flows out there is mostly UTF-8. It is incredible that risc os doesn’t have resolved this issue already. Thanks for the links, I’ve been there already!
Don’t you have a copy of the sources? It would be easier for a novice to identify the SWIs and friends. |
Rick Murray (539) 13850 posts |
I fully agree. Because there is no sort of automatic fallback to an eight bit character set, it’s currently an all or nothing. One way or another, stuff will end up looking a mess. Granted, certain purists say that the font manager shouldn’t support fallback if one explicitly specifies UTF-8, however this means either a lot of work in getting the Wimp to do it, or nothing at all gets done. Guess which option has been chosen thus far. ;)
Yeah, I’ve already mentioned a few times that my little Creative Zen music player (from circa 2005?) can handle foreign characters without issue and…
There’s a document that describes all of the SWIs. |
Steve Pampling (1551) 8172 posts |
Would it help if you ran Ricks module through Armalyser? OK, it’s probably missing a few, slightly zany?1, comments alongside the assembler2, but it’s probably short enough to follow anyway. 1 It’s Rick, would it be otherwise? 2 Assuming here that, since it’s a very small item and probably predates Rick regularly using C, it’s a pure ARM construct. |
Rick Murray (539) 13850 posts |
It was written in ObjAsm. The main program (Wimp app) is written in C, but the intercept module is a tiny bit of assembler as it didn’t need to be any more complicated than that. I can’t comment on the comments. ;) |
adr (12133) 23 posts |
The module itself is something like 280 bytes (from memory). It’s not at all complicated. Oh, I opened the module with zap and the assembly mode is even showing me the names of the SWIs. |
Rick Murray (539) 13850 posts |
Yup. Zap is good like that. To help you along, I’ve also dropped a copy of the source code used to make the module here: https://heyrick.eu/blog/files/morekeys_module_source_0-03.txt Yeah… a few quirky comments, as Steve mentioned. ;) What you’re missing:
The other two (pushpull, debug) I don’t think are actually used. So try commenting them out. nemo, if he’s lurking, would tell you that this is a horrible way to intercept keypresses and that one should write a keyboard handler. That there’s practically zero documentation on how to write keyboard handlers and we’re really operating above the level of keyboards and into the realm of associating actions with keypresses in numerous potential languages is why I didn’t do it that way. If intercepting a specific hotkey combination requires one to “write a keyboard handler”, then RISC OS’ keyboard handling is even more broken than I usually give it credit for. |
Clive Semmens (2335) 3276 posts |
There used to be !IKHG – International Keyboard Handler Generator – but sadly it’s never been updated since 26-bit days, and the keyboard handlers it generated don’t work since then either 8~( – I used to have proper keyboard handlers I’d written using it, so I could touch type Russian, Greek and Hindi on my RiscPC, with standard keyboard layouts for those languages… could switch keyboard layout at a click on the appropriate icon on the icon bar… |
adr (12133) 23 posts |
To help you along, I've also dropped a copy of the source code used to make the module here: <a href="https://heyrick.eu/blog/files/morekeys_module_source_0-03.txt">https://heyrick.eu/blog/files/morekeys_module_source_0-03.txt</a> Thanks Rick for sharing it!
I remember looking at the sources and seeing some documentation about generating new keyboard layouts, in case you are interested in doing something like that. The compose key input system + UTF-8 let you input any unicode char supported by the system whithout changing the layout. I can edit a simple text file in my favorite (this day of the week) editor using the keyboard layout I’m familiar with, and insert mathematical symbols, Greek, kana, chess symbols… you name it. All this without having to change the keyboard layout or the font, and without the need of a text processor. And for characters not used so often, there is always !Chars. Of course the convenience of this depends on the mnemonics you use and the number of characters the font supports. For example you can prefix a letter of the latin alphabet with a * to input greek letters. *a → α, *b → β, so you just have to remember “* is for Greek”. If you need to input ancient Greek with its orthography, things get more complicated, in that case is better to use a keyboard layout for ancient Greek. If some Plan 9 user is around here, I’m pretty sure he/she will understand that once you get used to this way of typing, you can’t live |
adr (12133) 23 posts |
Would it help if you ran Ricks module through Armalyser? Thanks Steve for pointing me to this application. I’m amazed by its output. |
Clive Semmens (2335) 3276 posts |
But I want to change the keyboard layout – I want to be able to change it quickly and easily to the proper layout for the language I’m currently typing in, and change it again and again, every time I switch languages. Otherwise I’m hunting for the right key all the time, which is incredibly slow compared with proper typing. Obviously if I’ve got a font that has all the characters for all the languages I use then I don’t need to change font – apart from when I want bold or italic or small caps or cursive or whatever. That I didn’t have back in RiscPC days – I had fonts I could switch between Latin and Cyrillic (for Russian), or between Latin and Greek, or between Latin and Devanagari (for Hindi), but in those days fonts couldn’t have enough glyphs for all four languages in the same font*. But now…
|
adr (12133) 23 posts |
But I want to change the keyboard layout Oh, I know that you want to do that, I just was pointing out that What I mean is that the compose key input system doesn’t substitute the need of different layouts. In Xorg I use the command setxkbmap. You can specify the layouts you want to use, the compose key, the key to change the layouts, &c, although I prefer configuring key bindings in the windows manager to change the layout to the one I want. So don’t worry, I will be asking for advice to do something similar soon… Didn’t you take a look at the code for creating new layouts? IIRC it was well documented. |
Clive Semmens (2335) 3276 posts |
I don’t know the keyboard layouts for any Latin alphabet language apart from English; I’m very happy typing French or German (the only other Latin alphabet languages I really know much at all) on an English keyboard layout and using floating accents on Alt-Number. But for Greek, Russian or Hindi trying to type on an English language keyboard is hopeless – especially Hindi, where phonetic just doesn’t work at all, with no 1:1 correspondence between English and Hindi phonemes at all. |
Rick Murray (539) 13850 posts |
For western (Latin) languages, the “international” keyboard layout is your friend. There’s US International, and a British version. It’s not so different to the Alt-: method used by RISC OS… …and given that RISC OS did this in 1992 and Windows in 1995, it p<bleep>es me off enormously that in 2023 Android is utterly hopeless at managing this with a British layout Bluetooth keyboard. I’m in the process of trying to figure out enough of Keyboard Helper to do what RISC OS does… Of course, this isn’t much help for non Latin languages which are a trauma all their own, but I’ve not run into difficulty (Android aside) ripping in French and Spanish using a British layout. |
Clive Semmens (2335) 3276 posts |
Here’s a proper Hindi keyboard, laid out exactly the way old Hindi manual typewriters were, so I could touch-type on it: https://clive.semmens.org.uk/024_ROUGOL/PO.html – it’s not remotely phonetically similar to QWERTY. Using !IKHG I was able to write a keyboard driver that generated the top-bit-set character codes for where I’d put those characters in the old 224 character fonts – it would be better of course to generate UTF8 codes for them, but in those days you couldn’t have more than 224 characters in a font. (Well – 223 really, since #128 is delete.) |
adr (12133) 23 posts |
For western (Latin) languages, the “international” keyboard layout is your friend Not mine. You have to look when typing a char like the caret symbol that corresponds now to a dead key if the |
Colin Ferris (399) 1818 posts |
Are your keyboard 26bit modules downloadable? |
Clive Semmens (2335) 3276 posts |
Mine, you mean? Not currently. I’m not sure whether I’ve still got copies. If I can find them I can let you have them if you want them. I’m not at the Pi at the moment – I’ll probably be able to take a look later this afternoon. |
David J. Ruck (33) 1636 posts |
I’m glad your v0.08 is using a jump table in the SWI handler rather than the compare and branch chain of v0.03. |
Rick Murray (539) 13850 posts |
Yeah, I’ve just looked it up on Google. Has the US international layout always been like that, or has Microsoft broken it on recent years? Using the UK Extended, it’s a matter of holding AltGr and something like ‘^’ followed by a keeps like ‘a’ to arrive at ‘â’. Similar to how RISC OS does it, except certain common keys are easier (AltGr and e → é). |
Rick Murray (539) 13850 posts |
It could be better, yes… ;) |
Clive Semmens (2335) 3276 posts |
I’ve found my little 26-bit era apps that change the keyboards for Russian or Hindi, but not Greek or our special Journal of Physiology keyboard. Sadly I’ve not found a copy of !IKHG that I made the modules inside them with. They certainly don’t work on the Pi, and I’ve nothing else to check them on. If anyone really wants to have a look at them, speak up and I’ll put them in a quiet corner of my website and give you a link. Other than looking what’s going on inside them I’m not sure what use they’d be – they don’t generate UTF-8, they generate 8-bit codes that correspond to non-standard character encodings that allowed me to cram the Latin alphabet, with a comprehensive set of non-spacing diacritics for the Latin alphabet, plus either Cyrillic or Devanagari and all their diacritics, into 223 character fonts… (Edited: found the (very simple) app to return keyboard to default.) |
Pages: 1 2