Chars
Chris (121) 472 posts |
Anyone doing work on this? I’d be interested in looking at it, if not. |
Andrew Conroy (370) 725 posts |
I’d suggest that Martin Wurthner’s !XChars might be a good place to start! |
Matthew Phillips (473) 719 posts |
I think XChars has been supplied with some variant of ROL’s version of the OS. It’s certainly very nice. There are three things I would like to see, however: 1) being able to see the full range of Unicode characters present in a font, so that you can easily see which glyphs are present for your chosen font. XChars allows you to choose different encodings, but when you switch to UTF8 it only shows 128 characters rather than the full range. 2) being able to see the full range of available Unicode characters, in a font-independent way. NetSurf uses the RUFL library which has very nice features for displaying Unicode text. Basically you throw stuff at RUFL and it will display the characters by using a combination of fonts if necessary, so if you have Hebrew characters in one font and Chinese in another, and have set your preferences in NetSurf to use Trinity, it will still display the characters. In an improved Chars or XChars this would give a combined view of all the characters on offer. It would probably need to indicate the font each one came from when you hover over with the mouse. 2a) there then might need to be a sensible way of selecting a subset of Unicode characters using character properties, as the list in Unicode order is not intuitively arranged. For example, showing all the “e”-with-diacritics together would be good. 3) and the point of this for the functionality of Chars or XChars is that most importantly we need a way for these ustilities to pass a Unicode character through to an application when the user clicks as though it were entered on the keyboard. At present RISC OS operates a fudge whereby applications only get Unicode characters from the keyboard drivers if the machine’s alphabet is set to UTF-8, and then they are received byte by byte as several separate events. The application has to build them into a single character. This avoided having to do anything backwards incompatible to deal with the special Wimp codes beyond 256 like the F-keys, cursor keys etc. But this is hampering development of Unicode-aware software. No-one uses alphabet set to UTF-8 because it means all your other applications go funny. So to my mind, what we need is a standard extension to the Wimp Key Pressed event. It needs to be compatible with DeepKeys (hello nemo!). Here’s one suggestion: +24 Character code +28 DeepKeys stuff +32 Unicode character number We’ve recently seen a Unicode-aware !Dict released, which has a special integration with !KeyMap in order to allow the user to type a wider range of characters. I don’t know how Paul Sprangers has done it, but I suspect it will be via a set of private messages. An approved standard extension to the Key Pressed event would be a step forward in this area. |
Chris (121) 472 posts |
Yes, the Unicode issue is quite tricky at the moment. My thoughts for Chars were for development in two stages: 1. Initially produce a smartened-up version with a few new features (different ASCII-style encodings, nicer visual display, icon bar icon, Choices window, etc.) This wouldn’t take very long, would make the app nicer to use, and I’d be happy to do it and upload for comments. I’ve made a start on this. 2. Look into extending it to cope with the UTF-8 alphabet. This would serve primarily as a way of showing all the glyphs present in a font, but would also provide a way of entering characters outside the normal range. At present, I only have the sketchiest understanding of what would be involved in doing this. Chars is written in BASIC, which may be a poor choice (it wouldn’t be able to make use of libraries like RUFL, for example). As you point out too, the OS has only rudimentary support for UTF-8 in the desktop – Chars currently just issues Wimp_ProcessKey to send the character code to other applications. So it might prove too difficult to do without starting from scratch. I’m interested in working on this, but would be happy to take advice/collaborate with anyone who knows the issues better. |
nemo (145) 2529 posts |
;-) Well here’s an interesting thing… there’s a reason for the “deep” in !!DeepKeys – the keyboard buffer becomes word-deep, instead of byte-deep. (I acknowledge “WideKeys” may have been clearer!). Couple that with the fact that Wimp_ProcessKey has always been word-sized and one can see that sending Unicodes is not hard… except it’s rather more complicated than that. If it wasn’t for the Wimp we could simply switch to Unicode keypresses throughout. But the Wimp has stolen Extended Latin-A and part of Extended Latin-B for its own purposes, and defines Basic Latin rather differently. So you can’t use Unicode directly without confusing existing apps… but you could probably set b31 and faff appropriately. Existing apps would just ignore the keypress.
That’s what my !IntChars program does. Clicking menu on a letter shows all the alternates. This is done via the Encoding, which is a PostScript concept. You can choose an Encoding as well as a Font (and my !Encodings resource contains a number of useful ones). The problem is, what does a “keypress” mean? If you do Wimp_ProcessKey,65 do you expect to get an “A”? Well not if you’ve chosen \fHomerton.Medium\eEBCDIC as your font, for example (apologies if I’ve misremembered the slash syntax… is it capitals? I forget). IntChars sends a message ahead of the KeyPress with the font and encoding, so an application can switch font or at least transcode between endcodings… but whether a glyph is available depends on the font chosen, and there isn’t actually any way of specifying the glyph you want from the font manager with most fonts. That might sound like a strange claim, but consider the Symbol and Dingbats lookalike fonts (Sydney and Selwyn) – what do you get when you ask for glyph 65 from them? For those “non-language fonts” (as the original format fonts were rechristened) the input is a string of glyph indices, not “characters”. What glyph you get is not defined by the API, though you can manually look for the optional Encoding file in the font and Do It Yourself (as IntChars does). Language fonts can have an Encoding applied, but it ain’t necessarily a Unicode one, and again you have the difficulty of transcoding. If you do choose UTF8, then you need a Unicode-aware application to type into… good luck with that. This isn’t a desktop issue as such, it’s the same Unicode Surprise that split EPOC into two different versions (the Unicode one becoming Symbian). You can’t deliver Unicodes through the keyboard or Wimp without applications being required to handle them. Extending the KeyPress poll event doesn’t help – what do you put in the ‘old’ keypress for ‘old’ programs to use? &FFFD, the Unicode replacement character? Though !!DeepKeys makes the keyboard buffer wide enough to support Unicodes, and assuming you write a keyboard driver to generate them (not hard), you still have the issue of making all your programs do something useful with the result. This isn’t easy. I have a module with a load of new alphabets such as 98 MacRoman. My keyboard driver used an extension to query the alphabet support module to find which code represented the hard space (for Alt-Space) because of this – the MacRoman hard space isn’t 160. But applications don’t know that. That’s why I took the approach I did with !IntChars – a separate message which, if the app did not acknowledge it, was followed by a keypress message with suitable transcoding. As for whether it is written in Basic or not. That’s really not the problem! |
W P Blatchley (147) 247 posts |
This is something I’m really interested in. I’ve always thought the RISC OS 5 way of passing UTF-8 key presses to apps was a bit clunky. Nevertheless, passing UTF-8 rather than UCS-4/UTF-32 is arguably more useful, since the Font Manager speaks the former but not the latter. In the most naive of apps, it’s possible that if UTF-8 key presses were passed in, correct text /might/ be displayed on screen even if the app itself was not Unicode-aware as such. Although that said, it would have had to open a font with \EUTF8, so perhaps that point is moot. Even so, I’d say it’s at least worth discussing whether it would be more sensible to extend KeyPressed events with UCS-4 as Matthew suggested above, or whether it would be better to use UTF-8 for this. That would mean adding two extra words to the event, as we’d need up to 6 bytes.
Even so, we’d need more information in the Key Pressed events to fully support Unicode, wouldn’t we? Because, according to the docs I have on !!DeepKeys, although “Deep” key ranges are word-wide, only bits 16-31 are used for the actual key code. That’s not even enough to support the Unicode BMP, even if we assumed the allocation of key codes for !!DeepKeys wasn’t proprietary, which it is, I believe. No, I think either !!DeepKeys itself should be updated to be able to talk Unicode properly, or add either UTF-8 or UCS-4 to KeyPressed events. The former might be desirable, since !!DeepKeys is something that should really be standard in the OS, IMHO.
Doesn’t this just exemplify the problem? This just shows that as well as updating KeyPressed events, we also need to add to or modify Wimp_ProcessKey. Maybe add a Wimp_ProcessKeyWide? If you do Wimp_ProcessKey,65, I’d argue that you’d always want an “A”. If you wanted something else that’s represented by code point 65 in another encoding, the caller of Wimp_ProcessKey should have transcoded to Unicode first, passed a known and unambiguous Unicode “character” via an updated Wimp_ProcessKey, and receiving apps should interpret that appropriately. In other words, they should transcode from Unicode to whatever encoding they need to display/store internally. Fortunately, we already have good transcoding support in RISC OS with Iconv. As it’s a module, even BASIC users can make use of it. And that avoids having to pass encoding information around with everything via Wimp Messages or whatever. The situation is complicated by the need to pass “control” characters around via Wimp_ProcessKey and obviously with the KeyPressed event. By which I mean characters that have no representation in Unicode. Like a &1C1 “Menu” key press, etc. So, in the same way as Matthew suggested above for KeyPressed events, we need to send both traditional Wimp keycodes around, AND an extended field with Unicode-encoded information (be that UTF-8/UCS-4 or whatever).
I’m not sure I really understand the issue here. Can’t we just pass a reserved keycode in the ‘old’ keypress that ‘old’ programs would ignore? Same for Wimp_ProcessKey. If there’s a current representation (among existing keycodes, I mean) of a Unicode character, we pass that. If not, the reserved keycode is sent (and ignored), and the extended field contains the Unicode. Old programs wouldn’t pass on the key press, I suppose, but would that be such a problem? In your example, U+FFFD would just get ignored – does that matter? |
nemo (145) 2529 posts |
No, that’s the internal key number. It’s used to tell the difference between Ctrl-M, Return and Enter, if that’s important to your app (Note that Wimp_ProcessKey doesn’t have that functionality, so one must always default to the most likely keypress if the internal key number is strange). The “deep” part of the keypress is delivered where you’d expect for a KeyPress event. This is why there’s an inconsistancy for 0080-00FF and incompatibility for 0100-01FF. You could deliver a Unicode through there (and it’s neat that only a single Wimp_ProcessKey or KeyPress event is necessary) but that wasn’t a serious suggestion. In any case the intention (of word-sized keypresses) was for a large range of virtual keys, such as “Scan” (for USB scanners) or the various meta keys which I was hoping multimedia keyboards would sprout. Remember multimedia keyboards? Peripherals are so boring these days.
Not when setting text in a Dingbats font.
Ha, no, sorry. That’s not what Unicode is for. For example, I can direct you to a font that consists entirely of ampersands (or indeed Euros). There isn’t a one-to-one mapping of Unicode to glyph or vice-versa. Unicode encodes the meaning but not the representation. When setting text, representation is usually all-important. Unicode plays an important part in modern font formats, but sadly such fonts often make heavy use of the Private Use Area, which rather defeats the purpose. I’m quite serious when I say that PostScript Encodings (based on the AGL) are actually more useful in some ways than Unicode. For example, I have an Acorn pi font containing two sets of zodiac symbols. Because its encoding names these as /Taurus and /taurus etc, pressing Ctrl-S in !IntChars swaps between the two, and clicking Menu on either shows the other as a variant. Equally the various AGL-specified postfixes are supported, so one could have an ampersand font encoded as /ampersand /ampersand.alt1 /ampersand.alt2… With Unicode you end up with FC00 FC01 FC02 FC03 etc. Good luck doing anything useful with that.
One can transcode the intended keypress to the current alphabet (which may be UTF8 and hence require multiple KeyPress events, bleuuurgh) and if there is no satisfactory mapping, use FFFD. However, one has to be aware that with non-language fonts, nominating suitable Unicodes is very far from trivial, and probably isn’t very useful anyway – if any glyphs you choose from a font have no Unicode representation then the fact that the rest do is very little consolation when some disappear! I mention “setting text”, as opposed to displaying Unicoded content, because of the representation issue. It’s very important on a content production platform. |
nemo (145) 2529 posts |
I’m arguing against retrofitting the KeyPress event with a Unicode field because:
I prefer a separate message containing glyph id, font/encoding and Unicode because:
|
W P Blatchley (147) 247 posts |
Yes, of course you’re right. As I use computers for foreign languages daily, I tend to forget the importance of the **** dingbats! But really, I think you’ve hit the nail on the head with your final remark:
I heartily agree with that statement. I would love to see a new message introduced for this purpose. Would it in fact make sense to be able to send a string, not just a single character, for insertion at the caret? I can see where that might be useful in a very simplistic IME. I’m not sure a full-blown IME would talk to apps via Wimp messages, though. I think the interaction between the app and the IME would be too intense to make that sensible. Probably the IME would need to be a module task so the two could talk to each other without excessive latency, and without app writers having to get tied up in an overly complex messaging protocol. But that’s for another thread on another day… |
W P Blatchley (147) 247 posts |
I just remembered that Ben Avison said years ago on a Drobe article comments thread that:
So, ROOL, if you’re there, is there any chance of that surfacing? And nemo, is your !IntChars available anywhere? |
Matthew Phillips (473) 719 posts |
(Sorry, don’t know how to do a bq inside a bq. Any hints?) I wrote:
nemo replied:
I’m not familiar with !IntChars, and a quick search didn’t turn it up. Is it available anywhere? W.P. Blatchley writes:
I was suggesting UCS-4 because it fits neatly into a word. A naive application isn’t going to know about the contents of the buffer at offset +32 so the situation doesn’t arise. I believe (but could be wrong) that NetSurf does not use UTF-8 internally but one of the other Unicode forms. Key pressed events are not very frequent, and the overhead of conversion would not be great.
Yes, something along these lines would be required if this extension was to be adopted as standard. It could also deal with passing on the DeepKeys modifier info in +28. Turning to nemo’s next reply:
I think this shows the different perspectives we are coming from. You are talking primarily about publishing systems for printed or visual output. I am talking about meaning. If you are storing information in a field in a database, you don’t get the opportunity to change font or encoding part way through, or even to pick a font at all generally. If you are entering a search term in a form on the web, or searching a dictionary, you don’t get to pick a font. What would be the point of saying on the web that you’re typing in Colonna, or Jacinth, or OpusIII.Chords when the computer at the other end could easily know nothing about the font? That’s why Unicode is used, to carry the meaning and there are plenty of computer applications which are only interested in meaning. On the web, style is applied separately, and often with a range of possible fonts specified to allow for what’s not available.
With Unicode Taurus is U+2649. You can do a lot with that. Selwyn and Sidney are covered too, as Unicode has encoding for Selwyn’s dingbats, though obviously Unicode doesn’t cover all the symbols out there.
I have to plead ignorance and ask why?
There is if we have Wimp_ProcessKeyWide as suggested by W.P. Blatchley.
That’s OK.
Wrong. NetSurf is UTF8 aware, but unless you switch the whole machine’s alphabet to UTF8 you cannot enter exotic characters into forms for submission to a web site. Essentially I’m trying to suggest a way of breaking the impasse where no-one will realistically want to use UTF8 keyboard input because all the unaware applications will do daft things with it. Whenever I think about writing a UTF-8 capable application, I am put off by the thought that no-one could enter data into it.
Having said all that, your suggestion does make a lot of sense, providing Unicode is included wherever possible. It avoids backwards-compatibility issues. We just need a standard way of doing it, so that Chars, XChars, KeyMap, IntChars and so on can generate the messages (and if anyone devises a proper IME then that can do them too). And this is why after devising and agreeing the protocol, doing work on !Chars to implement it would be a very good start, as everyone would have access to one application on the input side of the fence. I’d imagine that !Chars would first send a User_MessageRecorded of the new type, with full information, and if no application picked up on it, would then use Wimp_ProcessKey as before. But would there be a problem with other applications getting hold of the new message if the application with input focus did not understand it, and possibly preventing the older app with input focus getting a key press event? I would not be interested in glyph ID or font for the types of applications I have in mind, but I can see it would be very nice to be able to use XChars, say, select a symbol font, click, and have that character added to my word-processor document without having to faff around changing fonts in that. If well-designed, this could be very powerful and would really suit the RISC OS way of applications helping each other. Back to W.P.B.:
It would be interesting to think this over. For many applications it would be quite possible to find the position of the caret without any co-operation from the application and thus be able to show the character being built/chosen in the right place on screen. And then use whatever standard we adopt to transmit the final character to the application. For some applications there may be a need to tell the IME whereabouts on screen the character would be appearing, but it’s not going to change quickly, and is well within the capabilities of Wimp messages, I’d have thought. But you could have a module task for the IME and have a SWI which the application would call to indicate position on screen and the size of the current font, plus things like hints for suitable background colour. Can we have a clear proposal/specification for your glyph/font/encoding/Unicode message, nemo? |
Ben Avison (25) 445 posts |
I’ve had a dig, but I can’t find the last version I remember, where there was a separate menu to select the encoding rather than the font. I did find an earlier version that at least lets you enter all the characters in the current alphabet – but it’s been squashed, so it’s not trivial to just apply the changes back to the published sources. If anyone fancies a go at merging the changes, I’ll happily send them a copy. The whole program is only a few screenfuls of code, so it’s not an impossible task, but I don’t have time to do it myself. For those thinking about IMEs, you might like to know that Acorn and Pace did commission a number of IME ports (Japan, Korea and Taiwan) which were used on various NC and STB designs. There’s a well-defined interface, including Wimp messages, extensions to the Territory module and so on. But because the core of the IME code is a port of commercial software, we have been unable to release them. |
Matthew Phillips (473) 719 posts |
Do you mean you can’t release the specification of the messages and extensions, or that you can’t release the code? |
Matthew Phillips (473) 719 posts |
I guess the new message would only be sent to the window where the caret is, and if not acknowledged then an ordinary key press would be generated for all applications (or not if there were no appropriate key to generate). |
Chris (121) 472 posts |
I’m working on Chars right now, and it would be good to take a look at that. I’ve sent an email with my contact details. |
Jess Hampshire (158) 865 posts |
Would it be possible to add enter delete and cursor keys so that it can be used when the keyboard has failed? |
nemo (145) 2529 posts |
An on-screen keyboard is another thing again. I’ll add !IntChars to the list of things I will release. Along with !Encodings (which goes in Resources). |
Trevor Johnson (329) 1645 posts |
|
W P Blatchley (147) 247 posts |
Quite. I was really thinking about “meaning” above. Obviously Unicode /could/ be used to cater for all situations. Symbol fonts like dingbats probably have valid Unicode code points for a number of their glyphs, and, as nemo said before, the Private Use Area could be used for the multiple-ampersand example. But the Postcript encoding method seems much preferable, especially for the latter. I think it’s clear that we need to cater for both (for want of better terms) a meaning-centric encoding, for which Unicode is the obvious choice, and a representation-centric encoding, for which it sounds like the PostScript encodings nemo has described are a good choice. I confess to knowing nothing about them (I will read up), but I like the idea of a /glyph.alt1, /glyph.alt2 syntax – that’s neat. This seems like a good way of handling symbol fonts in particular. The latter, representation-centric encoding, seems to me to fit in better with RISC OS’s existing way of handling fonts. But we desperately need Unicode representation too. Maybe the existing, albeit clunky way of sending Unicode as separated bytes of UTF-8 in the normal KeyPressed events is sufficient and we shouldn’t mess with it. That’s probably fine for genuine keyboard entry. After all, even most East Asians type in the latin alphabet and use IMEs to handle conversions, so the need to send the full range of Unicode characters via KeyPressed events is arguably small. Then we need a separate concept of “I want to insert characters at the caret in program X”. Formerly this has been done by faking keypresses, but I think it would be great to make a formally specified API (Wimp Message, as outlined above) specifically for this purpose. And, as I said above, being able to send characters, as opposed to just a single character at once could be very handy at times. |
W P Blatchley (147) 247 posts |
A RISC OS IME is something I’ve given a lot of thought to over the years. I really think that a peek-and-poke approach, where the IME does stuff without the app’s knowledge is doomed to failure. You could get it to work to some extent as you describe, Matthew, but it wouldn’t look very nice. Say you were typing in the middle of a block of existing text. Would the IME’s “conversion-in-progress” just be plotted over the top of the text that follows the caret? Ideally not. The editor would react to each change in the IME’s “conversion string”, shuffling and reflowing its text as necessary. I also think you really want the IME to have control over rendering of its “conversion string” within the app’s work area. That requires the app’s cooperation. In other words, the IME would use the Nested Wimp to take charge of a small, and frequently changing area within the app’s work area. This would make the appearance of IME conversions consistent across all apps, and alleviate some of the work of supporting an IME by the client. Essentially “all” the client would have to do is pass on relevant (probably all) new input to the IME. The IME would tell the client how big an area of its work area it now needed to render the conversion-in-progress, and the client would react by moving any content around to grow or shrink the “hole” that the IME was using. Finally, the client would respond to an “end-of-conversion” by inserting the converted string into its text and both client and IME would at that point remove the “hole” entirely. Of course, there are many other considerations. IMEs often need context in order to operate well. In other words, when you start a new conversion in the middle of some existing text, the IME will want to know what currently comes before and after the insertion point in order to make decisions about how to process the current conversion. In a very simple, contrived, example, say you had an “IME” that was capital-letter aware. You might type “there”. If the preceeding text was ". ", the IME would want to output “There”, and not, “there”. The “context” in this case is the preceeding ". ". It’s not part of the current conversion, but it affects the current conversion’s outcome. And there’s the issue of returning metadata from the IME. Some IMEs might want, for example, to not only return the converted string, but other additional info also. For anyone who knows a bit about Japanese (anyone?), an IME might be expected to return the kana used in the conversion to be used as furigana, displayed above the converted text in a capable editor. There are surely many other possible situations where metadata might need to be passed back. As to whether everything I expect from an IME can sensibly be achieved purely with Wimp Messages, I’m not convinced, but maybe it can. At the moment, I’m of the opinion that a largely SWI-based interface, propped up with appropriate Wimp Messages, might be easier for application authors to integrate.
I would love to see this interface. Can it be made available? Were there many Wimp apps written to take advantage of it? |
Matthew Phillips (473) 719 posts |
I’m not sure what you mean by peak (peek?) and poke: I didn’t envisage the IME needing access to the application’s memory, but perhaps that’s not what you meant.
The IME arrangements I have seen staff at work using for typing Chinese typically do not seem to show the character in place in the editor until it has been composed. The Wikipedia article on input methods has an animated GIF illustrating the sort of thing I have seen. With this style of input, the IME would use its own window so less co-operation would be required. Indeed, you wouldn’t really want the IME to be insisting on a child window nested within a parent window because in a dialogue box situation, typing into an ordinary Wimp icon, there might not be enough room within the dialogue box to fit a child window, especially as different IMEs would have different space requirements.
At present we have keyboard drivers that allow you to type Alt-: then “u” to get “ü”. The application only gets told about the finished character. At the basic level an IME could work like this to take the hassle off the hands of the application developer. I think there should be several (or at least two) levels of co-operation. If you demand the highest standards with no fall-back, then there will be very little take-up from application writers. Most authors, even if they make the effort to support more than just Latin-1, will not want to go to too much effort rewriting all their input handling code. So for the basic level of support I reckon that with the IME in operation, when you start typing keys that would go to “compose” a character, the IME would intercept them and the application would not see them at all. The IME would send a Wimp message to the window of the task owning the caret saying “I’m composing a character”. The application could do one of three things:
In the case of (1) the IME would just discover the caret position itself, from Wimp_GetCaretPosition, and position its window accordingly, but it would be unable to match any styling as to font or size. In the case of (2) it would do likewise but attempt to match the style. Obviously if the character being composed were not available in that font it would be using a different font anyway, but at least the size would be sensible. You’ll have to say what would happen in the case of (3)! I imagine in this situation the IME will give a running commentary on the partial input in order to do all the clever things you’re thinking of, but the application probably still does not need to receive key press events. Once the IME has finished composing the character it would send a Wimp message to the task giving the Unicode representation of the character, the font name, encoding and code-point (to satisfy nemo’s scenarios) and anything else that might be relevant from the above discussions. This last message could also be generated by Chars, XChars, IntChars etc., allowing users to select from a simple utility application rather than use a full-blown IME if they’re only doing small-scale things. If this Wimp message at the end of the process is not acknowledged, then the IME or character application would see whether the character could be represented in the current alphabet, and if so, would use Wimp_ProcessKey to send it to the relevant application. We would thus have four levels of compliance:
Ideally there would also need to be support from the Wimp, to allow applications to create icons using UTF-8 even if the current OS alphabet was only Latin-1. The Wimp would also need to respond to the IME with at least the 2nd level of compliance. The beauty of this scheme, of course, is that all we need agreement on initially is what the “character composed and ready” message at the end of the process needs to be like. Then !Chars can be updated and folk can start supporting the protocol at the basic level. Once someone starts writing an IME, the 3rd level of compliance can be tested and developed. And once someone has done the hard thinking, the 4th level of compliance can be specified and designed and then, possibly, implemented by the few dedicated developers who are willing to do the work! There would need to be some standards as to how an IME should behave: for example pressing Escape to break out of character composition, standards for timing or keypresses to terminate the composition, whether cursor keys can be used to select from an auto-completion list, which keypresses must be passed on to the application, etc. It would be good if Ben can tell us if this scheme would be at all compatible with the work Acorn did, or if he can offer any better suggestions for the design of such a protocol. |
W P Blatchley (147) 247 posts |
Sorry, serves me right for not re-reading my post! Yes, I meant peek-and-poke (now edited in original post), by which I meant peeking at the character input of the app and poking something different to it once the IME had processed it.
There are more integrated methods of typing pinyin for Chinese, and the more recent MS Japanese IMEs (by which I mean, the last 10 years or so!) allow for composition-in-situ. The method illustrated on the Wikipedia page you referenced is okay for entering the odd word or phrase, but for serious typing, it’s very laborious. But I agree with you about not over-complicating things for app writers, and to that end I was also considering a “level of compliance” structure, whereby app writers could choose how integrated they wanted the IME to be.
Wimp icon code would ideally be updated to include IME-awareness at a more basic level, so the code could work with the IME, rather than the IME being bolted on to it. The sticky issue for me here is that if you had a Wimp icon within a menu, and the IME also needed to open a menu for the user to make a selection from, you’d be in trouble! I’ve also been thinking about a command line-only IME (no GUI), whereby composition selections aren’t displayed in a menu, but can just be stepped through using the arrow keys or suchlike. This is one solution to the menu-in-a-menu problem described above, but it’s not very nice. I’d also write a drop-in replacement for the Toolbox’s textbox gadget that had IME-awareness built in. That would make it very easy for toolbox apps to make great use of the IME.
As I said above, I think this is a good idea. I was already thinking of basically the same thing, but hadn’t considered positioning the composition window at the point of the caret in the case of (1) – that’s a nice idea. I’m also coming round to the idea of the IME intercepting key presses so the client app never sees them. I suppose it would make sense for the IME to genuinely take input focus away from the client so it had caret control in this case as well.
In the case of (3), perhaps a messaging protocol could work: i) “I’m composing” message sent from IME to client. No more input reaches the client directly. Client responds to the message with information about its caret position, font style and possibly some contextual information about what comes before and after the conversion point. The extent of the required context would probably be indicated by the IME’s current mode. ii) For each change (in size) of the displayed conversion, the IME sends a message to the client telling it how much space it needs. The client responds with a list of rectangles that fulfil that requirement, and shuffles its content to make that space available. It has to be a list of rectangles, as the composition could run over the end of a line, or around an irregularly shaped embedded object. Most editors should be able to calculate these rectangles fairly easily, as the result is just the same as inserting a dummy string of spaces into their text. iii) When the conversion is complete, the IME sends a message like the one we’ve been talking about above for !Chars to “insert this text at your caret”. Child windows are closed and focus is given back to the client. There might also need to be messages from the client to the IME to say, for example, the conversion needs to be forcibly ended, or the surrounding text is reflowing, and the IME needs to reformat also.
Absolutely. This would need careful specification. But I think you’ve touched on the most important issues already. One other thing that springs to mind is that IMEs often make use of the function keys to offer keyboard shortcuts to common functions. So apps might have to accept that during composition, their own hotkeys might not do anything. Whether the IME should pass on hotkeys that aren’t useful to itself in its current mode, or whether all hotkeys should be blocked during composition is up for debate, really. |
Ben Avison (25) 445 posts |
Your wish is my command… I’ve done some digging, and I believe two of the components are OK for public release: InternationalIME which acts as a central dispatcher for IMEs, and KoreaIME which is described by one of the authors as a “basic Korean IME that just handled syllable composition (pretty trivial), as part of the proof of concept”. He also says:
Included in the sources is a specification.
The main application which supported it was Fresco, since at the time most RISC OS NC and STB user interfaces were web-based. There were rumours that EasiWriter/TechWriter supported it too, but Martin Wuerthner says it’s not in the sources he has, so that may be a myth. |
W P Blatchley (147) 247 posts |
Ben, thanks a lot for looking out that IME stuff. I’ve actually seen that spec. before many years ago. It’s a bit thin on the ground for a full Wimp IME, as the author acknowledges, but it could form a good starting point. That said, I’m a little sceptical about making the IME a shared “device” with a mutex, since I think it would be nicer to use if each app or even each document had its own IME context. I’ve never seen the sources to InternationalIME before, nor of the KoreaIME, so thanks also for making them available. I need to go away and mull over a few details now! |
Ben Avison (25) 445 posts |
I haven’t really been keeping up with this discussion properly due to other pressures on my time, but I’d like to make a few points. I’ve also had an email from Chris about !Chars, which I’m going to respond to here to give the issues a wider airing. Chris makes the point that it would be useful to let !Chars group Unicode characters by category – this is eminently sensible, and such character browsers exist on other platforms. He’s identified a file he can use to do this grouping. I think my suggestion would be that this file should be included in the !Unicode directory in $.!Boot.Resources – any objections? (Note this directory is built from the RiscOS/Sources/SystemRes/Unicode component, not RiscOS/Sources/Lib/Unicode.) In fact, I note that this file is now somewhat dated – a much newer issue is available here and the corresponding documentation is here. These should probably be used in preference, assuming no incompatible changes have been made to the file format. It also occurs to me that it would be sensible for !Chars to use the Unicode character descriptions for interactive help messages when the pointer is hovering over each character. Recent discussions about use of !Chars to insert unusual characters – especially in peculiar fonts which provide glyphs with no sensible Unicode mapping – has thrown into focus that !Chars has previously been used for two quite distinct purposes:
This distinction becomes more pronounced with Unicode in the picture – perhaps to the extent that they ought to be provided by different applications. !Chars does what it does by piggybacking on Key_Pressed events. But Key_Pressed events primarily originate from the keyboard handler, and for the keyboard handler, the values have the first of those two meanings (a semantic, not a glyph). The keyboard handler, being a low level OS component, shouldn’t have to know about which font is in use in the application – and in fact, due to buffering, it can’t know for sure which font or encoding will be in force when a given character comes to be extracted from the keyboard buffer (for example, you might have navigated to a different part of a document or opened a dialogue box in the meantime). The only sensible long-term approach is for Key_Pressed events to move to using UTF-8 universally. There are a number of side-effects of such a change – so many that I’ve written it up in a separate topic. I don’t think that !Chars can be allowed to go inserting key events in arbitrary alphabets – it should stick to the system alphabet, or we risk completely inconsistent behaviour between characters inserted via the keyboard and those inserted from !Chars. Yes, in the Unicode world, it makes perfect sense for !Chars to group characters by type. A separate application could exist which lets you browse all the glyphs in a specific font and select one, but it should use an entirely separate mechanism to convey this information to an application – one which includes the font name, weight and style, and a glyph number (not character code point), maybe also character size. I might even go as far as to say that !Chars perhaps shouldn’t give you a choice of font, to emphasise the difference. Or at most, it might let you see a list of which fonts define a glyph for each specific character, rather than trying to represent entire character sets for a specific font. So, in future, if you wanted to insert one of the standard dingbats into a document using !Chars, you wouldn’t select Selwyn and choose a character from that character set. You’d go to the “Crosses”, “Stars”, “Ornamental punctuation” or whichever Unicode character category you wanted, and choose the Unicode code point for that dingbat, without necessarily caring about which fonts might or might not have a glyph for that code point. Then, if you’re using an application like NetSurf which uses the RUFL library, you may already find that font substitution is performed, even is your current font doesn’t define that glyph. It was always planned that the Unicode Font Manager would eventually do such substitutions itself; if and when that feature is completed, this would become an automatic feature of all applications. |