Safeguarding the past, present and future of RISC OS for everyone

News | Downloads | Bugs | Bounties | Forums | Library

Forums → Community Support →

Unicode, alphabets, etc.

9 posts, 5 voices

Feb 2, 2017 2:14pm Jeffrey Lee (213) 6048 posts	I’m looking at implementing support for non-ASCII characters in VNCServ. Dealing with different alphabets and character encodings in RISC OS isn’t really something I’m familiar with, so I figured that asking a few questions here might be a good way of getting the answers I need. Possibly some of these questions are answered in the PRMs, but I haven’t had a chance to check them yet. In any case, this post might serve as a useful reference for anyone else who is taking their first look at character encoding support. With UTF-8 alphabet, how is the keyboard buffer handled? I’m assuming you’d want to insert a UTF-8 byte sequence for each character, with the usual extra logic that any byte outside 1-127 must be prefixed with a null byte, but I haven’t had a chance to test that yet What’s the preferred way of converting from Unicode to the system alphabet? (internally VNC uses X11 keysyms, which are essentially a superset of Unicode, so I expect I’ll be doing a X11 → Unicode → alphabet conversion) AFAIK there isn’t anything built in to the OS to handle this The first third-party solution that came to mind was Iconv I haven’t looked close enough to know whether it supports all the alphabets RISC OS does (bearing in mind that the RISC OS alphabets aren’t an exact match for the underlying ISO standards due to e.g. the extra symbols the Wimp uses) Also I’m not sure if there’s a convenient way to convert RISC OS alphabet numbers to encoding names for use with Iconv (having your own hardcoded list seems like a bad idea) Related to the above, what uses the !Unicode resources that are part of the standard disc image? I know Iconv uses them, but that’s a third-party thing, so I’m still left wondering what in the OS uses them. A search through the ROM didn’t reveal anything obvious, but I haven’t checked the disc image yet.

Feb 2, 2017 8:42pm Sprow (202) 1158 posts	Related to the above, what uses the !Unicode resources that are part of the standard disc image? I think the only thing in (current) ROMs using !Unicode is the more recent versions of Chars. A quick grep of the sources show !Browse and the Korean IME do too, neither of which get used anywhere. The log messages for !Unicode have several mentions of set top box model numbers so most likely there are closed components for STBs of years gone by. What’s the preferred way of converting from Unicode to the system alphabet? UnicodeLib has some promising looking conversion functions, but I think the last time I tried to do what you’re doing it turned out some vital step was missing. If you can assume you only need to support RISC OS 5 then Service_International 8 will give you a 256 entry lookup table for a given alphabet. If you want to support other OS versions you’ll need to carry round your own copy, or at least a Latin1 fallback.

Feb 2, 2017 11:05pm Jeffrey Lee (213) 6048 posts	Thanks for the tips. A bit of experimentation with Iconv suggests that: Only the x-acorn encodings (x-acorn-latin1 and x-acorn-fuzzy) appear to match Service_International 8’s idea of the Latin1 alphabet If you use Service_International 3 to get an alphabet name then you get the “simple” name (e.g. Latin2) rather than the name of the corresponding standard Although iconv recognises a few of the “simple” names it doesn’t recognise all of them (and I haven’t checked to make sure the simple names are synonyms for the expected standard) But the main issue is that, as mentioned in the first point, only the x-acorn encodings match up with what Service_International 8 says And all of the above is totally fair, considering that Iconv is meant to work with standards and the RISC OS alphabets aren’t a 1:1 match for any of the standards. So I’ll probably rely on Service_International 8 plus a hardcoded Latin1 fallback. However Service_International 8 isn’t perfect either, since it doesn’t include the Wimp symbols. I can kind of understand why (it’s the Wimp that defines them, and the RISC OS 3 PRMs warn that their presence shouldn’t be relied upon), but surely the fact that apps are using them for their menus (and are causing trouble with UTF-8 alphabet) means that we should have some official way of converting them. Have we actually decided on which Unicode code points should be used for the Wimp symbols?

Feb 3, 2017 9:32am Sprow (202) 1158 posts	If you use Service_International 3 to get an alphabet name then you get the “simple” name (e.g. Latin2) rather than the name of the corresponding standard I guess it’s trying to map from number to the name you would need with a *Alphabet command, with the possibility of extra modules being added that respond with even more alphabets. In the strictest of senses since Latin1 is a superset of ISO8859-1 it would be a lie to return ISO8859-1 as the answer. When I sent in Appendix G of the User Guide with all the updated character sets I also dug up the standard numbers for the respective alphabets. I doubt we’ll ever add any new alphabets now UTF-8 exists. Have we actually decided on which Unicode code points should be used for the Wimp symbols? Yes – there’s a table in the Style Guide on page 98 in the section “Unicode support”. I think the table just comes from the Wimp sources (wimpsymbols_UTF8 and wimpsymbols_UCS4 in Wimp04.s).

Feb 3, 2017 1:42pm Jeffrey Lee (213) 6048 posts	With UTF-8 alphabet, how is the keyboard buffer handled? I’m assuming you’d want to insert a UTF-8 byte sequence for each character, with the usual extra logic that any byte outside 1-127 must be prefixed with a null byte, but I haven’t had a chance to test that yet Some poking around in the InternationalKeyboard sources suggests that the answer is “yes”.

Feb 3, 2017 6:59pm Chris Mahoney (1684) 2165 posts	(Ignore this – it was completely wrong)

Feb 3, 2017 8:18pm Rick Murray (539) 13840 posts	Filer would no longer show Japanese filenames […] Ignore this Okay, I’ll ignore it – but Japanese filenames in the Filer, huh? I’m guessing you are in Alphabet = UTF8?

Feb 3, 2017 9:06pm Chris Mahoney (1684) 2165 posts	Sneaky timing there! :) Yep, UTF8 alphabet with IPAex font.

Feb 4, 2017 9:26am Michael Drake (88) 336 posts	John-Mark’s Drobe article might help: http://www.drobe.co.uk/article.php?id=1319&hlt=unicode

Reply

To post replies, please first log in.

Forums → Community Support →

Search forums

Social

Follow us on

and

ROOL Store

Buy RISC OS Open merchandise here, including SD cards for Raspberry Pi and more.

Donate! Why?

Help ROOL make things happen – please consider donating!

RISC OS IPR

RISC OS is an Open Source operating system owned by RISC OS Developments Ltd and licensed primarily under the Apache 2.0 license.

Description

Community-provided support for all users of RISC OS.

Voices

Options

Forums
Login

Contact Us | About Us

The RISC OS Open Beast theme is based on Beast's default layout
Site design © RISC OS Open Limited 2024 except where indicated

Hosted by Arachsys

Powered by Beast © 2006 Josh Goebel and Rick Olson
This site runs on Rails