UTF8 problem.
Sergey Lentsov (8268) 63 posts |
Hello, I working on the chat program and it needs to send the messages on server in UTF8 charset to correctly show messages in any language. So I have a several questions: 1. What are best way to handle the UTF8? 2. Some national characters seems missing in the GUI fonts (at least Cyrillic). It missing both in UTF8 alphabet and in the single-byte Cyrillic alphabet. When I tried to type Russian letters in the input text control then I not see it but it actually typed and I get it in the my app. |
Michael Drake (88) 336 posts |
There are a couple of things from the NetSurf project, which may be useful: |
Jeffrey Lee (213) 6048 posts |
I think almost everyone is using a national Alphabet. There’s still a lot of work that needs doing before the OS fully supports UTF-8 (list here) If you need to manually convert text between UTF-8 and the system alphabet, then you can call Service_International 8 to get a lookup table for mapping between the two. |
Rick Murray (539) 13840 posts |
There’s no automatic fallback. It’s either UTF-8 or LatinX. Because of this, I can’t imagine there are many using UTF in preference to a national character set. OS_Byte 71 will let you read the current alphabet, to see if the machine is in UTF or if you need to convert yourself. https://www.riscosopen.org/wiki/documentation/show/OS_Byte%2071
Missing from the list – have FontManager treat illegal sequences as Latin1; this is probably the (second?) biggest barrier to changing, as there are many applications that “assume” the eight bit encoding that, outside of plain English (French, German, sexed quotes, etc) simply go wrong under UTF-8. |
Andrew Rawnsley (492) 1445 posts |
The iconv module probably also needs a mention. I haven’t used it personally, but I believe it was a great help when we added UTF8 display to Messenger Pro. The main problem was that we had to do a lot of conversions back and forth, as we support both non-OS5 (ie. no UTF8 support) and OS5 display paths, and have to accommodate everything on the fly. I think it still may not be 100% perfect in all cases, but is mostly sorted now. |
Matthew Phillips (473) 721 posts |
You shouldn’t be changing the alphabet from your app, as other applications will not expect that. Your problem can be split into three: a) how users type the desired characters Assuming you’ve solved (a) then you can use the iconv module or the call to OS_Service 43 (Service_International) to convert to Unicode. I would use the iconv module as it’s a bit easier if the end product needs to be UTF-8. The conversion table from Service_International requires the extra step of converting from UCS-32 to UTF-8. For the display, c), assuming you are receiving the text in UTF-8, you can use RUfl from the NetSurf project to render it. This can be including easily if you are developing in C. You can see RUfl in action in NetSurf. I used it in the application Nominatim Step (a) is the tricky part, but until we get better keyboard and character set support we have to assume that users already have the alphabet set to allow them to type most of what they need. We have also used RUfl in RiscOSM where the map data is all coded in UTF-8. The software is capable of creating UTF-8 encoded text in Draw files to render maps with non-western characters in. What we have not implemented in RiscOSM’s maps, however, is the font substitution magic provided by RUfl. With RUfl you can choose a font (for example, Trinity) and render text. If Trinity does not have all the required glyphs, RUfl will find them from other fonts. That’s quite easy to use when simply rendering to the screen, as Nominatim does. If you want to build a Draw file, and need to store the font changes in memory, RUfl does provide callbacks to allow this, but we have not got our heads round it yet! In practice this means that with the default map style sheets using the ROM fonts, you are limited to maps using the extended roman alphabet. We also try to avoid using UTF-8 encoding in the maps if possible, because some other popular software, most notably ArtWorks, cannot handle it. |
Matthew Phillips (473) 721 posts |
By the way, RUfl does not handle automatically right to left scripts like Arabic and Hebrew. |
Sergey Lentsov (8268) 63 posts |
Thanks to all for the answers. I implemented conversion from system Alphabet to UTF8 by iconv and function from NetSurf netsurf/frontends/riscos/ucstables.c patched for my case. My problem solved now. |