ununited codes...
Peter Scheele (2290) 178 posts |
The program I’m writing (BBC Basic, Wimp) fetches data from internet, geographical names. In the data (html, javascript etc) codings are used. My program shows that data in a window. As an example: and: AbÅ« Rudays My program shows it as: Instead of: What should I do to display the data properly? |
David Feugey (2125) 2709 posts |
Hum. You need an home made UTF converter. |
Colin (478) 2433 posts |
If you are drawing your own window you could use Font_Paint with a UTF8 encoding. It would need a font with the necessary glyphs though. |
Steffen Huber (91) 1949 posts |
Welcome to the wonderful world of encodings. In RISC OS world, I would recommend having a look at the two subprojects of NetSurf, iconv and rufl. |
Rick Murray (539) 13806 posts |
For the accented codes that are part of the standard character set, a small lookup table should suffice. For other characters, things are now complicated (see previous posts). |
Peter Scheele (2290) 178 posts |
Your clues made me do this: Vietnam: Nam Ä‘á»⇑nh Egypt: Å¢ÄŴbÄŴ France: So, it is more than accented codes and in most cases the transcription is all right. It shows that the UTF-code is transcripted in RISCOS printable characters. Which part of NetSurf does this transcription? Iconv or RUfl. As I can’t find RUfl in my system, it must be Iconv. That leaves the questions: |
Steffen Huber (91) 1949 posts |
Use rufl if you want to paint UTF-8 fonts with the Font Manager. Use iconv to handle the encoding conversion “in memory”. The iconv download contains stubs/headers for C. As usual, that’s all the doc you get as a C developer :-) |
Frank de Bruijn (160) 228 posts |
More like the other way around. StrongED shows the file as it is and NetSurf interprets it to create the correct characters. You may be able to use the iconv module to convert the file your program retrieves. Try *help iconv (with the module loaded, of course) for some info on how to do that. |
Peter Scheele (2290) 178 posts |
I found some additional information about the params on gnu.org There seem to be hundreds of codings. What kind of coding is: Å¢ÄŴbÄŴ (Taba) (I never felt so blank about a subject) |
Peter Scheele (2290) 178 posts |
@ David Feugey: A tool as MSR could help. What is a(n) MSR tool? |
Frank de Bruijn (160) 228 posts |
It looks like UTF-8 to me. As for RISC OS, if you do iconv -l in a taskwindow, the first item in the output is X-ACORN-LATIN1, so that’s probably worth a try. If that doesn’t look good enough, try ISO-8859-1 or ISO-8859-15. |
Colin (478) 2433 posts |
UTF8
If you want to see it correctly in edit you can’t. If you want to display it in your own program where you are drawing a window then you can print to the screen with Font_Paint and add \EUTF8 to the font name when opening it. If you are using C the library mentioned earlier is better than font paint as it handles unknown characters better. |
Peter Scheele (2290) 178 posts |
I’m afraid, Colin, it’s not a matter of drawing the text in a window. I have a window with ten buttons in it, each button has a [geographical] name. They are put in with Get_ and SetIconState. |
Chris Hall (132) 3554 posts |
Perhaps you could Font_Paint the text to a sprite in a user sprite area and then plot that sprite as the button? |
Colin (478) 2433 posts |
Ok. You can define the button to use any outline font and encoding – see wimp_createicon. Don’t know if the template editor will allow you do define it there – it’s a while since I’ve used a template editor – so you may have to modify the template in your program. |
Colin (478) 2433 posts |
I should have added that if you are using the toolbox then ActionButton_SetFont will do what you want. Just add \EUTF8 after the font name. |
Fred Graute (114) 645 posts |
Don’t think so, at least not with a UTF8 encoding. The following should work though. Set the icon to use an outline font in the template editor. At runtime open a font with UTF8 encoding using Font_FindFont. Set the icon’s font handle using Wimp_SetIconState to the handle returned by Font_FindFont. |
David Feugey (2125) 2709 posts |
MSR was a fast (ASM) universal text conversion tool made by Augustin Vidovic. |
Rick Murray (539) 13806 posts |
Not sure why the Japanese penguin on Augustin’s site is carrying the old imperial flag, but no matter. This. Wins. Completely. (^_^) |
Peter Scheele (2290) 178 posts |
@Rick: haha, copycat. |
Peter Scheele (2290) 178 posts |
@ Fred: Set the icon to use an outline font in the template editor. At runtime open a font with UTF8 encoding using Font_FindFont. Set the icon’s font handle using Wimp_SetIconState to the handle returned by Font_FindFont. I did as you described, the text is in the font chosen, but no transcription, alas. I’ll keep on searching… |
Colin (478) 2433 posts |
It works in the toolbox so it should work using wimp functions. However there is a bug/feature which makes it unusable. Certain bytes are changed to the WimpSymbol font so the € (&80) in Al ‗ArÄ«sh displays as a tick instead of being seen as a part of a unicode character. So unless someone knows how to disable WimpSymbol font mapping on a per icon basis you are going to have to draw your own buttons and use Font_Paint. |
Peter Scheele (2290) 178 posts |
@ Colin As far as I can see, this happens: when RISCOS meets a UTF-character like ê (C3 AA), it is able to convert it to ê because ê is in the character set of RISC OS. But: why can NetSurf display them properly? And how can I? (To show that it really matters to me: 95 % of my program was ready and worked well. As a next 3 % I added a way of searching any country or place in the world. And it works fine as well. The last 2 % is to show the data as it should be. I’m very motivated! :-)) BTW: It’s also the hyper correct way the BBC presents the data. The first button: Å¢ÄŴbÄŴ, Egypt The second: Taba International Airport, Egypt |
Colin (478) 2433 posts |
Using a Homerton.medium\EUTF8 font Ţābā is printed out correctly. You can’t be doing it properly. |
Peter Scheele (2290) 178 posts |
Please tell me how you do it. What I did is this: text$ is displayed in the correct font, but no conversion took place. |