Safeguarding the past, present and future of RISC OS for everyone

News | Downloads | Bugs | Bounties | Forums | Library

Forums → Announcements →

Tamarc 1.01 released

34 posts, 8 voices

Pages: 1 2

Jun 4, 2018 2:05pm Steffen Huber (91) 1953 posts	Um, all of it? :-/ The font manager is competent to paint some of the necessary glyphs. So a subset of UTF-8 support is certainly possible with the current font manager, including the necessary backwards compatibility for non-UTF-8-aware software (i.e. nearly all of the currently available RISC OS software). So e.g. UTF-8 support for all left-to-right languages without the need for character composition, and handling various of its shortcomings in code around the current fontmanager. Even that would be a massive step forward compared to what we have now. What would be get if we replace current font manager with something like FreeType2 and keep the existing API? Direct support for other font formats, but apart from that? It is a complicated subject, and I am just trying to get a better overall understanding. I am working on an industry which is still fighting with Unicode, because their big printing lines (and processes that prepare the printing streams before those) usually only understand a seemingly daily changing subset of it, and still they produce a lot of (mostly correct) black on paper. So it seems sensible to work in small steps towards the great goal, which might ultimately never be reached. If I understand Nemo correctly, no OS on this world has reached the great goal. And might never reach it, because Unicode itself is a moving target.

Jun 4, 2018 3:34pm nemo (145) 2556 posts	Nemo, could you explain which parts of “Unicode support” would need a font manager replacement Anything outside Little England (and by extension, little Europe). I mentioned in passing to Rick the ‘vertical Japanese’ problem, but it’s indicative: Japanese can be set horizontally or vertically. When set vertically the kanji, hiragana and katakana (and any Latin embedded in there too) are all still the usual way around, but some punctuation changes. So if you write ‘me (myself)’ – 私（自分）– vertically, the kanji stay the same way around, but the parentheses rotate. They do NOT have a different Unicode codepoint, they are the same character, but they are represented by a different glyph. Arabic letters have four different shapes depending where in the word they are placed. Even emoji come in multiple versions depending on the codepoints that follow them. is FreeType2 advanced enough for “all Unicode support” you are talking about? It’s a pretty good basis, and the only one that doesn’t cost lots of money. Is it feasible to add the Acorn font format to it Yes, and I have done so, but that’s not the right thing to do for RISC OS. would it be better to convert the Acorn fonts into e.g. Type1? I’m not sure what you think the Acorn fonts have that makes them worth converting, when there are vastly more capable fonts out there for free. Type1 would not be the right format anyway. OpenType is the only sensible choice. Does the current FontManager API “fit” to the way FreeType2 works? The existing API must certainly continue to work, but will have to be extended. The FontManager must be split into two parts – the API, font management, caching and painting; and various format-supporting back-ends: Acorn for backwards compatibility, OpenType for competence. German split-/hyphenation rules are a nightmare… Indeed, and line breaking is an application function, not a font function – that’s a decision about what text to paint (though part of that decision making is based on the font metrics), whereas the font decides what that text should look like (ie which glyphs should be used). The font manager is competent to paint some of the necessary glyphs. Uggh, no. If so, only by accident. If you paint ‘`a[0300]`’ (where […] denotes a Unicode codepoint) do you expect the FontManager to display the ‘à’ glyph? (Acorn fonts do have that precomposed glyph but it can’t find it). But what if the string is ‘`a[0300][0301]`’ – do you expect ‘à́’ an a with both a grave and an acute on it? You won’t get it in RISC OS. It’s not because ‘the font doesn’t have the accents’, it’s because the FontManager is not capable of glyph composition, and the Acorn font format cannot provide the necessary information. And we haven’t even left ‘little Europe’ yet. So a subset of UTF-8 support is certainly possible with the current font manager If by ‘subset’ you mean the equivalent of ‘it can do lowercase, who needs uppercase’ then fine, it can do a subset. But really, no it can’t. Direct support for other font formats, but apart from that? The crucial requirement is OpenType. OpenType fonts support the GSUB and GPOS tables that implement the various script features that Unicode rendering depends upon. It’s not sufficient but it is necessary. I’d recommend looking up the GSUB table in the OpenType documentation, but to give an idea of what it can do, here is Microsoft’s list of Features that Windows supports. I am working on an industry which is still fighting with Unicode, because their big printing lines (and processes that prepare the printing streams before those) usually only understand a seemingly daily changing subset of it, and still they produce a lot of (mostly correct) black on paper. As I’m probably responsible for the code that achieves that, I can only apologise for when it goes wrong. (Unless you’re running an Adobe or Artifex Rip, in which case, ha!). However, by far the biggest problem with (for example) PDFs is wonky non-compliant document creation, and highly broken fonts, not so much problems in the software. The PDF committee has fought with the PDF text/font spec for years, and I was invited to pass my judgement as an expert witness, but the PDF spec now says something along the lines of “you must do this, but if you don’t do that you must do this, but if you don’t do that then do this, and if you don’t do that you should certainly do this. Some applications do something else.” – sorry, blame Acrobat (on the whole). If I understand Nemo correctly, no OS on this world has reached the great goal. No I certainly haven’t said that. Some scripts are hugely more complex than others (we have it so easy in Europe, even with weird hyphenation rules). In Japanese you can’t tell where a word ends without a dictionary. In Nasta’līq you start writing a word above the baseline, angle down until the end of the word, while making sure that the ends of the words all line up, and none of the words clash with the words before and after it that are slighly above and below it. Then there’s Devanagari because Unicode itself is a moving target. Unicode marches on, inventing new stuff and sticking it in like a world-consuming game of Katamari Damacy (with little civil wars going on all over it, especially over emoji!). However, it publishes versions, so any particular version is a stationary target. Android, iOS etc are right at the bleeding edge of support. But don’t underestimate the amount of complexity. And don’t for a moment think that what RISC OS has can do any of it.

Jun 4, 2018 3:49pm Steffen Huber (91) 1953 posts	I am working on an industry which is still fighting with Unicode, because their big printing lines (and processes that prepare the printing streams before those) usually only understand a seemingly daily changing subset of it, and still they produce a lot of (mostly correct) black on paper. As I’m probably responsible for the code that achieves that, I can only apologise for when it goes wrong. (Unless you’re running an Adobe or Artifex Rip, in which case, ha!). I had the impression that you were based on PostScript and PDF territory. This industry is firmly based on IBM/AFP/MODCA territory. You know, where AFP raster fonts are still used and people using AFP outline fonts are considering themselves bleeding edge. With various interesting conversions using equally interesting conversion software going on if PostScript or PDF or various bitmap formats are used as part of input creation. Some of them not even capable of converting coloured input into coloured output. Thank god it is mostly the job of my colleagues to sort it out (e.g. proving that what our software delivers as AFP/MODCA or PDF or PostScript output is entirely correct according to the spec, and someone else on the long way to the printer messes it up), I only hear the stories when drinking coffee with them :-)

Jun 4, 2018 3:51pm nemo (145) 2556 posts	Glad it’s not my fault then!

Jun 4, 2018 3:56pm Steve Pampling (1551) 8172 posts	Anything outside Little England I believe they settled on labelling that as Brittany a number of centuries ago. An integral part of France (as opposed to Paris, if you talk to true French people :)) Arabic letters have four different shapes depending where in the word they are placed. Sounds similar, but possibly more extreme, to the character shapes used in English a number of centuries ago.

Jun 4, 2018 3:59pm nemo (145) 2556 posts	TL;DR: Font engine needs to support OpenType. Text shaping needs something like HarfBuzz. But daydreams aside, just producing a UTF-8 compatible text editor requires the cursor to do something sensible when you press right, delete or place the cursor with the mouse. Since characters aren’t codepoints (take the `a[0300][0301]` example from before – three codepoints, one character) every such application needs data-driven assistance to work out how to move its cursor. Splitting text prettily at the end of a line is an order of magnitude harder. Both of those functions need to agree unambiguously about terminology, behaviour and whether a codepoint is visible, a base character, a modifier or a control code. All of this stuff needs to be centralised and shared. The VDU (OS_WriteC in other words) needs to be able to display any Unicode that gets thrown at it… but in the manner of a terminal. It will NOT be doing glyph composition, shaping or reordering. How could one expect arbitrary accent composition in an 8×8 matrix pixel glyph?! It is just about feasible to implement rotating Japanese punctuation when the VDU is configured for vertical text, but that would be special casing.

Jun 4, 2018 4:00pm Clive Semmens (2335) 3276 posts	Then there’s Devanagari And just for a change, that’s one I actually know about! :-) & hacked a couple of 224 glyph fonts & Impression to make quite a decent stab at it, way back in the 90s – quite as good as Indian hot metal printers – and typed from the keyboard using the conventional Hindi keyboard layout (not the IBM one in common use in India more recently, that hobbles the users completely). Not, of course, with any of the glyphs in the “right places” in the fonts, and nothing pre-composed in the fonts. You could type normally, and something roughly recognizable would appear in Impression, then you had to save it from Impression as DDF onto a little app that put in all the kerning to put all the diacritics in the right places, and returned it to Impression looking nice. What a horrible hack. But it worked.

Jun 4, 2018 7:24pm Rick Murray (539) 13851 posts	Even emoji come in multiple versions depending on the codepoints that follow them. Yeah, isn’t it something like “girl with arms in the air” followed by a modifier for blonde, brunette, ginger, grey and another modifier for Caucasian, Black, Asian…? And don’t for a moment think that what RISC OS has can do any of it. Hmm. FontManager can be instructed to plot any character you can directly specify (so long as it exists in the font) (which means no fancy combining) so long as you’re happy with a flat left to right print. Right to left might work as well, can’t say I’ve ever tried. So it’s basically “FontManager of old with a few more characters to play with”. I can finally write Tōkyō correctly (and not the horrid ô compromise). I can even, with the right font, write 東京. And, uh… nemo has given numerous examples of stuff that doesn’t work in languages I’ve never heard of that use “interesting” rules (sloping baseline?!). It’s complicated, and FontName does approximately none of it. The UTF-8 support, as I said, amounts to adding a bunch more characters to the existing design.

Jun 4, 2018 7:35pm Rick Murray (539) 13851 posts	If your browser can hack it: https://www.chenhuijing.com/zh-type/ (if you’re not reading that one down the screen and right to left, “you’re holding it wrong”) Note, however, that the English language text is rotated and read sideways. I don’t know if this is how it’s done in China, or if it’s a limitation of the rendering, however this is not how it’s done in Japan: http://heyrick.co.uk/random/otome-fami_p113.jpeg

Pages: 1 2

Reply

To post replies, please first log in.

Forums → Announcements →

Search forums

Social

Follow us on

and

ROOL Store

Buy RISC OS Open merchandise here, including SD cards for Raspberry Pi and more.

Donate! Why?

Help ROOL make things happen – please consider donating!

RISC OS IPR

RISC OS is an Open Source operating system owned by RISC OS Developments Ltd and licensed primarily under the Apache 2.0 license.

Description

Announce and discuss new hardware and software releases.

Voices

Options

Forums
Login

Contact Us | About Us

The RISC OS Open Beast theme is based on Beast's default layout
Site design © RISC OS Open Limited 2024 except where indicated

Hosted by Arachsys

Powered by Beast © 2006 Josh Goebel and Rick Olson
This site runs on Rails