Tamarc 1.01 released
Pages: 1 2
Steffen Huber (91) 1949 posts |
The font manager is competent to paint some of the necessary glyphs. So a subset of UTF-8 support is certainly possible with the current font manager, including the necessary backwards compatibility for non-UTF-8-aware software (i.e. nearly all of the currently available RISC OS software). So e.g. UTF-8 support for all left-to-right languages without the need for character composition, and handling various of its shortcomings in code around the current fontmanager. Even that would be a massive step forward compared to what we have now. What would be get if we replace current font manager with something like FreeType2 and keep the existing API? Direct support for other font formats, but apart from that? It is a complicated subject, and I am just trying to get a better overall understanding. I am working on an industry which is still fighting with Unicode, because their big printing lines (and processes that prepare the printing streams before those) usually only understand a seemingly daily changing subset of it, and still they produce a lot of (mostly correct) black on paper. So it seems sensible to work in small steps towards the great goal, which might ultimately never be reached. If I understand Nemo correctly, no OS on this world has reached the great goal. And might never reach it, because Unicode itself is a moving target. |
nemo (145) 2529 posts |
Anything outside Little England (and by extension, little Europe). I mentioned in passing to Rick the ‘vertical Japanese’ problem, but it’s indicative: Japanese can be set horizontally or vertically. When set vertically the kanji, hiragana and katakana (and any Latin embedded in there too) are all still the usual way around, but some punctuation changes. So if you write ‘me (myself)’ – 私(自分)– vertically, the kanji stay the same way around, but the parentheses rotate. They do NOT have a different Unicode codepoint, they are the same character, but they are represented by a different glyph. Arabic letters have four different shapes depending where in the word they are placed. Even emoji come in multiple versions depending on the codepoints that follow them.
It’s a pretty good basis, and the only one that doesn’t cost lots of money.
Yes, and I have done so, but that’s not the right thing to do for RISC OS.
I’m not sure what you think the Acorn fonts have that makes them worth converting, when there are vastly more capable fonts out there for free. Type1 would not be the right format anyway. OpenType is the only sensible choice.
The existing API must certainly continue to work, but will have to be extended. The FontManager must be split into two parts – the API, font management, caching and painting; and various format-supporting back-ends: Acorn for backwards compatibility, OpenType for competence.
Indeed, and line breaking is an application function, not a font function – that’s a decision about what text to paint (though part of that decision making is based on the font metrics), whereas the font decides what that text should look like (ie which glyphs should be used).
Uggh, no. If so, only by accident. If you paint ‘ And we haven’t even left ‘little Europe’ yet.
If by ‘subset’ you mean the equivalent of ‘it can do lowercase, who needs uppercase’ then fine, it can do a subset. But really, no it can’t.
The crucial requirement is OpenType. OpenType fonts support the GSUB and GPOS tables that implement the various script features that Unicode rendering depends upon. It’s not sufficient but it is necessary. I’d recommend looking up the GSUB table in the OpenType documentation, but to give an idea of what it can do, here is Microsoft’s list of Features that Windows supports.
As I’m probably responsible for the code that achieves that, I can only apologise for when it goes wrong. (Unless you’re running an Adobe or Artifex Rip, in which case, ha!). However, by far the biggest problem with (for example) PDFs is wonky non-compliant document creation, and highly broken fonts, not so much problems in the software. The PDF committee has fought with the PDF text/font spec for years, and I was invited to pass my judgement as an expert witness, but the PDF spec now says something along the lines of “you must do this, but if you don’t do that you must do this, but if you don’t do that then do this, and if you don’t do that you should certainly do this. Some applications do something else.” – sorry, blame Acrobat (on the whole).
No I certainly haven’t said that. Some scripts are hugely more complex than others (we have it so easy in Europe, even with weird hyphenation rules). In Japanese you can’t tell where a word ends without a dictionary. In Nasta’līq you start writing a word above the baseline, angle down until the end of the word, while making sure that the ends of the words all line up, and none of the words clash with the words before and after it that are slighly above and below it. Then there’s Devanagari
Unicode marches on, inventing new stuff and sticking it in like a world-consuming game of Katamari Damacy (with little civil wars going on all over it, especially over emoji!). However, it publishes versions, so any particular version is a stationary target. Android, iOS etc are right at the bleeding edge of support. But don’t underestimate the amount of complexity. And don’t for a moment think that what RISC OS has can do any of it. |
Steffen Huber (91) 1949 posts |
I had the impression that you were based on PostScript and PDF territory. This industry is firmly based on IBM/AFP/MODCA territory. You know, where AFP raster fonts are still used and people using AFP outline fonts are considering themselves bleeding edge. With various interesting conversions using equally interesting conversion software going on if PostScript or PDF or various bitmap formats are used as part of input creation. Some of them not even capable of converting coloured input into coloured output. Thank god it is mostly the job of my colleagues to sort it out (e.g. proving that what our software delivers as AFP/MODCA or PDF or PostScript output is entirely correct according to the spec, and someone else on the long way to the printer messes it up), I only hear the stories when drinking coffee with them :-) |
nemo (145) 2529 posts |
Glad it’s not my fault then! |
Steve Pampling (1551) 8155 posts |
I believe they settled on labelling that as Brittany a number of centuries ago. An integral part of France (as opposed to Paris, if you talk to true French people :))
Sounds similar, but possibly more extreme, to the character shapes used in English a number of centuries ago. |
nemo (145) 2529 posts |
TL;DR: Font engine needs to support OpenType. Text shaping needs something like HarfBuzz. But daydreams aside, just producing a UTF-8 compatible text editor requires the cursor to do something sensible when you press right, delete or place the cursor with the mouse. Since characters aren’t codepoints (take the Splitting text prettily at the end of a line is an order of magnitude harder. Both of those functions need to agree unambiguously about terminology, behaviour and whether a codepoint is visible, a base character, a modifier or a control code. All of this stuff needs to be centralised and shared. The VDU (OS_WriteC in other words) needs to be able to display any Unicode that gets thrown at it… but in the manner of a terminal. It will NOT be doing glyph composition, shaping or reordering. How could one expect arbitrary accent composition in an 8×8 matrix pixel glyph?! It is just about feasible to implement rotating Japanese punctuation when the VDU is configured for vertical text, but that would be special casing. |
Clive Semmens (2335) 3276 posts |
And just for a change, that’s one I actually know about! :-) & hacked a couple of 224 glyph fonts & Impression to make quite a decent stab at it, way back in the 90s – quite as good as Indian hot metal printers – and typed from the keyboard using the conventional Hindi keyboard layout (not the IBM one in common use in India more recently, that hobbles the users completely). Not, of course, with any of the glyphs in the “right places” in the fonts, and nothing pre-composed in the fonts. You could type normally, and something roughly recognizable would appear in Impression, then you had to save it from Impression as DDF onto a little app that put in all the kerning to put all the diacritics in the right places, and returned it to Impression looking nice. What a horrible hack. But it worked. |
Rick Murray (539) 13806 posts |
Yeah, isn’t it something like “girl with arms in the air” followed by a modifier for blonde, brunette, ginger, grey and another modifier for Caucasian, Black, Asian…?
Hmm. FontManager can be instructed to plot any character you can directly specify (so long as it exists in the font) (which means no fancy combining) so long as you’re happy with a flat left to right print. Right to left might work as well, can’t say I’ve ever tried. So it’s basically “FontManager of old with a few more characters to play with”. I can finally write Tōkyō correctly (and not the horrid ô compromise). I can even, with the right font, write 東京. And, uh… |
Rick Murray (539) 13806 posts |
If your browser can hack it: https://www.chenhuijing.com/zh-type/ Note, however, that the English language text is rotated and read sideways. I don’t know if this is how it’s done in China, or if it’s a limitation of the rendering, however this is not how it’s done in Japan: http://heyrick.co.uk/random/otome-fami_p113.jpeg |
Pages: 1 2