Horizontal Scroll Bar for Boot/Run-at-startup
Rick Murray (539) 13850 posts |
From playing with !Edit on RO5, I think the main problems are redraws and cursor positioning. I believe these are related, for I suspect Edit is correctly rubbing out to repaint, only it is all happening in the wrong places. The complexity of UTF-8 is you won’t know the length of a string without counting it up character by character.
I wouldn’t. By typing Alt-163, you are specifically requesting character 163. That’s what you should get, even if it is junk, or needs a further character input to have something appear on-screen.
It isn’t too hard. The only complexity is we will need to have duplicates of the string handling functions (MID$ etc) that are Unicode aware, and programmers will need to think more C-like in that a “string” is a sequence of bytes, and they may not look alike. |
Jess Hampshire (158) 865 posts |
I just set alphabet to UTF8 and desktop font to cyberbit, and I can see Cyrillic on screen. очень хорошо I wonder if that will work. |
Jess Hampshire (158) 865 posts |
That seems to work. Random characters from other languages: αβγ כעח All entered on my Iyonix. |
nemo (145) 2554 posts |
Rick verbosed:
BOMs identify all encodings of Unicode, and also indicate endianness (which doesn’t apply to UTF-8, granted). My personal favourite is UTF-9. ;-)
So, no. FontManager does not support Japanese. Unless you mean like a typewriter supports Japanese. ;-p WPB suggested:
Rubbish. I’m staggered you could write such a thing. Newspapers, magazines, books, manuals, packaging, signs… good golly. I work in the print industry and my customers include Sharp, OKI, Kyocera, Canon (and they’re the ones I can mention) and I can assure you that vertical text is of the utmost importance to those, and all other Japanese manufacturers. Very little?! Why do you think the TrueType vhea, vmtx and VORG tables were created? What’s the vert feature of the GSUB table for? Even Microsoft Word supports vertical Japanese text, never mind Quark Express, Adobe Indesign, Illustrator and even Photoshop. PDF supports it. XPS supports it. Hell, even CSS3 supports it. Rick continued:
Yes.
That’s in markup. You don’t want to be hand editing every fraction when the font can do it for you with a GSUB feature. That RISC OS doesn’t support advanced OpenType functionality is unfortunate. That it doesn’t even support TrueType is embarrassing. WPB rightly said:
And probably a non-topic. If BASIC is limited to 255 byte strings then worrying how many characters that amounts to is rather moot. In any case, the BASIC programmer’s travails are no different to those in any other language – assembler and C see bytes just like BASIC does. You need a UTF-8 library to help you manipulate UTF-8 text regardless of language. Number of characters <= LEN(A$) Rick wore out his keyboard with:
Character code 163, yes, and what that looks like depends on your character set. If your character set is Unicode, it’s a pound sign, so you should get a pound sign. The fact that two bytes had to be delivered to do so is incidental – the same used to happen on a BBC Master, so nothing has changed.
Well, not need, no. Not if you have a Unicode handling library somewhere (which you’d need anyway). In any case, a BASIC program is at liberty to do a Font_FindFont,“\FTrinity.Medium\ELatin1” (or whatever the syntax is) and hence manipulate strings in an 8bit encoding too. On the whole, it’s best if the programmer knows what they are doing. I’m beginning to feel, in this area at least, that’s a forlorn hope. Which may go some way to explaining the schism in EPOC when the Unicode version was forked, which went on to become Symbian – Symbian applications weren’t compatible with the EPOC in Psions mainly because of the text encoding issue. Daft but true. |
WPB (1391) 352 posts |
Ooh, touché! Yes, that’s true. I was confining myself (perhaps artificially) to the realm of word-processing/office work and not really considering the print industry. ;-o Nevertheless, I think you’re worrying excessively about vertical printing at this stage. There are far more pressing issues. Wouldn’t it be better to focus on how to properly support UTF-8 in the OS and apps? After all, with horizontally printed text we can manage in CJK just fine. Other than packaging beanpoles, I’m sure the text will fit on horizontally. (Now I’m just trolling ;) You can try telling Sharp, OKI, Kyocera and Canon that if you like.) If anything, and somewhat orthogonally (!), our focus should probably be on right-to-left text if we’re talking about directions. As far as I know, you can’t write Arabic left-to-right, can you? I know Font Manager has some support for right-to-left, but how does the Wimp cope? I’ve never really explored this area, so I have no idea. Would be interesting to hear some comments… |
Theo Markettos (89) 919 posts |
You can’t write Arabic left-to-right (except for numbers). I’d probably suggest starting with Hebrew, because it has characters whereas Arabic has script, and you need a rendering engine to take arabic letters (ا ل س ل ا م ع ل ي ك م) and render them as a scripted phrase (السلام عليكم). (I have no idea what RISC OS will make of that). Other fun things are cursor behaviour (scrolling, delete, etc), and drag-selection. I’m very cautious about making generalisations about language support, because any time you say ‘X isn’t important’, this is essentially cultural imperialism – you marginalise any group who do care about X. Telling people they aren’t allowed to write in their native form is just slightly arrogant. My serious respect in this area goes to the bible translation folks, who are at the cutting edge of working in all kinds of marginalised languages many of which are poorly understood outside the small ethnic group, and have never been written down. And they do use software (with Unicode) to do this, as well as thinking about the linguistic support and typesetting. On the UTF8 filetype area, one thing to avoid is what happens on some editors (like gedit on GNOME) that you load in an arbitrary file (for example, an executable) and the editor can’t cope because it can’t decide what encoding it is. Making guesses about BOMs and such is fine – except it’ll probably go wrong more times than you think. And finally, a bonus use for Unicode… putting music in Facebook posts: |
WPB (1391) 352 posts |
Thanks for the quick lesson on Arabic, Theo. That’s interesting.
I hope my suggesting that vertical font rendering support isn’t essential to claim that the FM supports JP hasn’t come across as me saying it’s not important at all. Far from it – I would love to be able to work on vertically written Japanese on RISC OS. (If nothing else, in order to print beanpole packaging labels ;) But more than that, I’d love to be able work on written Japanese at all on RISC OS. I’m just trying to say (not very well), that we shouldn’t get bogged down in the fact that FM can’t do vertical text – other issues are far more important at this stage. |
Theo Markettos (89) 919 posts |
Yes, FTAOD I’m not suggesting that everything must be supported today – it’s not practical, and some necessary prioritisation has to happen. Just pointing out that any time you exclude X will upset the group of people who care about it. So we should care about such things, even if we have no opportunity to implement them any time soon. |
nemo (145) 2554 posts |
Theo has probably made more clear with a simple example than I did with heavy handed hinting that my concern is not vertical Japanese text per se, but the underlying limitations that this example eloquently and simply demonstrates, without getting into the further complexities of the BIDI algorithm, Unicode Data parsing and the frankly boggling subject of arbitrary script support. Farsi (Arabic script) is hard work because there are four forms of every unadorned character, extensive ligatures and alternate forms, complicated diacritic repositioning and unique justification rules. But if you think Farsi is complicated you should see Nastaʿlīq (a separate slanting baseline for every word). Plus Mongolian and Manchu are ONLY written vertically. Meanwhile even relatively ‘familiar’ fonts such as Zapfino require a level of font system sophistication which RISC OS cannot offer. |
nemo (145) 2554 posts |
You were certainly confining yourself to the office, as Japanese and Chinese even when written horizontally can often be written right-to-left, as well as the Western-influenced left-to-right… sometimes on the very same object. This is mostly true of signs, especially on vehicles as it happens. |
Rick Murray (539) 13850 posts |
Nine !? <looks> Oh, I see. http://en.wikipedia.org/wiki/UTF-9_and_UTF-18 But you want to be careful with stuff like that. After all, RFC1149" has been successfully implemented (I love the bit “Audit trails are automatically generated”).
I didn’t know the MOS supported vertical printing, but I can’t see it as being that difficult to do when you know the dimensions of the characters and they are regular (ie the VDU font). If you want to impress me, tell me it was capable of scrolling sideways when you went off the left edge… As for FontManager – why do you need specific vertical metrics to format the font vertically? Sounds like a dumb question, but doesn’t the font data have a concept of line height? Couldn’t you just do that (with leading adjustment if this looks too spaced out) to format?
Mmm, I never actually thought about that. Just looked in Otome-Fami 「オトメファミ」 magazine, and not only are they sideways (!) but they look like they take less vertical space (and yes, it throws the alignment off). It’s useful to note that also that there appears to be a typographical convention
Indeed! Okay… You could write a book in Japanese using only Western-style line formatting. Here’s a scan of a cute-things-to-put-in-a-bento book [ISBN 978-4-87303-608-3] However when you head more towards pop culture, mixing the two layouts on the same page is not uncommon. Here’s Otome-Fami (issue July 2012) with a Lego R2D2, cute figurines (one looks like “Kara no Kyōkai”’s Shiki on a good day!): Fact is, written Japanese just goes both ways. Anybody who says otherwise can go stare at the above scans awhile…
TrueType was first released with MacOS System 7 in ‘91. I was using RISC OS’s outline fonts with the execrable “AcornDTP” back in ‘89. Oregano supports TrueType fonts, which opened up a world of additional characters long before RISC OS had a Unicode FontManager in general distribution, however is it capable of any fancy rendering (such as top-down)?
Whoa… wait… when did that become a criteria? (^_^) Theo added:
I see two “glyph goes here” markers, a 3/4, then a bunch more “glyph goes here” markers. But, then, thankfully my exposure to Facebook is minimal; though I have seen some pretty nice ASCII art using Unicode characters (so by definition isn’t “ASCII”, but you know what I mean when I say that…). WPB wrote:
I’m still trying to work out how the written part relates to the phrase part. I can see some similarities in parts, but…
Indeed, and it might be useful to keep these things in mind if/when FontManager gets revision work, so that hooks can be left for future modifications or modifications can be made in a way that doesn’t preclude such things in the future.
This is, of course, in addition to all the other stuff that needs to be done! Nemo offered:
So how does that work? It looks like there are multiple versions of each letter (and over a dozen, in the case of ‘e’). How do you choose? |
WPB (1391) 352 posts |
Would you expect the Font Manager to handle writing “AMBULANCE” backwards for printing onto the bonnet of one? Isn’t the Murashita Shoji van you linked to a similar case? (I realise it’s not quite the same – the former is for viewing in a mirror, the latter is for viewing “as it drives past” – a delightful example highlighting how the minds of ideogram readers work!) Though I was flippant about the prevalence of vertical text above (termed tategaki to the Japanese) and rightfully called up for it, it really is a fringe case to write yokogaki (horizontal style) from right-to-left in Japanese. (Though I have seen it on vans, too.) That said, I suppose if the font engine had to do right-to-left for other scripts, it would be foolish to prohibit it artificially for a script that isn’t normally written that way. EDIT: I hope I haven’t just fallen into another “Ben trap”. I feel like I might have done… |
Theo Markettos (89) 919 posts |
An explanation. It’s roughly the same problem as printing handwriting. Note that these are separate Unicode code points, otherwise you wouldn’t be able to make a table like that with initial/medial/terminal forms. But the system also has to be able to take individual characters (eg from typing) and compose them into the appropriate form (eg switch from terminal to medial when you add a letter to the end of a word). Some Arabic encodings, eg ISO-8859-6 , don’t provide the different forms, so a shaping processor is required, while others, eg CP864 , do. This also means that there’s not a 1:1 mapping between glyphs in encodings.
Like this? ƎƆN∀˥∩qW∀ (that’s upside down too) |
WPB (1391) 352 posts |
Love it! ;) |
Theo Markettos (89) 919 posts |
BTW, if you want to make your rendering head hurt, try working out how to render these |
WPB (1391) 352 posts |
Ouch! Presumably there are Unicode code points for each arabic letter in its isolated/final/medial/initial form. Other than not adding any left/right padding to characters, does the rendering engine have to do anything special to make medial forms join up nicely. Is there a sort of baseline concept? |
nemo (145) 2554 posts |
It is a vastly complicated subject, but full support for OpenType and the correct implementation of the Unicode-specified BIDI algorithm (plus the appropriate high-quality fonts) goes a very long way to solving all of these real world problems. [I don’t mean different regional writing systems are the problem, I mean the Anglocentric nature of the software industry and the resulting lack of support for such things – “you’re using our software, use our writing system!”] Naturally you need application support to access the more complicated behaviour, but Arabic glyph selection, Japanese vertical punctuation etc tend to come out in the wash (ie, require no direct application control, however, this is a bit of a bête noir, especially in relation to PDF). Rik said:
Of course it was, as is RISC OS. Press F12 and type this: BASIC VDU23,16,10;0;0;0;12 *.
You answered your own question there!
Indeed. And Arabic needs complex glyph shaping. And Nasta‘līq needs ‘rising’ baselines (actually they are descending baselines – it’s written right-to-left – but the final kick in the kidneys is that the descending baselines of each word all end at the same height… it’s where the word starts that varies).
That’s not a Facebook issue, it’s a result of the repertoire of your available fonts, and points to Theo probably having a Mac or a higher quality font library. ;-)
FontManager is the API and the cache manager. It should not be dictating the font format. Ideally there’d be a FontManager extended slightly with BIDI and the various stylistic variants OTF offer, and a number of format back-ends that handle different flavours of font. I’d also like to fix the production of antialiased Acorn glyphs to make them less fuzzy. It’s a misdesign in the hinting (specifically when antialiasing).
Mostly you don’t need to. The font features extensive (very extensive) ligature and contextual substitutions, so mostly it just does the nicest thing possible with what you’ve typed. However, it also has many ‘stylistic variants’ so you can further choose what suits best. The interface for selecting these OpenType features vary from program to program, and it tends to be the more sophisticated typographic systems that give access to every last feature, but that’s an unnecessary GUI issue – it works the same whether you want WPB asked:
It can already do that. That’s a simple affine transformation done via the font matrix. I think it was the RO3 FontManager that gained the ability to put a matrix in the fontname (font ‘specifier’ I mean, of course). I remember talking to Dizzy about it. That was a long time ago, he’s moved on somewhat(!)
In fact, if one implements Unicode’s BIDI algorithm such usages can be specified unambiguously. Theo wrote:
Unless one used a webfont with a suitable Format 12 cmap and some judicious use of the Private Use area. But now I’m being pedantic. That’s a limitation of HTML. PDLs such as PDF and XPS would have no difficulty, and XPS in particular can actually use GIDs instead of Unicodes. (I’ll try to say no more about XPS, I was a consultant in its creation, so unable to be as free with opinion as I typically am on other subjects).
Fortunately, no font system is expected to be able to do dynamic calligraphic deformation on demand, so that’s a trick question. However, an OpenType can easily do a GSUB on the ‘plain text’ version and replace it with the composed one. WPB opened a can marked ‘worms’ with:
I earlier hinted at justification of Arabic – unlike roman scripts where we change the spaces between words (and sometimes between the letters themselves) and split awkward words with a hyphen (though with some slightly involved rules and exception dictionaries) Arabic is far more complicated. As each words has to flow calligraphically, glyphs have to change shape when justifying – including changing some glyph combinations to precomposed ligatures to help the word fit. Plus the long strokes between glyphs, the kashida (used for emphasis and legibility), needs to change length and shape to help. Unfortunately the kashida isn’t a separate glyph, it’s the joining stroke of some other glyph. OpenType’s multiple glyph repetoires help with this, as well as the provision of multiple ligatures. However, high-quality Arabic typography is difficult with anything but the simplest of typefaces (imagine if the only font that worked well for roman languages was |
Rick Murray (539) 13850 posts |
(-: ¡looɔ ʎʇʇǝɹd s,ʇɐɥʇ ʍou ’ɐoɥM
I wonder if it is always that, or sometimes a case of “OMG, couldn’t make heads nor tails of your writing system……so use ours!”.
<smile> I like that! (small things amuse small minds!)
Not really, for certain specific characters would need to be marked in some way for rotation, as we’ve already noticed that the punctuation is rotated, but the other billion kanji aren’t. Thus, in rotation you have two choices – use the character width as its height; or fall back to using the same line height as for everything else. The only complication I can see is that the character width might be the kanji-cell width, however I don’t doubt FontManager could calc something for itself from the font scaffolding. These would, obviously, need to be provided as exceptions. However, as your Arabic example highlights, there may be a fair few exceptions to consider if support for exceptions would be provided (like an Arabic character being different at the start, end, and in the middle of a sentence, plus yet another form for “on its own”). If support for exceptions is considered, all of this should be considered (even if no support for Arabic is planned) so that the specification can hopefully be extensible without throwing away and rewriting the API.
? FontManager is a big chunk of code that reads own-format font definitions and splats them onto a raster device as instructed by the user (or the user’s software). I think that qualifies it for abilities to dictate stuff…
[ideas / wishlist time] Yes – I think, since it looks like TrueType/OpenType use similar principles with fonts being drawn from lines and sort-of-bezier-curves, that the font manager needs to be three distinct sections:
It should be envisaged to provide hooks to allow additional or third-party modules to link in to provide new font formats and/or new compositing. In this way, support for <obscure format> or that language with the moving baselines could be conceivably added without needing to extend/hack/rework/recompile the entire font system. I, for one, wouldn’t appreciate a big stack of code for a language I’ve never heard of, never mind ever use. Certainly, we would need to look at FontManager’s direction in the case of modernising the thing and ask should it support dozens of font formats and languages internally or should it do so via plugins? Said plugins could conceivably live within !Fonts right alongside the font data, something like !Fonts.languages.japanese and !Fonts.languages.arabic ? Not to mention FontManager being “informed” that file type &xxx is a font file that this module can handle; so upon module load, FontManager rescans to take into account potential new fonts. Obviously, these will be provided as modules to be loaded on demand by FontManager (I say modules so they can remain active without cluttering up FontManager with issues of dynamically loading and caching code – that’s what a module is, right? for an example of why, look how many different non-Latin characters have popped up in this very topic!). |
Rick Murray (539) 13850 posts |
Seems….disturbingly appropriate. |