Encodings
Paul Sprangers (346) 524 posts |
Not that I understand everything, or even half of it, but this sounds all very promising! |
Rick Murray (539) 13840 posts |
Care to share? I’m all for enhanced documentation.
True, it isn’t as if just anybody can edit the manuals, if even to reflect stuff in the ROOL wiki, or note oddities that arise in these here forums.
And that’s the point people’s heads blow up. ;-) Just out of interest, do you have easy routines to return “how many printable characters this is” from a string of UTF-8, and “which byte in this string corresponds to the eighteenth character place”? That sort of thing. |
Steve Pampling (1551) 8170 posts |
Indeed. Looks like changes in RO5.25 could be interesting (I’m assuming the sensible, built-in option and a patch utility for RO6 if Nemo doesn’t go postal trying to deal with that). I’m just waiting for Nemo to spot something not right in the Printers setup1 and rebuild that, but producing hardcopy is a bit dated so that can slide off the side for now. 1 Reserved comment. |
nemo (145) 2546 posts |
Yes. It is important that people do not roll their own, as:
What I am planning to do once UTF-8 can be utilised inside and outside the desktop, is to build in the higher-level Unicode table-driven functionality – case folding, decomposition, recomposition, string splitting etc. (Note that I specifically exclude glyph shaping. I’m not yet completely insane) |
nemo (145) 2546 posts |
The UTF-8 support will be a soft-load until there is a consensus that it should be adopted into RO5. In particular, though KB patiently added some UTF-8 support to WindowManager, FontManager etc, they will have to be updated to match this algorithm if it proves itself. Things won’t work as expected if different modules disagree over how many characters there are in a string, for example. RO6 is causing me difficulties with VectorExtend, not the UTF-8 stuff, fortunately. |
nemo (145) 2546 posts |
Steve said, menacingly
Apart from the updated PDriverPS I produced last century (masked deep sprites!) the most I’ve done with the print system is to add HostFS printer shares to the !Printers connection window. The major impediment is that MakePSFont is very many decades out of date. It’s not just that it doesn’t make Unicode fonts, it doesn’t even make hinted fonts. Sigh. I’m not proposing making it produce Unicode-compatible Type3 fonts, so updating it is basically a total rewrite. I know that looks like my job because everyone else just took several steps back while mouthing “fonts? what are they?” but I’m not committing to do that at the moment. I’ve too much on. And if I start even thinking about the font system I’ll rip it all to pieces and put OpenType support in there, like it should have had twenty years ago. (FontManager needs to be split into an API/caching front end, and separate format back-ends). This will require a non-nominal bounty (and probably threats of physical violence). |
nemo (145) 2546 posts |
PS Although I mentioned the existing UTF-8 support in WindowManager etc, it isn’t actually UTF-8 compliant IIRC, so that’s a further incentive to revisit. |
Steve Pampling (1551) 8170 posts |
I’m sure Rick and David are waiting eagerly to search out the glitches, if any.
I have cats. Dangling a lace or a feather grabs their attention. :)
You know that “Reserved comment” footnote I dropped in? I do appreciate the effort you’re putting in on this and I’ve no doubt so does everyone else.
Define non-nominal, because the standard bounty payments ain’t going to create any rich folks. Although there is the RISC OS Developments setup. I’m not sure how much people want a new print system. Printing is a dying activity. |
Steffen Huber (91) 1953 posts |
Maybe physical printing, but “document creation” is still a frequent activity. So you need a capable printing system to produce the document format of choice PostScript, PDF, or for the high-volume guys probably AFP. And TIFF for archiving of course. |
Rick Murray (539) 13840 posts |
There is still a need for physical printing, but not in the way that we are used to. More and more printers are supporting IPP which looks to be some sort of binary protocol wrapped up in what is essentially http with a different name. It looks like many such printers support being thrown a raw raster image, a JPEG, or a PDF (among others). This is probably where we ought to consider stuff to be happening, and it would certainly support a greater range of modern printers than the old driver-for-every-printer model of the 1980s. |
nemo (145) 2546 posts |
Steffen wisely pointed out
Indeed. As part of the team that created the print system for Windows 7 and later, I can exclusively (ha) reveal that it works by creating an XPS – a document broadly like a PDF (for the purposes of this explanation). Print is a side-effect of being able to create a unified document. MacOS uses PDF for the same purpose. I’m not suggesting RO should change its print model to a document-based one – Windows will happily create such a document for you as you call the old GDI interfaces (and don’t get me started on the problems that caused, especially with Office), but we ABSOLUTELY SHOULD keep print working, and as generic as possible. A PDF printer driver would be very sensible. I was working towards creating one when Taborca appeared, so I stopped then.
And this is where alarm bells start ringing. The inestimable Mr David Pilling made a sprite printer driver (which was slightly odd, as per, but worked) and other bitmap formats are a pretty trivial extension. However, I was reminded only today that the internals of the graphics code is only 16bit… which is useless once you get into the printer drivers. I used to have the excellent Calligraph A3+ laser printer, by Richard Pillar. It would happily image oversize A3 at 1200dpi, but it could only do that because it rendered in strips. Even at 600dpi, an A3 sprite is too big to draw into in one go. This is not good enough. |
nemo (145) 2546 posts |
Rick waved
IPP is a grab-bag of formats. There’s a lossless bitmap format a bit like sprites, there’s JPEG for photos, then there’s PDF and XPS. It’s the first – PWG – that would be easiest to create on RO. It uses PackBits compression so it’s not necessarily tiny. |
nemo (145) 2546 posts |
… |
Chris Hall (132) 3554 posts |
When two major operating systems (RISC OS and Windows) cannot agree the character code for the adopted ASCII character of the pound symbol (i.e. 156 or 163, although text e-mails still seem to work OK) I think RISC OS should at least stick by its guns and ignore Unicode (so far is the OS is concerned). Filename ‘top bit set’ characters are deprecated anyway [PRM-2-10 ‘as a general rule you should not use top-bit-set characters in filenames’], as well as special characters like ‘.’ and ‘,’ which have special uses (directory separator and filetype metadata extension). |
Rick Murray (539) 13840 posts |
Hello Chris, the ’90s are calling and they want their messy disorganised character handling back. |
nemo (145) 2546 posts |
Chris said
Mistake
Mistake
Mistake
Mistake
Mistake
Mistake Other than that, a powerful point, well made, that will be treated with all the weight and consideration it deserves. Thank you. |
Alan Robertson (52) 420 posts |
Brilliant reply. Loved the humour. |
Chris Hall (132) 3554 posts |
Filename ‘top bit set’ characters are deprecated anyway You seem to have got carried away – it may have been a mistake for RISC OS to deprecate top-bit-set characters in filenames but it is a fact. Also use of special characters (two of which I gave as examples) is not just deprecated in (filecore) file systems, it is forbidden. Another fact, I’m afraid. Glad to be able to correct your ignorance. |
Steffen Huber (91) 1953 posts |
Never heard of that “deprecation”. Yes, it was always filesystem dependent on what you get, but I can’t remember official deprecation. |
David Pitt (3386) 1248 posts |
More a “general rule”? PRM Volume 2, 2 FileSwitch, Technical Details, Fikename. Page 2-12 DDE28 PDF version. Page 2-10 paper version. “As a general rule, you should not use top-bit-set characters in filenames, although some filing systems (such as FileCore-based ones) support them.” |
Chris Mahoney (1684) 2165 posts |
I just tried to save a file from Edit with a comma in the name and it worked fine. It doesn’t seem to be forbidden (although a dot didn’t work, as expected). As far as I can tell at a quick glance, a comma is special only when dealing with NFS and similar filing systems, and is just an ordinary character in FileCore. |
Chris Hall (132) 3554 posts |
Don’t try saving with a comma on VRPC – ,ttt is reserved as file type ttt. |
Steffen Huber (91) 1953 posts |
VRPC HostFS is broken in so many respects that it is frightening. Handling of characters that cannot be properly represented on the host’s FS is only one thing, filetype handling including easily allowing same-name-different-filetypes to happen is another. Getting confused if the user uses a comma inside a filename is just the icing on the cake. |
Rick Murray (539) 13840 posts |
That isn’t even remotely a deprecation. That is a note that while FileCore systems are fine with high-bit set characters, don’t be surprised if others fail miserably. Remember, at the time RISC OS supported BBC era ADFS, and versions of FAT that couldn’t even handle lower case never mind funny characters (plus at the time the DOS code page was nothing remotely like 8859/1). For what it’s worth, in UTF-8 mode, SDFS and Edit was perfectly happy with a filename written in (Japanese) kana. |
Chris Mahoney (1684) 2165 posts |
Indeed. I’ve also tested this with one of my own apps. Filer was happy, Edit was happy, my !Run file was happy, the Toolbox SaveAs object was happy… |