Build and Configure Samba Server (from scratch) to share resource with RISC OS 4,6 and 5
Pages: 1 2
Dave Higton (1515) 3526 posts |
Colin: thanks for piping up! If you’ve abandoned (be it temporarily or permanently) work on it, would you release the sources so that someone else can have a go? |
Colin (478) 2433 posts |
Hopefully Japanese filenames should render as a series of `xxxx with SmbFS. Where I think I stopped was where the filenames are mangled for problem characters and the fuzzy search Lanmanfs does to match filenames. Wouldn’t someone like to do a unicode filer it would save a lot of bother. |
Dave Higton (1515) 3526 posts |
A Unicode filer sounds to naive old me like an interesting idea… it would appear at first thought to be more limited in scope. But that just shows how little I know. Going from RISC OS to UTF16 can always be done losslessly except for the few characters that have special meaning to the OS at one or the other end. It does require that the current RISC OS alphabet is taken into consideration. UTF16 to RISC OS is harder as there may be some characters that cannot be rendered in the current alphabet. Translating to ’xxxx is lossless, but not helpful if the character is within the range of the current alphabet. How much does it matter, though? Rick and his Japanese files may be a case in point. |
Rick Murray (539) 13840 posts |
Sometimes. Depends what USB key is plugged in. Usually I plug the USB key directly into the computer. It’s quicker and generally less hassle.
Honestly, I’d only expect gibberish, which is what I think I got when I tried (locally). Mojibake is the word for this. ;)
For Anglocentric RISC OS, probably not that much. Some day we’re going to need to embrace squiggles and move away from the security blanket that has “a byte is a character” embroidered upon it.
It would need to be backed up by a Unicode filing system. Trying to mix and match on the same media would be chaos. Microsoft got around this by making the LFN extensions Unicode, so it could be introduced to Windows. Not that it hasn’t been without issue. Older software and pretty much anything in DOS can only see the 8.3 names, so non-Latin stuff would turn up looking like “????????.mp3”. Looking on my phone: “雨碎江南 二胡版.m4a”, “النسخة العربية.m4a” 1, and “כאן.mp3”. Makes a change from symphonic gothic metal or 80s stuff. ;) 1 😂 Just threw that at Google Translate. It translates as “Arabic version”. 🤦🏻♀️😂😂😂 |
Dave Higton (1515) 3526 posts |
And there wouldn’t be a *Alphabet Japan precisely because the alphabet has to have more than 256 characters. So there is currently no way to display Japanese text on RISC OS, in any app or any part of the OS? |
Rick Murray (539) 13840 posts |
There was a Japanese IME, but the sources don’t appear to be a part of the open parts of the OS. It probably used an appropriate font and UTF-8 mode. NetSurf does Japanese (etc), it handles the font rendering itself with rufl. My hack of Ovation has an option where it can do it. Basically it changes the font encoding to UTF-8, so the gibberish appears correctly. ;) It can be done, but it’s harder than it ought to be. |
Rick Murray (539) 13840 posts |
Oh, and you can do Japanese and English in eight bits if you cheat massively. 16×2 LCD character set ;) |
Paul Sprangers (346) 524 posts |
Indeed, NetSurf does a better job than Iris even, which refuses to display the Japanese characters. On the other hand, Iris gets the Arabic text right, whereas NetSurf displays it in reverse. |
Andrew Rawnsley (492) 1445 posts |
Paul, if you install a Japanese font in Iris.Resources.UnixFont (edited to remove plings because Textile), you’ll get japanese. Pretty sure I have mentioned this on the forum before. There are a selection of such fonts for free online, or you can use UniFont which has a load of (not beautiful) glyphs in many languages. Only problem is, these fonts tend to be huge, and have a negative impact on page load performance (from experience). Whilst I’d like to see this sped up, that’s the reason I don’t include the fonts by default. All you need is the opentype/ttf font in the same folder as the other fonts, and it should work. I have visited Japanese software developer sites using this methodology, and the pages rendered as intended. |
Rick Murray (539) 13840 posts |
Ditto Hebrew. Same reason, I’d imagine.
Based upon WebKit, it’ll have vastly more real world experience (and developers). |
Paul Sprangers (346) 524 posts |
Ah, I wasn’t aware that Iris uses its own font directory. That explains it. I permanently have some unicode fonts installed, among which ArialUni which covers an impressive number of non latin characters. But in order to see Japanese characters rendered correctly, I’ll have to copy it to Iris’ resources. For the sake of speed I won’t do it, the more so since I can’t read Japanese at all. But thanks for explaining. |
Chris Mahoney (1684) 2165 posts |
With *Alphabet UTF8 and Japanese fonts installed, you can get Japanese text in some apps. You can even have Japanese filenames in Filer on a SCSIFS disc :) |
Steve Fryatt (216) 2105 posts |
I wouldn’t assume that Iris is using RISC OS format fonts: if it did, it probably wouldn’t have a separate font folder. It can handle fonts downloaded from the web, so I’d imagine that it’s using a more conventional format. |
nemo (145) 2546 posts |
Long sigh
Sorry, what?
Sorry, what?
Wrong, wrong and erm, wrong. There’s a lot of guesswork and “I tried it and it didn’t explode” in this thread. Let me bring some clarity. Filer There’s no such thing as “a Unicode Filer”. The Filer cares nothing for your “file names”. It does not grok them, manipulate them, map them or do anything other than throw them up at the Wimp, down at the filing system interfaces, and sideways at HeapSort to put them in “name” order, whatever that means (it definitely doesn’t mean what you think it does). Filenames are just byte strings, the end. Filer doesn’t care. Wimp The RO5 Wimp is happy enough with Unicode strings in its icons. It doesn’t do anything mad with individual bytes when the Alphabet is UTF-8 (though it may still have to do WimpSymbol remapping because FontManager is bad). It throws the filenames at FontManager. The Wimp has to be more careful when editing text in writeables, and it is. So renaming a file in Filer won’t cause UTF-8 sequences to be damaged. It doesn’t do anything clever, can’t cope with any script other than Latin, really, but given that most of the work is done by FontManager, it does ok. HeapSort When Filer sorts things into “name” order, it uses the Territory module. This makes no sense at all for Unicode. Quite why you’d want to be sorting Chinese text by a Latin Territory’s sort order is difficult to justify. Why would you sort Turkish names by Swedish alphabet ordering just cos it’s cold outside? So, not ideal, but not horrifically broken. Just not a sensible order in your window. FontManager This is where RISC OS falls on its face. FM cannot cope with multiple scripts. It can’t even cope with Hebrew (sorry, guys who tried one word, it can’t). It certainly can’t display a filename containing an Arabic directory containing a Hebrew leafname, all on a Latin-named drive. It just can’t. And that’s without digits in there. Since the Wimp relies on FontManager for everything it does with writeables and display of text, the Wimp can’t display such things either. It’s got nothing to do with what fonts you have. FileSwitch Oh dear. This is where things go badly squiffy. FileSwitch has no idea (last time I checked, so 5.28) that Unicode is a thing. It believes all filenames are 8bit. As such it asks Territory for the uppercase and lowercase tables, which is nonsensical when the Alphabet is UTF-8. As such Territory normally returns -1 causing FileSwitch to limit case-insensitivity to ASCII. (And in the presence of the UTF8Alphabet module, those calls return the tables of the Fallback Alphabet, assuming that whatever was stupid enough to ask for 8bit tables when the Alphabet is UTF-8 is going to be treating text as 8bit anyway so it may as well work correctly when it actually is – which thanks to what some people sneeringly call “legacy” but which I call “stuff you did last week” it often will be, flash sticks aside). So how does Ascii-only case insensitivity work with cased Cyrillic names? About as badly as you’d think. FileSwitch suddenly becomes case sensitive. It is possible to persuade RISC OS to operate in case-sensitive mode with your familiar Queen’s English, but you would not like it – so that’s no good. In fact it’s worse than that, because most scripts don’t have casing, but that’s the least of your worries. Because it turns out that while there’s two ways to write the last letter of the English alphabet (‘z’ and ‘Z’ since you ask), there’s at least FOUR ways of writing the last three letters of the Swedish alphabet in Unicode. It gets considerably zanier with Indic languages (there was one case where a word could be encoded 43 different ways, all visually identical). So FileSwitch has no chance of matching anything you’ve typed to anything you’ve seen, in general. To do that would require proper Unicode support at the OS level (a UnicodeSupport module perhaps?!) which can do the necessary Unicode Normalisation and comparison. Plus a FontManager that can display “ABC123” correctly when it’s in Hebrew. Or to put it another way No, you don’t need a “Unicode Filer”. |
Steve Pampling (1551) 8170 posts |
I always thought it meant in an order I’d describe as a sort of concatenated sequence of character values order1, I don’t think computers really give much note to what set of squiggles is currently associated with the vales. 1 Probably totally wrong, but it’s a clumsy description, so how would you know :) |
Rick Murray (539) 13840 posts |
Hmmm, in order… Alana Not quite. ;) And that’s just names one might encounter in the UK… throw in any non Latin script and it’ll all fall apart. (consider also https://en.m.wikipedia.org/wiki/Goj%C5%ABon) |
Dave Higton (1515) 3526 posts |
Since RISC OS can be set to *alphabet utf8, and code points can be freely translated between UTF-16 and UTF-8 (well, nearly all; all that matter? Red flag to nemo!), why not just do those things, and wait to see what breaks? |
nemo (145) 2546 posts |
UTF-16 and UTF-8 are defined to be equivalent. Perhaps you’re thinking of the High and Low Surrogates that are required in UTF-16 but illegal in UTF-8. They are an artefact of the encoding system, not character codepoints in Unicode. As for seeing what breaks, although I admire your buccaneering can-do attitude, I’m less keen on this approach for filing systems. But you may have less of an attachment to your data. As an indicator of the problems, here’s something I came across when checking RO5 on RPCEmu:
This is how RO5 on RPCEmu under Windows works with an 8-bit Alphabet. What this means is that whereas FileSwitch views What does this mean when the filename is a UTF-8 sequence? It means that FileSwitch, regardless of Alphabet, thinks that ą and Ć are the same character, so is liable to delete completely the wrong file. Meanwhile HostFS thinks that ä and Ä are different characters, so will fail to find a file if you’ve typed the wrong case. Meanwhile those dragons on FileV? Yeah, god knows. |
nemo (145) 2546 posts |
Furthermore, despite FileSwitch not knowing what Unicode is, it does at least defer to Territory in its belief in the totality of 8bit Alphabets… but as Territory does not adapt to the current Alphabet but instead always returns tables that apply to the Alphabet it would have selected if it were in charge, those tables are unlikely to leave UTF-8 uncorrupted unless you happen to have a very foreign Territory. If one changes that interface to be somewhat less dumb (which is in effect what In the presence of the UTF8Alphabet module, Territory returns tables that are consistent with the Fallback Alphabet on the basis that code that is using an 8bit text interface is probably dealing with 8bit text, so one can’t even hope that Territory adapting to Alphabet would help… I think the only approach that makes sense for RISC OS is mixed-mode:
There’s a good argument for delegating all filename matching to a Unicode-aware module to centralise and standardise the above heuristic, rather than continuing to insist that random authors (RPCEmu) do random things (HostFS) and kill files (FileSwitch/FileCore) as a result. |
nemo (145) 2546 posts |
Rick tried
That is not the ordering that Territory_Collate would return for the UK Territory. But as you’re aware, Japanese dictionary ordering puts borrowed Chinese characters (kanji) in a very different order to that in Chinese; In the UK Territory Å would be adjacent to A but in Swedish it would follow Z; And Turkey’s use of I ⇒ ı and İ ⇒ i is different again. Let’s not even mention Estonian (äöõü come after w). However, if we are talking about the UK, then Welsh ordering can be so surprising to you invading Englanders that you change all the place names. But sort ordering, however perplexing, does not lead to data loss. |
Rick Murray (539) 13840 posts |
I didn’t say it was. I was running with the idea of “a sequence of character values order” and showing how that can go wrong just with names one might encounter in the UK. And, as you point out, different countries have different conventions regarding how to do it, so there’s no one size fits all, even for basic western Latin.
I know it’s different, but not sure of the specifics. I’m guessing it’s going by the (primary) Japanese kun reading, rather than the Chinese on reading.
Ah, the Celtic languages. That with written Irish and English people need a paracetamol and a pillow. ;)
I’ve said this plenty of times. Because in this way it would be “safe” to switch the system to UTF-8, and things would magically still work for Latin1 applications, which can (hopefully) transition.
Is that not a bug? I can understand case issues on a host that has case sensitivity, but to bung something that is case sensitive in between two things that are not seems quite bizarre, bordering on obtuse. |
nemo (145) 2546 posts |
Well, if this were an operating system with reliable and complete documentation and specifications with verified reference implementations against which one could compare, you’d have some hope of answering that definitively. As it is you’ll have to make do with an opinion: It’s mad. |
Rick Murray (539) 13840 posts |
Since I know you can do this: 😺👍 |
Pages: 1 2