Build and Configure Samba Server (from scratch) to share resource with RISC OS 4,6 and 5

48 posts, 16 voices

Pages: 1 2

Mar 19, 2023 2:31pm Dave Higton (1515) 3526 posts	Colin: thanks for piping up! If you’ve abandoned (be it temporarily or permanently) work on it, would you release the sources so that someone else can have a go?

Mar 19, 2023 2:39pm Colin (478) 2433 posts	Hopefully Japanese filenames should render as a series of `xxxx with SmbFS. Where I think I stopped was where the filenames are mangled for problem characters and the fuzzy search Lanmanfs does to match filenames. Wouldn’t someone like to do a unicode filer it would save a lot of bother.

Mar 19, 2023 3:01pm Dave Higton (1515) 3526 posts	A Unicode filer sounds to naive old me like an interesting idea… it would appear at first thought to be more limited in scope. But that just shows how little I know. Going from RISC OS to UTF16 can always be done losslessly except for the few characters that have special meaning to the OS at one or the other end. It does require that the current RISC OS alphabet is taken into consideration. UTF16 to RISC OS is harder as there may be some characters that cannot be rendered in the current alphabet. Translating to ’xxxx is lossless, but not helpful if the character is within the range of the current alphabet. How much does it matter, though? Rick and his Japanese files may be a case in point.

Mar 19, 2023 4:46pm Rick Murray (539) 13840 posts	Rick, do you store files with Japanese filenames on an SMB server? Sometimes. Depends what USB key is plugged in. Usually I plug the USB key directly into the computer. It’s quicker and generally less hassle. How would you expect them to be rendered on RISC OS at the moment? Honestly, I’d only expect gibberish, which is what I think I got when I tried (locally). Mojibake is the word for this. ;) How much does it matter, though? For Anglocentric RISC OS, probably not that much. But it’s a bit of an embarrassment when everything else, including my ancient Creative Zen (from 2007) handles this sort of thing just fine, but…… Some day we’re going to need to embrace squiggles and move away from the security blanket that has “a byte is a character” embroidered upon it. A Unicode filer sounds to naive old me like an interesting idea… It would need to be backed up by a Unicode filing system. Trying to mix and match on the same media would be chaos. Microsoft got around this by making the LFN extensions Unicode, so it could be introduced to Windows. Not that it hasn’t been without issue. Older software and pretty much anything in DOS can only see the 8.3 names, so non-Latin stuff would turn up looking like “????????.mp3”. Looking on my phone: “雨碎江南二胡版.m4a”, “النسخة العربية.m4a” ¹, and “כאן.mp3”. Makes a change from symphonic gothic metal or 80s stuff. ;) I wonder if the last two are back to front? ¹ 😂 Just threw that at Google Translate. It translates as “Arabic version”. 🤦🏻‍♀️😂😂😂

Mar 19, 2023 5:02pm Dave Higton (1515) 3526 posts	And there wouldn’t be a *Alphabet Japan precisely because the alphabet has to have more than 256 characters. So there is currently no way to display Japanese text on RISC OS, in any app or any part of the OS?

Mar 19, 2023 5:16pm Rick Murray (539) 13840 posts	There was a Japanese IME, but the sources don’t appear to be a part of the open parts of the OS. It probably used an appropriate font and UTF-8 mode. NetSurf does Japanese (etc), it handles the font rendering itself with rufl. My hack of Ovation has an option where it can do it. Basically it changes the font encoding to UTF-8, so the gibberish appears correctly. ;) It can be done, but it’s harder than it ought to be.

Mar 19, 2023 5:23pm Rick Murray (539) 13840 posts	Oh, and you can do Japanese and English in eight bits if you cheat massively. 16×2 LCD character set ;)

Mar 20, 2023 11:09am Paul Sprangers (346) 524 posts	NetSurf does Japanese (etc), it handles the font rendering itself with rufl. Indeed, NetSurf does a better job than Iris even, which refuses to display the Japanese characters. On the other hand, Iris gets the Arabic text right, whereas NetSurf displays it in reverse.

Mar 20, 2023 12:05pm Andrew Rawnsley (492) 1445 posts	Paul, if you install a Japanese font in Iris.Resources.UnixFont (edited to remove plings because Textile), you’ll get japanese. Pretty sure I have mentioned this on the forum before. There are a selection of such fonts for free online, or you can use UniFont which has a load of (not beautiful) glyphs in many languages. Only problem is, these fonts tend to be huge, and have a negative impact on page load performance (from experience). Whilst I’d like to see this sped up, that’s the reason I don’t include the fonts by default. All you need is the opentype/ttf font in the same folder as the other fonts, and it should work. I have visited Japanese software developer sites using this methodology, and the pages rendered as intended.

Mar 20, 2023 1:55pm Rick Murray (539) 13840 posts	whereas NetSurf displays it in reverse. Ditto Hebrew. Same reason, I’d imagine. Iris gets the Arabic text right Based upon WebKit, it’ll have vastly more real world experience (and developers).

Mar 20, 2023 3:42pm Paul Sprangers (346) 524 posts	Paul, if you install a Japanese font in Iris.Resources.UnixFont (edited to remove plings because Textile), you’ll get japanese. Ah, I wasn’t aware that Iris uses its own font directory. That explains it. I permanently have some unicode fonts installed, among which ArialUni which covers an impressive number of non latin characters. But in order to see Japanese characters rendered correctly, I’ll have to copy it to Iris’ resources. For the sake of speed I won’t do it, the more so since I can’t read Japanese at all. But thanks for explaining.

Mar 20, 2023 10:04pm Chris Mahoney (1684) 2165 posts	So there is currently no way to display Japanese text on RISC OS, in any app or any part of the OS? With *Alphabet UTF8 and Japanese fonts installed, you can get Japanese text in some apps. You can even have Japanese filenames in Filer on a SCSIFS disc :)

Mar 20, 2023 11:52pm Steve Fryatt (216) 2105 posts	I permanently have some unicode fonts installed, among which ArialUni which covers an impressive number of non latin characters. But in order to see Japanese characters rendered correctly, I’ll have to copy it to Iris’ resources. I wouldn’t assume that Iris is using RISC OS format fonts: if it did, it probably wouldn’t have a separate font folder. It can handle fonts downloaded from the web, so I’d imagine that it’s using a more conventional format.

Mar 21, 2023 3:49pm nemo (145) 2546 posts	Long sigh So there is currently no way to display Japanese text on RISC OS, in any app Sorry, what? or any part of the OS? Sorry, what? there wouldn’t be a Alphabet Japan precisely because the alphabet has to have more than 256 characters Wrong, wrong and erm, wrong. There’s a lot of guesswork and “I tried it and it didn’t explode” in this thread. Let me bring some clarity. Filer* There’s no such thing as “a Unicode Filer”. The Filer cares nothing for your “file names”. It does not grok them, manipulate them, map them or do anything other than throw them up at the Wimp, down at the filing system interfaces, and sideways at HeapSort to put them in “name” order, whatever that means (it definitely doesn’t mean what you think it does). Filenames are just byte strings, the end. Filer doesn’t care. Wimp The RO5 Wimp is happy enough with Unicode strings in its icons. It doesn’t do anything mad with individual bytes when the Alphabet is UTF-8 (though it may still have to do WimpSymbol remapping because FontManager is bad). It throws the filenames at FontManager. The Wimp has to be more careful when editing text in writeables, and it is. So renaming a file in Filer won’t cause UTF-8 sequences to be damaged. It doesn’t do anything clever, can’t cope with any script other than Latin, really, but given that most of the work is done by FontManager, it does ok. HeapSort When Filer sorts things into “name” order, it uses the Territory module. This makes no sense at all for Unicode. Quite why you’d want to be sorting Chinese text by a Latin Territory’s sort order is difficult to justify. Why would you sort Turkish names by Swedish alphabet ordering just cos it’s cold outside? So, not ideal, but not horrifically broken. Just not a sensible order in your window. FontManager This is where RISC OS falls on its face. FM cannot cope with multiple scripts. It can’t even cope with Hebrew (sorry, guys who tried one word, it can’t). It certainly can’t display a filename containing an Arabic directory containing a Hebrew leafname, all on a Latin-named drive. It just can’t. And that’s without digits in there. Since the Wimp relies on FontManager for everything it does with writeables and display of text, the Wimp can’t display such things either. It’s got nothing to do with what fonts you have. FileSwitch Oh dear. This is where things go badly squiffy. FileSwitch has no idea (last time I checked, so 5.28) that Unicode is a thing. It believes all filenames are 8bit. As such it asks Territory for the uppercase and lowercase tables, which is nonsensical when the Alphabet is UTF-8. As such Territory normally returns -1 causing FileSwitch to limit case-insensitivity to ASCII. (And in the presence of the UTF8Alphabet module, those calls return the tables of the Fallback Alphabet, assuming that whatever was stupid enough to ask for 8bit tables when the Alphabet is UTF-8 is going to be treating text as 8bit anyway so it may as well work correctly when it actually is – which thanks to what some people sneeringly call “legacy” but which I call “stuff you did last week” it often will be, flash sticks aside). So how does Ascii-only case insensitivity work with cased Cyrillic names? About as badly as you’d think. FileSwitch suddenly becomes case sensitive. It is possible to persuade RISC OS to operate in case-sensitive mode with your familiar Queen’s English, but you would not like it – so that’s no good. In fact it’s worse than that, because most scripts don’t have casing, but that’s the least of your worries. Because it turns out that while there’s two ways to write the last letter of the English alphabet (‘z’ and ‘Z’ since you ask), there’s at least FOUR ways of writing the last three letters of the Swedish alphabet in Unicode. It gets considerably zanier with Indic languages (there was one case where a word could be encoded 43 different ways, all visually identical). So FileSwitch has no chance of matching anything you’ve typed to anything you’ve seen, in general. To do that would require proper Unicode support at the OS level (a UnicodeSupport module perhaps?!) which can do the necessary Unicode Normalisation and comparison. Plus a FontManager that can display “ABC123” correctly when it’s in Hebrew. Or to put it another way No, you don’t need a “Unicode Filer”.

Mar 21, 2023 4:41pm Steve Pampling (1551) 8170 posts	put them in “name” order, whatever that means (it definitely doesn’t mean what you think it does). I always thought it meant in an order I’d describe as a sort of concatenated sequence of character values order¹, I don’t think computers really give much note to what set of squiggles is currently associated with the vales. ¹ Probably totally wrong, but it’s a clumsy description, so how would you know :)

Mar 21, 2023 5:10pm Rick Murray (539) 13840 posts	Hmmm, in order… Alana Jessica Myfanwy Sara Áine Éabha Órlaith Not quite. ;) And that’s just names one might encounter in the UK… throw in any non Latin script and it’ll all fall apart. (consider also https://en.m.wikipedia.org/wiki/Goj%C5%ABon)

Mar 21, 2023 8:29pm Dave Higton (1515) 3526 posts	Since RISC OS can be set to *alphabet utf8, and code points can be freely translated between UTF-16 and UTF-8 (well, nearly all; all that matter? Red flag to nemo!), why not just do those things, and wait to see what breaks?

Mar 22, 2023 12:17pm nemo (145) 2546 posts	UTF-16 and UTF-8 are defined to be equivalent. Perhaps you’re thinking of the High and Low Surrogates that are required in UTF-16 but illegal in UTF-8. They are an artefact of the encoding system, not character codepoints in Unicode. As for seeing what breaks, although I admire your buccaneering can-do attitude, I’m less keen on this approach for filing systems. But you may have less of an attachment to your data. As an indicator of the problems, here’s something I came across when checking RO5 on RPCEmu: `OS_File Task uses filename string from Wimp \| FileV There can be any number of things \| sitting on the vector, looking for <hErE bE DrAgOnS> particular filenames. Who knows \| FileSwitch Uses Territory for case-folding \| which does NOT adapt to Alphabet \| HostFS(RPCEmu) Only case-folds ASCII, maps to \| Unicode for the host OS \| Windows™ Does Unicode case-folding` This is how RO5 on RPCEmu under Windows works with an 8-bit Alphabet. What this means is that whereas FileSwitch views `Thïs` and `tHÏS` as the same file, HostFS does not, but Windows does. What does this mean when the filename is a UTF-8 sequence? It means that FileSwitch, regardless of Alphabet, thinks that ą and Ć are the same character, so is liable to delete completely the wrong file. Meanwhile HostFS thinks that ä and Ä are different characters, so will fail to find a file if you’ve typed the wrong case. Meanwhile those dragons on FileV? Yeah, god knows.

Mar 22, 2023 12:29pm nemo (145) 2546 posts	Furthermore, despite FileSwitch not knowing what Unicode is, it does at least defer to Territory in its belief in the totality of 8bit Alphabets… but as Territory does not adapt to the current Alphabet but instead always returns tables that apply to the Alphabet it would have selected if it were in charge, those tables are unlikely to leave UTF-8 uncorrupted unless you happen to have a very foreign Territory. If one changes that interface to be somewhat less dumb (which is in effect what `SENSITIVE ON` in the earlier example does) then one can persuade FileSwitch (and FileCore as it happens) to not mangle UTF-8… but as we’ve seen, other code does its own weird thing, so copying a file from this* filing system to that filing system could happily overwrite completely the wrong file. Thanks. In the presence of the UTF8Alphabet module, Territory returns tables that are consistent with the Fallback Alphabet on the basis that code that is using an 8bit text interface is probably dealing with 8bit text, so one can’t even hope that Territory adapting to Alphabet would help… I think the only approach that makes sense for RISC OS is mixed-mode: Code must read and adapt to the current Alphabet, especially “UTF8” [sic] When Alphabet=111, text that looks like valid UTF-8 must be treated as being Unicode When Alphabet=111, text that is not valid UTF-8 must be treated as being in the Fallback Alphabet Optionally (and this would have to be a separate configuration option) when in an 8bit Alphabet, filenames that appear to be a valid UTF-8 sequence ought to be treated as Unicode for case-folding purposes There’s a good argument for delegating all filename matching to a Unicode-aware module to centralise and standardise the above heuristic, rather than continuing to insist that random authors (RPCEmu) do random things (HostFS) and kill files (FileSwitch/FileCore) as a result.

Mar 22, 2023 12:41pm nemo (145) 2546 posts	Rick tried Sara Áine That is not the ordering that Territory_Collate would return for the UK Territory. But as you’re aware, Japanese dictionary ordering puts borrowed Chinese characters (kanji) in a very different order to that in Chinese; In the UK Territory Å would be adjacent to A but in Swedish it would follow Z; And Turkey’s use of I ⇒ ı and İ ⇒ i is different again. Let’s not even mention Estonian (äöõü come after w). However, if we are talking about the UK, then Welsh ordering can be so surprising to you invading Englanders that you change all the place names. But sort ordering, however perplexing, does not lead to data loss.

Mar 23, 2023 5:51pm Rick Murray (539) 13840 posts	That is not the ordering that Territory_Collate would return for the UK Territory. I didn’t say it was. I was running with the idea of “a sequence of character values order” and showing how that can go wrong just with names one might encounter in the UK. And, as you point out, different countries have different conventions regarding how to do it, so there’s no one size fits all, even for basic western Latin. Japanese dictionary ordering puts borrowed Chinese characters (kanji) in a very different order I know it’s different, but not sure of the specifics. I’m guessing it’s going by the (primary) Japanese kun reading, rather than the Chinese on reading. be so surprising to you invading Englanders Ah, the Celtic languages. That with written Irish and English people need a paracetamol and a pillow. ;) text that is not valid UTF-8 must be treated as being in the Fallback Alphabet I’ve said this plenty of times. Because in this way it would be “safe” to switch the system to UTF-8, and things would magically still work for Latin1 applications, which can (hopefully) transition. As long as “stuff will break” is the way it goes, there’s not really any impetus for changing anything. I mean, I think we’ve been having this argument for about a decade now and I ain’t getting any younger… What this means is that whereas FileSwitch views Thïs and tHÏS as the same file, HostFS does not, but Windows does. Is that not a bug? I can understand case issues on a host that has case sensitivity, but to bung something that is case sensitive in between two things that are not seems quite bizarre, bordering on obtuse.

Mar 24, 2023 1:04pm nemo (145) 2546 posts	Is that not a bug? Well, if this were an operating system with reliable and complete documentation and specifications with verified reference implementations against which one could compare, you’d have some hope of answering that definitively. As it is you’ll have to make do with an opinion: It’s mad.

Mar 24, 2023 1:56pm Rick Murray (539) 13840 posts	you’ll have to make do with an opinion: It’s mad. Since I know you can do this: 😺👍