Filename Translation
Colin Ferris (399) 1814 posts |
Byzantium – Constantinople – Istanbul Surprised they didn’t change ‘London’ when the Romans pulled out. |
Dave Higton (1515) 3526 posts |
Agreed. I find it embarrassing to read. |
GavinWraith (26) 1563 posts |
There you have it. Parchment and ink were expensive items in the middle ages. All kinds of shorthands were invented. A ‘q’ with a horizontal stroke through its vertical stave stood for ‘que’, a very common suffix in Latin. There was a similar shorthand for ‘pro’, ‘per’ and ‘prae’.
It has been done, many times. Paul (Francis) Jennings had some humorous pieces written in Anglish (or Roots English ) in the Observer. See https://hyperleap.com/topic/Linguistic_purism_in_English . All languages at all times have been in constant flux. Only because we give them names, and because they do not change that much in a single lifetime, do we foster the illusion that there are such things. All words are borrowed words, whether from across generations, borders or class divisions, with the exception of newly invented words. Shakespeare was particularly productive. |
Rick Murray (539) 13840 posts |
What I find amusing about all of this is that there’s a lot of whinging about the horrible influence of the Romance languages in English, and how it’s better to be “pure” and use words of Germanic origin. Yeah, well, depending on where you choose to draw the line, Germanic is as foreign as Norman French. English is a mush of everybody else’s words carefully mispronounced. If you want to be pure, resurrect Brittonic, or there’s the door… |
Steve Pampling (1551) 8170 posts |
So, the major difference between UK English and USA English is the difference in the mispronunciation? |
Steve Drain (222) 1620 posts |
There are a lot of nous Étiennes on this forum. ;-) |
Frederick Bambrough (1372) 837 posts |
Isn’t it just a written representation of spoken abbreviations rather than of itself? |
Steve Pampling (1551) 8170 posts |
Isn’t writing just a hardcopy of the spoken word? |
Frederick Bambrough (1372) 837 posts |
<Sigh!> |
GavinWraith (26) 1563 posts |
Reading silently is a relatively modern development, I believe. Modern readers can read the sense without the intervention of the auditory part of the brain. Libraries would otherwise be noisy places; as monastic libraries were, apparently. In ancient times reading silently was remarkable. Going further back, song, dance and gesture were merely performance , usually for the gods, and not separated. |
Clive Semmens (2335) 3276 posts |
Get outside of Latin (or Greek, or Cyrillic, or a handful of others like Georgian and Armenian) and the simplistic “accents are just replacements for missing letters” (or conversely, lots of letters in English are just replacements for missing accents) really doesn’t cut it. Deva Nagari (for Hindi & Nepali) and similar scripts (for other Indian languages, Myanmari and Thai) have little marks that are a bit like accents (or diacritics in general, considering that cedillas and ogoneks aren’t accents) but aren’t in any sense replacements for letters. Never mind Chinese, Japanese or Korean…and I don’t have the faintest idea where one character finishes and the next begins in Arabic. |
Rick Murray (539) 13840 posts |
The two little dodahs that look like quotes change the sounds. For example the character for “to” (say: toe) becomes “do” (say: dough) with the quotes symbol added. |
GavinWraith (26) 1563 posts |
The last speaker of Ubykh is to be heard online somewhere, but the wikipedia article says nothing about written Ubykh, if there ever was any. It would be interesting to see how ASCII could accommodate 84 consonants. Listening to Ubykh leaves one amazed at what the human voice can get up to. |
Chris Mahoney (1684) 2165 posts |
For those with the fonts installed: |
Clive Semmens (2335) 3276 posts |
Crikey. I thought Hindi’s 50 consonants was bad enough. Luckily in my mid-thirties my mouth was still young enough to master their pronunciation – sadly my ears weren’t still young enough to hear/learn the difference between some pairs. |
GavinWraith (26) 1563 posts |
While we wander the backstreets of Aldershot, I remember reading that though the aspirated and palatal dentals of Hindi are found in most of the languages of India, apparently they were not there in early Sanskrit or Proto-IndoEuropean. That suggests that they are a substrate feature. The incredible multiplicity of consonants in Ubykh is presumably a testament to the difficulty of moving around in the mountains of the Caucasus. |
Clive Semmens (2335) 3276 posts |
Aspirated/unaspirated, voiced/unvoiced, dental/palatal. I’ve read the theory they weren’t in early Sanskrit too, but I’ve also read that the idea they weren’t is probably false, and it was just that the script didn’t capture the pronunciations as well as the modern script does. But of course pronunciation shifts over time anyway, and so do the boundaries between the sounds of consonants (and vowels, for that matter – Hindi has ten well-defined vowels that correspond exactly with the ten in the script; my wife has endless trouble with English’s uncountable ill-defined vowels that don’t correspond well at all with the five letters and various digraphs…) |
Andrew McCarthy (3688) 605 posts |
Sigh. Are we moving into the realm of language theory and how it’s spoken versus filename translation? Yes, → Aldershot |
GavinWraith (26) 1563 posts |
I suppose the state-space describing the human utterance (the disposition of the tongue, the force of the breath, … ) is a continuum, and each language quantizes it differently. Even listening to old BBC radio recordings brings home how much that has shifted in English within our lifetime. Without a time machine, little hope for knowing how our ancestors sounded. My late friend, JNI, in Bangladesh, has a niece who is a TV presenter, a classical dancer and a Sanskrit speaker – a beautiful and formidable lady. When introduced to one of Modi’s nationalist MPs she had the ironic satisfaction (being officially Muslim) of correcting his attempts at Sanskrit. |
Clive Semmens (2335) 3276 posts |
I did wonder about that, but on further consideration I think actually the latter ought to take the former into account. Granted, how it’s spoken may not matter much (although I’m not even sure about that), but language theory surely must inform any attempt to sort out filename translation. |
Rick Murray (539) 13840 posts |
If nothing else, this diversion must certainly ram home the idea that “stick to plain ASCII” is hopelessly antiquated. We could come up with all sorts of clever ideas, or plan a way to make things support some form of Unicode and do it correctly the one time… |
GavinWraith (26) 1563 posts |
Apologies for the diversions. For use in Textile and webpages I find the named HTML entities α (α), β (β), γ (γ), . . . .easy to use. They work with NetSurf. Of course they are rather limited. To use these in filenames the ampersand has to be escaped. Not often do you see User Root Directory used in anger in RISC OS. Backwards compatibility must make it impossible to pension symbols off, even from sinecures like this. |
Theo Markettos (89) 919 posts |
Sigh, I get really fed up with thread drift on this forum. For context, my OP cited was not about internationalisation. Displayed filenames should be UTF-8, end of problem. 1980s 8 bit character sets are dead, buried, at the crossroads with a stake of garlic through their heart. Software that doesn’t know UTF-8 shouldn’t cause issues if it doesn’t decide to re-translate the filename into a different encoding (or that allow editing of filenames in an unsafe manner). The filename translation problem is something else. RISC OS has its own file naming conventions which are at odds with the rest of the world, but yet it has to interact with the rest of the world – in filesystems like HostFS and in cross-platform or ported software. Here’s some examples:
The problem of untangling this mess is one that’s particularly painful for command line software which expects to deal with some of these filenames and work out what the user actually meant. But it’s also a problem for software like HostFS which has to interchange files using the Windows/Linux/etc file naming which can be accessed via RISC OS or the native system. It’s a problem that will only grow as more sharing happens between the RISC OS and non-RISC OS side of things. My suggestion was at least to accept this as a problem and to shake out as much ambiguity out as possible. There isn’t a 1:1 translation and I don’t think there will ever be, but to stop everything making their own ad-hoc decisions in this area would be helpful. |
Rick Murray (539) 13840 posts |
I think expecting HTML entities to work on any filing system is crossing the line. Best just don’t.
The reason for the diversion into international issues is that these days other filing systems can handle names that just cannot be represented under RISC OS. I’ll need to look at DOS to see what sort of 8.3 name the Japanese file gets. Probably something really useful like a bunch of question marks. But, yes, I agree, there should be a spec for what sort of translation takes place and what to do if there are multiple files (on the host) that would be translated into one file on RISC OS. |
Stuart Swales (8827) 1357 posts |
Just because it’s something that you don’t use… |