RISC OS Open: Forum: Filename Translation

Jun 6, 2021 8:51am

Colin Ferris (399) 1814 posts

Byzantium – Constantinople – Istanbul

Surprised they didn’t change ‘London’ when the Romans pulled out.

Jun 6, 2021 8:58am

Dave Higton (1515) 3526 posts

How painfully, embarrassingly, Little Englander.

Agreed. I find it embarrassing to read.

Jun 6, 2021 9:08am

GavinWraith (26) 1563 posts

I’m now suspecting that the other accents are all a means of using a letter or two less.

There you have it. Parchment and ink were expensive items in the middle ages. All kinds of shorthands were invented. A ‘q’ with a horizontal stroke through its vertical stave stood for ‘que’, a very common suffix in Latin. There was a similar shorthand for ‘pro’, ‘per’ and ‘prae’.

It would be interesting to strip the borrowed words from English (and French) and see what is left.

It has been done, many times. Paul (Francis) Jennings had some humorous pieces written in Anglish (or Roots English ) in the Observer. See https://hyperleap.com/topic/Linguistic_purism_in_English .

All languages at all times have been in constant flux. Only because we give them names, and because they do not change that much in a single lifetime, do we foster the illusion that there are such things. All words are borrowed words, whether from across generations, borders or class divisions, with the exception of newly invented words. Shakespeare was particularly productive.

Jun 6, 2021 9:35am

Rick Murray (539) 13840 posts

Paul (Francis) Jennings had some humorous pieces written in Anglish (or Roots English ) in the Observer.

What I find amusing about all of this is that there’s a lot of whinging about the horrible influence of the Romance languages in English, and how it’s better to be “pure” and use words of Germanic origin.

Yeah, well, depending on where you choose to draw the line, Germanic is as foreign as Norman French.

English is a mush of everybody else’s words carefully mispronounced.

If you want to be pure, resurrect Brittonic, or there’s the door…

Jun 6, 2021 10:03am

Steve Pampling (1551) 8170 posts

English is a mush of everybody else’s words carefully mispronounced.

So, the major difference between UK English and USA English is the difference in the mispronunciation?

Jun 6, 2021 10:21am

Steve Drain (222) 1620 posts

I didn’t know about that aspect of French … just put the “s” back in.

There are a lot of nous Étiennes on this forum. ;-)

Jun 6, 2021 10:52am

Frederick Bambrough (1372) 837 posts

I’m now suspecting that the other accents are all a means of using a letter or two less. Like “aren’t” and “I’m” just now

Isn’t it just a written representation of spoken abbreviations rather than of itself?

Jun 6, 2021 11:23am

Steve Pampling (1551) 8170 posts

Isn’t it just a written representation of spoken abbreviations rather than of itself?

Isn’t writing just a hardcopy of the spoken word?

Jun 6, 2021 11:52am

Frederick Bambrough (1372) 837 posts

<Sigh!>

Jun 6, 2021 1:13pm

GavinWraith (26) 1563 posts

Reading silently is a relatively modern development, I believe. Modern readers can read the sense without the intervention of the auditory part of the brain. Libraries would otherwise be noisy places; as monastic libraries were, apparently. In ancient times reading silently was remarkable. Going further back, song, dance and gesture were merely performance , usually for the gods, and not separated.

Jun 6, 2021 6:29pm

Clive Semmens (2335) 3276 posts

Get outside of Latin (or Greek, or Cyrillic, or a handful of others like Georgian and Armenian) and the simplistic “accents are just replacements for missing letters” (or conversely, lots of letters in English are just replacements for missing accents) really doesn’t cut it.

Deva Nagari (for Hindi & Nepali) and similar scripts (for other Indian languages, Myanmari and Thai) have little marks that are a bit like accents (or diacritics in general, considering that cedillas and ogoneks aren’t accents) but aren’t in any sense replacements for letters.

Never mind Chinese, Japanese or Korean…and I don’t have the faintest idea where one character finishes and the next begins in Arabic.

Jun 6, 2021 7:01pm

Rick Murray (539) 13840 posts

Japanese

The two little dodahs that look like quotes change the sounds. For example the character for “to” (say: toe) becomes “do” (say: dough) with the quotes symbol added.
The little circle like a degree symbol changes the sound from, say, “ho” (say: hoe) to “po” (say: poe).
Wiki a table of katakana or hiragana for examples.
Kanji (the Chinese characters) work differently.

Jun 6, 2021 9:07pm

GavinWraith (26) 1563 posts

The last speaker of Ubykh is to be heard online somewhere, but the wikipedia article says nothing about written Ubykh, if there ever was any. It would be interesting to see how ASCII could accommodate 84 consonants. Listening to Ubykh leaves one amazed at what the human voice can get up to.

Jun 6, 2021 9:58pm

Chris Mahoney (1684) 2165 posts

The two little dodahs that look like quotes change the sounds. For example the character for “to” (say: toe) becomes “do” (say: dough) with the quotes symbol added.
The little circle like a degree symbol changes the sound from, say, “ho” (say: hoe) to “po” (say: poe).

For those with the fonts installed:
と、ど、ほ、ぽ

Jun 7, 2021 7:15am

Clive Semmens (2335) 3276 posts

It would be interesting to see how ASCII could accommodate 84 consonants.

Crikey. I thought Hindi’s 50 consonants was bad enough. Luckily in my mid-thirties my mouth was still young enough to master their pronunciation – sadly my ears weren’t still young enough to hear/learn the difference between some pairs.

Jun 7, 2021 8:42am

GavinWraith (26) 1563 posts

While we wander the backstreets of Aldershot, I remember reading that though the aspirated and palatal dentals of Hindi are found in most of the languages of India, apparently they were not there in early Sanskrit or Proto-IndoEuropean. That suggests that they are a substrate feature. The incredible multiplicity of consonants in Ubykh is presumably a testament to the difficulty of moving around in the mountains of the Caucasus.

Jun 7, 2021 10:00am

Clive Semmens (2335) 3276 posts

though the aspirated and palatal dentals of Hindi are found in most of the languages of India, apparently they were not there in early Sanskrit or Proto-IndoEuropean

Aspirated/unaspirated, voiced/unvoiced, dental/palatal.

I’ve read the theory they weren’t in early Sanskrit too, but I’ve also read that the idea they weren’t is probably false, and it was just that the script didn’t capture the pronunciations as well as the modern script does. But of course pronunciation shifts over time anyway, and so do the boundaries between the sounds of consonants (and vowels, for that matter – Hindi has ten well-defined vowels that correspond exactly with the ten in the script; my wife has endless trouble with English’s uncountable ill-defined vowels that don’t correspond well at all with the five letters and various digraphs…)

Jun 7, 2021 10:19am

Andrew McCarthy (3688) 605 posts

Sigh. Are we moving into the realm of language theory and how it’s spoken versus filename translation? Yes, → Aldershot

Jun 7, 2021 10:38am

GavinWraith (26) 1563 posts

I suppose the state-space describing the human utterance (the disposition of the tongue, the force of the breath, … ) is a continuum, and each language quantizes it differently. Even listening to old BBC radio recordings brings home how much that has shifted in English within our lifetime. Without a time machine, little hope for knowing how our ancestors sounded. My late friend, JNI, in Bangladesh, has a niece who is a TV presenter, a classical dancer and a Sanskrit speaker – a beautiful and formidable lady. When introduced to one of Modi’s nationalist MPs she had the ironic satisfaction (being officially Muslim) of correcting his attempts at Sanskrit.

Jun 7, 2021 11:43am

Clive Semmens (2335) 3276 posts

Are we moving into the realm of language theory and how it’s spoken versus filename translation?

I did wonder about that, but on further consideration I think actually the latter ought to take the former into account. Granted, how it’s spoken may not matter much (although I’m not even sure about that), but language theory surely must inform any attempt to sort out filename translation.

Jun 7, 2021 12:08pm

Rick Murray (539) 13840 posts

surely must inform any attempt to sort out filename translation.

If nothing else, this diversion must certainly ram home the idea that “stick to plain ASCII” is hopelessly antiquated. We could come up with all sorts of clever ideas, or plan a way to make things support some form of Unicode and do it correctly the one time…

Jun 7, 2021 1:05pm

GavinWraith (26) 1563 posts

Apologies for the diversions. For use in Textile and webpages I find the named HTML entities

&alpha; (α), &beta; (β), &gamma; (γ), . . . .

easy to use. They work with NetSurf. Of course they are rather limited. To use these in filenames the ampersand has to be escaped. Not often do you see User Root Directory used in anger in RISC OS. Backwards compatibility must make it impossible to pension symbols off, even from sinecures like this.

Jun 7, 2021 1:36pm

Theo Markettos (89) 919 posts

Sigh, I get really fed up with thread drift on this forum.

For context, my OP cited was not about internationalisation. Displayed filenames should be UTF-8, end of problem. 1980s 8 bit character sets are dead, buried, at the crossroads with a stake of garlic through their heart. Software that doesn’t know UTF-8 shouldn’t cause issues if it doesn’t decide to re-translate the filename into a different encoding (or that allow editing of filenames in an unsafe manner).

The filename translation problem is something else. RISC OS has its own file naming conventions which are at odds with the rest of the world, but yet it has to interact with the rest of the world – in filesystems like HostFS and in cross-platform or ported software. Here’s some examples:

file
file.txt
file.txt
file,fff
file.txt,fff
file.txt,ffb
file/txt
file/txt,ffb
.file
dir.file
dir/file.txt
dir/file.txt,faf
/dirA/dirB/file.txt
$.file.dir
:4.$.file.txt
server:dir/file
sysvar:dir.file

The problem of untangling this mess is one that’s particularly painful for command line software which expects to deal with some of these filenames and work out what the user actually meant. But it’s also a problem for software like HostFS which has to interchange files using the Windows/Linux/etc file naming which can be accessed via RISC OS or the native system.

It’s a problem that will only grow as more sharing happens between the RISC OS and non-RISC OS side of things.

My suggestion was at least to accept this as a problem and to shake out as much ambiguity out as possible. There isn’t a 1:1 translation and I don’t think there will ever be, but to stop everything making their own ad-hoc decisions in this area would be helpful.

Jun 7, 2021 2:08pm

Rick Murray (539) 13840 posts

To use these in filenames the ampersand has to be escaped.

I think expecting HTML entities to work on any filing system is crossing the line. Best just don’t.

stop everything making their own ad-hoc decisions in this area would be helpful.

The reason for the diversion into international issues is that these days other filing systems can handle names that just cannot be represented under RISC OS. I’ll need to look at DOS to see what sort of 8.3 name the Japanese file gets. Probably something really useful like a bunch of question marks.

But, yes, I agree, there should be a spec for what sort of translation takes place and what to do if there are multiple files (on the host) that would be translated into one file on RISC OS.

Jun 7, 2021 2:12pm

Stuart Swales (8827) 1357 posts

Not often do you see User Root Directory used in anger in RISC OS

Just because it’s something that you don’t use…

Filename Translation

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Jun 6, 2021 8:51am Colin Ferris (399) 1814 posts	Byzantium – Constantinople – Istanbul Surprised they didn’t change ‘London’ when the Romans pulled out.

Jun 6, 2021 8:58am Dave Higton (1515) 3526 posts	How painfully, embarrassingly, Little Englander. Agreed. I find it embarrassing to read.

Jun 6, 2021 9:08am GavinWraith (26) 1563 posts	I’m now suspecting that the other accents are all a means of using a letter or two less. There you have it. Parchment and ink were expensive items in the middle ages. All kinds of shorthands were invented. A ‘q’ with a horizontal stroke through its vertical stave stood for ‘que’, a very common suffix in Latin. There was a similar shorthand for ‘pro’, ‘per’ and ‘prae’. It would be interesting to strip the borrowed words from English (and French) and see what is left. It has been done, many times. Paul (Francis) Jennings had some humorous pieces written in Anglish (or Roots English ) in the Observer. See https://hyperleap.com/topic/Linguistic_purism_in_English . All languages at all times have been in constant flux. Only because we give them names, and because they do not change that much in a single lifetime, do we foster the illusion that there are such things. All words are borrowed words, whether from across generations, borders or class divisions, with the exception of newly invented words. Shakespeare was particularly productive.

Jun 6, 2021 9:35am Rick Murray (539) 13840 posts	Paul (Francis) Jennings had some humorous pieces written in Anglish (or Roots English ) in the Observer. What I find amusing about all of this is that there’s a lot of whinging about the horrible influence of the Romance languages in English, and how it’s better to be “pure” and use words of Germanic origin. Yeah, well, depending on where you choose to draw the line, Germanic is as foreign as Norman French. English is a mush of everybody else’s words carefully mispronounced. If you want to be pure, resurrect Brittonic, or there’s the door…

Jun 6, 2021 10:03am Steve Pampling (1551) 8170 posts	English is a mush of everybody else’s words carefully mispronounced. So, the major difference between UK English and USA English is the difference in the mispronunciation?

Jun 6, 2021 10:21am Steve Drain (222) 1620 posts	I didn’t know about that aspect of French … just put the “s” back in. There are a lot of nous Étiennes on this forum. ;-)

Jun 6, 2021 10:52am Frederick Bambrough (1372) 837 posts	I’m now suspecting that the other accents are all a means of using a letter or two less. Like “aren’t” and “I’m” just now Isn’t it just a written representation of spoken abbreviations rather than of itself?

Jun 6, 2021 11:23am Steve Pampling (1551) 8170 posts	Isn’t it just a written representation of spoken abbreviations rather than of itself? Isn’t writing just a hardcopy of the spoken word?

Jun 6, 2021 11:52am Frederick Bambrough (1372) 837 posts	<Sigh!>

Jun 6, 2021 1:13pm GavinWraith (26) 1563 posts	Reading silently is a relatively modern development, I believe. Modern readers can read the sense without the intervention of the auditory part of the brain. Libraries would otherwise be noisy places; as monastic libraries were, apparently. In ancient times reading silently was remarkable. Going further back, song, dance and gesture were merely performance , usually for the gods, and not separated.

Jun 6, 2021 6:29pm Clive Semmens (2335) 3276 posts	Get outside of Latin (or Greek, or Cyrillic, or a handful of others like Georgian and Armenian) and the simplistic “accents are just replacements for missing letters” (or conversely, lots of letters in English are just replacements for missing accents) really doesn’t cut it. Deva Nagari (for Hindi & Nepali) and similar scripts (for other Indian languages, Myanmari and Thai) have little marks that are a bit like accents (or diacritics in general, considering that cedillas and ogoneks aren’t accents) but aren’t in any sense replacements for letters. Never mind Chinese, Japanese or Korean…and I don’t have the faintest idea where one character finishes and the next begins in Arabic.

Jun 6, 2021 7:01pm Rick Murray (539) 13840 posts	Japanese The two little dodahs that look like quotes change the sounds. For example the character for “to” (say: toe) becomes “do” (say: dough) with the quotes symbol added. The little circle like a degree symbol changes the sound from, say, “ho” (say: hoe) to “po” (say: poe). Wiki a table of katakana or hiragana for examples. Kanji (the Chinese characters) work differently.

Jun 6, 2021 9:07pm GavinWraith (26) 1563 posts	The last speaker of Ubykh is to be heard online somewhere, but the wikipedia article says nothing about written Ubykh, if there ever was any. It would be interesting to see how ASCII could accommodate 84 consonants. Listening to Ubykh leaves one amazed at what the human voice can get up to.

Jun 6, 2021 9:58pm Chris Mahoney (1684) 2165 posts	The two little dodahs that look like quotes change the sounds. For example the character for “to” (say: toe) becomes “do” (say: dough) with the quotes symbol added. The little circle like a degree symbol changes the sound from, say, “ho” (say: hoe) to “po” (say: poe). For those with the fonts installed: と、ど、ほ、ぽ

Jun 7, 2021 7:15am Clive Semmens (2335) 3276 posts	It would be interesting to see how ASCII could accommodate 84 consonants. Crikey. I thought Hindi’s 50 consonants was bad enough. Luckily in my mid-thirties my mouth was still young enough to master their pronunciation – sadly my ears weren’t still young enough to hear/learn the difference between some pairs.

Jun 7, 2021 8:42am GavinWraith (26) 1563 posts	While we wander the backstreets of Aldershot, I remember reading that though the aspirated and palatal dentals of Hindi are found in most of the languages of India, apparently they were not there in early Sanskrit or Proto-IndoEuropean. That suggests that they are a substrate feature. The incredible multiplicity of consonants in Ubykh is presumably a testament to the difficulty of moving around in the mountains of the Caucasus.

Jun 7, 2021 10:00am Clive Semmens (2335) 3276 posts	though the aspirated and palatal dentals of Hindi are found in most of the languages of India, apparently they were not there in early Sanskrit or Proto-IndoEuropean Aspirated/unaspirated, voiced/unvoiced, dental/palatal. I’ve read the theory they weren’t in early Sanskrit too, but I’ve also read that the idea they weren’t is probably false, and it was just that the script didn’t capture the pronunciations as well as the modern script does. But of course pronunciation shifts over time anyway, and so do the boundaries between the sounds of consonants (and vowels, for that matter – Hindi has ten well-defined vowels that correspond exactly with the ten in the script; my wife has endless trouble with English’s uncountable ill-defined vowels that don’t correspond well at all with the five letters and various digraphs…)

Jun 7, 2021 10:19am Andrew McCarthy (3688) 605 posts	Sigh. Are we moving into the realm of language theory and how it’s spoken versus filename translation? Yes, → Aldershot

Jun 7, 2021 10:38am GavinWraith (26) 1563 posts	I suppose the state-space describing the human utterance (the disposition of the tongue, the force of the breath, … ) is a continuum, and each language quantizes it differently. Even listening to old BBC radio recordings brings home how much that has shifted in English within our lifetime. Without a time machine, little hope for knowing how our ancestors sounded. My late friend, JNI, in Bangladesh, has a niece who is a TV presenter, a classical dancer and a Sanskrit speaker – a beautiful and formidable lady. When introduced to one of Modi’s nationalist MPs she had the ironic satisfaction (being officially Muslim) of correcting his attempts at Sanskrit.

Jun 7, 2021 11:43am Clive Semmens (2335) 3276 posts	Are we moving into the realm of language theory and how it’s spoken versus filename translation? I did wonder about that, but on further consideration I think actually the latter ought to take the former into account. Granted, how it’s spoken may not matter much (although I’m not even sure about that), but language theory surely must inform any attempt to sort out filename translation.

Jun 7, 2021 12:08pm Rick Murray (539) 13840 posts	surely must inform any attempt to sort out filename translation. If nothing else, this diversion must certainly ram home the idea that “stick to plain ASCII” is hopelessly antiquated. We could come up with all sorts of clever ideas, or plan a way to make things support some form of Unicode and do it correctly the one time…

Jun 7, 2021 1:05pm GavinWraith (26) 1563 posts	Apologies for the diversions. For use in Textile and webpages I find the named HTML entities α (α), β (β), γ (γ), . . . . easy to use. They work with NetSurf. Of course they are rather limited. To use these in filenames the ampersand has to be escaped. Not often do you see User Root Directory used in anger in RISC OS. Backwards compatibility must make it impossible to pension symbols off, even from sinecures like this.

Jun 7, 2021 1:36pm Theo Markettos (89) 919 posts	Sigh, I get really fed up with thread drift on this forum. For context, my OP cited was not about internationalisation. Displayed filenames should be UTF-8, end of problem. 1980s 8 bit character sets are dead, buried, at the crossroads with a stake of garlic through their heart. Software that doesn’t know UTF-8 shouldn’t cause issues if it doesn’t decide to re-translate the filename into a different encoding (or that allow editing of filenames in an unsafe manner). The filename translation problem is something else. RISC OS has its own file naming conventions which are at odds with the rest of the world, but yet it has to interact with the rest of the world – in filesystems like HostFS and in cross-platform or ported software. Here’s some examples: file file.txt file.txt file,fff file.txt,fff file.txt,ffb file/txt file/txt,ffb .file dir.file dir/file.txt dir/file.txt,faf /dirA/dirB/file.txt $.file.dir :4.$.file.txt server:dir/file sysvar:dir.file The problem of untangling this mess is one that’s particularly painful for command line software which expects to deal with some of these filenames and work out what the user actually meant. But it’s also a problem for software like HostFS which has to interchange files using the Windows/Linux/etc file naming which can be accessed via RISC OS or the native system. It’s a problem that will only grow as more sharing happens between the RISC OS and non-RISC OS side of things. My suggestion was at least to accept this as a problem and to shake out as much ambiguity out as possible. There isn’t a 1:1 translation and I don’t think there will ever be, but to stop everything making their own ad-hoc decisions in this area would be helpful.

Jun 7, 2021 2:08pm Rick Murray (539) 13840 posts	To use these in filenames the ampersand has to be escaped. I think expecting HTML entities to work on any filing system is crossing the line. Best just don’t. stop everything making their own ad-hoc decisions in this area would be helpful. The reason for the diversion into international issues is that these days other filing systems can handle names that just cannot be represented under RISC OS. I’ll need to look at DOS to see what sort of 8.3 name the Japanese file gets. Probably something really useful like a bunch of question marks. But, yes, I agree, there should be a spec for what sort of translation takes place and what to do if there are multiple files (on the host) that would be translated into one file on RISC OS.

Jun 7, 2021 2:12pm Stuart Swales (8827) 1357 posts	Not often do you see User Root Directory used in anger in RISC OS Just because it’s something that you don’t use…