Territory_Collate
Steve Drain (222) 1620 posts |
This is more an observation than a suggested alteration, but it might stir some comment. ;-) I have used Territory_Collate a few times in BASIC to check the equality of strings, for its ability to ignore case or accents. The Basalt The PRM says: “You should always use this call to compare strings”, and I see that C function strcoll calls it. So far so good. This last week, in a BASIC/Basalt context 1, I have been trying to use it to compare strings for order, and could not see why it would not do what I was expecting. Of course, I was expecting the order to be based on character numbers as it is in BASIC comparisons, so “A”<“a” and “B”<“a” etc; I found “A”<“a” but "B">"a". ;-( I therefore wrote my own collate routine, but followed up by looking at the source. I found that things were rather more elaborate than my simple requirement, and indeed, the second character order above is specified in the UK territory: ; OSmith 29-Apr-92 Collation sequence is as follows: ; 01234567890123456789012345678901 ; [SP][NBSP]!"#$%&'()*+,-[SHY]./0123456789:;<=>? ; @AaÂâÄäÁáÀàÃãÅåÆæBbCcÇçDdÐðEeÊêËëÉéÈèFfžŸGgHhIiÎîÏïÍíÌìJjKkLlMm ; NnÑñOoÔôÖöÓóÒòÕõš›ØøPpQqRrSsßTtÞþUuÛûÜüÚúÙùVvWw‚XxYy…†ÿÝýZz[\]^_ ; `{|}~[DEL] ; €ƒ„‡ˆ‰Š‹ŒŽ‘’“”•–—˜™œ ; ¡¢£¤¥¦§¨©ª«¬®¯°±¹²³´µ¶·¸º»¼½¾¿ ; × ; ÷ I suppose my question is, how is such an order decided? Is there agreement across operating systems? 1 Writing a |
Jeffrey Lee (213) 6048 posts |
There are standards for how to perform collation (e.g. ISO 14651), including a file format that can describe the collation rules so that a general-purpose algorithm can be used to perform the collation. But I don’t think RISC OS follows any of the standards (I suspect most/all of them didn’t even exist when the code was first written). E.g. the accent ordering in the collation sequence you posted above doesn’t seem to match the European ordering rules |
Steve Drain (222) 1620 posts |
Thanks for the references Jeffrey. It answers my question, but probably opens up others.
The date of most of the Territory source appears to be 1992, so that is likely. ;-) |
Steve Pampling (1551) 8170 posts |
Earliest of the 14651 appears to be abou 2001, the precursor is the original Unicode standards which appeared in the 1993 – 1999 period. |