Important software compatibility notice
Steve Drain (222) 1620 posts |
If that is VARINT it is not the same sub-routine I have seen in earlier versions of BASIC V, nor is it like the other routines that load words from unaligned addresses, such as SIBs and Floats. Has someone been re-writing it? Even so, I do not immediately see where the un-aligned load is, but then I may be showing how backward I have become. I still recognise that the BASIC program has attempted to access zero-page, so ZeroPain has to do something. I am getting worried by this change. I can appreciate the technical necessity for removing the zero-page limitations, but I fear for the software that will be lost along the way. I am inclined to be an ostrich and bury my head in the sand and hope it goes away. ;-( |
Jeffrey Lee (213) 6048 posts |
Correct. For simplicity I didn’t bother to add support for unaligned loads (except LDRB). Hmm, good point. I should probably fix that. Of course, for 99% of the addresses ZeroPain emulates, alignment doesn’t matter anyway, since all it does is return zero! |
Martin Avison (27) 1494 posts |
I think VARINT uses a macro LDW (tho I cannot yet find where it is defined) to LoaD a Word, which uses different code depending on which machine it is built for. The Pi version does not use unaligned loads, the Iyonix version does. The standalone version will run on any machine. AFAIK. |
Steve Drain (222) 1620 posts |
Now I do; I was getting my EQ and NE crossed. Why does the code do that? All the routines I have looked at in older versions of BASIC get unaligned words without doing unaligned loads. Of course, I may have missed some, but for Basalt I looked at a range of options for this common requirement and only one, from 1988 and not in BASIC, actually used that method. I am getting too old. ;-) |
Steve Drain (222) 1620 posts |
It is such a trivial penalty to never use unaligned loads I cannot see why the distinction is even made. The Castle version of BASIC I have from a long while back does not do so. |
Martin Avison (27) 1494 posts |
Found LDW – in castle.RiscOS.Sources.Programmer.HdrSrc.hdr.Macros |
Adrian Lees (1349) 122 posts |
I shalln’t arrogantly assert that Aemulor does not unintentionally read from zero page locations somewhere in its codebase, but it does deliberately read and modify a couple of kernel workspace locations because there is no other way to achieve what it does and to keep the system stable; it is perfectly happy with raised processor vectors and the first 1K protected from writes (RaiseProcVecs and Prot1K which I wrote as development aids over a decade ago) which catch all realistic unintentional accesses (ie. bugs) since NULL pointer dereferences will lie within this region and structures are rarely as large as that. I must also now assert that Aemulor will honour any such attempted accesses from 26-bit software and is apt to be blamed (by either diagnostic/fault-reporting tools or users) as a result. Without exercising every code path of any bit of software, which I consider to be the task of the developer and not the user, it is impossible to be confident that it does not contain such faults, and any code that catches/logs/fixes up such faults can only aspire to restore function; inevitably there will be cases for which it cannot work. Useful on the machines of those developers still writing software, far less so for the user who depends upon software that does not immediately get modified and re-released. Wearing my ‘RISC OS user’ hat, there are far greater deficiencies in the platform that I would prefer to see addressed, and even wearing my developer hat, far greater causes of potential sytem instability, but I shall be interested to see what fraction of the unintentional accesses exposed actually constitute a stability threat either to the culprit or to the rest of the system. |
David Pitt (102) 743 posts |
Rightly or wrongly I take the response to mean that Aemulor should in, general terms, be good with the zero page protection though issues could be expected. In my original post I was trying not to apportion blame, I don’t know whether the issue is to with Aemulor or ZeroPain. I cannot get AemulorPro 2.34 to even load, no 26bit software is involved. To confirm my original report I have installed the high vectors ROM OS5.23 (05-Jul-15) on the Iyonix only to find that AemulorPro 2.34 will not load there either. The first attempt aborts and a second attempt gives an “overlapping areas” error, and there is pain. It’s all good on OS5.22. I can further confirm that Aemulor will not start on the Raspberry Pi even if !Boot has not been run. From the Iyonix :- 12 Jul 07:07:20 000 80000002: Error from (unknown): Internal error: abort on data transfer at &02113E70 12 Jul 07:07:54 000 000001C6: Error from (unknown): Overlapping areas Time: Sun Jul 12 07:07:20 2015 Location: Offset 0001f9dc in module Aemulor Current Wimp task: Unknown Last app to start: BASIC -quit "ADFS::Iyonix4.$.Boot.Resources.ToBeRun.ptr_col" R0 = 00000390 R1 = 00000000 R2 = faff33fc R3 = 202c9a90 R4 = fc019794 R5 = 202c9a28 R6 = 00000028 R7 = 202c9b10 R8 = 00000000 R9 = 01c00000 R10 = e1a0f00e R11 = 2027423c R12 = fa207c58 R13 = fa207c54 R14 = 20283364 R15 = 20293658 DFAR = 00000394 Mode SVC32 Flags nZCv if PSR = 60000013 20293610 : e0804004 : ADD R4,R0,R4 20293614 : ea000007 : B &20293638 20293618 : 202c9ae8 : EORCS R9,R12,R8,ROR #21 2029361c : 202c9b10 : EORCS R9,R12,R0,LSL R11 20293620 : 6c697455 : STCVSL CP4,C7,[R9],#-340 20293624 : 4d797469 : LDCMIL CP4,C7,[R9,#-420]! 20293628 : 6c75646f : LDCVSL CP4,C6,[R5],#-444 2029362c : 00000065 : ANDEQ R0,R0,R5,RRX 20293630 : e59d0004 : LDR R0,[R13,#4] 20293634 : e5b04198 : LDR R4,[R0,#408]! 20293638 : e5950064 : LDR R0,[R5,#100] 2029363c : e35000aa : CMP R0,#&AA ; ="™" 20293640 : b3a00e26 : MOVLT R0,#&0260 ; =608 20293644 : a3a00e39 : MOVGE R0,#&0390 ; =912 20293648 : e5850040 : STR R0,[R5,#64] 2029364c : e3a09507 : MOV R9,#&01C00000 20293650 * e5901004 * LDR R1,[R0,#4] 20293654 : e3710001 : CMN R1,#1 20293658 : 1a000004 : BNE &20293670 2029365c : e5951068 : LDR R1,[R5,#104] 20293660 : e2812902 : ADD R2,R1,#&8000 20293664 : e5901014 : LDR R1,[R0,#20] 20293668 : e1520001 : CMP R2,R1 2029366c : 05a09014 : STREQ R9,[R0,#20]! 20293670 : e1a02006 : MOV R2,R6 20293674 : e1a01004 : MOV R1,R4 20293678 : e1a00007 : MOV R0,R7 2029367c : eb000c51 : BL &202967C8 20293680 : e3500000 : CMP R0,#0 20293684 : 12848a02 : ADDNE R8,R4,#&2000 20293688 : 12444a02 : SUBNE R4,R4,#&2000 2029368c : 1a000001 : BNE &20293698 -------------------------------------------------------------------------------- Time: Sun Jul 12 07:07:20 2015 Location: Offset 00004fa0 in module Aemulor Current Wimp task: Unknown Last app to start: BASIC -quit "ADFS::Iyonix4.$.Boot.Resources.ToBeRun.ptr_col" R0 = 0000000c R1 = 00000002 R2 = 001e8480 R3 = 00000000 R4 = 00000000 R5 = 202c9a28 R6 = 202c9aac R7 = 20292544 R8 = 202c9a94 R9 = fb407d34 R10 = e1a0f00e R11 = 2027423c R12 = 0bebc200 R13 = fa207c78 R14 = 20293acc R15 = 20278c1c DFAR = 0000000c Mode SVC32 Flags nZCv if PSR = 60000013 20278bd4 : 20278d1c : EORCS R8,R7,R12,LSL R13 20278bd8 : 20278d50 : EORCS R8,R7,R0,ASR R13 20278bdc : 20278d84 : EORCS R8,R7,R4,LSL #27 20278be0 : 20278db8 : Undefined instruction 20278be4 : 20278dec : EORCS R8,R7,R12,ROR #27 20278be8 : 20278e20 : EORCS R8,R7,R0,LSR #28 20278bec : 20278e54 : EORCS R8,R7,R4,ASR R14 20278bf0 : 20278e8c : EORCS R8,R7,R12,LSL #29 20278bf4 : 20278ec4 : EORCS R8,R7,R4,ASR #29 20278bf8 : 20278efc : Undefined instruction 20278bfc : 20278f34 : EORCS R8,R7,R4,LSR PC ; *** Shift by R15 20278c00 : 20278f70 : EORCS R8,R7,R0,ROR PC ; *** Shift by R15 20278c04 : 2027a2a0 : EORCS R10,R7,R0,LSR #5 20278c08 : 20278fac : EORCS R8,R7,R12,LSR #31 20278c0c : 000e59ff : Undefined instruction 20278c10 : e3a0000c : MOV R0,#&0C ; =12 20278c14 * e4902008 * LDR R2,[R0],#8 20278c18 : e51f3014 : LDR R3,&20278C0C 20278c1c : e1a0c00e : MOV R12,R14 20278c20 : e10f1000 : MRS R1,CPSR 20278c24 : e1330622 : TEQ R3,R2,LSR #12 20278c28 : 01a02a02 : MOVEQ R2,R2,LSL #20 20278c2c : 128f0b02 : ADRNE R0,&20279434 20278c30 : 12800f85 : ADDNE R0,R0,#&0214 ; =532 20278c34 : 00800a22 : ADDEQ R0,R0,R2,LSR #20 20278c38 : e58f0a2c : STR R0,&2027966C 20278c3c : e28f1b05 : ADR R1,&2027A044 20278c40 : e2811f97 : ADD R1,R1,#&025C ; =604 20278c44 : e3a00003 : MOV R0,#3 20278c48 : 158f19f8 : STRNE R1,&20279648 20278c4c : 128f1b02 : ADRNE R1,&20279454 20278c50 : 12811e1f : ADDNE R1,R1,#&01F0 ; =496 -------------------------------------------------------------------------------- |
Adrian Lees (1349) 122 posts |
Please do not clutter the forums with Aemulor-related posts. It will not work until I have had time to study what has been changed in RISC OS and adapt Aemulor accordingly. Aemulor handles many changes to the OS, by using pattern-matching techniques when it has to resort to low-level patching because there simply isn’t an OS API call to do what’s required. Current versions, however, cannot cope with the kernel workspace disappearing and moving to another address. In the specific case of Aemulor, it is the memory-remapping required to execute 26-bit code and shrinking the application space to 28MB. It is not lazy/erroneous coding; were this patching not performed, the system would crash as soon as any 26- or 32-bit application exceeded that. Other low-level system extension code that adds functionality to the OS could be subject to breakage, hopefully when first loaded rather than unpredictably during use, but users will be reliant upon the developer still being active. (Aside: In point of fact, it is not possible to load Aemulor without running 26-bit code, because it supplies some 26-bit modules for use by emulated appiications.) |
Steve Pampling (1551) 8170 posts |
I did wonder whether this change would break Aemulor as the change hits at the heart of the patching being done by Aemulor to allow 26-bit apps/modules to operate. |
Jeffrey Lee (213) 6048 posts |
SystemDisc and NewsUK are two things affected by this change too. I’m unsure how or why NewsUK is affected, my guess was the FPEmulator but there are fixes for that… It looks like NewsUK is calling window_extract_gadget_info with a bad object template – the ‘templ’ parameter just points to a buffer full of zeros. |
Bryn Evans (2091) 31 posts |
I have to add another ‘not working’ app to the list ! RiscOSM, the Mapping program fails to start, with errors, if I try to use it This is on a Raspberry Pi 1 with Zero Pain installed in both cases. The Authors of RiscOSM have been notified of the problem. |
Jon Abbott (1421) 2651 posts |
In light of this change, could OS_ReadSysInfo 6 be extended to return the address of VecPtrTab – or some other legal means of requesting current Vector claimants be added? At the minute ADFFS gets the current Vector owners directly from &7D8 – which is configurable in it’s !Boot file so could be changed to it’s new location, but I’d rather use a legal means to get the Vector claimants. I believe OS_ReadSysInfo 6 item 23 returns VecPtrTab in RO6, but it doesn’t look like it’s been added to RO5 yet. |
Jon Abbott (1421) 2651 posts |
At each boot, I get 256kb of logs which repeat the three addresses below for just about every file the Filer runs. It looks like GSREAD_XPandGetNextByte that’s triggering them.
R0 = 00000001 R1 = 80000113 R2 = 60000000 R3 = fa207690 fc01ebc8 : e3a0c000 : MOV R12,#0 Time: Fri Jan 2 00:00:28 1970 R0 = 00000001 R1 = 00000000 R2 = 40000000 R3 = fa207690 fc01ec04 : e52d0004 : STR R0,[R13,#-4]! Time: Fri Jan 2 00:00:28 1970 R0 = 00000001 R1 = 00000000 R2 = 40000000 R3 = fa207690 fc01ec5c : 03822201 : ORREQ R2,R2,#&10000000 |
Malcolm Hussain-Gambles (1596) 811 posts |
@Jeffrey, thanks for that once I get a machine I can use – I’ll fix it (Pandaboard is dead and Pi seems to hard crash on dhcp at the moment) |
Jeffrey Lee (213) 6048 posts |
Sure, I can add VecPtrTab to the list – at the moment the DebugTools module is using a hardcoded address for that (and perhaps a couple of other things), so it would be good to get that fixed.
Interesting! At a guess I’d say that it’s something extra that you’ve added to the boot sequence, rather than a problem with the basic boot sequence itself (if you were running an old boot sequence then I’d expect maybe one or two null pointer deferences, but not one per file). Maybe try using the “last app to start” to narrow down where it’s coming from? It must be something that was loaded after ZeroPain, at least. |
Adrian Lees (1349) 122 posts |
The change has a far greater impact upon the 26-bit apps/modules themselves than upon Aemulor itself, and is antithetical to what Aemulor is trying to do, being predicated on the idea that reads are bugs and that it is possible to change the code to fix them. This obviously is not true for the vast majority of what Aemulor is trying to run and ADFFS will surely encounter similar problems. What concerns me most, however, is that there may well be a sizeable body of 32-bit programs that inadvertently performed harmless reads and will not be updated, diminishing the platform as a whole, whilst providing – IMHO – questionable stability gains for the OS itself. Obviously the kernel should have its own private workspace and be protected against unintended changes, and equally applications that inadvertently read from /and use/ kernel workspace values need to be protected against changes within that workspace, but the fact that Aemulor is still being used 13 years after its creation is proof that that’s not a realistic scenario.
On the subject of, as in the case, software that intentionally reads from and uses kernel workspace data, and particularly software that assumes the data structures employed by the kernel, I really don’t think introducing yet more OS_ReadSysInfo 6 values is the way to go because it’s not addressing the problem. If the software must be modified anyway to restore function, isn’t it time for an API extension/another approach, in an effort to obviate further maintenance of that very same code? |
Jon Abbott (1421) 2651 posts |
Relocating Page Zero shouldn’t affect ADFFS, long term it actually helps as I had to code a load interpreter to check for Page Zero reads. With Page Zero gone it can translate all LDR/LDM and trap Page Zero reads via Aborts, which is an order of magnitude faster. ADFFS doesn’t actually allow any Page Zero reads, it instead interprets them to legal OS values and if it can’t do that reports an “Unhandled read/write in Page Zero” and terminates the program. Of the 50+ games ADFFS gets running on the Pi, I took the route of fixing the game code instead of returning an arbitrary value – primarily so I could see what “legal” reads games were doing in Page Zero. To date, the only “legal” reads I’ve seen are the hardware vectors all the others have been coding errors, mostly C code. At some point, I’ll switch it to return 0 for any unhandled Page Zero reads, once I’m confident it’s handling everything required.
It needs adding anyway, as it’s used in various places – which is probably why it was added in RO6. I’d prefer a legal public means to read the current vector claimant, OS_Claim,vector,0,0 perhaps? |
Jeffrey Lee (213) 6048 posts |
Try not to worry too much about unmaintained software being broken – both myself and ROOL are fully aware of the fact that there’s a lot of it still in use daily. We don’t have any concrete plans for it yet, but we’re almost certainly going to come up with some kind of long-term compatibility solution. Preferably a method of patching the bad code (pro: it’s fixed for good, con: needs developer time per app), or maybe an improved version of ZeroPain (pro: less developer time, can work with unpatchable software, con: potentially results in real problems being hidden, extra system overhead).
Yes, API extensions would certainly be better than directly peeking/poking kernel workspace. So if any developers have ideas for what they’d like to see, feel free to open a discussion about it (I’d guess the wish list or code review forum would be most appropriate). |
Jess Hampshire (158) 865 posts |
A few questions: Zero pain will allow most programs that access zero page to work as normal, while the memory relocation protects the system from any consequences of that access? Program that do this access by accident or poor programming are likely to be the ones that work, while those that deliberately access zero page for low level stuff will fail? The consequence of using a program that needs zero pain to work would be as stable a system as a fixed program without it, but less efficient? Would it not be sensible to for the stable releases to default to maximum compatibility (zero pain, without logging, permit rotated load, etc) while the dev versions do not? (With a configure panel to override the defaults.) |
Rick Murray (539) 13840 posts |
Yes, I suggested this here, pointing out that even if software is technically flawed, if a new “stable” release arrives that appears to break a bunch of things, it would likely be RISC OS itself that gets the blame. I should also point out the length of time it took to get over the unaligned loads change; even today I sometimes come across software that crashes unless I turn the exceptions off. There is less and less of it, but it’s still around here and there. |
Steve Pampling (1551) 8170 posts |
If, and it’s a large “if”, such a setup was incorporated in a release it could present a support burden to ROOL for an indefinite period. |
Jess Hampshire (158) 865 posts |
But if people cannot use software because the goalposts have moved since the software ceased to be developed, people won’t update. Perhaps the stable ROM should start with compatibility running, and the unstable not, and the latest version of !boot would set compatibility early on depending on configure settings that would default to off. (The dev version might not contain the modules or have a logging version.) That way a ROM update to an existing system would have minimum issues, a fresh stable machine might require compatibility settings to be modified for old software. (though hopefully, this could have modified run files to turn on compatibility on the fly). Hopefully the configure panel would have a toggle for the current state of any compatibility feature that could be changed on the fly, as well as a start up state option. |
Chris Evans (457) 1614 posts |
Sorry I don’t understand. Can you expand on that? |
Rick Murray (539) 13840 posts |
Why? Think about it – if ZeroPain “just ran” and didn’t do any logging, the majority of broken stuff would carry on and nobody would be any the wiser as to the fundamental change that just happened. The problem comes when you decide to patch on an app by app basis – that could easily spiral out of control. Something like ZeroPain that patches the memory access and not the app, that’s probably the simplest route… |