Pragmatic suggestions for the zero page restriction incompatibilities
Peter Howkins (211) 236 posts |
It has been approx 20 months since the first release of a beta/nightly build of RISC OS that protected access to the zero page of memory. This caused a significant proportion of previously working RISC OS software to fail. Over the months following this some software has been patched to work again, but still an extremely large proportion of the remaining applications that need to be updated have not been updated. A simple test is to take a basic raspberry pi image and use !Store and !PackMan to download software listed as free and compare the number that work with zero page restriction vs a ROM image without that restriction. At the moment you could not honestly do a ‘stable’ release of RISC OS with this level of software incompatibility. An issue of using the Zero Pain module generating huge logs for misbehaving applications, that otherwise work well, has also been identified. The issue of applications accessing zero page still needs to be addressed as such providing logging of access to developers will continue to be useful. What can be done to resolve this? There are several technical solutions that may work, that offer some or all of the benefits of the zero page work. 1) Move the vectors out of page zero, but do not protect wayward applications accessing page zero. 2) Ship RISC OS with the Zero Pain module enabled as standard with the extra option of disabling logging. Logging should probably be disabled by default on release builds, and enabled by developers who are interested. 3) A more complex and perhaps more long term solution. For each task allocate one extra page of memory to serve as that tasks ‘page zero’. Once again logging errant accesses could be provided. Logging should probably be disabled by default on release builds, and enabled by developers who are interested. ps a SWP emulator using ARMv8 primitives would probably be a useful thing too :) |
Steve Pampling (1551) 8172 posts |
Interesting idea for promoting the idea of both developers and users ignoring the problems in software. Yes some kind of VM/emulator that allows old (buggy) software to run on new kit would be nice1 but stifling the new developments by reverting to something that allows the buggy stuff to continue being buggy is hardly the way to progress. 1 Clone Jon Abbot to get an increase in progress on that front? Or do we need a Jon Jeffrey cross breed? |
Doug Webb (190) 1180 posts |
Don’t we have emulation already with ArcEm/Aemulor/ADFFS etc. What would be good in the short term is to have say an enhanced ArcEm running on a second processor AKA PCCard style with the ability to direct file saves/print functions etc back. !a310Emu allows easy file sharing like the old PC Emulator. I agree that reverting to something that allows buggy software to contine being buggy doesn’t help but as you say Acorn didn’t sort this and neither did a lot of developers and there was a whole lot more of them at the time so some pragmatic approach is reqired with minimal effort though a good VM might be just the thing in the longer term. Though chicken and egg spring to mind in that we need the changes currently being worked on to enact some better long term solutions. |
Steve Pampling (1551) 8172 posts |
ArcEm – only useful for dealing with software that really does come from the Jurassic Aemulor – unfortunately it suffers from problems arising from the ZP and similar changes ADFFS – games support, although it does have elements that Jon could take forward into a different product. Probably the best stepping-stone to what people are asking for. |
Doug Webb (190) 1180 posts |
Given that Aemulor fails to work on even low vector ROMs based on the latest sources then ArcEm is still a viable short term solution until Aemulor/ADFFS are fixed/enhanced. Or perhaps as mentioned a silent version of ZeroPain with a few additions and one release of code for 5.24 and then restrict RISCOS development code access to more experienced users until things have got to a better multicore position with support tools. |
Rick Murray (539) 13850 posts |
Sorry, I’m with Peter here. Logging is a developer thing. Not a user thing. As was pointed out, it’s been a fair old while since ZPP was “enabled by default”. While an errant memory access (null pointer reference) in new code may result in a zero page access, I think by this stage it is probably fairly safe to assume that any software with zero page accesses that hasn’t had some sort of TLC already is probably never going to have it. It can be faked to a fairly large degree with a log-less ZeroPain module. The software is still broken, but it “appears to work” like it always has. Unfortunately, there is still ZPP-unsafe software around. This is why my own custom builds of RISC OS are all low vector. I just want the machine and the software I use frequently to work, not start an argument or bomb out at the slightest mishap. Remember, we must tread lightly here. We lost in the 26→32 transition. We lost some more with the unaligned reads thing. Those were ARM architectural issues. This? This is self inflicted. Do we want to lose more? By all means have a ZeroPain with lots of logging for developers. Let them see what blows up and when and how. |
Martin Avison (27) 1494 posts |
From my limited experience (which does include maintaining released programs is assembler, Basic and C) I have resolved quite a number of ZeroPain problems. However, I will never claim that those programs have NO ZeroPain problems. ZP problems are very difficult to spot from reading the code: they are usually only found from someone doing something that creates a ZeroPain log, and reporting it to the developer. Indeed, even with a ZP log, some bugs can take considerable effort to resolve properly. But they are all bugs and should be fixed. Nevertheless, much un-maintained software is still useful, and time and money limit fixes. However, one has to assume that there may be ZP problems in any program – and this includes RISC OS itself (though possibly in little used parts). Indeed, I found and reported a ZP problem in ADFSFiler very recently! Therefore I suggest vectors should be high so RISC OS can move forwards, logging should not be optional, but code should not be aborted. There may be a case to be able to avoid logging for known ZP problems that will not be fixed. |
Rick Murray (539) 13850 posts |
The difficulty here is that, as yet, there has been no stable release for the Pi. |
Rick Murray (539) 13850 posts |
Well, shouldn’t be too hard to binary-hack the executable to get ZeroPain to dump the log into |
Andrew Conroy (370) 740 posts |
I’m with Rick here. There’s no point having an “It must work 100% properly or not at all” approach to RISC OS and its applications if that means that there’s no users left to develop the OS for! Users just want their software to carry on working, they don’t care why it no longer works, they just know that if it doesn’t work on any given system, then they don’t want to use that system. They stay with their ageing RISC PC/A5000/A3000/A310 until it dies, then they jump ship to PC-land. (For the avoidance of doubt, in case anyone still isn’t sure, I’m speaking personally here, and nothing to do with my employer, although my views are coloured by almost 20yrs of dealing with RISC OS users day in, day out.) |
Bryan Hogan (339) 593 posts |
Apologies for the long quote, but this is exactly what I wanted to say too:
It was described by someone at last month’s ROUGOL meeting as “suicidal”. This buggy software has worked without causing any issues for 10-20 years, so don’t deliberately break it now. I like the idea of giving every task its own (empty) zero page, so they can freely stamp all over it if they want without affecting anything else. Multiprocessor safe too :-) |
Bryan Hogan (339) 593 posts |
Oh yes, and note that the Broken Cog Award this year went to “Everyone who persists in using old hardware when there is such a wide choice of new at all price ranges”. There’s no chance of getting them to upgrade if what you are offering is something that won’t run lots of their old applications, and the one thing that can help with that (Aemulor) also doesn’t work! |
Clive Semmens (2335) 3276 posts |
For what it’s worth: I’m still mostly using an old RC14 with low vectors on my Pi, because it works. It’s restricting my screen resolution to 1920×1200, which is bloody annoying when I’ve got another SD card (or two…) that give me 3840×2160, which I sometimes use because !Draw works on it and I sometimes want the big screen, but then I have to save my work on the hard drive and switch back to the old RC14 before I can email the work to myself to get it onto the Mac… Yes, it would be lovely to have an SD card with all the things I want working and the high resolution, but in the meantime I’ve got a workable, if irritating, work-around. Given the limited number of apps I actually care about on the Pi, I probably could have everything I wanted if I fiddled about with different builds, but there are limits to how much fiddling I can handle. Thank you very much (and that’s not in the least ironical) to the developers for giving me what I’ve got. I’m sorry I’m really not a developer myself. |
Steve Pampling (1551) 8172 posts |
Given the underlying flakeyness the hidden zeropage use has no doubt caused the users are probably happy now that the fixed versions have appeared for some programs. These fixes apply to all versions of the OS, it’s just more obvious on the high vector versions.
I’d say that was actually “it hasn’t caused any easily traceable issues” What the zero page changes and zeropain have done is lift the rock to make the squirming bugs visible. |
Colin (478) 2433 posts |
All zero pain has done is highlight 1 class of bugs and is treating all instances of that class as equal and the sentence for having such a bug is abort. It does nothing for dereferencing of freed/unitialised pointers or overrunning buffers for example. Say ‘*usbdevinfo’ instead of printing
printed
so what, it’s not worth aborting the program for. It’s only idiots like us who are interested in bugs. Sensible people should just use the machine and moan about problems or use something else. Yes you want bugs fixed but don’t make everyone bug testers. Forcing people to abandon software they use, however buggy, for the sake of some utopian future where the os works perfectly and multitasks brilliantly on 39 cores but has no software is… – I have to agree with an earlier post Suicidal. |
Steffen Huber (91) 1953 posts |
I am not sure if someone has mentioned a valid reason against the high vector/zpp/ZeroPain combination. It makes the OS more stable, and it highlights if you run software that is potentially broken. Even if the software writing to the zero page appears to work perfectly, it is nevertheless a good thing to be altered that there might be bugs lurking inside. The only downside of ZeroPain mentioned seems to be “it writes a log”. Really? And that is a valid concern? Without ZeroPain and its logs, only a small number of zero page access bugs would have been fixed by now. It is a valuable tool. If you make logging an “opt-in”, everybody will just never turn it on. I am not convinced that this is a good idea. Maybe I am missing something, because I run real legacy software with the help of ArchiEmu, and everything else does not seem to suffer from the zero page protection. Is there software around that does not run if ZeroPain is active? |
David Feugey (2125) 2709 posts |
Perhaps it could be possible, while not supporting old software directly, to provide a a way to patch it. A command “*Run -old” to patch old software ourself (ZPP, AE OFF). And an alert “this software needs some potential unstable settings to work. Do you still want to use it?”. And of course an option to cancel this message, in Configure. Good point, AE Off could be activated only when the task is running, and switched off when it gives control to another task. SWP could be replaced when loading with the same tool. *Run -PatchSWP. |
David Feugey (2125) 2709 posts |
Could be also done another way. “This software does not support SWP / AE / ZPP. Do you want to patch it?” |
Doug Webb (190) 1180 posts |
My further thoughts on this are:
I think in the early days it was good to have as many users as possible and it enable us ordinary users to help and contribute to the developement by providing information to developers about their software even if we couldn’t fix it. By and large my experience was a postive one and I thank the developers I sent information to for the fixed versions even if some had not been very active recently and indeed one mentioned it they didn’t think anyone still used the particular application but would see what they could do given the time that had elapsed since they last programmed for RISCOS. I agree though that becomes a one of diminishing returns as the number of active developers with supported apps who have issues to fix lessen and then we are left with unsupported ones or developers who have no wish to spend time fixing things and what to do with them. It could be that given information a patch could be applied rather like some of the early RiscPC era ones but who is going to do that.
Totally agree and hence why either some form of emulator/ on the fly fix or patch process needs to available so that there is less software that is left out. The issue with an emulator is that it doesn’t intergrate well in that the software needs to be run in a different way hence why Aemulor was good though it caused it’s own issues. If a program could be run in it’s own sandbox type environment so it launches and interacts the same all the better if not the on the fly patch /workaround seems best. Until something is available then we have emulators that work and though not ideal it lessens the risk of losing users.
I agree it can cause issues and we don’t want to give the impression that ROOL have done a ROL and no development is going on but I think if a stable Pi and good 5.24 release is done then people will wait for the next step which gives time to fix things before we have a multi core/feature rich release, Ok dreaming a bit to much there. But as I see it many people buy machines from RComp/CJE and are happy with them with the feature set they have and with the software they run so it at least gives some breathing time.
Like the idea and who knows perhaps some users would be willing to pay for the odd program to be fixed this way if it was crucial. Bit like M$ end of life support – We’ll fix it but you pay for it and could be part of a package from some developer/ROOL. I know it might not fly but we are talking things over on this thread so nothing is out of bounds as far as I’m concerned. |
George T. Greenfield (154) 749 posts |
Did you mean ZPP as such, or HV OS versions generally? If the latter, IME anything requiring Aemulor won’t run under high vectors, e.g. 26-bit apps like Rhapsody, Eureka, Sleuth, which I still use (on a Pi1 running RC12). Admittedly I haven’t tried these under ArchiEmu on my high-vector Pi2. I agree that HV OS versions seem more stable, and Otter (an important app for me) definitely seems to prefer the HV environment. After considerable experiment, my solution of choice is to standardise on 4GB SD cards* and keep a HV and non-HV card in regular use as required. The storage issue is dealt with (very capably) by a filecore-formatted 240GB SSD SCSI/USB drive (which I find much faster than any SD card). (*this allows me to backup to a discimage using CloneDisc very easily). |
Rick Murray (539) 13850 posts |
Steffen:
No, that’s a concern, but there is a bigger concern for which the absence of a log would be useful.
Very much so.
Ah, but I’m talking about an opt-in (or opt-out, if you prefer) with ZeroPain being an actual included part of the OS, not an add-on. Steve:
Okay, now step back. You’re still thinking with the mindset of a geek. Specifically, a geek that knows what ZeroPain is and what it does. Try: My software worked on an older version of RISC OS, this version makes it crash. Yes, we all know that it is the software at fault, that 99% of zero page accesses are bugs, and the remaining 1% are just being naughty. Thing is, whatever that bug was, it wasn’t a show stopper. It was probably not even visible, otherwise it would have been fixed ages ago. The software more or less functioned as it was expected. And the HV builds of RISC OS break this for long-sentences-of-gibberish reasons. In the original Ovation, there was a test that was something like if (array==NULL) || (array[element] == NULL). This test would pass if array was unset, as it would be NULL, but the conclusion of the line implies a read from the null pointer. That it would have read from page zero is a bug, but it would not have changed the behaviour of the program. If the array is NULL, the second read is unnecessary, but it happens anyway because that’s just how the program is built… George:
Is this a direct comparison between HV and LV, or could it be influenced by the continual development and enhancements being made to RISC OS? Consider, for example, the PMP memory and…well… pretty much everything Jeffrey has added. I found a rather unpleasant bug that caused CallEvery/CallAfter to fall over, it got fixed in a day. A lot of weird bugs suddenly vanished. For what it is worth, I build my own LV version of RISC OS and it is quite stable. My uptimes (running my server and WebJames) could be measured in days/weeks/months. Usually it is power cuts or my development implying/causing/requiring a reset that makes my uptime less impressive than it could be.
A low vector version requires somebody to build it; and unfortunately the only “update” tarballs are in CVS format, so often updating to a newer version of RISC OS requires downloading and extracting the entire source archive. That’s why I re-iterate mine every 2-3 months. Doing so more often is tedious. The other options… emulation? Running a RISC OS emulation on RISC OS just to run an application that the OS ought to be capable of running natively? Seriously? That’s not a solution. I’d reserve emulation for things the platform cannot run natively, like the 26 bit stuff.
Wait – was anybody suggesting it wouldn’t?
Well, probably everybody still slogging along with the RiscPC… That said, the 32 bit change was necessary if there was to be any hope of RISC OS running on a processor produced this century. :-)
That’s to do with Aemulor because of specifics of what’s there that Aemulor uses. You know, we’re all overthinking it. The ideal solution already exists and does a pretty good job. Here’s what we need. We need ZeroPain to be baked directly into the RISC OS image. Not an add-on that may or may not vanish some day, but an official part of the ROM. We need it to have logging enabled or disabled. I really don’t care what the default is, so long as those who do not want it can turn it off. This satisfies ordinary users (their program’s errant page zero accesses will be trapped, and this can be done quietly in the background). In other words, stuff continues to work. This satisfies developers (they can have loads of logging to peruse – and I admit that I have found and fixed a null pointer reference in my own program with the aid of ZeroPain – all the more amusing because DDT didn’t flag reading from &0 as anything usual (shouldn’t it?!?)). And, I dare say, it satisfies developers more because if there is an error that they suspect is ZP related, they can instruct the user how to switch logging on to capture info from their system. And, yes, Steffen, I know that this solution might encourage some to simply ignore the bugs in their software. Well, why not? There are plenty who are using ZeroPain and looking at the logs to see what went wrong. If somebody wants to stick their head in the sand, why shouldn’t they? Do you not think that taking what may be a minor trivial bug (refer to the Ovation example above) and instantly terminating their program over it isn’t being a bit antagonistic? I’d call this the Theresa May approach, and it isn’t going to work. |
David R. Lane (77) 766 posts |
@Steffen Huber
The logging problem is that with some software it doesn’t just “write a log”, but writes a very large log that soon fills up and then you have to ‘empty the bin’ because you have seen it all before and, after reporting to the those reponsible for such software, nothing is done about the software. R-Comp’s Messenger and other parts of their email software is a good example. Exceptions are those software writers like Martin Avison who respond and correct the errors. Confix initially had a massive amount of zero page errors, but they soon got corrected with a new version. In some cases, only a few errors are thrown up and don’t bother me too much. |
Colin (478) 2433 posts |
Agree with what you say Rick and sorry to be pedantic but the ovation code isn’t a bug |
George T. Greenfield (154) 749 posts |
In this case, very likely: my LV version of 5.23 dates from April ‘16, and the HV one says 6 July ’16 – so not exactly “bleeding edge” ;-) I don’t have the expertise to roll my own, unfortunately.
Well, “Forcing people to abandon software they use, however buggy, for the sake of some utopian future where the os works perfectly and multitasks brilliantly on 39 cores but has no software is… – I have to agree with an earlier post Suicidal.” did strike me as a plea to mark time, at the very least. |
Rick Murray (539) 13850 posts |
Hehe, there’s some stuff kicking around that hasn’t changed since the BBC MOS. I offer OS_Byte. ;-)
We’ll have to agree to disagree. If something causes an abort, it’s a bug. :-) |