Zero page protection
Rick Murray (539) 13806 posts |
Certainly – this is why I asked if it was possible to trap (and permit) accesses – so we can get an idea of scale. After all, it is not such a big deal if a few known-about OS routines use it. The area can be deprecated and the routines modified as appropriate. FWIW, it may be that the filing systems use this area a lot – refer to the table “Temporary buffers” at the end of PRM 2-598 (raw page 608).
I’m actually surprised that Justin didn’t. The idea of having “a piece of memory” that the OS can use as a generic dumping ground for data just sounds a little bit icky in this day and age. For what it is worth, I wrote the following short program and running it as soon as the machine started (no boot invoked) and then entering
Unfortunately, Desktop startup is complex, so at the moment I can tell it is used, but not by whom. Looking purely at the kernel:
Now a non-exhaustive look at other stuff. Some things that I expected might use it (Desktop, Econet…) didn’t. The ones below were what I dug up. It seems to be the older modules. It also seems a bit of a nasty free-for-all IMHBLIO.
|
Jeff Doggett (257) 234 posts |
(pedant mode on)
I realise that the code is a quick hack, but I it should be ‘BLO’ not ‘BLT’ as we are not dealing with signed numbers. This seems to be a common error in the RiscOs world. |
Jeffrey Lee (213) 6048 posts |
A couple of thoughts I’ve had about scratch space: The first is that because you can only ‘claim’ it from the foreground, there’s no technical reason why any reason why any use of scratch space couldn’t be replaced with a temporary block of memory allocated from elsewhere (e.g. RMA). The second thought is that some SoCs actually have a small amount of fast, on-chip memory available (SRAM, TCM, etc.). So maybe there is some benefit to having an OS API to allow access to that memory. (Looking at Rick’s list, it strikes me that a lot of the users of scratch space are performance-centric things – flood fill, heap sort, colour translation, etc.) So for any code which wants to use a fast area of scratch space, it could first try and claim the on-chip memory, and then if that fails it can fall back to allocating from the RMA (and potentially remember the pointer so that the same block can be re-used in the future) Back on the topic of zero page, initial testing with the compatibility module is showing promising results. So far I haven’t seen anything constantly spam the log, so I think I can get by without implementing any kind of complex config file (which is good, because any app/module based filtering may require the abort handler to call SWIs, and that’s something I’d like to avoid, just in case an abort happens while the OS is in a bad state). There are a couple of bits of polish I need to add, but with any luck you should be seeing the module (+ high processor vector ROM builds) within the next few days. |
Adrian Lees (1349) 122 posts |
Having looked through the changes to support raised processor vectors/relocated kernel workspace (ZeroPage internally, but an inappropriate name now), I can see no API for returning the logical address of kernel workspace for that code which does need it. For now, I am rounding-down an address returned by OS_ReadSysInfo 6, but of course that’s an undocumented API call for internal use only. Similarly, code that is being modified to be aware of high processor vectors really shouldn’t have to assume their ‘high’ address, and perhaps end up needing further change again later (and this applies equally to code ‘within’ the OS, such as the FPEmulator, and third party code). |
Jeffrey Lee (213) 6048 posts |
Why do you need to know the address of kernel workspace? One of the aims of moving it is to force all the software which peeks at hardcoded addresses to use more sensible methods instead, in order to allow us to change the layout of the workspace in future without having to worry about breaking things. Publicly exposing the base of the workspace would therefore seem to be a bit counter-productive.
It is nominally for internal use only, but it isn’t undocumented – there’s a full list of values it exposes on the wiki https://www.riscosopen.org/wiki/documentation/show/OS_ReadSysInfo%206
True, although I’m not sure we’d ever really want to move the processor vectors away from their default low/high locations. In fact, I can only think of two situations where we’d want to move them again:
However if you think a call to read the vector base would still be useful then I can easily add one. (For reference – FPEmulator doesn’t need to know the vector base, except on pre-OS 3.7 machines. On 3.7 and above OS_ClaimProcessorVector will suffice. In practice it’s only FIQ handlers and really low-level stuff like Aemulor/ADFFS which would need to know where they are) |
Adrian Lees (1349) 122 posts |
Disregarding Aemulor for a moment, the answer is that any system extension/low-level software (written in the future, or being modified now) will be more futureproof if it does not have to assume fixed addresses (which Aemulor, for example, had to do, even though it used pattern-matching as defence against OS changes). Quite simply, there isn’t always an OS API to do what’s required and I think it’s fair to say that anyone writing such advanced software knows better than to resort to ‘unclean’/underhand methods unless it’s absolutely necessary. As far as Aemulor itself is concerned, it can now locate the kernel data that it needs, via internal API calls (When Aemulor was written in 2002, there was no source to read, and no information available and I guess I never postulated the question of whether an internal OS_ReadSysInfo call could be used to locate data used by OS_DynamicArea code, for which I had only disassembly anyway). It is still, however, left with the dilemma of assuming either that the processor vectors adjoin the kernel workspace, or that they are now at one of two fixed addresses. It does seem a little myopic to fix code that is now rejected because a formerly-fixed, hardcoded address no longer holds, by introducing another hardcoded address; the – to my mind fairly arbitrary – 0xFFFF0000 alternative used in the XScale is not programmable. Aemulor, however, is also in the business of trying to keep all 26-bit code running (including that which I’ve never encountered), whether that code is accidentally reading from addresses that land within the kernel workspace, or deliberately and using the data. Whilst reads from historically-stable locations can be remapped to their actual locations (by pre-storing the addresses returned by OS_ReadSysInfo 6), it cannot be known a priori what addresses will be used by all 26-bit code because there is no defined set of 26-bit code that must run (nor, in practise, the ability to test all possible paths through that code), and it becomes necessary to do something sensible for all addresses. In the interests of trying to keep the kernel stable, writes are apt to be swallowed or faulted, at least by default, although it’s not clear that that will always be the best approach; faulting them could equally lead to a system crash). |
Jeffrey Lee (213) 6048 posts |
So you’re not happy with assuming a fixed address for the base of kernel workspace, but you are happy with making assumptions about the layout of kernel workspace? In my mind they’re both as bad as each other. RISC OS 5 is being actively developed, and most of the development happens out in the open, led by developers who are willing to listen to feedback (even if it may take us a while to act on it). So if developers are interested in creating low-level extensions to the OS then they should probably open a discussion here, to see if the relevant internals can be exposed through a sensible API. If they need to support older versions of the OS then they may still have issues and need to assume the address/layout of various areas, but there’s not really any way around that unless we have a reflection interface which describes the entire OS. For Aemulor, you do have a compelling reason for us to expose the base address of kernel workspace. But it will very much be a case of “here is a pointer to something, we don’t make any guarantees as to what it contains”. Perhaps adding it to the list of areas exposed by OS_Memory 16 would make sense? (AIUI the reason behind that SWI is so the task manager can show the sizes of the areas – so it would make sense to include ZeroPage and any other kernel areas there so that their size can be taken into account as well) Thinking about it, the processor vectors could go in that list as well, although we may have to break the rule about the return values being rounded to a whole number of pages (so that we can reflect the fact that the processor vectors have 256 bytes of space allocated – in future if we dislocate ZeroPage from the vectors then we may allow for FIQ handlers to use almost the full 4K page) |
Adrian Lees (1349) 122 posts |
I am not happy with any non-kernel code assuming the layout of the kernel workspace, nor of the structures within that workspace (a number of which are exported by OS_ReadSysInfo, although they should really be considered as opaque as the workspace itself), but the reality is that it is sometimes necessary and it is unreasonable to demand of all prospective users of an updated/newly-written system extension that they upgrade to RISC OS 5.MN which has a required OS API change in order to use it, particularly when it is not backwards-compatible with a number of other bits of software upon which they depend and have not been/cannot be updated. The point at which an OS makes a fundamental change and breaks system extension code seems a good time to provide an OS API to allow code to cope with that change, and predictable further changes in the future. OS_PlatformFeatures reports the repositioning of the processor vectors, albeit less than ideally, but nothing currently indicates that the kernel workspace has moved or where it has gone, yet of the code out there that has legitimate reasons for accessing what was formerly at [0,4KB), I am pretty sure that you’ll find that is the kernel workspace that most of it wants (when did you last see released code that modified the processor vectors themselves or installed a FIQ handler?) |
Jon Abbott (1421) 2641 posts |
With the way ARM are developing the core and changes being made to modernise the OS, there’s no getting away from an increasing amount of software breaking. This is quite evident by the amount of reports coming in of “current” software that suffer inadvertent Page Zero access. Admittedly this is mostly down to a lack of pointer validation in C compiled code, which the original programmer wouldn’t be aware of. Adrian and I have done our best to provide software compatibility around all OS and CPU changes in the past 25 years, but there’s going to come a day when even our software won’t work and machine emulation is the only way to keep most software running on a modern machine. This is why I proposed coding a Hypervisor with Wimp integration so applications can run in their own VMM, under the OS version and machine they were written for. I can’t see any other way of keeping the bulk of software running. Regarding ZeroPain, I think the Module will need extending past 2016 and moving into the OS as it will break a large percentage of applications if dereferencing isn’t automatically handled, I don’t personally see the bulk of software being updated in the next six months as its reliant on people testing nightly builds and the original developer finding and fixing the problems. There’s also the problem of people updating their ROM image. If they don’t update any software that’s loaded during boot, they’ll quite possibly end up with an unbootable machine – I certainly did when I tested it. In many cases people won’t be aware the software needs updating and may not even know where to get the updates. Regarding direct access to Kernel workspace, I don’t think any software should be doing this as it will almost certainly break at some point. I’ve certainly avoided it when coding ADFFS and requested OS extensions or found workarounds where possible. Requesting a current Vector owner was the only one that a viable solution wasn’t found, so it’s still using the RO3.7x code to read them directly from Page Zero until a legal solution is implemented. |
Steve Pampling (1551) 8154 posts |
Looking at the source, something as simple as changing 2015 to 2016 would work (from 2016-01-01 to 2016-12-31) |
Jon Abbott (1421) 2641 posts |
When I proposed implementing High Vectors and coding an Abort handler to provide backward compatibility in the Hypervisor thread I did highlight it was a “big can of worms”, based on what I’d seen under ADFFS. Personally, I’d move ZeroPain into the OS and remove the time limit, I’d also consider coding a Wimp front end to highlight compatibility issues – along the lines of how Windows does it. Possibly even providing a backend database so users can “check for a solution”, which both logs the fault if it’s new and provides a link to a solution if the software causing it has been fixed. |
Jeffrey Lee (213) 6048 posts |
FYI Adrian & Jon – OS_Memory 16 has been expanded to report more locations, including the relocated zero page workspace (12-15 are new), OS_PlatformFeatures 32 has been added to report the exact size & location of the processor vectors (OS_Memory 16 might not always report their location, since they’re usually counted as being part of the zero page workspace), and OS_ReadSysInfo 6 items 85 & 86 have been added to expose VecPtrTab & NVECTORS (presumably equivalent to ROL’s 23 & 24, but we have an allocation clash there). I’m still open to the idea of adding a proper API for examining/manipulating the vector table if a full list of requirements can be produced.
I’m not actually sure what ROOL’s opinion(s) are, but I believe both myself and Sprow are against the idea of having ZeroPain built directly into the ROM or the standard hard disc image. An optional extra you can download and install? Yes. But built directly into the OS? No. Bad code is bad code, and we shouldn’t be bogging down the core OS by endlessly adding compatibility modes for running broken old code, especially if those compatibility modes may hide bugs in the OS itself.
Yes, I think that if/when we do implement a permanent compatibility solution it’ll have to have some kind of Wimp frontend to ensure that non-techie users can use it. But that will probably be a few months off (there are other things I want to do for RISC OS 5.24, other than break everyone’s code ;-)) |
Chris Johnson (125) 825 posts |
Is there any indication or listing anywhere of problems with the OS or OS support such as toolbox modules? For example, I have started trying to check all my applications, and any of those using the toolbox/oslib give a zeropain entry for the Window module as soon as the icon is dragged from the Saveas dialogue. Since the drag save is mainly managed by the toolbox, there isn’t a lot of my own code involved at that point. It would be useful to know that the problem is definitely of my own making. |
Sprow (202) 1155 posts |
Don’t think so.
There are two null pointer dereferences I can see in the ‘Window’ sources, both from 1995, oldie but a goodie! Fixes follow shortly… |
Steve Pampling (1551) 8154 posts |
I wonder if any of this is fixed in RO 4.39 (you know, going over old ground) |
Dave Higton (1515) 3497 posts |
Since some apps have been updated to prevent unexpected access to zero page, is there a wiki page to list everythig that has been updated? I don’t feel confident in loading a recent ROM because of the consequences – I don’t want RISC OS to crash, or for the whole machine to become unbootable. Or am I worrying too much? It seems to me that it would be good to have a wiki page that gathered together all the changes that I or anyone else should make before trying a new ROM. |
Rick Murray (539) 13806 posts |
Jon:
Indeed, for RISC OS never really sanitised accesses to NULL pointers. Old RISC OS (3.10?) would allow you read/write access and you could trash the first word as that was the reset vector. I wonder how many programs did that and never realised? Writing any more would be painful, though, as you run into the important vectors (aborts, SWIs, IRQ, blah blah). The proper response is what Unix types refer to as a “segmentation error”, or SIGSEGV in C. You sometimes do see this in RISC OS, it is the “type 5” error (info). Jeffrey:
That is entirely correct, but it might we an idea to look at the poohstorm that was created when FTDI released that driver update as part of the regular Windows update cycle. You know, the one that intentionally bricked counterfeit devices. We know now, though the luxury of hindsight, that this was a deliberate act of sabotage1 conducted by FTDI. At the time, it was Microsoft that got the brunt of user anguish. I installed a Windows update and stuff broke. So to RISC OS. Without ZeroPain to trap and warn at least in the first stable release you may run into problems with the non-technical users (those that don’t read all this stuff) that "the last stable release was great but I tried the new one and stuff broke. Yes, it is lame-ass programming (mine included ;-) ) but that’s not who people will want to point the finger at. After all, it worked before so you broke it. I’m not saying ZeroPain should become a permanent feature; however a revised version (that logs and then nags the user via a front-end popup) should absolutely be present in the first stable release that has this change implemented. Warn the user that their software is faulty and that a fault (that is about a quarter century old!) has been corrected to enhance system stability. For now, this nag will appear each time the fault happens and the action will be permitted for now, however the next stable release of RISC OS will not have this feature so if you’re seeing this message, it means the software in question will probably cease working at that time… Don’t just drop a change like this without a rescue plan in place.
I guess it depends upon the severity of the problem. The BASIC quirk is an interesting one… ;-)
Isn’t that what the odd-numbered releases are for? Chris:
Isn’t that what this is for? https://www.riscosopen.org/tracker/tickets/filter?status=1 Dave:
Depends upon your setup. I took a self-built ROM (November vintage) which loads a fairly standard boot setup with a few of my own custom tweaks, then drops through to an Obey file that loads my server, WebJames, Zap, and initialises the DDE. The only thing of note is that I have abandoned DHCP in favour of a fixed IP address. This is for the benefit of the server software as RISC OS comes up before the WiFi adaptor, and both come up before the Livebox. Running it like this, I can have the server running straight from a power cut, as RISC OS has no mechanism to search for a connection and deal with one if found. Swapped in the first ZeroPain ROM, added ZeroPain itself to the boot sequence. Machine booted fine. Zap threw a bunch of pain – I note that Tank has linked to a fixed module so I’ll need to try that over the weekend. The only problem I encountered was that I could not save any configuration. This was ultimately because the ZeroPain ROM image was called “risczp.img” and the SDCMOS module was trying to write to “riscos.img”, in essence updating the configuration of my custom build. Oops! So I saved the SDCMOS module in Zap, binary edited it, then reloaded it. Configuration written to the correct image. All went well. The point of ZeroPain is to attempt to “permit” the faulty operation, logging this was happening. Zap did not crash with ZeroPain, but it would without it. It is safe to put ZeroPain into your boot sequence. It doesn’t do anything on systems that don’t have the page zero move; I can switch back to my self-built ROM and ZeroPain will do nothing in that case. Remember – the OS itself will boot – so if you have something that needs to be fixed and ZeroPain isn’t loaded, it may crash the machine. But all is not lost. Just reset and then keep hammering the ESC key as the reboot starts. The boot should abort with an Escape message, and you ought to be dumped into a plain crappy looking Wimp with nothing having been booted. It’s enough to be able to amend the stuff that is to be booted to either copy in ZeroPain or move out whatever it is that is crashing.
This would be a good idea – if even only for authors to be able to flag a “this app at this address – I know, I’m working on it” and for us so we don’t report something already known. … Maybe we ought to think of a protocol where embedded somewhere in a program (and detectable by RISC OS; extra field in the AIF header?) is a standardised version number (akin to modules), the program title, the author’s name, a contact address (optional), phone number (optional), email address (optional), website, and maybe a special URL that will return update information (current version and a URL to where an update can be found, if one is available that way). I have a rough outline in my head (based upon the auto-upgrader spec I put together a long time ago and (ironically) implemented on some of my Windows software). If anybody would like this fleshed out a bit more, let me know. I’m not saying the OS itself can do all this. An add-on program would do it. But if the data existed, a program such as ZeroPain could see if information was present, and if so, invoke whatever it is that does the update checking. Which could, in theory, build a database of all compatible software, and periodically check for updates for you – you know, like on other systems… 1 I would agree with FTDI if their driver included better fake detection and popped up a warning to say as much. However to intentionally alter fake hardware so that it would theoretically never be usable again2, that’s a step too far…is it even legal? Hmmm… 2 Of course, it wasn’t as smart as it thought it was – you could force an older driver to be installed and accept the “bricked” hardware by modifying the INF file so the older driver has a later date, and adding a line for the USB PID of the device so the driver will recognise the device with the “0000” part ID as its own. This, of course, being a signed driver… Great security, guys. |
Rick Murray (539) 13806 posts |
…the plan that I had is that if the Image debug type is zero (no debugging data), then the debug size would be a pointer to the information table (offset from start of file).1 This means that a debug image cannot have information, but then this shouldn’t be required in a debug build. It isn’t necessarily ideal to purloin the debug size; however the intention is to add something to AIFs without overly changing the header or claiming one of the reserved words. It might be simpler to have two words following the header “PIB1” (check word) followed by an offset. This can be ignored by anything thinking the header is supposed to be 32 words long, and can be picked up by something that “knows” to look for this extra data. However it would mean that the linker would need to be complicit and at least write two null words (that can later be changed by whatever generates the table itself). This would be the less destructive choice as these words would not be referenced by anything and would therefore be more or less invisible in use; however I do not know if ROOL would be willing to amend the linker to have this as an option… The information table would be appended to the end of the program on disc and it is not compressed. The information is expected to be read by something examining the executable file and not in-situ in memory, so the table can be trashed by the application’s decompression code. The table itself is a simple layout, not unlike modules. Each item is a word, the word is an offset from the start of the table (so file location is info_table_offset + data_offset) or zero if the information is not provided. Following the table are a series of null terminated strings (that may or may not be word aligned) the correspond to the information in the table. As the information is read via offsets, the strings don’t need to be in the same order, but they should as anything else would just be weird… So: <word> ‘PIB1’ (Program Information Block version 1), a check word. Would be in memory like this: ‘PIB1’<word><word><word>[etc]<string>[null]<string>[null]<string>[null][etc] 1 I wonder if RISC OS Select’s “enhanced” AIF checking would fault this? |
Jon Abbott (1421) 2641 posts |
Thanks, I’ve updated the next release of ADFFS accordingly. Regards API requirements, my only requirement is to get the current entry address and R12 value of a vector, so I can roll back to it whilst performing filesytem access, should a game install a protection routine on the filesytem vectors. |
Chris Mahoney (1684) 2165 posts |
Is it still possible to build a Pi ROM without zero page relocated? I found “HiProcVecs” in castle.RiscOS.Sources.Kernel.hdr.Options, but setting it to false results in a ROM that doesn’t get past a completely black screen. Is there something else that I need to change too? |
Rick Murray (539) 13806 posts |
Did you also set |
Chris Mahoney (1684) 2165 posts |
Oops, missed that one! Rebuilding now… Edit: Success! Arigatou gozaimasu :) |