Ticket #405 (Open)Wed Mar 18 21:25:00 UTC 2015
CLib should refuse to die
Reported by: | Rick Murray (539) | Severity: | Normal |
Part: | RISC OS: Module | Release: | |
Milestone: | Status | Open |
Details by Rick Murray (539):
Copied with better formatting/markup at: https://www.riscosopen.org/forum/forums/8/topic…
It is possible to replace the ROM version of CLib with a softloaded one – because the ROM version is still in ROM so direct links, such as within !Edit etc, will still work.
It is not possible to replace a softloaded version of CLib with a newer softloaded version, because applications using CLib set up a list of branches into the C library module when the program initialises. If a newer/different version of CLib is loaded afterwards, it is likely that this may be placed elsewhere in the RMA, so the positions held within the previously set up jump tables (any applications loaded with the previous CLib) may end up pointing to fragments or rubbish.
This requires having multiple copies of CLib available. For older versions of RISC OS:
Start system normally.
Load an older (but newer than ROM) version of CLib.
Load something that uses C. I loaded OvationPro, but anything that uses C will do.
Load the newest version of CLib that you have.
RISC OS will promptly freeze.
For RISC OS 5, this is a little bit harder to do because there appears to be only one slightly older CLib in !System.Modules, and loading a module on top of itself won’t change anything. So here is a method to simulate loading a newer CLib in a different part of the RMA (which is what may well happen if the newer one is larger).
Start system normally.
*RMLoad System:Modules.CLib
Load some software that uses CLib – such as SparkFS.
Go to the command line (not taskwindow) and type:
*RMKill SharedCLibrary
*RMLoad System:Modules.BasicEdit
*RMLoad System:Modules.CLib
Press Return and the machine will freeze just after drawing the desktop grey background.
It is clearly not safe to arbitrarily mess around with CLib when clients have jump tables pointing into it.
Therefore, softload versions of CLib should set a flag, if any programs call LibInitXXXX (as applicable for the version of CLib – LibInit[Module]APCS_32 for RISC OS 5) then subsequently CLib should refuse to die. There is no easy mechanism to determine if any clients are still using CLib, so it should just refuse to die in all cases.
This means that it would not be possible to quit all CLib clients and then load a newer module, but since forgetting something could result in a stiffed machine, this was never a particularly safe practice.
I’m leaving this as Normal severity despite the symptoms being more akin to Major because everybody and their kitten ought to know not to softload CLib twice. I am just concerned with the possibility of something providing a newer version of CLib (say, as a package or via the installer) and the application concerned then trying to RMLoad it. This should fail. It wouldn’t. The results? Unpleasant.
Changelog:
Modified by Jon Abbott (1421) Tue, April 07 2015 - 06:33:30 GMT
Could we not modofiy CLib so it tracks the jump tables and update them when CLib is softloaded?
A dedicated DA for CLib perhaps with a heap of CLib subscribers?
Modified by Jeffrey Lee (213) Tue, April 07 2015 - 10:33:01 GMT
I think to solve this properly we’d need to make a more fundamental change to how CLib is implemented. Upgrading a module is not an atomic process – first the new module is killed and then the new module is loaded and initialised. Inbetween the two steps any number of things could happen which might cause CLib to be invoked (interrupt handlers, filesystem stack, etc. could all be making use of CLib). We also need to consider the case of where the call to reload CLib comes from within CLib itself – e.g. system(“rmload clib”). Clearly it’s no good if the new CLIb gets loaded and the jump tables get patched up only for the CPU to return from the SWI and execute some code which no longer exists.
Making module upgrades atomic would be nice, but is perhaps a bit too big of a problem for us to aim to solve right now. But if we can find a good way of implementing it for CLib then perhaps we can extend other modules (or modules in general) to work in a similar way.
So, here’s my 2p:
- Replace the existing LibInit SWIs with a new pair of SWIs (one for init, one for shutdown). The new init SWI is a requirement so that CLib is able to detect if the old SWI has ever been used (and thus whether the backend can be safely released or not)
- Change the CLib module so that it is formed of two parts: frontend (SWI interface, backend manager) and backend (CLib library code)
- On startup the library code will be relocated to some private memory which is managed by the frontend (could be another DA, or it could just be stored in the RMA). Once this is done it would be nice if the module could then shrink its RMA allocation to avoid wasted space (should be possible, if the module assumes that it’s stored in a standard OS_Heap block in the RMA)
- For each CLib backend, the frontend will maintain a list of active clients, and a “legacy client” flag which tracks whether the old LibInit SWIs have been used. This list must be stored somewhere globally accessible so that new instances of the frontend module can find it (e.g. specially named DA, or a system variable holding the address)
- On frontend module shutdown, it will walk the list of backends and free any that aren’t currently in use.
- On frontend module initialisation, it will walk the list of backends and check to see which clients can safely be upgraded to the new frontend. This will probably be limited only to situations where the CLib static data has the same format as the previous version. If an upgrade is possible, the jump table will be patched appropriately. However the old backend will be kept loaded, in order to cope with the case where CLib was active at the time (actively tracking or detecting which backends are currently active would be a big task, and may add an unacceptable level of extra overheard to CLib calls. E.g. programs running in taskwindows can be suspended at almost any time, so you’d effectively need to add protection to every single CLib call). Potentially CLib could keep track of ‘dead references’ – where a client used to use an old version of the backend but has now been upgraded to a new version. Then if that client dies it knows that the old backend is now definitely no longer in use and can safely be freed.
- Some extra magic will be required to deal with paged out Wimp tasks, tasks which have been relocated due to the high end of application space via a call to system(), etc. But those shouldn’t be too hard to deal with (e.g. verify that the jump table you’re replacing is at the location you expect). At the least the code should be able to go into a fail-safe state of refusing to upgrade the client.
That’s basically it. However one extra thing to take into account is that when upgrading, the static data for the new and old CLibs must be an exact match – it’s not simply a case of the data being forward-compatible. This is because CLib may be being called re-entrantly within the client. e.g. for a slightly contrived example, consider a program that calls system(“rmload clib”) from within a qsort() callback function. Within the system() call the jump table of the client will be updated to the new backend, ensuring that any future CLib calls use the old code. Then system() will return, and the client will call a clib function which does something to the internal state of the static data (let’s say printf). Then after that the qsort callback returns and bang! the old backend is using new-format static data. (Or for a non-contrived example, consider what happens if an upcall handler, error handler, etc. gets invoked shortly after the clib upgrade).