Thinking ahead: Supporting multicore CPUs
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... 26
Jeffrey Lee (213) 6048 posts |
Off the top of my head, these are the kinds of things that would affect the microkernel (and, depending on how the threading library works, potentially any code compiled for the microkernel-RISC OS):
|
Eric Rucker (325) 232 posts |
Ahh, so that’s why OKL4 made a big deal out of ARM11 support (in the first version that wasn’t BSD licensed, of course). Anyway, that sounds like there’s serious bugs that mean that an IOMD implementation would require significant extra work, or would be restricted to a very small subset of IOMD machines. (Plus the emulators – which actually brings up yet another nearly unrelated point, how should emulation be handled, if the existing IOMD machines are dropped? QEMU emulating a BeagleBoard? Modify RPCEmu to emulate a Cortex-A8-powered RiscPC?) Doing some quick Googling, looks like ARMv4 (or at least StrongARM and ARM9, which is all that matters to us) is when the switch to base-restored occurred, so that’s another argument for dropping the ARMv3 IOMD machines. (Edit: Source (in a RISC OS context, even): http://www.iconbar.com/forums/viewthread.php?threadid=11977) The SWP vs. LDREX/STREX thing gets interesting, because that brings the Iyonix onto the chopping block, and I don’t think that would be nearly as well accepted. As far as page tables go, I’ve read that the format changes on Cortex-A7 and A15, too, for a very major change within ARMv7 (but the old format is also usable, I think – but supporting the new format would be a good idea, IMO, as it’s the required format to use LPAE (no, it’s not a concern for RISC OS NOW, but once there’s some serious CPU power and a good threading model, I’d guess you’ll see more straight Linux ports, or ports of Linux software with RISC OS front ends, and then you’ll really want the RAM)), and also the only format used by ARMv8 AArch64). |
Rick Murray (539) 13840 posts |
This, I feel, is fairly important. I wonder if it would be possible, in the future, to make “unsigned” modules run in SYS mode instead of SVC (which should be OS-only and “signed” modules).
I wonder if it would be possible to harden RISC OS’s various mechanisms. When writing a module, I forgot I was in SVC mode so I called a SWI without stashing R14. Bang. Well, actually the machine froze. Completely understandable from a technical viewpoint, but should a numpty error in a regular module bring down the OS? [this goes back to the SYS/SVC distinction above]
Just out of interest – are there any plans near or medium for a virtual memory mechanism in RISC OS?
Ah, yes. I remember reading that and thinking “erk!”. I think, reading though the list, it is a question of what a microkernel is going to support that will dictate how it looks and feels. Perhaps the microkernel itself will be fairly generic, interwoven with the HAL so multicore systems can benefit while older hardware (such as the OMAPs etc) will present an interface as if only having the one core. If this is the case, then perhaps for the time being an ARM6/7 version could be available if somebody decides to port it. However, in the longer run, perhaps having moved away from 26 bit hardware, it is time to leave behind older less capable processors? Straw poll:
For the record, my RiscPC is an ARM710. I have an A7000 board someplace, and two ARM7500-based “Bushbox” thingies. |
Steve Pampling (1551) 8170 posts |
Mounted inside the RPC case? |
Jeffrey Lee (213) 6048 posts |
This, I feel, is fairly important. I wonder if it would be possible, in the future, to make “unsigned” modules run in SYS mode instead of SVC (which should be OS-only and “signed” modules). Except you’re forgetting that SYS mode has exactly the same privileges as SVC mode? But having said that, USR mode modules might not be such a bad idea (ignoring the possibility of them being fiddly to add support for!)
Yes, it’s possible, and probably one of the things we’d want to try doing before going crazy with threading support.
None that I know of. |
Eric Rucker (325) 232 posts |
Something I’ll note… as far as RISC OS is concerned, AArch64 drops to EL0 replacing USR, EL1 replacing ALL OTHER MODES. (EL2 is a hypervisor layer, EL3 is a security monitor layer.) |
nemo (145) 2546 posts |
Rick asked:
All the hardening in the world won’t help when a module goes into an infinite loop. Yes, you can enable some watchdog timer on entry to the module and after a while decide to give up… but what’s the difference between code getting stuck and code doing something repeatedly? It’s not enough to monitor SWI calls – it could be looping over them: I managed to write some code that accidentally looped forever once DAs got negative numbers on RO4. Lots and lots of SWI calls, but stuck all the same. Browsers have the “A script on this page is taking a long time to complete…” dialog. Probably the best one could hope for is a similar UpCall that would lead to a “Do you want to kill this module?” question presented appropriately inside and outside the desktop, but the tidying up is problematic, the indeterminate system state part-way through a SWI is worrying (for user interaction) and the heuristic for recognising what is and is not a problem is not obvious. My feeling is that there’s an awful lot of other things that can be ‘hardened’ before auto-detecting infinite loops.
You mean Virtualise? That worked very nicely indeed. No 32bit version though. Not sure if Alexander Thoukydides is still around, but although there’s heavy use of 26bit features throughout the module they’re by convention, not necessity (ie although functions preserve flags the caller doesn’t often need them to), so it wouldn’t be hard to make a 32bit version. |
Tim Rowledge (1742) 170 posts |
There’s been some fascinating low level detail stuff in this thread but I can’t help wondering what we collectively would actually like to end up with for a future RISC OS. What is the value in the current system that ought to be kept? What new things are wanted? I suggest there are at least two viewpoints that need to be considered for deciding this; It would be possible (almost certainly) to make the user experience be the same (barring the improvements!) by making a GUI layer for some full ‘modern’ OS with suitable libraries etc to support all the current apis. Actually I guess a filing system would probably be needed to support application directories and so on, something I couldn’t ever understand not becoming a standard. I’m sure there would be some very tricky parts to this. Undoubtedly some applications would need some serious rework. Then the question is whether anyone would be bothered to do that work and I suppose, whether anyone would ever really care. A different degree of effort could produce a variant kernel taken (at least in design) from a modern OS that supports all those nice process/thread/memory-protection/etc capabilities and provides a more pure RISC OS system by not being any sort of unix-alike in terms of all the cruft around the kernel. (Though it has to be said that there are quite a lot of useful doohickeys supported by unix, as evidenced by the number of them ported to RISC OS over the years.) This new kernel could provide support for old-style apps via libraries and maybe even an emulator approach, whilst extending new capabilities to rewritten code. For developers I suggest that library frameworks (including modules) provide most of the routine support anyway. You make a swicall according to the rules and get the answer and nothing is different. I know there are some sets of apis that appear to be very serial in nature – IIRC the copy/cut/paste set were given as an example some years ago even though the protocol seemed extremely like the X Windows equivalent to me – and there may be hard work to do to solve that. The other important thing for developers in RISC OS always seemed to me to be that it is just less painful than so many other systems. I’m not entirely sure what made that work but I’d like to protect it. I’d like to see a fully modern OS that is as nice to use as RISC OS, running on modern ARM machines, that attracts enough development effort that we get good web browsing, Skype-like systems, media handling, whatever it takes to be a serious desktop/mobile platform. We know it’s possible to get all that good stuff in principal because there are existence proofs of nearly getting it right. iOS may not be your favourite UI but the underlying system is excellent. Android is not my idea of a good UI but it’s clearly a ‘real’ OS under there. So it’s all pretty easy really. Just use known modern OS technology to write a new kernel, add a good modern filling system that supports meta-data well (lets make it secure, reliable, self-versioning, all that nice stuff as well), write a new GUI system that has all the good stuff of the Wimp and takes advantage of modern GPUs, rewrite a bunch of apps and there we are! No problem :-) |
Jess Hampshire (158) 865 posts |
It would be a shame to move away from machines that can run RO5, just because they are slow. (Especially since they have only just got a viable level of support – network drivers) Does supporting armv3 and 4 involve much more work than just armv4? RO 5 works reasonably well on an A7000. (And don’t the emulators emulate fast versions of these machines anyway?). The other issue is that the higher end RPCs are more likely to be using Adjust or similar, where RO 5 currently wouldn’t be an upgrade. And I think it would be a bad move to scrap RPC compatibility altogether, the performance of an SA RPC isn’t bad, and replacing one doesn’t just require a Pi, it requires a case, SD card, mouse keyboard and screen, you also lose access to podules and floppy drives and IDE. I’m sure there are some tasks where a hard drive would be important, rather than an SD. |
Rick Murray (539) 13840 posts |
It isn’t because they are slow. It is more fundamental issues. It may never happen, but I believe that most modules should not run with the same level of privilege as the OS. Well, there may be a processor mode that can assist with this – SYS 1. It uses USR mode registers with SVC-like behaviour. Certainly, the OS should be able to recover better from an error in a module that doesn’t trash the OS’s own stacks and stuff. To give you an example, write a small module (doesn’t need to do anything) that calls a SWI (say, OS_Write0) in its startup. Forget to stack R14. The SWI call will take place in SVC mode, however since your module init is in SVC mode… bang. Or rather, the OS will crash/freeze/burn. Well, SYS was introduced, IIRC, with the StrongARM. It may be that further work to RISC OS may wish to take more advantage of the facilities offered by more modern processors. This, obviously, would come at the cost of older non-compatible hardware.
I presume you mean ARMv3 and ARMv4… [the ARM3 is the 26 bit CPU inside the A4/A5000, and there was no ARM4] How much work it takes depends upon two factors. Firstly, what the actual necessity is, and secondly whether supporting the older system would require a pile of botches to get similar behaviour, or alternatively “don’t do it”. Consider something like speeding up the multitasking using lazy task switching. This is like Windows where memory pages are mapped on demand instead of all at once. Won’t work on ARM6/ARM7 (incapable), or early StrongARM (buggy). Likewise if RO ever gets support for VM, similar story.
…for a system with an 8MHz I/O bus, and 12MHz FSB. It’s the power and efficiency of the ARM and RISC OS’ tight coding making it shine. Try shifting around large files (like untarring a source bundle) and the limitations will become apparent. This isn’t to say the computer should be ignored because it is slow; that is resolved with patience. However, the question is should newer modifications to the OS be held back by very old hardware? As it is, the Ubuntu crowd have decided the Pi itself is “too old to support” (possibly to their detriment, mind you…).
You could put the Pi int the RPC case and also a USB-IDE harddisc, USB hub, power supply, etc. Question is, if the realms of kernel hacking, will a line be drawn between what is and is not supported… and if so, where? 1 How does SYS mode get around the risks of trashing USR R14? Or must the OS do this prior to switching? |
Rick Murray (539) 13840 posts |
Just to clarify, dropping support for older, slower and less capable hardware just because it is older, slower and less capable hardware. Is churlish – for something older, slower and less capable is older, slower and less capable – there’s a clue in the name. Thankfully we aren’t selling hardware so we don’t arbitrarily drop support for week-old technology (Google, are you listening? Before blaming devs for Android fragmentation look at what manufacturers/carriers are doing…). The cessation of support for older, slower and less capable hardware should be taken purely on technical reasons, balancing “can we support this?” vs “will it be too much impact to continue to support this?”. |
Jess Hampshire (158) 865 posts |
(v3 v4 Typo fixed) In your example, wouldn’t the sensible approach be that modules keep current behaviour on older precessors and with lower priveliges on new processors? If the processor is too old to support a new feature in a sensible manner, then it just works on the newer ones. |
Ben Avison (25) 445 posts |
I’d like to suggest one path towards multi-core support which could be started on today – that’s to start tackling locking in the various modules that make up the OS. Assuming that sooner or later we’re going to want to avoid a “big kernel lock” which is a sledgehammer approach to deal with concurrency by ensuring all kernel and module code only runs on one core at once, then individual modules are going to need individual locks on parts of their functionality. This is because when more than one core is in use, we have to protect against simultaneous access, irrespective of whether the OS gains the general ability to pre-empt application code or privileged mode code, which are separate headaches in their own right (and which have been discussed at great length already). The thing is, all re-entrant calls have, by definition, protection against concurrency, but it’s done in a single-core fashion, by disabling interrupts on the local CPU while modifying data structures. This technique doesn’t work on multi-core CPUs – instead, you need to use semaphores in memory so that all cores can see when the associated data structure is being modified. As a proof of concept, I wrote SDIODriver and SDFS using them, to prove that a hardware driver on RISC OS can be written purely using sempahores to handle concurrency. The advantages of this approach are:
One disadvantage is that it doesn’t do anything for concurrency protection for non-re-entrant calls. However, by definition, non-re-entrant calls probably need to be executed sequentially anyway, so are relatively easily handled by sticking a semaphore around them in the module’s SWI dispatcher. Doing a quick search of the ROM binary, it looks like there are only about 1000 places in the ROM which are affected. Put a small team on it, doing say 20 of them a week, the whole task would be finished in a year, in which time someone else might have got a microkernel running underneath the RISC OS kernel… |
Eric Rucker (325) 232 posts |
Hmm, I’ve been reading some of Apple and DayStar’s docs about their multiprocessing support (Apple rewrote all of that for 8.6, to the microkernel-underneath-Mac-OS approach), to find out how Apple did it before 8.6. Unfortunately, Apple and DayStar were extremely cagey (for good reason) about revealing details of how they implemented multiprocessing support – they didn’t want developers coding towards specific behavior, when they knew that that behavior would be changing in later versions. It seems that DayStar/Apple went with a rather similar approach to Hydra, although they also ran their threads on the main CPU, which is something that Hydra couldn’t do (due to Hydra threads being 32-bit, and the host OS being 26-bit). DayStar/Apple said ZERO details of how running on the main CPU worked other than it being PMT – I’d guess that it’s running in a CMT task, or some similar approach, given that the Mac OS “nanokernel” (which was really a 68k emulator back then, IIRC) wasn’t changed for this. Implementing a Hydra-esque approach could actually work OK, if it’s done carefully, so that threads can be run unmodified on a future microkernel OS. (Of course, that threading library should be, ideally, designed so that it is useful for solving other currently existing problems on RISC OS, too. And, as Jeffrey pointed out, it might not actually be any less work, although could Simtec be convinced to release the source for the actual Hydra code (which could, but may not, save a lot of work), if that approach is considered a good stop-gap? In any case, though, it would have to be modified for ARM MPCore instead of the Hydra API, and so that threads could be dispatched to the main CPU alongside everything running on it normally – adoption would be dependent on it running on single-core machines, too.) The DayStar/Apple approach, after all, translated well to the later microkernel approach, and then existing code written for that approach could be recompiled under Carbon, and ran on the fully PMT/SMP OS X. |
nemo (145) 2546 posts |
Ben said:
Having had the interesting experience of converting a large application* to a multithreaded model, I’m confident a number of those will require extensive refactoring, perhaps amounting to a redesign. I stumbled upon an unintended non-reentrancy in RO2 days when I implemented a virtual file system (not unlike RO3’s Images) that called back into FileCore from the FS implementation code. It turned out (back then) that when opening a file, FileCore would choose an unused handle and then call FSOpen. My code then attempted to open another file which led FileCore to choose an unused handle… which turned out to be the same handle. My workaround was to open ‘null:’ first, then the file I wanted to open, then close null: and return. Everybody happy. Unfortunately, semaphores and locks aren’t so amenable to lateral thinking. Locks need to be locked for as short a time as possible, and that will often need a rethink of how data is updated. Doing that can sometimes lead to a non-locked solution, but I wouldn’t aim for that in general. In other words, there’s plenty of work in doing Ben’s suggestion. And yes, it definitely must be done. I’m just quibbling about the magnitude of ‘small’ and ‘year’. :-/ *By ‘large application’ I mean over 10,000 files amounting to 750MB of source. Sadly I’m neither kidding nor exaggerating. |
nemo (145) 2546 posts |
Ben also said:
Even those will often require a redesign. It may be far more practical to assume that such modules (by which I mean all modules not marked threadsafe) can only be used from the GUI thread. Though one can reasonably work towards making the OS threadsafe, the demands on module authors may be too high, and for little apparent benefit. There is a link to applications here. Wimp2’s trick for pre-empting applications within the Wimp’s cooperative protocols was to have a separate message handler, a kind of pre-filter. Supplying such a thing for existing applications is a reasonable compromise to getting them working acceptably in a PMT system – the PMTFilter (for that application) can ‘vouch’ for the app, and make it play nice with the Filer protocols (for example) even when busy (as it could ignore filetypes it knows the application ignores). It may be possible to vouch for modules similarly such that the module itself does not need to be modified to be declared thread-safe. Or maybe a wrapper makes it so. Incidentally, Vectors, UpCalls and Events are interesting in a multicore system. :-) |
Timothy Baldwin (184) 242 posts |
I think the first step could be to use a recursive mutex to protect legacy code, the the Linux big kernel lock. Split Wimp_Poll into:
Legacy interrupt handlers and callbacks must be run in the context of the current holder of the mutex, and the SWI dispatcher will acquire the mutex before calling a SWI handler that is not flagged as PMT safe. Gradually the role of Wimp_SwitchTask would be gradually eliminated by maintaining per thread state. |
nemo (145) 2546 posts |
Strongly disagree. It sounds like you want to have the GUI running on multiple threads. That makes things very difficult indeed without actually gaining any functionality. I don’t understand why you want to split Wimp_Poll into bits. There will be locking at the SWI handler level (which is BEFORE the Wimp gets to do its SWI) necessarily (for this is the only way to support existing modules). Assuming the Wimp is marked as threadsafe then locking will be done by it, not by its tasks. Anything that claims to be threadsafe must absolutely do its own locking. I’m not sure though whether your bullet list is the steps the task or the Wimp would be doing, so I’m slightly confused about what problem you’re trying to solve. |
Bryan Hogan (339) 592 posts |
If you want multi-threaded WIMP and multi core support, rather than trying to bolt it on to RISC OS it might be better to help develop ROLF – http://www.rougol.jellybaby.net/meetings/2013/aug.html |
Chris Hall (132) 3554 posts |
it might be better to help develop ROLF Will Impression Publisher run under ROLF? |
Bryan Hogan (339) 592 posts |
That question was asked at the meeting! The answer was that it had not been tested. That’s one of the areas where Simon needs assistance. |
Steffen Huber (91) 1953 posts |
ROLF will not magically solve the “multi core multi threaded” problem if it wants to retain backward compatibility. More precisely, it will face the exact same problems that any solution tackling these things will face. |
andym (447) 473 posts |
Is there a downloadable, installable binary distro of ROLF that people can get hold of to try/test/play with? I don’t have the knowledge to build it, but I can manage an install or a (if possible) Live version. I’ve looked on the blog and Sourceforge site, but can’t see anything. |
nemo (145) 2546 posts |
I don’t think this is widely understood, nor the problems in question. ROLF is utterly irrelevant on a number of levels. |
Steve Pampling (1551) 8170 posts |
It strikes me as the answer to a question that hadn’t been asked: 1 “Mostly” being some abitary value between 1% and 99% |
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... 26