Thinking ahead: Supporting multicore CPUs
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 ... 26
nemo (145) 2546 posts |
Sorry to resurrect such an old thread (or sorry I didn’t stop by earlier). BA wrote:
Spot on. The WIMP should be a GUI (and a veneer for the underlying Task stuff for compatability) – I think “Task” is more RO than “Process”. Threads are also absolutely required, and before multi-core considerations. “Tasks” need to be supported at the Kernel level, far below the WIMP… by which I actually mean Tasks/Threads of course.
W_OT was always botched but could have been safe from the start – it should have been part of each Task’s information, not shared between them. As such it’s a small bit of WIMP plumbing, not something that needs threading to fix – if the WIMP knows which Task is calling it (as it always must) then there’s no problem. Wimp_CloseTemplate would be implicit during an (implicit) OS_Exit so the WIMP doesn’t leak. Re-entrancy of allocation is a different problem of course, but not the WIMP’s problem.
That’s not a TaskWindow problem – try doing the same under Null Polls in any Task and you’ll have the same problem. The Wimp is still a total surprise to the Kernel, which still thinks it’s Arthur.
And only a question of optimisation, for which as we well know there are only two rules:
Making it run fast is a tiny detail after the rather harder stage of making it run at all!
Instead of blindly allocating masses of address space to titchy DAs, assume for old programs an old limit (64MB, or maybe 256 if we’re super generous). New programs that want more set a flag when they create the DA. The DA doesn’t get more address space at first, instead the flag marks the DA as moveable. If it can’t extend in place, it gets moved to a different logical address, and the DA Handler called (with new reason codes) before and after the move. OS_ChangeDynamicArea returns new base address in R2 for such DAs. The requirement is Big DA = Relocateable DA, but hey, what can you do. JL wrote:
The old SWI vs DLL question! I see no problem with SWI_Func1, SWI_Func2, SWI_GetFuncTable so you can have it either way (note slight wrinkle of making the latter safe in an OS where modules can be replaced at run time – icky but not impossible). As for the programmer model, I’m currently preparing to write a pthread compatibility layer on top of threadx (or considering whether I should, or just throw them both in the sack and look the other way), so by the end of that I’ll have a better idea of how I feel about the API… but we shouldn’t be using Wimp_StartTask or *TaskWindow to be spawning new threads. In other words, one should be able to have a number of multi-threaded tasks running on RISC OS without involving the Wimp at all! There is a much bigger question (in effect) than making the Kernel work – how do we expect the Wimp to work. ie what does a pre-empted Wimp look like? So many of the protocols assume known yield points (as Neil Douglass proved with Wimp2). Eg it is possible to pause the desktop save protocol… though only (for completeness) if you are able to fake a message bounce back to the originator. PostFilters FTW here. But the killer is the simple DataRun broadcast. I don’t see how that can work well with pre-empted legacy apps (as I’ve probably discussed before). Bounteous apologies if anything I’ve written is already obsolete, I only seem to pass by every year or so. :-O |
André Timmermans (100) 655 posts |
Well, I wouldn’t mind the main Wimp task/thread working as before, most of the existing applications are well designed enough to cope with it, as long as you can allow background work: |
nemo (145) 2546 posts |
Problems with desktop save and DataRun protocols in a pre-empted wimp: Co-operative:
Pre-empted:
Step 3 above doesn’t happen under Wimp2 because saves were still atomic. The only reason the desktop save protocol worked under Wimp2 was coincidence – tasks tend to respond to messages before getting pre-empted. So clearly the Wimp must hide messages between being delivered to a task and that task next yielding – otherwise it would bounce or get delivered to the next task in “parallel”. Care must therefore be taken with tasks that die while “hosting” a message. Another problem with the data save protocol concerns the possibility of initiating another save while the first is still occurring – ie before the Loader has sent DataLoadAck. And as for DataSaved… well that’s always been dodgy and hardly ever implemented correctly. However, DataRun is the biggest problem: What happens when you double-click a spreadsheet file while the spreadsheet program is recalculating? Broadcast messages are delivered to each task in turn, so what happens when one is at sleep or busy rather than yield? The task cannot be re-entered with the message (Wimp2 dealt with this by having, in effect, a separate message handling thread for the task). Does the DataRun pause at a task until it yields? The result of that is that double-clicking on a file might appear to have no effect at all. This usually results in the user double-clicking again. Are such messages amalgamated (as some already are)? Does it skip that task and go on to the next? The result of that is that sometimes double-clicking on a file will open another instance of the application, or be opened by a totally different application. What if it’s one of those applications that refuses to run more than one instance at a time? Does it try all the yielded tasks first and then wait for the busy and sleeping ones before bouncing back to the Filer? Once again we have unpredictable destinations and interminable delays. The trouble is that the Wimp doesn’t know which tasks load what kinds of file, so any DataRun must be passed to all the tasks, every time. I have investigated this not only through using Niall’s Wimp2, but also by simulating pre-emption through pausing the desktop save protocol to implement “slow” (browser-like) loading and saving of files between desktop programs. As long as the user doesn’t try anything clever, you’re OK… but as soon as the user tries multiple simultaneous saves (for example) most applications get confused. |
Eric Rucker (325) 232 posts |
So, here’s a question. Long-term, what would be the downside of a “Wimp3”-like approach (think along the lines of adding Hydra support to Wimp2, except not actually doing that)? I know, it’s not the most elegant, having multitasking being done at the WIMP level, but here’s the advantages I can see: Likely faster development – the existing stuff below wouldn’t need to be made safe, as all PMT programs would be treated as one giant CMT program that also happens to use other cores while it’s running (but not while other CMT programs are running). The mechanism that PMT programs use to communicate with the OS would need to be made safe, but all of the “making safe” could be done at the Wimp3 level, and if done right, a future build could do it truly right and just use the Wimp3 APIs. I don’t think it’s a good idea to preempt existing applications (look at what happened with Wimp2, after all), or run them on any cores but the first, so this approach would only benefit new software, and CMT software could still bring down the system. Also, this approach does mean that, the more CMT stuff you have running, the more atrocious performance is (as the other cores sit idle while CMT stuff runs). But, still, there would be a benefit now, and if the APIs are set up properly, that benefit could be carried into more “pure” versions of the approach later. Basically, looking over everything, I’m advocating approach 5, using a hybrid of approach 1 and 2 as the way to get there, with approach 1 as a potential ultimate goal (with the caveat that fully doing approach 1 will be far, far more effort to preserve legacy compatibility than stopping at completion of approach 5, but will improve stability). I don’t think a pure approach 1 is workable, due to the whole “it’s not RISC OS any more” problem if you can’t run any existing RISC OS software, and the amount of effort required (with only minimal support from the community, I can’t see it succeeding). Approach 5 is the way that gets benefits to the platform the fastest (ignoring a pure approach 2, which doesn’t solve any of RISC OS’s other problems, unlike approach 5), and more importantly, gets software written for it the fastest if done right. It does leave some things unfixed, but it allows for fixing them later, if done right. I’d gladly throw some cash at a bounty to get a decent approach 5 implemented, FWIW. |
Malcolm Hussain-Gambles (1596) 811 posts |
My thoughts, from an external point of view – so I could be totally wrong….just my thoughts. Mixing CMT and PMT seems a bad idea to me. I would prefer to see a clean cut off. |
Rick Murray (539) 13840 posts |
Dangerous move without some degree of guaranteed developer support. The first question I’d ask is does RISC OS need PMT support? In other words, does the development (which would be rewriting the Wimp, big swathes of kernel; not to mention creating a new API that would be by definition be incompatible with current RISC OS) be justified by the end result? |
Eric Rucker (325) 232 posts |
The problem that I see is that some of the apps in question are abandoned, and some aren’t commercial, either. And, adoption of a PMT-only fork with no existing software would, I fear, look like what happened with Vista, just worse. So, you’d need to virtualize (or emulate, but that sucks) for the old software, which means you’re looking at running on something like the X-Gene to get virtualization support. And if you’re running on THAT arch, might as well make it AArch64 while you’re at it… (Not that getting ready for AArch64 is a bad thing, mind you, but doing all of this at once is an absolutely massive effort. That level would actually be the ideal, but this community doesn’t have the resources of Apple or Microsoft, or even the resources of a BSD.) The mixed approach means that you get some of the benefits now, without having to reimplement everything (although you will have to reimplement some of the OS no matter what). |
Jeffrey Lee (213) 6048 posts |
The major downside I can think of is that by keeping the threading code in the Wimp, it won’t do anything to make things easier for complex OS components which could benefit greatly from threading (whether from the point of view of performance or ease of implementation). Network stack, USB stack, filesystem stack, etc. So I think a better approach would be something like:
I think this plan is closest to option 1 in its eventual goal, but with a reachable initial goal of only providing the minimum threading functionality necessary for useful threaded code to be written. As time goes by we can make incremental updates to improve performance (make more modules thread safe, tackle threading-incompatible APIs, etc.) The only problem is that Wimp tasks tend to use a lot more SWIs than hardware drivers and OS-level components, so it would be a big problem if Wimp threads were only allowed to use the small number of thread safe SWIs. To tackle that I think we’d need to have a “compatbility mode” flag for threads. If this flag is set, and the thread calls a non-thread safe SWI, then the thread will be suspended until the main thread reaches a state where the other thread can take control. For threads associated with Wimp tasks there’s also the extra requirement that the correct task must be active (therefore mimicing the behaviour of Wimp2, Unixlib threads, and the compatability layer we’d have for older OS’s). This compatability flag may also place some restrictions on the behaviour of the thread – e.g. because the thread needs to be capable of running in the single-tasking RISC OS world, it won’t be allowed to use any per-thread dynamic areas (if we were to implement such things). Although this may sound like it’s similar to option 5, it’s actually quite different; with option 5 I was envisaging some hacky system of detecting when a thread-unsafe activity is being performed (e.g. for wimp tasks, trap any memory reads/writes which occur outside of the tasks wimp slot), but with this approach it’ll be much cleaner as it requires applications to flag that they’re thread safe before they’re let loose on the other cores. |
Eric Rucker (325) 232 posts |
Hmm, that could be an interesting way to do it. I suspect it would result in end users and application developers seeing the main gains later, but on the flip side, it could all be done behind the scenes without end users even NOTICING, if done correctly, and stuff under the hood could be improved gradually. That said, I wasn’t thinking of doing hacky detection of thread-unsafe activity – I was thinking of, if a program wanted PMT/SMP, it had to explicitly be marked as such, and then the PMT/SMP environment would have its own SWIs (and would handle thread safety on its own). Where things would get ugly with that approach would be, if a CMT program had control, the other cores would sit idle, whereas in your approach, they can keep running until they need to hit an unsafe SWI. Also, I forgot that Cortex-A15 has virtualization support as well, so that would be another acceptable option if an option as radical as “RISC OS X” is taken. (As far as AArch64 goes… that’s really a subject for another thread, and with the memory usage of RISC OS software, not the highest priority right now, IMO (and LPAE, which even the A15 supports, helps, too), but it is something to consider.) |
Jess Hampshire (158) 865 posts |
Wouldn’t that be logical to do first, so that apps get produced that will make use of the new system? (Also would such a new system be more compatible with doing a WINE type thing on Linux than the current system?) |
Malcolm Hussain-Gambles (1596) 811 posts |
Rick: I’d agree it’s a very dangerous move, but if people want PMT then I’d see that as the only option if it is to happen. PMT for me is a distraction, it would be “cool” – but until there is more interest I don’t see the point. So we can port more non-riscos applications? That’s digging a hole isn’t it…. Outlook or evolution on riscos? Isn’t it easier to install windows or linux? |
Eric Rucker (325) 232 posts |
PMT is useful for more than just porting software. It’s useful for improving the stability and responsiveness of the system, too, because a program can’t “hold onto” the CPU against the system’s will. And, one of the main obstacles to that is the lack of thread safety. That said, it’s not strictly required for a multiprocessor CMT system, but a degree of thread safety is (and the Simtec Hydra was a “first come, first served” CMT system that had a very limited API to be thread safe). Come to think of it, with a fully thread safe system, you could actually assign CMT processes to whatever processor comes available next in line, and get HUGE benefits, but PMT would get you even more granular control of the system’s performance, and once you’ve got the thread safety… (FWIW, any less than a fully thread safe system that dispatches processes to other cores will require developers to specifically target additional cores, though.) I’ll note that Apple went for “PMT in a CMT environment” at first (Multiprocessing Services 1.x, as far as I can tell, which is Mac System 7.5.2 through Mac OS 8.5), and then “CMT in a PMT environment” in later releases of the classic Mac OS (8.6 through 9.2.2). Also, Apple allowed their threads to run on the main CPU, too. |
Jess Hampshire (158) 865 posts |
Jabber sounds very useful, that would fill a hole in what RISC OS can do. |
Eric Rucker (325) 232 posts |
Actually, I was just thinking… the microkernel approach (which is basically the Mac OS 8.6-9.2.2 approach) allows some other fun stuff. Like making a version of the microkernel that supports AArch64, running the AArch32 CMT process, and any AArch32 or AArch64 threads that have spun off. ARMv8-A supports running AArch32 (fully ARMv7-A compatible (read: fully compatible with the Cortex-A8/A9/A15), they say) code in the userland of an AArch64 OS, after all. Alternately, you could virtualize RISC OS, and thunk in and out of the VM. (This approach also works on Cortex-A15, but not on anything older than that.) Move more and more of the OS out of the VM, until the VM is just a legacy compatibility tool that happens to integrate really really well. |
nemo (145) 2546 posts |
I’m slightly worried that the focus is here is “write another multicore OS!” rather than “allow RISC OS to make use of multiple cores”. Putting aside the replumbing of separating Task (ie process and/or thread) management from the Wimp, one must still acknowledge that the Wimp as we know it is single threaded. By that I mean most of the protocols are serial, and consequently all existing applications have been written expecting them to be serial. Attempt to pre-empt those protocols and applications will crash, leak memory, corrupt data and potentially destroy files. (I don’t just mean the mythical C compiler taking advantage of ‘undefined behaviour’). It would be lovely to be able to interact with this drawing program while that spreadsheet is recalculating… but to do so is to risk all the above misbehaviour. So either you accept that the Wimp is single threaded, or you accept you’ll only be running new programs. I think the latter is pointless – use some other OS in that case. What problem are we actually trying to solve? The oft-mentioned internet protocols have already been implemented using callbacks – abstract that and you’ve probably done half the job (ie the use of callbacks on a single-threaded machine is an implementation detail, it should not be the API). The Wimp though will require very elaborate heuristics to give (or appear to give) the benefits of multiple cores – much of the time it could run multi-threaded, but most Wimp protocols would have to force the Wimp (and all its Task threads) to synchronise to the serial, single-threaded behaviour that is mandated by the APIs. |
Eric Rucker (325) 232 posts |
And I don’t think anyone is saying to make existing software pre-emptive. Make new software preemptive (but with a cooperative stub), only able to call safe calls directly. (Unsafe calls can either be implemented as: a new safe call, and a call to simulate the old unsafe call; or, the old unsafe call, running the call as the parent cooperative task, which will inherently be safe, which is IIRC the Mac OS 8.6 way.) Old software stays cooperative, but the whole “blob of old software” can be pre-empted. Because the new software can only call safe calls without thunking into the cooperative blob somehow, it won’t cause problems with the cooperative software. |
Jeffrey Lee (213) 6048 posts |
Well, feel free to suggest your own ideas on how to do things. As you say, attempting to allow programs which use existing APIs to run concurrently will only result in failure, so what alternative is there other than creating new, thread-safe APIs?
The problem that we’re limited to only using 50%, 25%, or even less of the power of modern ARM CPUs. |
Eric Rucker (325) 232 posts |
There is always the Hydra approach if you want the fastest way to multicore support, but it doesn’t solve many of the other current problems with RISC OS. ] Of course, the Hydra approach had a very limited thread-safe API for its own threads, and anything that needed to directly touch the WIMP or any other thread-unsafe APIs needed to be in the main task running on the main CPU. |
Malcolm Hussain-Gambles (1596) 811 posts |
A “modern” PC desktop has around 4 cores, under normal utilisation only one core is realistically ever needed, two cores can improve performance in some cases. So another question should be: “What do we want to use the other core(s) for?” and “How can they be utilised in reality?” I would hope that RISC OS isn’t going to attempt to be a server grade OS or target serious number crunching. One random suggestion could be one core for the WIMP/OS and another core for the “userland” programs, and split the memory across the two? Just bouncing ideas around…. |
Rick Murray (539) 13840 posts |
I’ve been thinking about this. I think that a SWI call “Wimp_WillBlock” should be added, to alert the Wimp when a task will knowingly block the system. A prime candidate here is the !Printers stuff. If the task does not signal its intent to block, the Wimp should be at liberty to force-preempt it if it has not polled within 2 seconds (and if returning control to it has the same effect, to poll it less and less frequently). Likewise, related to the above, the idea of suspending a task stalled with an errorbox on-screen. If an errorbox is visible for more than Just an idea…
You can if the spreadsheet programmer thought of this in advance. I wrote a program (since ‘lost’, I ought to rewrite it) that scanned directories building a ‘map’ of JPEGs with sizes, image dimensions, etc.
Well, a single-threaded system on a modern ARM is akin to running a Xeon on pure 8086 code. It ought to work, given the eccentric behaviour of the IA-32 (x86) family, but it is hardly going to get the most out of the thing. The OMAP4 contains a dual-core Cortex-A9; and in the case of the 4470 there’s also two Cortex-M3 cores in addition to the A9s. The OMAP5 has two A15s plus two M4s. Wouldn’t it be nice to be able to make use of this added processing power? The question is: Do we do it in a way that requires a special new specific API (and is going to be the quickest and easiest to implement) or do we look to making RISC OS multiprocessor capable (and risk breaking everything along the way)? My vote would be for the easier option, primarily because there’s a heap of stuff still to do (we can’t yet play an SD quality XviD under RISC OS, for instance, no hardware accell on video decode) and I don’t imagine the small number of developers are going to be willing to take on a “let’s rewrite the entire OS”, especially if people stick with the one that they know because the new fancy multiprocessor one has no software! Malcolm might be onto something, if the other cores are handled by RISC OS almost as if they were co-processors; in this way a mini-RISCOS (set of stubs to talk to the real OS; and yes, it may need to wait) could run on the other cores and multicore-aware programs could then load up code/data onto this ‘co-processor’. We’ve a long history of having rather impressive results though something that sounds an obvious bottleneck – consider the Tube interface allowing a 3MHz 6502 to speed up a 2MHz 6502 system. Consider any RiscPC with a StrongARM fitted. Now? How about dual-wielded Cortex-Asomethings? For what it is worth, an update I had to SMPlayer (with MPlayer back-end) elected for 2 threads for H.264 decode. It all went very wrong until I switched back to a single thread. I’m not sure what stuff is running on what core (can XP even report this sort of thing?), it might be interesting to look at multicore utilisation and how much – in real world – the other cores are used, and why. |
Jess Hampshire (158) 865 posts |
I still think the Wimp2 approach is best. A new API which calls a PMT system that runs as one CMT program as far as the system is concerned. In future systems that arrangement could be reversed. |
Eric Rucker (325) 232 posts |
Keep in mind, the “fancy new multiprocessor one” won’t be a “rewrite the entire OS” scale project (although it will be extensive), and it won’t break compatibility with any software if done right, it’ll just add capabilities for new software. I want everyone in this thread to look at Mac OS 8.6. Toss a copy of it in SheepShaver if need be, along with a copy of 8.5. Grab a selection of popular Mac OS programs that predate Mac OS 8.6. Run them. |
Malcolm Hussain-Gambles (1596) 811 posts |
I keep on seeing the word “thread”. Whilst I would love a multi-core, multi-threaded RISC OS, the RISC OS Open Team (Steve, Rick etc.) seem like nice people and I would prefer them not to throw themselves off a cliff ;-) If we can get effort to do this, eventually… what about getting updates for the filesystem, USB stack and the basics done? I think my point is, progamming at this level isn’t difficult per say. You do require to be a skilled programmer, but that’s only 10% of the requirements. Could I stick at it for long enough? Probably No. |
Jeffrey Lee (213) 6048 posts |
Remember that the Cortex-M CPUs only support the Thumb instruction set, so there’s no chance any existing binaries will run on them. To get RISC OS to run on them, there’s a hell of a lot of assembler code in the ROM that would need rewriting, not to mention all the user apps – supporting Cortex-M may end up being more of a hassle than adding basic multicore support!
Remember that the tile of this thread is “Thinking ahead”. We all know that there’s a great many things which are likely to be a higher priority (and much easier to implement) than creating a multi-threaded/multi-core RISC OS will be. And several of the tasks that need doing may well end up as stepping stones towards making the OS multi-thread safe (memory protection, tighter process management, a common threading library the network/USB/FS stacks can use, etc.)
Yes, this is effectively the “RISC OS on microkernel” approach. |
Eric Rucker (325) 232 posts |
It sounds like Rick is actually talking about the Hydra approach, which is specially designed threads running on alternate processors, with existing RISC OS programs running on the main processor as is. |
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 ... 26