Thinking ahead: Supporting multicore CPUs

636 posts, 79 voices

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 ... 26

May 24, 2011 3:21pm nemo (145) 2546 posts	Sorry to resurrect such an old thread (or sorry I didn’t stop by earlier). BA wrote: I do feel that process management is too tightly tied to the GUI and too loosely tied to threading libraries and pre-empters like the TaskWindow module at the moment. Spot on. The WIMP should be a GUI (and a veneer for the underlying Task stuff for compatability) – I think “Task” is more RO than “Process”. Threads are also absolutely required, and before multi-core considerations. “Tasks” need to be supported at the Kernel level, far below the WIMP… by which I actually mean Tasks/Threads of course. Here’s another vote for using a per-thread memory allocation system as a way to patch up SWIs like Wimp_OpenTemplate W_OT was always botched but could have been safe from the start – it should have been part of each Task’s information, not shared between them. As such it’s a small bit of WIMP plumbing, not something that needs threading to fix – if the WIMP knows which Task is calling it (as it always must) then there’s no problem. Wimp_CloseTemplate would be implicit during an (implicit) OS_Exit so the WIMP doesn’t leak. Re-entrancy of allocation is a different problem of course, but not the WIMP’s problem. try running a couple of taskwindows concurrently which do a lot of GSTrans processing or directory enumeration for example That’s not a TaskWindow problem – try doing the same under Null Polls in any Task and you’ll have the same problem. The Wimp is still a total surprise to the Kernel, which still thinks it’s Arthur. Writing a good scheduler appears to be rather a black art And only a question of optimisation, for which as we well know there are only two rules: Don’t do it. (Experts only) Don’t do it yet. Making it run fast is a tiny detail after the rather harder stage of making it run at all! …because we set a maximum possible size of each dynamic area as soon as we set the base address of the next dynamic area along Instead of blindly allocating masses of address space to titchy DAs, assume for old programs an old limit (64MB, or maybe 256 if we’re super generous). New programs that want more set a flag when they create the DA. The DA doesn’t get more address space at first, instead the flag marks the DA as moveable. If it can’t extend in place, it gets moved to a different logical address, and the DA Handler called (with new reason codes) before and after the move. OS_ChangeDynamicArea returns new base address in R2 for such DAs. The requirement is Big DA = Relocateable DA, but hey, what can you do. JL wrote: We’d either have to use ARMv7-specific features directly (stopping software from running on older machines), or wrap the ARMv7 features in SWIs (a good idea, but could hurt performance for speed-critical software), or start storing the data in memory and use more page tables/RAM as a consequence. The old SWI vs DLL question! I see no problem with SWI_Func1, SWI_Func2, SWI_GetFuncTable so you can have it either way (note slight wrinkle of making the latter safe in an OS where modules can be replaced at run time – icky but not impossible). As for the programmer model, I’m currently preparing to write a pthread compatibility layer on top of threadx (or considering whether I should, or just throw them both in the sack and look the other way), so by the end of that I’ll have a better idea of how I feel about the API… but we shouldn’t be using Wimp_StartTask or *TaskWindow to be spawning new threads. In other words, one should be able to have a number of multi-threaded tasks running on RISC OS without involving the Wimp at all! There is a much bigger question (in effect) than making the Kernel work – how do we expect the Wimp to work. ie what does a pre-empted Wimp look like? So many of the protocols assume known yield points (as Neil Douglass proved with Wimp2). Eg it is possible to pause the desktop save protocol… though only (for completeness) if you are able to fake a message bounce back to the originator. PostFilters FTW here. But the killer is the simple DataRun broadcast. I don’t see how that can work well with pre-empted legacy apps (as I’ve probably discussed before). Bounteous apologies if anything I’ve written is already obsolete, I only seem to pass by every year or so. :-O

May 25, 2011 6:36am André Timmermans (100) 655 posts	In other words, one should be able to have a number of multi-threaded tasks running on RISC OS without involving the Wimp at all! There is a much bigger question (in effect) than making the Kernel work – how do we expect the Wimp to work. ie what does a pre-empted Wimp look like? So many of the protocols assume known yield points (as Neil Douglass proved with Wimp2). Eg it is possible to pause the desktop save protocol… though only (for completeness) if you are able to fake a message bounce back to the originator. PostFilters FTW here. But the killer is the simple DataRun broadcast. I don’t see how that can work well with pre-empted legacy apps (as I’ve probably discussed before). Well, I wouldn’t mind the main Wimp task/thread working as before, most of the existing applications are well designed enough to cope with it, as long as you can allow background work: 1) to be pre-empted, i.e. allow the system to slice heavy CPU work 2) to be put to sleep till an IO (disc op/network) is completed so that other work may take place in the meantime. It such a pain for example to have a callback to be unable to work correctly while performing a file reading operation on an FS which relies on callbacks to complete its operations.

May 27, 2011 3:49pm nemo (145) 2546 posts	Problems with desktop save and DataRun protocols in a pre-empted wimp: Co-operative: Saver sends DataSave then yields (Wimp_Poll), Loader receives DataSave and replies with DataSaveAck then yields, Saver saves file and sends DataLoad, and yields, Loader loads file and sends DataLoadAck, and yields, Saver gets DataLoadAck, all is well. Pre-empted: Saver sends DataSave then yields, Loader receives DataSave and replies with DataSaveAck then yields, Saver gets DataSaveAck and saves file, and sleeps… Loader gets DataSaveAck bounced back because no one has replied. :-/ Step 3 above doesn’t happen under Wimp2 because saves were still atomic. The only reason the desktop save protocol worked under Wimp2 was coincidence – tasks tend to respond to messages before getting pre-empted. So clearly the Wimp must hide messages between being delivered to a task and that task next yielding – otherwise it would bounce or get delivered to the next task in “parallel”. Care must therefore be taken with tasks that die while “hosting” a message. Another problem with the data save protocol concerns the possibility of initiating another save while the first is still occurring – ie before the Loader has sent DataLoadAck. And as for DataSaved… well that’s always been dodgy and hardly ever implemented correctly. However, DataRun is the biggest problem: What happens when you double-click a spreadsheet file while the spreadsheet program is recalculating? Broadcast messages are delivered to each task in turn, so what happens when one is at sleep or busy rather than yield? The task cannot be re-entered with the message (Wimp2 dealt with this by having, in effect, a separate message handling thread for the task). Does the DataRun pause at a task until it yields? The result of that is that double-clicking on a file might appear to have no effect at all. This usually results in the user double-clicking again. Are such messages amalgamated (as some already are)? Does it skip that task and go on to the next? The result of that is that sometimes double-clicking on a file will open another instance of the application, or be opened by a totally different application. What if it’s one of those applications that refuses to run more than one instance at a time? Does it try all the yielded tasks first and then wait for the busy and sleeping ones before bouncing back to the Filer? Once again we have unpredictable destinations and interminable delays. The trouble is that the Wimp doesn’t know which tasks load what kinds of file, so any DataRun must be passed to all the tasks, every time. I have investigated this not only through using Niall’s Wimp2, but also by simulating pre-emption through pausing the desktop save protocol to implement “slow” (browser-like) loading and saving of files between desktop programs. As long as the user doesn’t try anything clever, you’re OK… but as soon as the user tries multiple simultaneous saves (for example) most applications get confused.

Oct 27, 2012 1:20pm Eric Rucker (325) 232 posts	So, here’s a question. Long-term, what would be the downside of a “Wimp3”-like approach (think along the lines of adding Hydra support to Wimp2, except not actually doing that)? I know, it’s not the most elegant, having multitasking being done at the WIMP level, but here’s the advantages I can see: Likely faster development – the existing stuff below wouldn’t need to be made safe, as all PMT programs would be treated as one giant CMT program that also happens to use other cores while it’s running (but not while other CMT programs are running). The mechanism that PMT programs use to communicate with the OS would need to be made safe, but all of the “making safe” could be done at the Wimp3 level, and if done right, a future build could do it truly right and just use the Wimp3 APIs. Full backwards compatibility with no effort – as far as the other software’s concerned, the PMT programs are just another CMT program, meaning that the other software doesn’t have to yield to it Could be installed on top of ROL versions of RISC OS – means that developers targeting this system will have a larger target base (meaning there’ll be more adoption of the PMT/SMP support early on) I don’t think it’s a good idea to preempt existing applications (look at what happened with Wimp2, after all), or run them on any cores but the first, so this approach would only benefit new software, and CMT software could still bring down the system. Also, this approach does mean that, the more CMT stuff you have running, the more atrocious performance is (as the other cores sit idle while CMT stuff runs). But, still, there would be a benefit now, and if the APIs are set up properly, that benefit could be carried into more “pure” versions of the approach later. Basically, looking over everything, I’m advocating approach 5, using a hybrid of approach 1 and 2 as the way to get there, with approach 1 as a potential ultimate goal (with the caveat that fully doing approach 1 will be far, far more effort to preserve legacy compatibility than stopping at completion of approach 5, but will improve stability). I don’t think a pure approach 1 is workable, due to the whole “it’s not RISC OS any more” problem if you can’t run any existing RISC OS software, and the amount of effort required (with only minimal support from the community, I can’t see it succeeding). Approach 5 is the way that gets benefits to the platform the fastest (ignoring a pure approach 2, which doesn’t solve any of RISC OS’s other problems, unlike approach 5), and more importantly, gets software written for it the fastest if done right. It does leave some things unfixed, but it allows for fixing them later, if done right. I’d gladly throw some cash at a bounty to get a decent approach 5 implemented, FWIW.

Oct 27, 2012 2:42pm Malcolm Hussain-Gambles (1596) 811 posts	My thoughts, from an external point of view – so I could be totally wrong….just my thoughts. Mixing CMT and PMT seems a bad idea to me. I would prefer to see a clean cut off. Move to say RISC OS X (lol), and once it is actually in a working state, maybe provide something like amulator for legacy apps. Providing a mix of CMT and PMT in the OS seems a bad idea, and given the current resources avaliable it seems more sensible to focus on a pure PMT environment and getting it actually working. Otherwise don’t we run the risk of not having anything anytime? As far as commercial apps goes, if there is enough demand they will be rewritten for the “New OS”. If there isn’t what’s the point in going PMT in the first place. I would hope that a push for PMT would increase interest enough for RISC OS to be more widespread.

Oct 27, 2012 3:33pm Rick Murray (539) 13840 posts	I would prefer to see a clean cut off. Move to say RISC OS X (lol), Dangerous move without some degree of guaranteed developer support. The first question I’d ask is does RISC OS need PMT support? In other words, does the development (which would be rewriting the Wimp, big swathes of kernel; not to mention creating a new API that would be by definition be incompatible with current RISC OS) be justified by the end result?

Oct 27, 2012 3:33pm Eric Rucker (325) 232 posts	The problem that I see is that some of the apps in question are abandoned, and some aren’t commercial, either. And, adoption of a PMT-only fork with no existing software would, I fear, look like what happened with Vista, just worse. So, you’d need to virtualize (or emulate, but that sucks) for the old software, which means you’re looking at running on something like the X-Gene to get virtualization support. And if you’re running on THAT arch, might as well make it AArch64 while you’re at it… (Not that getting ready for AArch64 is a bad thing, mind you, but doing all of this at once is an absolutely massive effort. That level would actually be the ideal, but this community doesn’t have the resources of Apple or Microsoft, or even the resources of a BSD.) The mixed approach means that you get some of the benefits now, without having to reimplement everything (although you will have to reimplement some of the OS no matter what).

Oct 27, 2012 3:40pm Jeffrey Lee (213) 6048 posts	Long-term, what would be the downside of a “Wimp3”-like approach? The major downside I can think of is that by keeping the threading code in the Wimp, it won’t do anything to make things easier for complex OS components which could benefit greatly from threading (whether from the point of view of performance or ease of implementation). Network stack, USB stack, filesystem stack, etc. So I think a better approach would be something like: Get a microkernel running which can provide threading, interrupt & memory management facilities to all cores Get RISC OS running on that microkernel (using one thread/core), complete with multicore/thread safe SWI dispatcher. Modules will be flagged for whether they’re thread safe or not, any threaded SWI call to a non-threadsafe module will generate an error. For OS SWIs, the flags would be on a per-SWI basis, so we only have to worry about making the bare minimum thread safe. Expose threading/microkernel functionality via some SWIs, so the network stack, USB stack, filesystem stack, etc. can run in a fully threaded manner. Only requirement is that they’ll have to manually handle interaction with the main RISC OS thread – e.g. pushing events onto a queue which RISC OS can then pull from at a safe time. This event queue functionality should probably be built into the OS itself. Since the microkernel functionality is exposed via SWIs, “Wimp3” tasks can make use of them to spawn their own threads. On multi-core systems, these threads will most likely be allowed to continue to run even while the task isn’t active on the main thread. (Potentially) write a simpler threading layer which will allow the threaded Wimp tasks to run on other OS versions, but with similar restrictions to Wimp2, Unixlib threads, etc. (i.e. threads will only run while task is active) I think this plan is closest to option 1 in its eventual goal, but with a reachable initial goal of only providing the minimum threading functionality necessary for useful threaded code to be written. As time goes by we can make incremental updates to improve performance (make more modules thread safe, tackle threading-incompatible APIs, etc.) The only problem is that Wimp tasks tend to use a lot more SWIs than hardware drivers and OS-level components, so it would be a big problem if Wimp threads were only allowed to use the small number of thread safe SWIs. To tackle that I think we’d need to have a “compatbility mode” flag for threads. If this flag is set, and the thread calls a non-thread safe SWI, then the thread will be suspended until the main thread reaches a state where the other thread can take control. For threads associated with Wimp tasks there’s also the extra requirement that the correct task must be active (therefore mimicing the behaviour of Wimp2, Unixlib threads, and the compatability layer we’d have for older OS’s). This compatability flag may also place some restrictions on the behaviour of the thread – e.g. because the thread needs to be capable of running in the single-tasking RISC OS world, it won’t be allowed to use any per-thread dynamic areas (if we were to implement such things). Although this may sound like it’s similar to option 5, it’s actually quite different; with option 5 I was envisaging some hacky system of detecting when a thread-unsafe activity is being performed (e.g. for wimp tasks, trap any memory reads/writes which occur outside of the tasks wimp slot), but with this approach it’ll be much cleaner as it requires applications to flag that they’re thread safe before they’re let loose on the other cores.

Oct 27, 2012 4:20pm Eric Rucker (325) 232 posts	Hmm, that could be an interesting way to do it. I suspect it would result in end users and application developers seeing the main gains later, but on the flip side, it could all be done behind the scenes without end users even NOTICING, if done correctly, and stuff under the hood could be improved gradually. That said, I wasn’t thinking of doing hacky detection of thread-unsafe activity – I was thinking of, if a program wanted PMT/SMP, it had to explicitly be marked as such, and then the PMT/SMP environment would have its own SWIs (and would handle thread safety on its own). Where things would get ugly with that approach would be, if a CMT program had control, the other cores would sit idle, whereas in your approach, they can keep running until they need to hit an unsafe SWI. Also, I forgot that Cortex-A15 has virtualization support as well, so that would be another acceptable option if an option as radical as “RISC OS X” is taken. (As far as AArch64 goes… that’s really a subject for another thread, and with the memory usage of RISC OS software, not the highest priority right now, IMO (and LPAE, which even the A15 supports, helps, too), but it is something to consider.)

Oct 27, 2012 5:07pm Jess Hampshire (158) 865 posts	5. (Potentially) write a simpler threading layer which will allow the threaded Wimp tasks to run on other OS versions, but with similar restrictions to Wimp2, Unixlib threads, etc. (i.e. threads will only run while task is active) Wouldn’t that be logical to do first, so that apps get produced that will make use of the new system? (Also would such a new system be more compatible with doing a WINE type thing on Linux than the current system?)

Oct 27, 2012 7:18pm Malcolm Hussain-Gambles (1596) 811 posts	Rick: I’d agree it’s a very dangerous move, but if people want PMT then I’d see that as the only option if it is to happen. Personally I’d go for better abstraction libraries to be written first, similar to toolbox but for non-wimp stuff (telnet/http/jabber/ssl/xml/imap/pop/smtp libraries are what I’m attempting to write, albeit slowly and very hard going!). And importantly documentation and clear examples, not marked as legacy ;-) PMT for me is a distraction, it would be “cool” – but until there is more interest I don’t see the point. So we can port more non-riscos applications? That’s digging a hole isn’t it…. Outlook or evolution on riscos? Isn’t it easier to install windows or linux?

Oct 27, 2012 8:26pm Eric Rucker (325) 232 posts	PMT is useful for more than just porting software. It’s useful for improving the stability and responsiveness of the system, too, because a program can’t “hold onto” the CPU against the system’s will. And, one of the main obstacles to that is the lack of thread safety. That said, it’s not strictly required for a multiprocessor CMT system, but a degree of thread safety is (and the Simtec Hydra was a “first come, first served” CMT system that had a very limited API to be thread safe). Come to think of it, with a fully thread safe system, you could actually assign CMT processes to whatever processor comes available next in line, and get HUGE benefits, but PMT would get you even more granular control of the system’s performance, and once you’ve got the thread safety… (FWIW, any less than a fully thread safe system that dispatches processes to other cores will require developers to specifically target additional cores, though.) I’ll note that Apple went for “PMT in a CMT environment” at first (Multiprocessing Services 1.x, as far as I can tell, which is Mac System 7.5.2 through Mac OS 8.5), and then “CMT in a PMT environment” in later releases of the classic Mac OS (8.6 through 9.2.2). Also, Apple allowed their threads to run on the main CPU, too.

Oct 29, 2012 10:25am Jess Hampshire (158) 865 posts	(telnet/http/jabber/ssl/xml/imap/pop/smtp libraries are what I’m attempting to write, albeit slowly and very hard going!). Jabber sounds very useful, that would fill a hole in what RISC OS can do.

Oct 29, 2012 10:41am Eric Rucker (325) 232 posts	Actually, I was just thinking… the microkernel approach (which is basically the Mac OS 8.6-9.2.2 approach) allows some other fun stuff. Like making a version of the microkernel that supports AArch64, running the AArch32 CMT process, and any AArch32 or AArch64 threads that have spun off. ARMv8-A supports running AArch32 (fully ARMv7-A compatible (read: fully compatible with the Cortex-A8/A9/A15), they say) code in the userland of an AArch64 OS, after all. Alternately, you could virtualize RISC OS, and thunk in and out of the VM. (This approach also works on Cortex-A15, but not on anything older than that.) Move more and more of the OS out of the VM, until the VM is just a legacy compatibility tool that happens to integrate really really well.

Oct 29, 2012 1:15pm nemo (145) 2546 posts	I’m slightly worried that the focus is here is “write another multicore OS!” rather than “allow RISC OS to make use of multiple cores”. Putting aside the replumbing of separating Task (ie process and/or thread) management from the Wimp, one must still acknowledge that the Wimp as we know it is single threaded. By that I mean most of the protocols are serial, and consequently all existing applications have been written expecting them to be serial. Attempt to pre-empt those protocols and applications will crash, leak memory, corrupt data and potentially destroy files. (I don’t just mean the mythical C compiler taking advantage of ‘undefined behaviour’). It would be lovely to be able to interact with this drawing program while that spreadsheet is recalculating… but to do so is to risk all the above misbehaviour. So either you accept that the Wimp is single threaded, or you accept you’ll only be running new programs. I think the latter is pointless – use some other OS in that case. What problem are we actually trying to solve? The oft-mentioned internet protocols have already been implemented using callbacks – abstract that and you’ve probably done half the job (ie the use of callbacks on a single-threaded machine is an implementation detail, it should not be the API). The Wimp though will require very elaborate heuristics to give (or appear to give) the benefits of multiple cores – much of the time it could run multi-threaded, but most Wimp protocols would have to force the Wimp (and all its Task threads) to synchronise to the serial, single-threaded behaviour that is mandated by the APIs.

Oct 29, 2012 1:41pm Eric Rucker (325) 232 posts	And I don’t think anyone is saying to make existing software pre-emptive. Make new software preemptive (but with a cooperative stub), only able to call safe calls directly. (Unsafe calls can either be implemented as: a new safe call, and a call to simulate the old unsafe call; or, the old unsafe call, running the call as the parent cooperative task, which will inherently be safe, which is IIRC the Mac OS 8.6 way.) Old software stays cooperative, but the whole “blob of old software” can be pre-empted. Because the new software can only call safe calls without thunking into the cooperative blob somehow, it won’t cause problems with the cooperative software.

Oct 29, 2012 1:42pm Jeffrey Lee (213) 6048 posts	I’m slightly worried that the focus is here is “write another multicore OS!” rather than “allow RISC OS to make use of multiple cores”. Well, feel free to suggest your own ideas on how to do things. As you say, attempting to allow programs which use existing APIs to run concurrently will only result in failure, so what alternative is there other than creating new, thread-safe APIs? What problem are we actually trying to solve? The problem that we’re limited to only using 50%, 25%, or even less of the power of modern ARM CPUs.

Oct 29, 2012 1:53pm Eric Rucker (325) 232 posts	There is always the Hydra approach if you want the fastest way to multicore support, but it doesn’t solve many of the other current problems with RISC OS. ] Of course, the Hydra approach had a very limited thread-safe API for its own threads, and anything that needed to directly touch the WIMP or any other thread-unsafe APIs needed to be in the main task running on the main CPU.

Oct 29, 2012 11:19pm Malcolm Hussain-Gambles (1596) 811 posts	A “modern” PC desktop has around 4 cores, under normal utilisation only one core is realistically ever needed, two cores can improve performance in some cases. Further more, by RISC OS being CMT how much of that time on the single processor is actually utilised as blocking requests (e.g. disc/network and other I/O) are not CPU intensive necessarily but simply tie the CPU up. So another question should be: “What do we want to use the other core(s) for?” and “How can they be utilised in reality?” I would hope that RISC OS isn’t going to attempt to be a server grade OS or target serious number crunching. One random suggestion could be one core for the WIMP/OS and another core for the “userland” programs, and split the memory across the two? If the userland CPU finishes first then it just has to wait for the main CPU, if the Userland CPU is waiting for the main CPU then it just has to wait. That idea has plenty of problems and it’s likely not worth while. Possibly similar idea and have one CPU and chunk of memory for PMT and another for CMT? Waste of memory definately…but old apps could run “along side”???? Maybe it’s another bad idea Just bouncing ideas around….

Oct 30, 2012 12:03am Rick Murray (539) 13840 posts	It’s useful for improving the stability and responsiveness of the system, too, because a program can’t “hold onto” the CPU against the system’s will. I’ve been thinking about this. I think that a SWI call “Wimp_WillBlock” should be added, to alert the Wimp when a task will knowingly block the system. A prime candidate here is the !Printers stuff. If the task does not signal its intent to block, the Wimp should be at liberty to force-preempt it if it has not polled within 2 seconds (and if returning control to it has the same effect, to poll it less and less frequently). Likewise, related to the above, the idea of suspending a task stalled with an errorbox on-screen. If an errorbox is visible for more than seconds (I’m thinking 30, some might prefer less) with no user activity, then the Wimp should take and cache a bitmap snapshot of the windows belonging to the errored application, and then remove all of those windows and the errorbox from the screen, repainting the application’s icon in shades of bright red, and maybe red-tinting the backdrop and iconbar so it is really obvious that something happened. Then when the user clicks the red application icon, the windows are re-opened and the error re-displayed. As it is obviously unsafe to call the program to handle OpenWindow or RedrawWindow, this activity is managed entirely by the Wimp itself (and this is where the cached bitmap comes into play). Just an idea… It would be lovely to be able to interact with this drawing program while that spreadsheet is recalculating… You can if the spreadsheet programmer thought of this in advance. I wrote a program (since ‘lost’, I ought to rewrite it) that scanned directories building a ‘map’ of JPEGs with sizes, image dimensions, etc. Do you think the Desktop froze while the scan was taking place? Don’t be silly, there was a mini poll-handler in the scanning routine so activity would continue while my software was busy. It takes a little bit of additional thought, but it isn’t impossible, and it certainly doesn’t need a pre-emptive system to allow you to do x while y is busy, unless you’re using some rubbish software! (^_^) What problem are we actually trying to solve? Well, a single-threaded system on a modern ARM is akin to running a Xeon on pure 8086 code. It ought to work, given the eccentric behaviour of the IA-32 (x86) family, but it is hardly going to get the most out of the thing. The OMAP4 contains a dual-core Cortex-A9; and in the case of the 4470 there’s also two Cortex-M3 cores in addition to the A9s. The OMAP5 has two A15s plus two M4s. Wouldn’t it be nice to be able to make use of this added processing power? The question is: Do we do it in a way that requires a special new specific API (and is going to be the quickest and easiest to implement) or do we look to making RISC OS multiprocessor capable (and risk breaking everything along the way)? My vote would be for the easier option, primarily because there’s a heap of stuff still to do (we can’t yet play an SD quality XviD under RISC OS, for instance, no hardware accell on video decode) and I don’t imagine the small number of developers are going to be willing to take on a “let’s rewrite the entire OS”, especially if people stick with the one that they know because the new fancy multiprocessor one has no software! Malcolm might be onto something, if the other cores are handled by RISC OS almost as if they were co-processors; in this way a mini-RISCOS (set of stubs to talk to the real OS; and yes, it may need to wait) could run on the other cores and multicore-aware programs could then load up code/data onto this ‘co-processor’. We’ve a long history of having rather impressive results though something that sounds an obvious bottleneck – consider the Tube interface allowing a 3MHz 6502 to speed up a 2MHz 6502 system. Consider any RiscPC with a StrongARM fitted. Now? How about dual-wielded Cortex-Asomethings? For what it is worth, an update I had to SMPlayer (with MPlayer back-end) elected for 2 threads for H.264 decode. It all went very wrong until I switched back to a single thread. I’m not sure what stuff is running on what core (can XP even report this sort of thing?), it might be interesting to look at multicore utilisation and how much – in real world – the other cores are used, and why.

Oct 30, 2012 12:13am Jess Hampshire (158) 865 posts	I still think the Wimp2 approach is best. A new API which calls a PMT system that runs as one CMT program as far as the system is concerned. In future systems that arrangement could be reversed.

Oct 30, 2012 1:14am Eric Rucker (325) 232 posts	Keep in mind, the “fancy new multiprocessor one” won’t be a “rewrite the entire OS” scale project (although it will be extensive), and it won’t break compatibility with any software if done right, it’ll just add capabilities for new software. I want everyone in this thread to look at Mac OS 8.6. Toss a copy of it in SheepShaver if need be, along with a copy of 8.5. Grab a selection of popular Mac OS programs that predate Mac OS 8.6. Run them.

Oct 30, 2012 7:29am Malcolm Hussain-Gambles (1596) 811 posts	I keep on seeing the word “thread”. Threads are lightweight processes, that are generally used to maximise throughput. They are hellish to debug, for threading you need a whole host of functionality in the OS. Shared memory, IPC and very careful programming (serialisation of data access! etc.) Whilst I would love a multi-core, multi-threaded RISC OS, the RISC OS Open Team (Steve, Rick etc.) seem like nice people and I would prefer them not to throw themselves off a cliff ;-) If we can get effort to do this, eventually… what about getting updates for the filesystem, USB stack and the basics done? If you are going to attempt a threaded I/O subsystem, you had better know how it works… muhahahahaha ;-) I think my point is, progamming at this level isn’t difficult per say. You do require to be a skilled programmer, but that’s only 10% of the requirements. The main problem is time and persistance, a heck of a lot of it. I could probably write a new OS from the ground up better than anything around now. Could I stick at it for long enough? Probably No. Would I live long enough? Probably Not. Would it be outdated by the time I finished it? Probably

Oct 30, 2012 10:31am Jeffrey Lee (213) 6048 posts	The OMAP4 contains a dual-core Cortex-A9; and in the case of the 4470 there’s also two Cortex-M3 cores in addition to the A9s. The OMAP5 has two A15s plus two M4s. Remember that the Cortex-M CPUs only support the Thumb instruction set, so there’s no chance any existing binaries will run on them. To get RISC OS to run on them, there’s a hell of a lot of assembler code in the ROM that would need rewriting, not to mention all the user apps – supporting Cortex-M may end up being more of a hassle than adding basic multicore support! The question is: Do we do it in a way that requires a special new specific API (and is going to be the quickest and easiest to implement) or do we look to making RISC OS multiprocessor capable (and risk breaking everything along the way)? My vote would be for the easier option, primarily because there’s a heap of stuff still to do Remember that the tile of this thread is “Thinking ahead”. We all know that there’s a great many things which are likely to be a higher priority (and much easier to implement) than creating a multi-threaded/multi-core RISC OS will be. And several of the tasks that need doing may well end up as stepping stones towards making the OS multi-thread safe (memory protection, tighter process management, a common threading library the network/USB/FS stacks can use, etc.) Malcolm might be onto something, if the other cores are handled by RISC OS almost as if they were co-processors; in this way a mini-RISCOS (set of stubs to talk to the real OS; and yes, it may need to wait) could run on the other cores and multicore-aware programs could then load up code/data onto this ‘co-processor’. Yes, this is effectively the “RISC OS on microkernel” approach.

Oct 30, 2012 10:38am Eric Rucker (325) 232 posts	It sounds like Rick is actually talking about the Hydra approach, which is specially designed threads running on alternate processors, with existing RISC OS programs running on the main processor as is.