Thinking ahead: Supporting multicore CPUs
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 ... 26
Jeffrey Lee (213) 6048 posts |
Except the microkernel approach is much less of an ugly hack :) From a program’s perspective they’d be pretty much the same (in the beginning, at least), but from an OS developer’s perspective the Hydra approach doesn’t aim to solve any of the fundamental problems that the OS is facing. And (initially, at least) I don’t think the microkernel approach is guaranteed to be significantly more complex than the Hydra approach. After all, with the Hydra approach you’ll still need some kind of microkernel to manage the extra cores – the only difference between the two is that with the Hydra approach the microkernel will be a child of RISC OS, and with the microkernel approach RISC OS will be a child of the microkernel. |
nemo (145) 2552 posts |
I wrote:
And got two silly answers, so it was obviously a silly question. :-) What I meant was “do we really want multiple applications responding to Poll returns simultaneously, or do we want a parallel form of ‘background’ processing?”. Eric suggested
You can’t make Wimp applications pre-emptive in general. Wimp2 sort of proved that (in that it was a fun exercise but not production quality and never could be). I’ve mentioned the Wimp protocols before but it’s really important… exactly how is the Clipboard Protocol going to work if the application holding the clipboard data is busy when you press ^V? How is the Filer supposed to handle a double-click if applications are printing, or recalculating, or connecting to a server? No. In short, the UI – the Wimp and applications using it – must be (or often must act as) a single thread. Malcolm worried:
Yes, it’s not only the prevailing model it’s also a sensible generalisation – threads are applicable to single-core machines too. They are the programmer’s model – the programmer doesn’t want to have to write different code for multi-core processors. Ported code will be using threads of some kind, probably pthreads. I’ve had a lot of experience not only of writing and debugging threaded code, but also of converting a huge, monolithic and single-threaded application to be multithreaded to take advantage of multiple cores. Multithreading isn’t hard unless you make it so. Rick wrote:
I’m not sure what you meant by “!Printers stuff” as I’ve never been aware of !Printers tying up the machine. Perhaps you meant printing (which doesn’t involve !Printers). It has long been possible to print cooperatively, but it’s fiddly and the granularity is much too coarse – it’s the usual scalability problem: The printer drivers do the same thing with a 24MB sprite that they’d do with a 24KB one, and that’s no help to the application. Incidentally that problem also affected Wimp2 – all the pre-emption in the world didn’t help when you double-clicked on a 24MB sprite and had to wait for one very long OS_File,255 to complete. As an experiment I wrote a sprite editor which loaded and saved large files using GBPB under null polls (updating its window like a browser does). That was much more responsive, but requires cheating to get the desktop save protocol to work (which otherwise wouldn’t). Hence my concern that people might believe that pre-empting applications is trivial… it is, but you’ll break the Wimp. Jeffrey said
Making the OS thread-safe can be trivial – mutex the SWI interface from USR mode. That’s mostly it. There are a few calls that would need safe versions (just as there were a few that needed 32bit versions, and there will be many FS calls that will need 64bit versions), but probably not many. Even that can be avoided if one takes the Hydra approach, but that is a lot less useful, and wouldn’t in any way mitigate the ‘large file loading’ and ‘slow printing’ problems. They are ideal cases for multithreading, but that cannot be done independently of the Wimp because of the protocols – if you double-click a file in the Filer while an application is printing, the only sensible result is that the Filer should show an hourglass until the printing app Polls again. The Wimp’s protocols are co-operative, and cannot be pre-empted without breaking (or changing) them. Like it or not, this affects anything beyond the lowly ‘number crunching coprocessors’ aspiration. |
Colin (478) 2433 posts |
I don’t have any experience of programming PMT systems or its terminology for that matter but from the wimp programmers perspective it seems to me that we have 2 problems 1) Device blocking So, at the moment, for: To this end the programmer either chops up his program so that it can continue on null events or devises a simple threading system where the program is paused on a yield instruction and continued on a null event. Just a thought – and taking an interest. |
Andrew Daniel (376) 76 posts |
Nemo I’m curious as to your thoughts on utilising the extra ARM cores in modern cpus? Also so how fast would a 26 bit ARM emulator written in thumb2 run on one of those Cortex M3 cores? |
Rick Murray (539) 13850 posts |
In a word, no. Sorry. The thing is, taskwindows work with singletasking linear applications where 99% of the time it doesn’t matter if or where you pause them. Now consider an application. It is busy-waiting on data from a slow GPS dongle, or somesuch. While the user is waiting, they click Menu over your iconbar icon. Or look in your window for a “Cancel” icon. Or… |
Martin Bazley (331) 379 posts |
Paul Fellows recently related the story of how Arthur was originally developed to ROUGOL, and it was very much like a more literal version of the ‘co-processor’ technique mentioned above – as in, the ARM evaluation system was plugged into the Tube, the BBC powered it, and over time responsibility for more and more system functions was migrated to ARM code with the 6502 becoming little more than a bootstrap. Before anyone objects to the microkernel approach on the grounds that ‘it isn’t RISC OS’ to have the present kernel be the ‘child’ of some other entity, remember that exactly that has basically already happened with the addition of the HAL. |
Colin (478) 2433 posts |
If you are waiting for a slow device the main poll loop of your program continues as it does now and you deal with inputs like you do now, you just get time slices for the pmt parts of your program in place of null events. e.g mouse click → if not already reading device start reading device you do mouse_click → if not already reading device multitask_start read_device multitask_stop the multitasking would only happen when the system is idle ie at a null event so it doesn’t affect any other task or at least it is no different to what we have now is it? |
nemo (145) 2552 posts |
Well, modulo over-enthusiasm about Making The Wimp PMT!!! I’m broadly in agreement with all the other sensible contributors here – there’s a number of choices which, as they get more attractive for the user get much harder for the OS developer.
The thing is, cleverly written co-operative applications will appear much more responsive than poorly written pre-empted ones (compare RISC OS with any Windows for proof)… but a poorly written co-operative application can be far less responsive than a pre-empted one (ie the vast majority of non-trivial RISC OS programs – until fairly recently the clock stopped when dragging a window with panes for heaven’s sake!).
Ah, that’s a much easier question. I have absolutely no idea. |
Jeffrey Lee (213) 6048 posts |
AIUI the Cortex M cores are designed purely for low performance tasks, e.g. keeping your phones’ OS and comms hardware ticking over while the phone is in standby. So in short: A hell of a lot slower than the main CPU would be.
Which is a perfectly sensible way of developing the OS/hardware, and not really any different to how any games console manufacturer, mobile phone manufacturer, etc. would develop the hardware & OS for their latest devices, except instead of using the Tube they’d be using JTAG (and probably a few other assorted interfaces). Unfortunately this anecdote doesn’t help us much with making RISC OS multi-core friendly :) (Except perhaps as a reminder that JTAG is invaluable for many low-level debugging tasks) |
nemo (145) 2552 posts |
Actually that’s not really a problem at all – the user can accept that that program is busy – the message will queue up and the menu appear in a mo. No, far worse than that is the Filer problem – you’re interacting with a Filer window, and it’s all perfectly responsive. You’re not paying attention to whether that other program has finished recalculating or printing or whatever. You open the Filer menu, it works. You select a file, it works. You double-click a file… and nothing happens, so you double-click again… still nothing happens. You give up and drag it to a program. Some time later the application that was printing opens two copies of the file. Other scenarios: You press ^C in one program, switch to another and press ^V and nothing happens. You save a file into another program and nothing seems to happen, so you drag it to a Filer window instead. Seconds later your program crashes. There are many, many more. Even ^F12 could be broken by pre-emption. If you want to multithread the desktop (the UI – all the applications, basically) then the Wimp needs to be fully involved so that it can synchronise all the (wimp) threads at Wimp_Poll to allow certain Message protocols to work. Otherwise the desktop is completely broken. Quite how much synchronisation is necessary I’m not sure, but there are other difficulties including redrawing (which Wimp2 skirted somewhat) – if you pre-empt a task while it is redrawing you then allow other tasks to move their windows – this can then invalidate things the redrawing task knows, such as exactly which pixels to touch. That can only be fixed (in a multithreaded redrawing sense) by having tasks redraw to their own surfaces and not directly to the screen. That of course has a performance and responsiveness impact which is precisely what one is trying to avoid! Hence my previous point: The Wimp must be, or often appear to be, a single thread. |
Eric Rucker (325) 232 posts |
Actually, having tasks redraw to their own surfaces only lets you do interesting things regarding performance. First, the amount of redrawing that tasks have to do is significantly reduced. Second, if those surfaces are OpenGL surfaces, you can use the GPU to accelerate all UI operations. (This is what most modern operating systems do. Yes, RISC OS has a fast UI, but it can be made even faster that way. Yes, most modern operating systems have slow UIs, that’s because they’re using the GPU to display more shiny crap, and when you’re using integrated graphics, the performance sucks.) But, I believe PMT OSes that do use redrawing to display things properly just… tell the program to redraw again. Also, there is the whole Mac OS 8.6 thing again, where a program would keep all UI stuff in the WIMP, but spin things off to the threading system (which would be able to preempt the WIMP itself). |
Rick Murray (539) 13850 posts |
[quotes from all over the place – if you see yours, wave and say “hi!”]
Gee, thanks. :-)
Printing doesn’t involve !Printers? While they (printing and !Printers) are different things, they are tied up together, a sort of symbiotic relationship. Either way, printing does hairy things (try reporting an error without calling the AbortJob SWI and watch it all fall apart) which I would imagine would need serious reworking in a PMT environment.
Doesn’t that depend upon the Wimp? Surely a message is passed around to see if a task can handle the request. At the moment, I would imagine the message would be queued. However, if the Wimp is upgraded to understand blocking software, it probably shouldn’t bother trying to give a message it expects a reply from to a program that isn’t actively polling. Thus, the filer will get a NAK and it will start a new instance of the application…
Whoa! I’m not talking about porting RISC OS to the Cortex-M; I’m just thinking that it might be a possibility to task off some of the boring repetitive stuff to them – like the mechanism for the centisecond ticker and/or syncing the system clock? Stuff that happens routinely in the background, doesn’t require much oomph, and could be left to get on with it.
Ah, but how much work did you do vs how much the operating system assisted? We aren’t talking about porting an application, we’re talking about an operating system. Slightly different ballpark.
<cough!> I couldn’t. I’d like to be able to, but I at least have an idea of my capabilities (except that one where I marry a cute Japanese girl and live happily ever after with love bubbles and sparkles and crap), and writing a new OS “better than anything around now” isn’t one of them!
I don’t think many would object to that. I think we’d object to the “look and feel” not being RISC OS. In short, the API would change dramatically. How much of what arrives at the end will retain the spirit of RISC OS? You want a detailed description of a multitasking thread-capable OS? I can give it to you with bells on top. It’s called Minix, not only are sources available, but there’s a very very detailed book describing every aspect of how the system works. [ there are CHM and PDF versions floating around if you are so inclined; though I prefer paper… ] Now, if I was a little more clever and a little less lazy, it probably wouldn’t be overly hard to get the basics of that running on the ARM and build a sort of RISC OS layer on top. The thing is, I fear that the more the RISC OS layer would be created, the more I’d need to either bugger up existing APIs, or try to devise new concepts…until I reach the point where I realise I’ve long forgotten RISC OS and have just written YetAnotherDamnUnixClone. In short, the question isn’t “how simple is it to make pthreads” (why “p” prefix? this some sort of reverse polish notation thing?); but rather “how can we make RISC OS support multithreaded activity without completely buggering up what RISC OS is and how it works?” That said, the topic title talks about multiple core CPUs; which is not necessarily the same thing as multiple threads. ;-) |
Eric Rucker (325) 232 posts |
The thing is, you can’t rely on the Cortex-M cores being there on any SoC, so relying on them is IMO unwise except for implementing VERY platform-specific stuff. And, like I said, look at Mac OS 8.6. APIs stayed the same (even the Multiprocessing Services ones, except the thread-safe APIs were extended, I believe), but GUI stuff still had to run as a task within the Mac OS process, and the APIs didn’t change at all there. And, if anything, Mac OS was in worse shape than RISC OS – at least RISC OS 2.0 broke backwards compatibility with single-tasking GUI stuff when it came out, whereas Mac OS was trying to graft full multitasking onto a system where, while a cooperative model was in place, it was only designed to be used to allow very limited applications to multitask alongside full applications. |
Malcolm Hussain-Gambles (1596) 811 posts |
Rick: Sorry, think my point was missed! (Maybe that was my fault, just got over a bad cold) I wasn’t saying could actually do it, heck I’m just grasping the basics of RISC OS again! More if I had infinite time, after the 1000000’th (or more) re-write and didn’t find something else that distracted me over the billion years I’d probably need. The point was to reflect on: How much man-time is required to complete each suggested idea, and a possible likelyhood of said idea based on willingness and avaliablility. |
Eric Rucker (325) 232 posts |
There’s a third thing to consider, though: 3) Time saved in implementing future improvements to RISC OS once it’s done |
Rick Murray (539) 13850 posts |
What, you mean the three programs that actually used the Arthur GUI? (^_^) That said, I think “single-tasking GUI” makes about as much sense as “paid volunteer”. Slight aside: Back in those days, ARMBE (a single-tasking program) was the hot thing. Although I note that it is in Library of the RPi installation I have – so either it’s there for a joke or somebody actually still uses ARMBE! |
Malcolm Hussain-Gambles (1596) 811 posts |
Eric: That’s probably the most important point as well! |
Rick Murray (539) 13850 posts |
Malcolm: You too with the cold huh? I’m recovering from my second in as many months. Pffft! Is it valid to keep comparing against MacOS without knowing the ins and outs? Windows did the same sort of thing in the transition from Windows 3.1x to NT/95+ (in addition to some 16/32bit thunking that makes the brain ache). I am led to believe that Mac’s way of doing it is to run all of the co-operative programs as a single pre-empted thread. An interesting idea, certainly, though I’d hope this thread would have more priority/time the more tasks it is running. Anyway – in answer to your questions, I think a major determining factor is: Of course, you do realise, I hope, that in any case, we’d need to build the OS in mostly pure assembler? There’s a reason RISC OS is blindingly nippy… so take any estimate you had in mind and double it. Twice. And once again for good measure. |
Rick Murray (539) 13850 posts |
Of course, there’s always fibers (sic, or “fibre” for us Brits). |
Jeffrey Lee (213) 6048 posts |
There’s also a reason why it’s such a PITA to maintain. Plus the OS being written in “mostly pure assembler” isn’t true anymore. Taking a typical OMAP3 ROM and totalling up the sizes of all the modules I can see that the breakdown is 1313KB assembler, 1608KB C, and 57KB BASIC. I could sit here and rant for ages about how writing most of the OS in assembler isn’t the right way to go, but I’d sincerely hope that you’re smart enough to realise that fact for yourself. |
Eric Rucker (325) 232 posts |
Rick: That would be correct, that cooperative programs are run as a single pre-emptable thread. However, it’s the only thread that has access to unsafe APIs, much like the WIMP in a theoretical “RISC OS 5.6”, if you will, would. And, my understanding is that any user program must start as a cooperative stub, then call Multiprocessing Services and spin off any threads it wants to run (and those threads can call back into the cooperative stub as needed to access APIs that aren’t thread-safe). Also, a scheduler could be designed around the knowledge that there are multiple individual user-facing tasks running in that cooperative thread. If you wanted, the scheduler could even replace the WIMP scheduler in theory. |
GavinWraith (26) 1563 posts |
Could some of you knowledgeable ladies and gents give a few pointers about where to find information about multicore systems to someone like myself who has had no experience of them. Excuse these, no doubt naive, questions: 1) how do the cores communicate? 2) are cores provided with private RAM that other cores can read from but to which they cannot write? 3) is one core the sole owner of IO? 4) can one core trigger an interrupt in another (maybe part of question 1)? |
Jeffrey Lee (213) 6048 posts |
I’m sure there are many good textbooks on the subject, and there must be a few decent resources on the ‘net, but I can’t think of anything offhand.
A mixture of shared memory, hardware FIFOs/mailboxes, and interrupts (often tied to the FIFOs/mailboxes). Depending on how shared memory is mapped, various types of safeguards may be needed to make sure it’s accessed in a safe manner by all the different cores. Of course not all multi core/multi processor systems are the same, but we’re lucky in that modern ones try to make sharing memory as safe and easy as possible.
Each core has an independent set of page table pointers, so the cores are free to do whatever they want with memory (all private, all shared, partially shared, etc.)
The interrupt controller allows individual IRQs to be routed to individual cores. It’s also possible for the same interrupt to be routed to multiple processors – e.g. the timer used for thread scheduling would be a good choice for this. For other IRQs the OS should probably take into account the processing cost of handling each IRQ and distribute them appropriately between cores.
Yes. |
Rick Murray (539) 13850 posts |
Oh, I don’t deny that.
Funny, most of the core I’ve been poking around recently (amusement, more than anything else, I’m a sick sick person…) has been assembler. I know there’s a big wodge of C, not just the built-in apps but I’d guess the networking/sockets stuff as well.
(^_^) There must be, I guess, another reason why a mobile phone twice as powerful as a Pi takes eight times longer to reach a useable state… Or why my Linux-based PVR boots in three minutes (but the custom microkernel1 PVR on an identical SoC boots in 27 seconds). I’ll stop here as this could easily turn into a “why Linux sucks on small devices” rant of my own. 1 From the little I could figure out of the semi-scrambled firmware update file. I ought to hook the JTAG to the DM320 and just copy out the raw firmware for examination. |
Eric Rucker (325) 232 posts |
And, those devices aren’t booting slowly because of C, they’re booting slowly because their kernels are doing a LOT. (And, the problem with hand-optimized assembler is, next CPU generation, your optimization is now crap. Not as big of a problem on an architecture like ARM, where micro-ops are almost never used (and before ARMv7, never used), but still…) |
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 ... 26