Thinking ahead: Supporting multicore CPUs

636 posts, 79 voices

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... 26

Aug 30, 2013 6:30pm Jess Hampshire (158) 865 posts	“How do I install a version of Linux that looks and acts like RISC OS and mostly runs RO programs without alteration?”. A version of Linux that looks and acts like RISC OS and runs Linux programs would be good enough, if it runs some RISC OS stuff too then that would be icing on the cake.

Aug 30, 2013 6:46pm Malcolm Hussain-Gambles (1596) 811 posts	From my perspective the main difference between linux desktops (apart from massive bloat,sluggishness and instability of applications [YES EVOLUTION I MEAN YOU!]) is the file management. If you aren’t bothered about that then I’d suggest trying enlightenment. (and it’s the only lightweight desktop on linux that looks nice and is rapid, IMHO ) But the more I think about it, the more I think what is the point of multicore/pre-emptive at the moment. Integrated I/O Scheduling (Disc and networking) would be a great step. Unless we are going to support android apps of course ;-) Just my 1.2 pence

Aug 31, 2013 10:09am Simon Willcocks (1499) 513 posts	Thanks for the plug, Bryan! I haven’t had a chance to read through the thread yet, but I’ll try to answer a few of the questions about ROLF. Will Impression Publisher run under ROLF? Almost certainly not, yet. ROLF will not magically solve the “multi core multi threaded” problem if it wants to retain backward compatibility. More precisely, it will face the exact same problems that any solution tackling these things will face. It will allow you to write programs that make use of the multiple cores and multi-threading right now, you just have to write them using Linux libraries and keep the GUI interfacing to a single thread. It should be that redraws from multiple processes can progress simultaneously at the moment, the window positions are locked until the last rectangle is redrawn. Is there a downloadable, installable binary distro of ROLF that people can get hold of to try/test/play with? Not since 2007, and that has disappeared off the internet. I won’t have much time until next week to look at it, but I’ll try to come up with something asap. Maybe with a RO5 rom image for the modules.

Aug 31, 2013 11:32am Chris Evans (457) 1614 posts	It will allow you to write programs that make use of the multiple cores and multi-threading right now, you just have to write them using Linux libraries and keep the GUI interfacing to a single thread. It should be that redraws from multiple processes can progress simultaneously at the moment, the window positions are locked until the last rectangle is redrawn. Sorry if I’m being thick but does that mean that if you follow the above you could write something that would run on a standard RISC OS computer e.g. Iyonix/Risc PC and that it would also work under ROLF. And that under ROLF part (redraws only) would Multi Process simultaneously? And is “Multi Process simultaneously” meaning multi processes on the same core or Multicores

Aug 31, 2013 3:58pm Simon Willcocks (1499) 513 posts	does that mean that if you follow the above you could write something that would run on a standard RISC OS computer e.g. Iyonix/Risc PC and that it would also work under ROLF No, you would be writing a ROLF program, not a RISC OS one. That said, if you wrote a multi-threaded RISC OS program that used a Unix standard threading model, recompiling it for ROLF would probably give you multi-core use.

Aug 31, 2013 5:32pm Timothy Baldwin (184) 242 posts	On second thoughts my suggestion was overkill. However simply locking around SWI calls is not sufficient: Pre-emptive threads can not access the wimp (which isn’t a serious problem). Non-thread safe user mode code may run concurrently with a non-thread safe SWI.

Sep 2, 2013 10:09am Andrew Hodgkinson (6) 465 posts	FWIW, Mac OS X puts the onus on the programmer to ensure they only do GUI updates in the main thread. There is robust API support to help other threads very easily “ask” the main thread to perform actions for background activities which need to update the UI. This works well provided the programmer avoids heavy blocking operations on the main thread, as that’d prevent GUI updates from being processed. Instead, the UI thread only does lightweight stuff and worker threads should be used for any heavyweight processing. There is a huge collection of very low to very high level mechanisms for doing different kinds of thread processing with this model in mind. Coupled with a preemptive scheduling model between processes, the worst that might happen is that an application’s UI becomes unresponsive but the rest of the system does not. Note the two problems being solved here, which it is important to keep distinct: The problem of allowing the WIMP to be interrupted and other applications to run at any time, which has been tackled in various ways in the past (most notable would be Wimp2) The problem of providing an easy route to multithreaded processing within individual applications (which is tackled less often and usually with ad hoc libraries often tied to a specific hardware solution, such as Hydra) Personally I’d approach this problem from the requirements first, then the API, then the implementation and iterate if necessary. First work out what you want application programmers and the system to be able to do. Then work out how you’d like to present those facilities as an API. Finally, consider the implementation; it may be that certain requirements and/or APIs are not technically feasible, so you go around the loop again adjusting things. (As far as APIs and implementations go, since Grand Central Dispatch is open source and integrated into BSD these days, it would be an obvious place to start for inspiration when it comes to the low level side of multithreaded programming). Jumping from zero to 100% of all possible features isn’t necessary. Consider a roadmap. What things can be written in a self-contained way, that gradually build and work together to present the end goal functionality? That way, you have a series of testable, releasable changes which provide useful functional extensions to the OS at each step. Right now I think we’re in danger of being mired in the minutiae while missing the big picture.

Sep 2, 2013 11:51am Rick Murray (539) 13840 posts	Right now I think we’re in danger of being mired in the minutiae while missing thebig picture. I think something to consider, unfortunately, is “what does Linux do?”. While it would be great to think of the possibilities that could be available – this perhaps needs to be tempered with some consideration for porting stuff from afar – is it to our benefit if our future sexy multi processor implementation is as difficult to port multi threading code as it is now? That said, I like the simplicity and directness of the RISC OS API, I hope for this to continue…

Sep 2, 2013 12:26pm Colin (478) 2433 posts	“what does Linux do?”. Forget about linux its licence makes it a non starter.

Sep 2, 2013 1:57pm Rob Kendrick (86) 50 posts	Colin: Would you like to expand on that statement a bit, or admit you’re wrong?

Sep 2, 2013 2:09pm Colin (478) 2433 posts	If you can get code source used on linux with a licence compatible with castles then yes you can use linux code to go in the RISC OS rom but generally it’s a non starter. If you think I’m wrong then I’m wrong.

Sep 2, 2013 3:43pm nemo (145) 2546 posts	I’m most familiar with the POSIX threads model, so of course I would suggest that supporting that API directly would be highly advantageous. However, pthreads doesn’t make multithreading any easier to do correctly, so it may be better to adopt the kind of parallelisation model that Javascript and OpenCL use – and that’s quite different from the expectation that every thread/core sees a complete and somehow autonomous RISC OS. Add to that off-screen rendering (for legacy apps, probably by intercepting where the screen seems to be) and you may be some way towards delivering the grunt of multiple processors while maintaining the existing GUI and Wimp idioms.

Sep 2, 2013 3:45pm Simon Willcocks (1499) 513 posts	If you ran RISC OS code on top of the Linux kernel, it would just be running as processes (and, possibly, drivers) in Linux, and it’s fine to run closed-source code on Linux. The changes to RISC OS code to get it to run on Linux might possibly be against Castle’s licence, but I don’t see how the problem could be with the GPL.

Sep 2, 2013 5:19pm Rick Murray (539) 13840 posts	Forget about linux its licence makes it a non starter. ??? I think you completely grabbed the wrong end of the stick. I was emphatically not suggesting that Linux code be brought into RISC OS – I respect what the GPL has done, but I really hate the GPL itself (it is compatible only with itself and v3 is a joke – I wrote a long blog post on how the supposed freedoms are an illusion, so refer to that if you’re interested). What I was referring to was to consider “how Linux does it” so if we make a multithreaded/multicore arrangement, it might make it easier for people to port software written for Linux. Not port bits of Linux, not run RISC OS under Linux, but just simply porting stuff. Perhaps _nemo_’s suggestion is one to follow up? I don’t know enough about JavaScript (which I’ve mostly tried to avoid thanks to a history of slightly not-quite-the-same implementations) and I know zip about OpenCL. One could ask if it would be possible to support such a thing from within the Wimp too? I mean, the operating system totally lies to you already (no, we aren’t all running a &8000!), so while mucking around with where a task is in the memory map, it could also be assigned a core? Perhaps the main problem with RISC OS is the question of reentrancy…

Sep 2, 2013 10:33pm Andrew Hodgkinson (6) 465 posts	While having a Linux compatibility layer (pthreads etc.) on top of whatever solution gets implemented will be great for porting, it’s not a good idea in today’s world as your primary multithreading/multiprocessor model. It’s very low level and really, just not very good; extremely hard to use correctly, extremely easy to make mistakes. Beyond threads The problem here is being mired in the idea of threads. That’s like programming via front panel flip switches and punch cards when it’s 2013 and we’re meant to be talking about high level languages. You’re a programmer; you have work to do; you want to think in terms of those tasks, those units of work, not in terms of the underlying hardware cores you’re going to try and use to implement that. Let the operating system choose whether to put your work units on one thread, or many threads; on one CPU, on many CPUs; on homogenous or heterogenous CPUs; or even distributed across multiple machines. The programmer shouldn’t need to care, at least for the general use case. About concurrency and application design Grand Central Dispatch achieves this even at a low level with a ‘C’ API that makes it crazy easy in the simple examples to replace e.g. a “for” loop that does some heavy lifting into a “for” loop that does heavy lifting which the operating system assigns across available processing resources transparently. It’s not your job as a programmer to faff about worrying over spawning threads (and how many to spawn), what the thread API is, how you manage synchronisation and getting results back from them, what happens if a thread should get stuck/crash, what happens if your application is closed down during processing and so-on. It’s necessary to have this stuff available but really it must be possible to get parallel processing done without it. IMHO it would be insanity to design bottom-up a system that intentionally enforced all that kind of baggage on the programmer, rather than merely making the baggage available for those who wanted/needed it, but provided a far better, more robust, more managed approach for the lion’s share of programming. Simple example Concurrency isn’t something you should put much effort into. If you can iterate over an array linearly: for (size_t i = 0; i < array_length; i++) { do_complicated_things(array[i]); } …then it should be easy to make that operation run in parallel – remember, you’re looking at one of the lowest level APIs here: queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); dispatch_apply(array_length, queue, ^(size_t i) { do_complicated_things(array[i]); }); …the “`^`” syntax is a block extension to C99, providing something analogous to JavaScript closures or Ruby lambdas. The “`^`” bit marks the block – essentially, it’s the body of a function – then you get the function parameter (“`size_t i`”) and finally the function body inside the “`{}`” container. So our “`for`” loop’s body goes there, with the stuff hanging around the outside being boilerplate. No need for the “`size_t i`…`i++`” counter – we don’t know nor do we care how the operating system is going to count through the array elements in this simple case. Now of course this is a really simplistic example and for “real work” things start getting more complicated pretty quickly. But surely, you want the easiest possible API to start with, so that your brain can deal with the thorny unavoidable difficult problems, rather than getting overloaded just dealing with naff APIs and pointless housekeeping. Essential reading The following is a must-read document for anyone considering designing or implementing concurrency related APIs in the 21st century ;-) http://www.objc.io/issue-2/concurrency-apis-and-pitfalls.html You can find more useful examples at the URL below, but it has Objective C syntax all over it which some people might find confusing: http://jeffreysambells.com/2013/03/01/asynchronous-operations-in-ios-with-grand-central-dispatch Understanding Objective C method calls in 10 seconds “`[foo bar]`” calls method (AKA function) “`-bar`” on object “`foo`”. In C, it’d be a bit like “`foo->bar()`”. “`[foo bar:baz more:things]`” calls method “`-bar:more:`” on object “`foo`” passing parameters “`baz`” and “`things`” – that’s because Objective C has named parameters and their names all form part of the overall method name, hence “`-bar:more:`” complete with colons. C doesn’t have named parameters, so this’d boil down to something along the lines of “`foo->bar_more(baz, things)`”. [ Edit: Before anyone Who Knows complains :) the above is entirely illustrative in terms of understanding the syntax’s intent at a high level. In fact Objective C messaging is just that – messages, not function calls. In terms of what actually happens when the message is sent, it’s very like RISC OS SWI calls. You send a message (call a SWI) which the run-time dispatcher (RISC OS SWI dispatcher) sends to the target object (is sent to the intended module via SWI chunk). The lack of “null” checking in some of the example code is misleading; the run-time includes a very sensible condition that any message sent to “null” returns the result “null” immediately. It means you can write a whole chain of code that might perform complex allocations, any one of which fail, but only have to check for “null” right at the end as any intermediate “null” results will just fall through cleanly. ]

Sep 6, 2013 4:26pm nemo (145) 2546 posts	While having a Linux compatibility layer (pthreads etc.) on top of whatever solution gets implemented will be great for porting, it’s not a good idea in today’s world as your primary multithreading/multiprocessor model. Indeed, unless of course the prospective author is most familiar with that model. I’m not suggesting that we implement pthreads, ThreadX, GCD compatibility layers on top of the actual implementation in the vain hope of getting somebody, somewhere, to write something… but I think pthreads is a no-brainer for a number of reasons (as one API, but not the only one). The problem here is being mired in the idea of threads. Let us assume that we are discussing the subject from the point of view of implementation. Anyone who does not understand “threads”, an API and model like pthreads or has no experience of the practicalities of multicore work is unlikely to be able to contribute much to the implementation discussion. So “mired” is a little strong. The subject is necessarily complicated, which is why I think it should be wrapped up in a very simple model for the programmer. In reality, spawning hundreds of threads is absolutely not the way to achieve high performance, especially on OSes with high context switch granularity such as Windows. Hence any implementation is necessarily going to be a form of thread pool system whose dimensions reflect the number of cores available and their context switching capability, amongst other things. Which is what GCD uses, happily. So having an abstraction of “work to do in parallel” (and of course, serial) is very valuable, which is what the models I cited provide. One of the reasons I mention Web Workers is that not only do they also work by message passing (or “events”) they also have the important distinction between the main program (that enjoys the whole familiar environment) and the Worker (or “block” in GCD parlance) that does not. I think that distinction is crucial from a pragmatic viewpoint for RISC OS. Andrew’s example of automagically splitting a loop into multiple loops on multiple processors doesn’t generalise to the higher level actions (the words “processes” and “tasks” are both misleading here) that make up large applications, and in particular I think we may be more concerned about whether we can still move windows about while a spreadsheet recalculates than only about how quickly that spreadsheet recalculates. I don’t think the C# syntax is helpful to a RISC OS audience, and obscures the actual requirements and restrictions that would apply to our case. We might be better starting with Event or Vector Handlers and then expanding those more familiar concepts to the multithreaded world. I make no apology for using “thread” here rather than “core” – the programmer should not need to care how many cores there are, physical or virtual, nor what their hyperthreading capabilities are – the API should work on an ARM2.

Sep 6, 2013 4:55pm Rick Murray (539) 13840 posts	the API should work on an ARM2. I’m going to (mostly) duck out of the discussion as while I can visualise ways this can (should?) work; I don’t have experience in programming such things. However, the above quote is crucially important. A working API should isolate the programmer from hardware specifics. The programmer should say they want to set up a new thread and the OS deals with how this maps to cores and such.

Sep 7, 2013 12:44am Andrew Hodgkinson (6) 465 posts	The programmer should say they want to set up a new thread and the OS deals with how this maps to cores and such. IMHO the programmer should say they want “`this task`” being done asynchronously and that “`this`” should be called back with results / errors. As for threads – who knows. That’s up to the OS. Generally, programs must not be creating threads themselves, or they most definitely do end up needing to know the hardware specifics (if I’m processing a large numerical set in a parallel way, how many threads should I use? Ah, well, that depends on the number of CPU cores… Etc. etc. – that way lies madness). nemo: C#? Spit `:-P` – as an aside – Objective C is not the same beast (it’s a pure superset of C99 and is compiled to native code, not CIL); but the code samples were illustrative, not literal. Block syntax specifically, though, is a C language extension which could be extremely relevant and useful for RISC OS and is independent of C derivatives like Objective C. http://en.wikipedia.org/wiki/Blocks…

Sep 7, 2013 4:18pm nemo (145) 2546 posts	Generally, programs must not be creating threads themselves, or they most definitely do end up needing to know the hardware specifics There’s an extremely large body of work that disagrees with you there. C#? As you were. Can’t read. I blame a large quantity of opiates (long story). What I meant was that illustrating a language-specific way of making use of the underlying functionality does not necessarily make the underlying functionality clearer, and may in fact obscure it. For example: Is the Worker/Block in the task slot? Does that imply you expect all workers/threads to have completed/joined before the owner calls Wimp_Poll? If so, how is that policed? Are you suggesting a functionality that does NOT allow a Task’s Workers to continue while it is paged out (single core)? When Workers are active on a multicore, is the whole application space paged in on multiple cores, or only pages containing Workers? I understand the GCD model, but I’ve no idea how it is implemented on a BSD/Linux-based OS, nor therefore whether that translates meaningfully to RISC OS with its paged memory model. :-/

Sep 7, 2013 11:41pm Rick Murray (539) 13840 posts	IIMHO the programmer should say they want “this task” being done asynchronously and that “this” should be called back with results / errors. As for threads – who knows. That’s up to the OS. Generally, programs must not be creating threads themselves, or they most definitely do end up needing to know the hardware specifics (if I’m processing a large numerical set in a parallel way, how many threads should I use? Ah, well, that depends on the number of CPU cores… Etc. etc. – that way lies madness). Forgive my obvious stupidity here, but… A programmer should want a “task” completed in an asynchronous manner. How is this different from creating a thread to perform some sort of task asynchronously? What if the “task” does not actually return within a meaningful time frame? I’m thinking of splitting video playback into a thread for the video and another thread for the audio, arrangements like that, where chunks of code operate together but independently. Why does the programmer need to know the specifics of the hardware to know how many threads to allocate? While there can be enhancements for certain tasks (video playback again) split across cores, isn’t the primary decision of how to split up the workload defined more by the task itself than the number of cores? The task should be at liberty to say “I need threads to do this, this, and this” and the OS think “okay, task plus three threads”, and it can be the OSs responsibility to sort out the allocations across cores, or all on the same CPU. To put this in context, we don’t need to know how the Wimp multitasks and handles memory in order to write multitasking programs. Neither do we need to know about SINs and maps in order to read data off the disc. We just tell the OS what we want and let it provide. That is what the API is for. Why should this be any more complicated? Because the minute a program needs to know specifics about the hardware in order to handle threads effectively, you would be condemning any program using that functionality to require a whole pile of special-case code to handle all the various permutations of processors – plus a dev willing to keep it all up to date for new hardware as it arrives. This may make sense for some highly optimised applications, but these should be exceptions and not the rule.

Sep 9, 2013 2:56am Andrew Hodgkinson (6) 465 posts	To answer that, please read the “About concurrency and application design” document linked to in my post of September 2nd.

Sep 9, 2013 2:57am Andrew Hodgkinson (6) 465 posts	(…further reading, the “concurrency APIs and pitfalls” link in the same post goes into more details about why threads are usually a bad model for getting work done).

Sep 9, 2013 12:01pm Rick Murray (539) 13840 posts	I suspect we might be getting hung up on the technical description of what one means by “thread”. On the face of it, GCD seems like a good idea, but I wonder (given that code blocks trickle down to thread code) if there are latency problems? The example of twenty threads creating more threads unaware of the others sounds like a straw man to me – anybody who codes so their threads spawn loads of threads which….eventually bring the machine to it’s knees, they have no business calling themselves “programmer”. My phone’s text editor handling sucks, I’ll continue this evening on the iPad…

Sep 9, 2013 10:22pm Andrew Hodgkinson (6) 465 posts	I suspect we might be getting hung up on the technical description of what one means by “thread”. There is surely nothing upon which to get hung :-) – the definition is very clear. Again, please read the earlier references. They explain things much more clearly than I can. On the face of it, GCD seems like a good idea, but I wonder (given that code blocks trickle down to thread code) Blocks might not. You don’t know nor need to know. You simply pass a unit of work to the OS and let it decide how to best schedule it given the available hardware resources and the software load from other processes at any particular instant. if there are latency problems? That’s a separate issue and if low latency is required then real-time thread priorities have to be built in (and, indeed, are). Unlike the current WIMP where you can just block everything by not calling Wimp_Poll, in a multithreaded system with pre-emptive multitasking one of the problems that always exists is that it’s impossible to request 100% of the CPU and block everything else, by design. That in turn means two processes which both require such a lock running concurrently will both fail to get the real-time performance they require. That’s an insoluble problem at the root; the user simply tried to do more concurrently than their hardware and software was capable of. The example of twenty threads creating more threads unaware of the others sounds like a straw man to me – anybody who codes so their threads spawn loads of threads which….eventually bring the machine to it’s knees, they have no business calling themselves “programmer”. (Again, please see the above links. All I can do is repeat or rephrase what they say which kind of wastes everyone’s time). There’s no non-trivial way for a non-kernel piece of software to know what the “bringing machine to knees” threshold is. It is a combination of both hardware resource availability, hardware and software context switch efficiency, latencies and efficiencies in communicating data between computing resources (e.g. consider the barrier to sending and receiving data to a graphics card when trying to use the GPU via OpenCL for some number crunching), the overhead of different thread types (as the OS will usually offer more than one) and the instantaneous varying-by-the-microsecond load presented by software and external device interrupts. The only thing that stands a chance of getting close to figuring that out correctly and continuously reevaluating the situation – and it’s a extremely hard problem in and of itself – is the kernel. That’s why applications shouldn’t be trying to use threads, since that presents them with the problem of figuring out how many they should have and what the priorities should be. Threads are a crude low level resource. You should no more try to work in the domain of threads than you would try to low-level allocate specific pages of RAM – you let the kernel deal with the pages, instead just telling it what amount of memory you require and give hints about what it is to be used for (at least in modern unified memory architecture systems where the file system, VM subsystem and the allocator are one and the same). That’s not the best ever analogy but the point is you don’t say “I want 4 threads” because you (think you have) 4 CPU cores and run code manually on them. That’s crazy crude (you’re not the only thing running on the system, so what relevance has 4 CPU cores to 4 threads?). You say “I have this chunk of code that I want running in a parallel fashion”, with hints perhaps about your preferred level or maximum level of concurrency and an indication of the kind of job priority you require (usually “normal”, occasionally “real time”, but only very rarely “background” because of priority inversion risk). Worked example: foobar2000. FB2K is a nice audio player for Windows. It includes a good quality built in format converter. You can select a bunch of files in the playlist (potentially thousands of them – the most I’ve ever done in one go so far was over 19,000 files) and it’ll convert them from one format to another (in my most recent case, FLAC to low bitrate HE-AAC). It’s very good at preserving metadata while it does so. Now, this is clearly a CPU intensive task; CODECs are heavy. So the author took this approach: Count the number of CPU cores. Spawn one thread per core. Process the queue of files on this thread pool. Each time a thread finishes processing a file, give it another one. The application needs to count CPU cores, but a “CPU core” is a complex thing, especially with hyperthreading. The OS may report 8 hardware “threads” when in fact there are 4 fast ones and 4 “spare CPU cycles” threads. FB2K doesn’t get to see that, so it may make incorrect choices. The reason why it gets away with this is that the operating system is actually responsible for scheduling the run time of threads and there’s no mapping of “one thread to one CPU core” whatsoever, so the application’s fundamental assumptions are flawed; but if you’re stuck with a basic thread model, you have nothing else to go on. The application needs to manage the lifecycle of these threads and handle return data from them. It needs to watch out for deadlocks etc., manage memory and make sure the central list of files is handed out across the threads as they work in a cross-thread safe manner. That’s fragile and involves lots of boilerplate code. Wouldn’t it be nice if the application author didn’t have to waste time on all this housekeeping stuff? If only there was something in the system that was designed to take care of process-realated housekeeping… Some sort of centralised process management authority… ;-) If the work was being done purely in the CPU, the above might all be just about acceptable. But FB2K is writing files. On an 8 core machine you’ll have 8 software threads hammering the I/O subsystem. An SSD might be OK with that, but on a hard disc it means you’re writing small fragments of different files in multiple threads. The files may be very large (e.g. suppose I was converting audio files to uncompressed 24-bit 192KHz WAV) so though we might have a very clever OS disc subsystem that tries to coalesce and cache writes to minimise disc seeking and thrashing, it’s just going to be forced to write chunks of files sooner or later – there won’t be enough RAM to hold them all, most likely. The result is a great deal of disc seeking and massively fragmented file output. Thus in general the while thing runs slowly as it gets disc-bound, not CPU bound. But that does depend on the size of data being output, the number of things concurrently writing, the OS disc writing strategy, instantaneous available RAM, SSD vs HDD etc. etc… Contrast with this approach. The application asks the OS for a concurrent processing queue. The application adds converter tasks for every single file in its list to the queue. The application is notified as each task completes (or fails) and updates its UI. The OS does everything else. If the OS is dumb then the worst that happens is it does as badly as FB2K does when dealing with raw threads; but at least FB2K’s code would be far simpler to write and much easier to read. And in the best case, the OS has a far better idea of what can be done to process the concurrent queue and is far more likely to do so in a manner appropriate for the operating environment at any given time. Meanwhile the user is unlikely to sit and stare at a progress bar so they get on with other things – browsing the web, YouTube videos, probably playing music in the background, whatever. And since all of that involves units of processing managed by the kernel, we’ve a good chance that use of all the machine’s resources will be appropriately balanced from moment to moment.

Sep 10, 2013 6:06am Trevor Johnson (329) 1645 posts	Unlike the current WIMP where you can just block everything by not calling Wimp_Poll, in a multithreaded system with pre-emptive multitasking one of the problems that always exists is that it’s impossible to request 100% of the CPU and block everything else, by design. Isn’t it possible to design in¹ retention of a legacy operation mode of cooperative multitasking (either current single core, and/or an interim implementation utilising multi-cores in other ways, should one be developed)? Engaging such an operation mode would only be done with explicit user confirmation (Style Guide should recommend standard ways of asking/setting config options, etc.). And the option to revert to pre-emptive operation² should also be offered (again, per Style Guide recommendations). Then anything which really needs/benefits from 100% CPU time³, e.g. realtime applications, overnight file format conversion, or whatever, would be able to demand it. ¹ Even if only theoretically, i.e. potentially too much of an added complication to realistically expect any bounty to be sufficient to cover such work. ² Dependent on the nature of the single-tasking operation, the wait for a convenient point to switch over might possibly be such that the task is complete before it continues under reverted pre-emptive mode – in any case, perhaps an obvious (but not too processor intensive) visual cue could be employed, e.g. special hourglass only displayed when waiting to switch between modes. ³ Or as close as practicably possible.