Thinking ahead: Supporting multicore CPUs
Pages: 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Martin Avison (27) 1494 posts |
To add to what Chris said, it is also true that each additional processor consumes some processor just to schedule tasks on a processor, and keep track of them. There are also instances where a processor has to wait until another processor has finished something, which also adds to the ‘unused’ time. When in 1976 IBM first introduced ‘Attached’ then ‘Multi’ processor IBM/370 mainframe machines the overheads were quite large, but were slowly reduced over the years as the OS and hardware were improved. Any use of multiple cores in RISC OS I would expect to follow a similar pattern – probably small (but useful) gains to start with. |
David Feugey (2125) 2709 posts |
One solution is to use only one core, and to provide a ‘light threads’ library to use the power of other cores from the main OS. So no need for a SMP kernel or a ‘giant lock’ free system, but just a kind of multicore taskwindows. IMHO, taskwindows could/should be the bridge to SMP and PMT. |
Rick Murray (539) 13840 posts |
The problem is…when one of your cores wants to read a big wodge of data from disc. Remember, if you move away from plain BASIC to more advanced code in C or assembler, you run the risk of crashing into all sorts of gotchas. Example? Let’s say you want to periodically read some sensor and log the results. Well a simple CallEvery will get the OS to prod you for reading the sensor. But you can’t write the data anywhere. Other programs, and maybe bits of the OS, will be “threaded”, and if one of those bits is a filesystem operation, touching the filesystem yourself will result in a “FileCore in use” error. If this is the fun of doing periodic events with disc activity on a single core, try to imagine the joy that multi core activity would bring! ;-) 1 This issue identified in the creation of my server. It listens to a port twice a second (CallEvery) and when something happens it takes the expected action; however the action code must run as a CallBack or else things go boobies-in-the-air big time as we crash into the very valid risk that every non reentrant call will fail – and one that is absolutely guaranteed to fail is any filesystem access. So many FileCore in use errors that I almost thought I was in a time warp and running RISC OS 2! |
Rick Murray (539) 13840 posts |
TaskWindows could “leverage” (!) the ability to offload code onto other cores, but I don’t think they should be the primary interface. Perhaps as the very minimum you might not even need much support on the slave cores, you could trap all SWIs and kick them over to the primary core. This implies that other cores won’t necessarily see much of a speed up; but it’s a place to start. More autonomous support for each core can be added in time, little by little, and likely in tandem with work on the main OS to stop it freaking out…which task where? EEK! |
David Feugey (2125) 2709 posts |
Not directly. But a light threads API could use TaskWindows, as they are already PMT compliant (and could become SMP compliant too, without the need to rewrite all the OS). Of course, that’s just a first step.
Yep, it’s closer to what we had. Another argument is that we would could keep the same tools for an heterogeneous multicore system. For example to get access to the 2 Cortex-M of the OMAP5, or to make some clustering through a network. SMP is not so helpful. It seems to be cool, but once you make multicore apps, you’ll run into problems when trying to extend you application for an heterogeneous multicore architecture. A SMP/PMT core limited to what a TaskWindow can currently do would be enough for most code. TaskWindows could even rely on it. So the idea is to say: 1/ a multicore compliant version of TaskWindow (good to make tests and launch some code), and 2/ a multicore engine separated from the shell. When a second core is available, a monitor runs on it, and the whole is referenced as a resource for the multicore engine. You can also attach other resources, for example a slave system available on the network. - A bit like the sound management :) You attach resources, and can use them, not to play sound, but to play code. - A bit like the FS too. You write interfaces for a generic compute resources provider. You could for example provide a tool to make a cluster with SSH only computers (OK: not really efficient). You could use this from a TaskWindow (to launch tasks on other cores with a specific exec command…), from the Basic ASM (with a ‘launch on resource x’ command) or from your other code (with a light threads library, or as a task, through the TaskWindow module). All non interactive part of the code could use it. Image renderers, multimedia, gaming, emulation, compression, etc. Specific DA and SWIs could be used to exchange data and commands. Endless possibilities. |
David Feugey (2125) 2709 posts |
I used something similar a long time ago. With a small Basic library, I was able to launch tasks inside a TaskWindow, or on the PC Card. I did use it to make accelerators for tools like Gzip. It was possible to launch Gzip on the ARM, with PMT support (TaskWindow), or on the second core, the 486. Exactly the same thing. Of course it was only tricks, not an optimised module, but it did work and with impressive gains (even if I copy data and launch the tasks with a simple mix of CLI and DOS commands). Some modern SWIs would be more efficient, but I did like this easy way to test my idea. |
Rick Murray (539) 13840 posts |
Off topic – David, do you still have the code to start up the x86? I’m wondering how that mechanism worked (and don’t fancy trying to make sense of !PC). |
David Feugey (2125) 2709 posts |
I did use a CLI tool that was able to launch software on the PC Card. I can’t remember it’s name (not mine). Very Basic… just to validate the idea. For Gzip, I checked if the PC Card was present, and (no) launch Gzip or (yes) copy data, launch Gzip on the PC card and get back data. Gain was important with Gzip, despite of the transfer of data to and from the PC partition. It would be much better to use the DDK: http://www.riscos.info/index.php/PC |
Rick Murray (539) 13840 posts |
Therein lies the problem. My use of TaskWindow (and I am a geek) is for compiling stuff, reading DADebug logs, and issuing random *commands as it is 2015 so we ought to have moved beyond ShellCLI by now… I really don’t think we will see benefits from other cores until actual applications can run on them. That said, RISC OS has a rather sanitised view of application handling (everything is “the only app and it starts at address &8000”) that might lend itself to running on other cores – nothing is supposed to make too many assumptions about the state of the system it is running on. I am not sure I’m hopeful though. We still haven’t figured out the mess that is coherent UTF-8 support with non-UTF-8 apps; so I can’t imagine how one could sensibly invoke the Wimp_Message mechanism with multiple applications that can take arbitrary amounts of time to execute. Maybe this is the time to consider my suggestion for pre-emption-lite (a pair of SWI calls that can inform the Wimp that the following code does not poll but takes time, feel free to pre-empt it). I’ve already discussed the idea. Anyway, on a single core system, the Wimp could assign such code a time-slice and yank control away when the time is up. On a multi core system, the Wimp could kick such an app over to another core to run in peace (and pre-empt between them if multiple such apps exist), while the primary core runs the primary apps as normal. It would be nice to have the Wimp capable of dispersing apps across all ‘n’ cores, but in order for this to work, we’d really need to take a long hard look at how protocols such as User_MessageRecorded work, not to mention behaviour of other cores for tasks that would ordinarily block the system. It’s merely a pain in the ass when ChangeFSI renders a JPEG in hundreds of times longer than it takes SwiftJPEG; however it is much more critical when the Printer Manager is in use (due to how the printing system hooks its tentacles everywhere). Soooo many questions.
Ah, you are probably thinking of ARMEdit to pass data/commands to a DOS session. A bit higher level than I was hoping for. |
David Feugey (2125) 2709 posts |
I was not talking strictly of the TaskWindow itself, but of the module that permits to launch tasks from your code in PMT mode. Very useful for front-ends, calculation offloading, etc. From a developer point of view there are many ways to use it to separate non interactive code from the Wimp one, then to launch it in PMT mode, or even on another Core. That would be a start (and a good change for tools as ChangeFSI).
Of course. But do we need lots of multicore apps to be happy? ArchiEmu and Mplayer would be almost enough for me :)
I can’t either. That’s why I said that multicore code should not be more complex and ‘interactive’ than the code that currently runs under a TaskWindow. Of course no Wimp call. I see more multicore as a way to offload tasks from an app that works on the main core. With the TaskWindow module, or with a a light threads library, or even from BBC Basic ASM with a specific directive.
That’s what a task running in a TaskWindow is :) (does not poll, but takes time). So I agree. We need something to manage more globally PMT tasks. PMT tasks, by design (and because of the limits of RISC OS) will be CMT compliant… and closed to light threads as seen on other systems (the task will get access to limited API/SWI).
Yes and no. Since these tasks will not be Wimp tasks, perhaps it’s better to do this at a lower level. That’s why I suggested too to move the TaskWindow module from Wimp to CLI (not an easy task).
That was a (working) demo. The worst case. Anyway I think that both ways should be provided. Code level, Exe level. I note that some people managed to use second core for some tasks (GPIO?). RISC OS FR have still a private bounty reserved to this (with around 500 E depending on the availability of the code [ARMX6, Titanium, Pi2, etc.]). There is another one for Brandy (complete refresh, Windows, DOS, ROS, Linux + collect all patches + add few things). This one will be financed by a private company. I should make some announcement around this. Anyway, anybody can contact me trough RISC OS FR mail. |
Colin (478) 2433 posts |
Taskwindows don’t do PMT they do co-operative multi tasking by sleeping when a swi is called. You can test this very easily with the c program
If you run the program in a taskwindow it will multitask but if you comment out the swi it will lock the machine up – as it will if the swi doesn’t return. |
Colin (478) 2433 posts |
How about… At the moment all applications except the application that calls the wimp message are stalled ie they have called Wimp_Poll and it hasn’t returned. So the wimp just has to iterate through the list of stalled applications posting the messages. With a PMT multithreaded system the other applications may not be stalled but when the wimp iterates through the applications to post the message it stalls until the application calls wimp_poll then posts the message. |
David Feugey (2125) 2709 posts |
True. When calling SWIs, all PMT is gone. |
Rick Murray (539) 13840 posts |
Wimp_MessageRecorded? DataSave? |
Colin (478) 2433 posts |
Must go down as your shortest post Rick :-) I presume it was directed at my post. Where’s the problem with Wimp_MessageRecorded and DataSave. The wimp stalling until an application has called wimp_poll makes any message work as it does now doesn’t it? |
Rick Murray (539) 13840 posts |
Cough. I’m at work. Cough. ;-) Thing is, both of my examples refer to a task that is expecting a reply to its message, so the calling task will also need to stall until the recipient has been polled. |
Colin (478) 2433 posts |
Yes but the calling task is stalled anyway in a state that the wimp can pass it messages when it calls wimp_poll to get the reply. Wimp_poll always stalls an application – a wimp event on that application makes the application continue. |
David J. Ruck (33) 1635 posts |
Going back five and a half years to the original post, I have to say realistically option 6 is most likely and option 2 is the only other one that is practical. Although we’d all love full symmetric multi-processing like grown up OS’s, even if the work was done, there’s only a handful of applications still in active development which could be modified to take advantage of it, everything else would either at best gain no advantage or at worse break and never be fixed. Keep the co-op and use the additional cores for specific offload processing. But if we going to try to make real SMP use of multi-core ARM processors I’d go for something between 3 and 4, and to be trendy call it RISC OS containers. As RISC OS is essentially a single client OS, with a co-operative Window manager bolted on top, without a complete re-write to make it fully thread safe i.e. it can be used by multiple clients (tasks in different threads or on different processors) simultaneously, the easiest way is run multiple copies of RISC OS. Each application would be invoked in a separate thread running a copy of RISC OS. An underlying pre-emptive kernel would switch between the threads (and allocate them to multiple cores). A layer of glue logic would both virtualise the hardware so each copy of RISC OS thinks it has exclusive access, but also provide a legacy message passing system so applications could communicate with each other despite running on different OS instances. This would allow for applications to run in parallel when handling input events, performing screen redraws and null processing. At message passing events, the current blocking semantics would be enforced causing threads to stall until the message has handled. The evolution of this method would be to gradual move functionality from the separate RISC OS instances in to the underlying fully thread safe layer. Eventually coming up with a new Wimp which would support a different pre-emptive compatible message passing protocol, which would open up the way for a new class of application while still maintaining compatibility for legacy applications running in their own OS instances. But pigs are more likely to fly. |
David Feugey (2125) 2709 posts |
Yep
That’s not SMP, but AMP. One session of RISC OS on each core (of course, only one will make the I/O). |
William Harden (2174) 244 posts |
Right – not in a position to test this (mainly because my monitor is VGA and I’ve used a Pi1 with X100 connected to it – I have a Pi3 ready to go but need HDMI availability to play with it properly). However, a read this evening would suggest that setting up the Pi for multicore is more straightforward than the Panda. The cores are all enabled on boot, and basically sat waiting for an address for code to run from, which is supplied in the mailbox (the Panda’s from what I read needed turning on). Looks like the mailbox is stored at a physical address, (0×4000008C + 0×10 * core_number). So is it possible to load some code into logical address space, get its physical address, then push the physical address into the mailbox address above? (Clearly we’re not talking ‘et voila, useful multicore’ – just a question of whether it’s possible to demonstrate more than one core being in actual use at once as a very first baby step). |
Jeffrey Lee (213) 6048 posts |
In theory yes. There are at least a couple of threads which discuss this (1, 2, 3) but no success stories yet.
I thought the Panda was pretty straightforward – AIUI the boot ROM will put the second core into a similar sleep loop and all you need to do to wake it up is write to AUX_CORE_BOOT_1 (boot address), AUX_CORE_BOOT_0 (“jump to boot address on next event” flag), then send an event via SEV. Check the TRM! Edit: “the Panda’s from what I read needed turning on” – Ah yes, it’s possible there’s some power management setup which is needed. Ignore me :-) |
Jeffrey Lee (213) 6048 posts |
I’m starting to think that that’s an easily achievable goal, with useful short-term gains, and it can act as the foundation for a lot of future work. Make the memory map for the slave cores contain just the ROM, the appslot for the executing task, and a small amount of kernel workspace. (Having the ROM available isn’t strictly necessary, but it will allow apps which use BASIC or CLib to function without being pushed onto the main core all the time) ROM is readable from user mode, appslot is read/write, everything else is only accessible from privileged modes. Tasks are restricted from entering privileged modes. If the task running on the slave core calls a SWI or triggers an abort (e.g. trying to access a dynamic area which isn’t mapped in, like the RMA) then suspend execution at that point and push the task into a queue of tasks which are waiting for the main core. Then at a later point the taskwindow module on the main core will switch to that task (as normal) and restart the aborting instruction. The task will run on the primary core for that timeslice, after which it will be put back into the pool of tasks which are available for the slave cores to run. With tomorrow’s Pi ROM I think it would be possible to write it as a drop-in replacement for the current taskwindow module – no changes to the kernel or the rest of the OS required. Once the basic system is up and running we could start extending it with more capabilities, e.g.:
|
David Feugey (2125) 2709 posts |
Good plan. Perhaps even better than a full SMP system. We have so many problems with locks and other things, that it’s almost always better to say that core 0 should be the only responsible of GUI and other core functions. If we can use Basic, C, ASM code on other cores, that’s good. If we can have light threads, that’s good too. And if you have a solution to go back and forth to core 0 for system calls… hey, that’s perfect. The only problem are TaskWindows. It would be perfect if they could work without the Wimp, directly from a CLI only system. |
David Feugey (2125) 2709 posts |
Nota: that could be a bounty. |
John Williams (567) 768 posts |
[TaskWindows could “leverage” (!) the ability to offload code onto other cores] I hesitate to ask this, not really having much (any?) understanding of the matter, but does the above imply, to reach my wish to have a RISC OS computer which could access a Linux browser, perhaps in a desktop window, that RISC OS could run on the primary core, and a (simplified?) version of Linux could run, say Firefox, on another, presenting it to the RISCOS desktop in that window? If that were possible, I suspect it would tick all my boxes! Is it just a totally misconceived idea that couldn’t possibly work? |
Pages: 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26