Thinking ahead: Supporting multicore CPUs
Pages: 1 ... 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
David Feugey (2125) 2709 posts |
It would need a RISC OS hypervisor (possible on Pi2, Pi3 and other modern ARM computers). The idea here is to be able to launch ASM, C and Basic code on other cores, then to provide a bridge for SWI calls. Two very important steps. I would probably use it for precision free maths as it’s CPU intensive, but I/O friendly. |
Jeffrey Lee (213) 6048 posts |
I think that would require us to move some of the task management code out of the Wimp (e.g. put it into the kernel). It’s definitely something we should be looking at doing, but it’s not something that needs to be done for the initial proof-of-concept.
Yeah, once the kernel supports shareable memory it should be possible to do that kind of thing. |
David Feugey (2125) 2709 posts |
Absolutely true. I could add that since there are other problems with Wimp-less systems (cursor management), perhaps that we only need a full screen shell app for Wimp. A boot to F12 mode :) |
William Harden (2174) 244 posts |
Jeffrey: yes that plan looks pretty much as good as we are going to get. It ticks the boxes of having something deliverable off the ground quite quickly, whilst allowing progressive deliverables over a suitable timeframe. The proposals make perfect sense from my perspective. |
Jeffrey Lee (213) 6048 posts |
The big question is, is anyone going to try it? ;-) I’ve got a few too many things which are in the half-finished state to be able to start work on any big new features. Sure, it should be possible to get the proof-of-concept up and running fairly quickly, but from there it’s a bit of a slippery slope that could easily lead to a hundred more things appearing on my todo list. |
Rick Murray (539) 13850 posts |
I think the biggest benefit to us all is if the Wimp is capable of distributing tasks among the available cores; however the real danger here is in trying to imagine how much is liable to break if multiple applications are running at the same time. I’ll give a few examples:
|
William Harden (2174) 244 posts |
Jeffrey: I suspect it’s more a shortage of expertise than a shortage of volunteers!!! The plan makes logical sense to me in terms of what is required, and the concept is fascinating, but needs expertise in handling the memory map and low-level/kernel-level OS stuff. I wouldn’t have a clue how to set the memory mapping up for the cores. (also notes ScrnSetup needs sorting at some point). Incidentally are the DDE Tools Pi3-friendly? Think mine are a release or two out of date… |
Anthony Vaughan Bartram (2454) 458 posts |
Once I’ve got my current game released I am going to try and have a go at waking up a core. But really mapping out the code structure to provide a refactoring path – as previously discussed – is required. I started looking at this last year and came down with flu before Christmas… Nearly got this new computer game out for Beta. My first foray into 3D vector graphics… But that is for another thread. |
Alan Robertson (52) 420 posts |
I am officially excited. |
Jeffrey Lee (213) 6048 posts |
On that subject then, here are a couple of things which have been collecting in my brain recently. The first thing is that there’s one key area that’s been behind a lot of stuff recently – memory management:
So rather than treat those as individual tasks to be implemented as-and-when needed, it would be nice to group at least some of them together and try and get them implemented all in one go (say, the first four points, as they all relate to page tables and cache maintenance) The second thing is that splitting up the kernel is likely to make any future work (GraphicsV improvements, memory management improvements, multi-core support, 64bit support, etc.) easier to deal with. |
Jeffrey Lee (213) 6048 posts |
Over the weekend I wrote a quick test app which would start another core with the MMU enabled (using the same page tables as the OS). Theoretically this makes it a lot easier to run code, but in reality there are still a few things that need sorting out before the test app can be used as a foundation for “proper” multi-core development:
At some point we’ll also need to work out how to deal with interrupts in general. This is quite an important one – if the ultimate aim is to allow device drivers to run on the extra cores then we’ll need to have interrupt management APIs which can cope with that. Issues to resolve include:
|
Rick Murray (539) 13850 posts |
I’m wondering if there is anything that could be picked up from how the x86 Co processor in the RiscPC worked? IIRC it used mailboxes to pass messages to/from the ARM host. The interrupt one will be more tricky but not insurmountable. Depends on the specifics of how they are grouped – but it seems that BCMxxxx technical data isn’t so easy to come by. :-( |
David Feugey (2125) 2709 posts |
At some point. Without interrupts, other cores would still be fantastic to offload user mode code. |
Jeffrey Lee (213) 6048 posts |
BCM2836 ARM-local peripherals is the key doc that explains how the 2836 (and 2837) differ from the 2835. Take a look at the diagram at the start of section 3.2 – the original interrupt controller that generated the ARM11 IRQ & FIQ signals is still there, but instead of being directly connected to the CPU it’s connected to a secondary controller which allows control over which core each of the two signals are routed to (as well as mixing in the core-local interrupts) |
Jon Abbott (1421) 2651 posts |
I had to implement something along the lines of IRQ direction in ADFFS on StrongARM. Essentially, I trapped all means of claiming an IRQ (ie modifying the hardware vectors or claiming though SWI’s) and then track which IRQ’s an app wants. The IRQ vector then points to my redirection code, which first looks to see if an app wants to know about an IRQ and directs it there first, prior to handing off to the OS. There’s a few possible routes I can think of: 1. Pass the IRQ across all cores first, then clear it on core 0 (we’ll assume the OS kernel is on core 0 only) if it’s not handled FIQ’s probably want to remain on core 0, which might pose a problem due to the way it can be claimed in RISCOS. Perhaps FIQ’s should be Module based only and have a flag to indicate the Module might claim FIQ so it can be steered to core 0? Will there be a microkernel on one core and RISCOS then spread across all cores so for example, one core using FileCore doesn’t stall SWI’s on all cores? I’m assuming there’s an SWI handler on all cores, the HAL/kernel on one core and virtualised devices so hardware can be shared across the cores? Wouldn’t we want RISCOS to be more Module based than it currently is? |
Jeffrey Lee (213) 6048 posts |
We’ll definitely be having an IRQ handler for each core – at the least, the additional cores will need something there to respond to the doorbell interrupts that the kernel would be using to communicate high-priority things like the cache maintenance operations. And if there’s a thread scheduler we’d probably want a timer interrupt per core (although sharing the 100Hz timer would be possible, if the primary core was to send a message via the message queue)
Yeah, if we want to allow modules to run on other cores then we’ll definitely need a flag in the header to indicate that it’s MP-safe. Device drivers – whether IRQ or FIQ based – are both subject to the problem that code which wants to do something quick and atomic will typically disable IRQs/FIQs in order to stop the device interrupt from occurring in the middle. But for multicore that will obviously only work if the code is running on the core which has the IRQ/FIQ handler installed on it, so we’ll either need a way of guaranteeing that, or the code will need updating to some other method (load/store exclusive, spinlocks, etc.)
At the moment what I’m aiming for is:
That should give a reasonable foundation for the “proper” multi-core development to begin – making sure that cache/TLB/page table operations are MP-safe, developing proper HAL APIs and adding support for the other multi-core devices, adding a thread scheduler and synchronisation primitives, making bits of the kernel MP-safe, etc. |
Rick Murray (539) 13850 posts |
A much simpler solution to all of this is to have a flag in the module header saying that it is safe to run on multiple cores.
For the moment, would it not be more logical to have RISC OS (more or less as it stands) on the primary core, then minimal microkernels on the others? This should provide the most benefit for the least amount of work.
I suppose it depends on how much each core’s kernel is capable of doing for itself before stalling becomes an issue. For the beginning, I can envisage the core kernel kicking pretty much everything to the primary RISC OS; then little by little the cores will be able to do more stuff with less assistance. I believe the main issue with FileCore is that it couldn’t be safely re-entered. So moving it to a different core would present the same sort of situation (it won’t magically run faster as it’s doing its stuff at nearly the full system speed as it is) with the added complication of having to marshal conflicting demands from potentially three other cores. Ideally, FileCore needs to be rewritten to work in a way that isn’t something from the eighties; but if it were simple, it’d have been done by now.
Let’s assume we have a module, XYZZY, and it is starting up on core #3. No reason, it flagged it can work with multiple cores so RISC OS put it there.
This is probably a habit we ought to be trying to get away from. I can’t believe that Linux, for instance, buggers around with interrupt disabling all the time. There must be another method…we ought to be using… ;-) |
Jon Abbott (1421) 2651 posts |
Can’t RISC OS and Modules be in Shared memory and Applications in core specific appspace? I suppose it depends on what the other cores are being used for and what Modules are allowed to do going forward. For example, should Modules be allowed to touch appspace directly? There are some fairly major fundamental issues such as this that prohibit some current software working in a multi-core environment. I suppose if you start on the premis that apps/Modules need to be marked as multi-core then you’re starting with a clean slate and can redefine the rule set. What are we envisaging running on each core? Apps or threads? Wimp based apps certainly lend themselves to multi-core as the communication protocol is already there to pass data around. Mind you, apps that have Module dependencies could be a potential issue. Areas of the OS such as FileCore that aren’t reentrant could be botched into a multi-core environment by queuing calls, until there’s the resource to modify/rewrite them. |
Chris Hall (132) 3558 posts |
We’ll definitely be having an IRQ handler for each core – at the least, the additional cores will need something there to respond to the doorbell interrupts that the kernel would be using to communicate high-priority things like the cache maintenance operations. If we can get to a point where RISC OS can use a programme running on a second core, would it be possible to have a WiFi network stack working entirely on a second core with everything else running on the main core? Not sure how the OS asks for access to stuff on the network. Developing the idea a little further, is it possible for Linux to be running on one core, doing WiFi but nothing else and for RISC OS to be running on the main core and just sending messages (or whatever) when it needs network stuff. The memory map could be split between the two – top half RISC OS, bottom half Linux with a small overlap for comms? It’s a bit like having the PiFi connected to the network socket but using just the one processor. Or is this too difficult? |
Jon Abbott (1421) 2651 posts |
I’m not sure that would be possible without Virtualization, they’d be contending access to hardware. |
Jeffrey Lee (213) 6048 posts |
are both subject to the problem that code which wants to do something quick and atomic will typically disable IRQs/FIQs in order to stop the device interrupt from occurring in the middle. If you’re manipulating a single value which can be accessed by load/store exclusive instructions, then they’re preferred way to go (although since Linux supports pretty much every architecture under the sun I don’t know if they have a generic load/store exclusive API that’s guaranteed to be available everywhere). For larger values, complex data structures, etc. then spinlocks are the next step up, which do indeed involve disabling interrupts for the local core. http://www.makelinux.net/ldd3/chp-5-sect-5
I think that to start with we should aim for a minimal set of functionality – enough to allow multicore to be used for useful things, but not so much that it will cause compatibility issues. So it will be some kind of thread or task based approach (not wimp tasks, just a generic “run this code” task) that will allow programs to opt-in to the system. Once we’ve got the basics sorted we can start thinking about how to handle more complex scenarios. I still think the multicore taskwindow idea is worth investigating, as it should be possible to implement that without introducing any compatibility issues. |
Chris Hall (132) 3558 posts |
I’m not sure that would be possible without Virtualization, they’d be contending access to hardware. In my simple thought experiment, I was envisaging a cut down Linux that could access no hardware apart from (i) an on-board Wifi adapter; (ii) a reserved portion of the installed memory and (iii) (possibly) one of the on-board serial ports to provide a remote ‘teletype’ for debugging and info. It would respond to messages from the RISC OS running in another core which would access all other hardware on the board (hence no contention apart from a small region of overlapping memory for data exchange and no other purpose). All Wifi transactions would be requested by RISC OS and handled by Linux (which would not, itself, be able to access the Internet). Functionally it would be like a PiFi (raspberry Pi running Linux, connected to a separate machine by a short Ethernet cable) but all in the one processor and circuit board. Is there any chance this would work please? The purpose would be to make a RISC OS i-pad type machine feasible (where WiFi is indispensible and the machine probably does not have a wired Ethernet socket). |
Malcolm Hussain-Gambles (1596) 811 posts |
I appreciate this isn’t what you after, but it should be fairly easy to do another PiFi that would boot into RISC OS using RPCEmu or the like? |
Jeffrey Lee (213) 6048 posts |
Yes, that would work. But with most things software related, it’s not a question of whether it would work, it’s a question of how much effort is required to make it work, and who is willing to undertake that effort :-) To make it work you’d probably need more Linux knowledge than RISC OS knowledge (custom kernel build, root FS, etc.) If you were to implement something like that it might make sense to use a virtual ethernet device as the sole method of communication between the two OS’s. Apart from the obvious use of accessing the wifi, you could also use the virtual ethernet to connect to a ssh/telnet server to control the system.
RPCEmu on ARM will be horribly slow due to the lack of JIT. So if you’re offering to implement an ARM JIT, then I’d definitely be interested ;-) |
Malcolm Hussain-Gambles (1596) 811 posts |
Sljit? Surely there are others too? |
Pages: 1 ... 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26