Thinking ahead: Supporting multicore CPUs

636 posts, 79 voices

Pages: 1 ... 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Mar 30, 2016 11:33am David Feugey (2125) 2709 posts	It would need a RISC OS hypervisor (possible on Pi2, Pi3 and other modern ARM computers). The idea here is to be able to launch ASM, C and Basic code on other cores, then to provide a bridge for SWI calls. Two very important steps. By sharing the right parts of memory between cores, it could be possible (or not?) to do complex things as software 3D rendering on a dedicated core. I would probably use it for precision free maths as it’s CPU intensive, but I/O friendly.

Mar 30, 2016 12:48pm Jeffrey Lee (213) 6048 posts	The only problem are TaskWindows. It would be perfect if they could work without the Wimp, directly from a CLI only system. I think that would require us to move some of the task management code out of the Wimp (e.g. put it into the kernel). It’s definitely something we should be looking at doing, but it’s not something that needs to be done for the initial proof-of-concept. By sharing the right parts of memory between cores, it could be possible (or not?) to do complex things as software 3D rendering on a dedicated core. Yeah, once the kernel supports shareable memory it should be possible to do that kind of thing.

Mar 30, 2016 1:02pm David Feugey (2125) 2709 posts	but it’s not something that needs to be done for the initial proof-of-concept. Absolutely true. I could add that since there are other problems with Wimp-less systems (cursor management), perhaps that we only need a full screen shell app for Wimp. A boot to F12 mode :)

Mar 30, 2016 5:04pm William Harden (2174) 244 posts	Jeffrey: yes that plan looks pretty much as good as we are going to get. It ticks the boxes of having something deliverable off the ground quite quickly, whilst allowing progressive deliverables over a suitable timeframe. The proposals make perfect sense from my perspective.

Mar 30, 2016 5:23pm Jeffrey Lee (213) 6048 posts	The big question is, is anyone going to try it? ;-) I’ve got a few too many things which are in the half-finished state to be able to start work on any big new features. Sure, it should be possible to get the proof-of-concept up and running fairly quickly, but from there it’s a bit of a slippery slope that could easily lead to a hundred more things appearing on my todo list.

Mar 30, 2016 5:59pm Rick Murray (539) 13850 posts	I think the biggest benefit to us all is if the Wimp is capable of distributing tasks among the available cores; however the real danger here is in trying to imagine how much is liable to break if multiple applications are running at the same time. I’ll give a few examples: User Message Recorded Sprite redirection Printing anything

Mar 30, 2016 9:59pm William Harden (2174) 244 posts	Jeffrey: I suspect it’s more a shortage of expertise than a shortage of volunteers!!! The plan makes logical sense to me in terms of what is required, and the concept is fascinating, but needs expertise in handling the memory map and low-level/kernel-level OS stuff. I wouldn’t have a clue how to set the memory mapping up for the cores. (also notes ScrnSetup needs sorting at some point). Incidentally are the DDE Tools Pi3-friendly? Think mine are a release or two out of date…

Apr 4, 2016 7:24pm Anthony Vaughan Bartram (2454) 458 posts	Once I’ve got my current game released I am going to try and have a go at waking up a core. But really mapping out the code structure to provide a refactoring path – as previously discussed – is required. I started looking at this last year and came down with flu before Christmas… Nearly got this new computer game out for Beta. My first foray into 3D vector graphics… But that is for another thread.

Apr 5, 2016 5:37am Alan Robertson (52) 420 posts	Once I’ve got my current game released I am going to try and have a go at waking up a core. I am officially excited.

Apr 5, 2016 1:20pm Jeffrey Lee (213) 6048 posts	But really mapping out the code structure to provide a refactoring path – as previously discussed – is required. On that subject then, here are a couple of things which have been collecting in my brain recently. The first thing is that there’s one key area that’s been behind a lot of stuff recently – memory management: We need support for shareable pages (long-term: useful multicore, short-term: simplifies code needed to get the taskwindow proof-of-concept working) It would be nice to support the long descriptor format (short-term: > 2GB of RAM, long-term: AArch64 future proofing) We need to fix our approach to cache maintenance (short-term: increased reliability, long-term: makes memory management more multi-core friendly) As well as shareable pages there are a few more page flags which we don’t (fully) support yet (non-executable pages, some new ARMv8 memory types, separate inner & outer cache policies) Now that the free pool is no longer logically mapped it would be nice if OS_ChangeDynamicArea could support growing DAs from within interrupt context (no more problems with e.g. USB failing due to failed memory allocation requests from interrupt handlers) And of course for multi-core to be useful it would be nice to be key kernel memory management APIs be thread-safe So rather than treat those as individual tasks to be implemented as-and-when needed, it would be nice to group at least some of them together and try and get them implemented all in one go (say, the first four points, as they all relate to page tables and cache maintenance) The second thing is that splitting up the kernel is likely to make any future work (GraphicsV improvements, memory management improvements, multi-core support, 64bit support, etc.) easier to deal with.

Jul 25, 2016 1:19pm Jeffrey Lee (213) 6048 posts	Over the weekend I wrote a quick test app which would start another core with the MMU enabled (using the same page tables as the OS). Theoretically this makes it a lot easier to run code, but in reality there are still a few things that need sorting out before the test app can be used as a foundation for “proper” multi-core development: Shareable pages. I’ve actually got the “shareable pages” part done, but haven’t fully tackled the issue of making sure our cache & TLB maintenance operations work with multicore. Since some of the operations might require the kernel to send messages to the other cores I’m not sure if I’ll be able to implement everything that’s necessary within the standard kernel (since multicore is very much in an “experimental” stage it would be nice if the experimental bits were all in the test app instead of in the kernel). However, considering that the kernel accesses the ARMops via function pointers, possibly the test app could patch the kernel to point to its own MP-safe ops. The HAL needs to be made MP-safe, at least for the critical APIs (interrupts, debug, maybe timers) We need a simple MP-safe message queue implementation which can be used to pass messages between the different cores We need a HAL device/API for triggering inter-processor doorbell interrupts. This will probably have to go directly into the HAL (as opposed to being part of the test app) – at the moment the Pi HAL is completely unaware of the extra interrupt sources on the Pi 2/3, so the OS would get very confused if it received an IRQ from any of those. The test app needs updating so that it will make use of the above features – use the message queue and doorbells to communicate between the cores, both for kernel-level things (cache/TLB maintenance) and user-level things (receiving requests from the primary core to run code, and signalling the primary code on completion of that code or if it crashes) At some point we’ll also need to work out how to deal with interrupts in general. This is quite an important one – if the ultimate aim is to allow device drivers to run on the extra cores then we’ll need to have interrupt management APIs which can cope with that. Issues to resolve include: We need an API to control which core(s) specific interrupts will be routed to. Usually you can control this on a per-IRQ source basis, but there are some platforms like the Pi where you can only change the assignment of groups of interrupts. We also need to decide how that routing API fits in with HAL_IRQEnable/HAL_IRQDisable – should it enable/disable the interrupt for just the calling core, or globally? We need a way of representing interrupts which are core-specific, like mailboxes, doorbells and local timers. I.e. is each one given the same device number across all the cores (which is usually how the hardware represents them), or do we generate virtual device numbers so that interrupts can be uniquely identified? We need to deal with banked hardware registers that are only accessible by specific cores (e.g. the core-specific interrupt registers would usually fall into this category). If core A wants to disable a core-specific interrupt for core B, but the register is only accessible by core B, that’s going to require some message passing between the cores.

Jul 25, 2016 2:32pm Rick Murray (539) 13850 posts	I’m wondering if there is anything that could be picked up from how the x86 Co processor in the RiscPC worked? IIRC it used mailboxes to pass messages to/from the ARM host. The interrupt one will be more tricky but not insurmountable. Depends on the specifics of how they are grouped – but it seems that BCMxxxx technical data isn’t so easy to come by. :-(

Jul 25, 2016 2:49pm David Feugey (2125) 2709 posts	At some point we’ll also need to work out how to deal with interrupts in general. At some point. Without interrupts, other cores would still be fantastic to offload user mode code. Anyway, I agree on the whole plan :)

Jul 25, 2016 4:06pm Jeffrey Lee (213) 6048 posts	The interrupt one will be more tricky but not insurmountable. Depends on the specifics of how they are grouped – but it seems that BCMxxxx technical data isn’t so easy to come by. :-( BCM2836 ARM-local peripherals is the key doc that explains how the 2836 (and 2837) differ from the 2835. Take a look at the diagram at the start of section 3.2 – the original interrupt controller that generated the ARM11 IRQ & FIQ signals is still there, but instead of being directly connected to the CPU it’s connected to a secondary controller which allows control over which core each of the two signals are routed to (as well as mixing in the core-local interrupts)

Jul 26, 2016 12:35pm Jon Abbott (1421) 2651 posts	I had to implement something along the lines of IRQ direction in ADFFS on StrongARM. Essentially, I trapped all means of claiming an IRQ (ie modifying the hardware vectors or claiming though SWI’s) and then track which IRQ’s an app wants. The IRQ vector then points to my redirection code, which first looks to see if an app wants to know about an IRQ and directs it there first, prior to handing off to the OS. There’s a few possible routes I can think of: 1. Pass the IRQ across all cores first, then clear it on core 0 (we’ll assume the OS kernel is on core 0 only) if it’s not handled 2. Track which process ID’s are claiming which IRQ’s and direct the IRQ’s to the appropriate core (assuming there’s an IRQ handler on each core) 3. Track which process ID’s are claiming which IRQ’s and directly call them (assuming there’s only one IRQ handler on core 0) FIQ’s probably want to remain on core 0, which might pose a problem due to the way it can be claimed in RISCOS. Perhaps FIQ’s should be Module based only and have a flag to indicate the Module might claim FIQ so it can be steered to core 0? Will there be a microkernel on one core and RISCOS then spread across all cores so for example, one core using FileCore doesn’t stall SWI’s on all cores? I’m assuming there’s an SWI handler on all cores, the HAL/kernel on one core and virtualised devices so hardware can be shared across the cores? Wouldn’t we want RISCOS to be more Module based than it currently is?

Jul 26, 2016 1:39pm Jeffrey Lee (213) 6048 posts	We’ll definitely be having an IRQ handler for each core – at the least, the additional cores will need something there to respond to the doorbell interrupts that the kernel would be using to communicate high-priority things like the cache maintenance operations. And if there’s a thread scheduler we’d probably want a timer interrupt per core (although sharing the 100Hz timer would be possible, if the primary core was to send a message via the message queue) FIQ’s probably want to remain on core 0, which might pose a problem due to the way it can be claimed in RISCOS. Perhaps FIQ’s should be Module based only and have a flag to indicate the Module might claim FIQ so it can be steered to core 0? Yeah, if we want to allow modules to run on other cores then we’ll definitely need a flag in the header to indicate that it’s MP-safe. Device drivers – whether IRQ or FIQ based – are both subject to the problem that code which wants to do something quick and atomic will typically disable IRQs/FIQs in order to stop the device interrupt from occurring in the middle. But for multicore that will obviously only work if the code is running on the core which has the IRQ/FIQ handler installed on it, so we’ll either need a way of guaranteeing that, or the code will need updating to some other method (load/store exclusive, spinlocks, etc.) Will there be a microkernel on one core and RISCOS then spread across all cores so for example, one core using FileCore doesn’t stall SWI’s on all cores? I’m assuming there’s an SWI handler on all cores, the HAL/kernel on one core and virtualised devices so hardware can be shared across the cores? At the moment what I’m aiming for is: Same memory map across all cores (including application space, for now) The extra cores will be running a microkernel with its own processor vectors The SWI handler will only allow calling MP-safe SWIs/modules Some kind of task management API on the primary core will be used to run code on the other cores, and to register/deregister the thread-safe SWIs/modules Calling a non-MP safe SWI, or triggering an abort, will cause the task to terminate Certain HAL APIs made MP-safe as appropriate (basically whatever the microkernel needs to be able to use) That should give a reasonable foundation for the “proper” multi-core development to begin – making sure that cache/TLB/page table operations are MP-safe, developing proper HAL APIs and adding support for the other multi-core devices, adding a thread scheduler and synchronisation primitives, making bits of the kernel MP-safe, etc.

Jul 26, 2016 6:39pm Rick Murray (539) 13850 posts	Perhaps FIQ’s should be Module based only and have a flag to indicate the Module might claim FIQ so it can be steered to core 0? A much simpler solution to all of this is to have a flag in the module header saying that it is safe to run on multiple cores. Anything else that does not have this flag will run on the primary core. So this is all modules that don’t claim to be safe, bits of workspace claimed from the RMA then used for callback/vector code, blah blah. Will there be a microkernel on one core and RISCOS then spread across all cores For the moment, would it not be more logical to have RISC OS (more or less as it stands) on the primary core, then minimal microkernels on the others? This should provide the most benefit for the least amount of work. so for example, one core using FileCore doesn’t stall SWI’s on all cores? I suppose it depends on how much each core’s kernel is capable of doing for itself before stalling becomes an issue. For the beginning, I can envisage the core kernel kicking pretty much everything to the primary RISC OS; then little by little the cores will be able to do more stuff with less assistance. I believe the main issue with FileCore is that it couldn’t be safely re-entered. So moving it to a different core would present the same sort of situation (it won’t magically run faster as it’s doing its stuff at nearly the full system speed as it is) with the added complication of having to marshal conflicting demands from potentially three other cores. Ideally, FileCore needs to be rewritten to work in a way that isn’t something from the eighties; but if it were simple, it’d have been done by now. Wouldn’t we want RISCOS to be more Module based than it currently is? Let’s assume we have a module, XYZZY, and it is starting up on core #3. No reason, it flagged it can work with multiple cores so RISC OS put it there. It calls RMA_Claim to grab a chunk of RMA space. Where is this space allocated? The core-specific module area? The primary core’s module area? It would make sense to place it into the core-specific module area…but what if data within is going to be passed to other RISC OS APIs? What if pointers to data within are going to be made available to user mode applications or accept pointers from user mode applications – pretty much how most SWIs work. Would all cores be able to see the workspace of all module areas? Indeed, what do we even mean when we speak of the “RMA” if it is split across several cores? Or will RISC OS be smart enough to maintain one RMA with different cores running code from out of it (yikes, sounds like a potential cache nightmare). are both subject to the problem that code which wants to do something quick and atomic will typically disable IRQs/FIQs in order to stop the device interrupt from occurring in the middle. This is probably a habit we ought to be trying to get away from. I can’t believe that Linux, for instance, buggers around with interrupt disabling all the time. There must be another method…we ought to be using… ;-)

Jul 27, 2016 2:08am Jon Abbott (1421) 2651 posts	It calls RMA_Claim to grab a chunk of RMA space. Where is this space allocated? The core-specific module area? The primary core’s module area? Can’t RISC OS and Modules be in Shared memory and Applications in core specific appspace? I suppose it depends on what the other cores are being used for and what Modules are allowed to do going forward. For example, should Modules be allowed to touch appspace directly? There are some fairly major fundamental issues such as this that prohibit some current software working in a multi-core environment. I suppose if you start on the premis that apps/Modules need to be marked as multi-core then you’re starting with a clean slate and can redefine the rule set. What are we envisaging running on each core? Apps or threads? Wimp based apps certainly lend themselves to multi-core as the communication protocol is already there to pass data around. Mind you, apps that have Module dependencies could be a potential issue. Areas of the OS such as FileCore that aren’t reentrant could be botched into a multi-core environment by queuing calls, until there’s the resource to modify/rewrite them.

Jul 27, 2016 6:52am Chris Hall (132) 3558 posts	We’ll definitely be having an IRQ handler for each core – at the least, the additional cores will need something there to respond to the doorbell interrupts that the kernel would be using to communicate high-priority things like the cache maintenance operations. If we can get to a point where RISC OS can use a programme running on a second core, would it be possible to have a WiFi network stack working entirely on a second core with everything else running on the main core? Not sure how the OS asks for access to stuff on the network. Developing the idea a little further, is it possible for Linux to be running on one core, doing WiFi but nothing else and for RISC OS to be running on the main core and just sending messages (or whatever) when it needs network stuff. The memory map could be split between the two – top half RISC OS, bottom half Linux with a small overlap for comms? It’s a bit like having the PiFi connected to the network socket but using just the one processor. Or is this too difficult?

Jul 27, 2016 7:13am Jon Abbott (1421) 2651 posts	is it possible for Linux to be running on one core, doing WiFi but nothing else and for RISC OS to be running on the main core I’m not sure that would be possible without Virtualization, they’d be contending access to hardware.

Jul 28, 2016 12:59pm Jeffrey Lee (213) 6048 posts	are both subject to the problem that code which wants to do something quick and atomic will typically disable IRQs/FIQs in order to stop the device interrupt from occurring in the middle. This is probably a habit we ought to be trying to get away from. I can’t believe that Linux, for instance, buggers around with interrupt disabling all the time. There must be another method…we ought to be using… ;-) If you’re manipulating a single value which can be accessed by load/store exclusive instructions, then they’re preferred way to go (although since Linux supports pretty much every architecture under the sun I don’t know if they have a generic load/store exclusive API that’s guaranteed to be available everywhere). For larger values, complex data structures, etc. then spinlocks are the next step up, which do indeed involve disabling interrupts for the local core. http://www.makelinux.net/ldd3/chp-5-sect-5 What are we envisaging running on each core? Apps or threads? Wimp based apps certainly lend themselves to multi-core as the communication protocol is already there to pass data around. Mind you, apps that have Module dependencies could be a potential issue. I think that to start with we should aim for a minimal set of functionality – enough to allow multicore to be used for useful things, but not so much that it will cause compatibility issues. So it will be some kind of thread or task based approach (not wimp tasks, just a generic “run this code” task) that will allow programs to opt-in to the system. Once we’ve got the basics sorted we can start thinking about how to handle more complex scenarios. I still think the multicore taskwindow idea is worth investigating, as it should be possible to implement that without introducing any compatibility issues.

Jul 28, 2016 4:06pm Chris Hall (132) 3558 posts	I’m not sure that would be possible without Virtualization, they’d be contending access to hardware. In my simple thought experiment, I was envisaging a cut down Linux that could access no hardware apart from (i) an on-board Wifi adapter; (ii) a reserved portion of the installed memory and (iii) (possibly) one of the on-board serial ports to provide a remote ‘teletype’ for debugging and info. It would respond to messages from the RISC OS running in another core which would access all other hardware on the board (hence no contention apart from a small region of overlapping memory for data exchange and no other purpose). All Wifi transactions would be requested by RISC OS and handled by Linux (which would not, itself, be able to access the Internet). Functionally it would be like a PiFi (raspberry Pi running Linux, connected to a separate machine by a short Ethernet cable) but all in the one processor and circuit board. Is there any chance this would work please? The purpose would be to make a RISC OS i-pad type machine feasible (where WiFi is indispensible and the machine probably does not have a wired Ethernet socket).

Jul 28, 2016 7:05pm Malcolm Hussain-Gambles (1596) 811 posts	I appreciate this isn’t what you after, but it should be fairly easy to do another PiFi that would boot into RISC OS using RPCEmu or the like? I think I’d do a RPi v3 only version though as that has built in wifi. I don’t have any time, but it would be a fun project and an excuse to get a Pi3. It’s probably two days work to get done nicely. (That’s actual work) If there is enough interest (given most RISC OS stuff, one person saying “yeah that would be cool”) I’ll knock something up and put the .img on my website for free download with destructions.

Jul 28, 2016 7:30pm Jeffrey Lee (213) 6048 posts	Is there any chance this would work please? The purpose would be to make a RISC OS i-pad type machine feasible (where WiFi is indispensible and the machine probably does not have a wired Ethernet socket). Yes, that would work. But with most things software related, it’s not a question of whether it would work, it’s a question of how much effort is required to make it work, and who is willing to undertake that effort :-) To make it work you’d probably need more Linux knowledge than RISC OS knowledge (custom kernel build, root FS, etc.) If you were to implement something like that it might make sense to use a virtual ethernet device as the sole method of communication between the two OS’s. Apart from the obvious use of accessing the wifi, you could also use the virtual ethernet to connect to a ssh/telnet server to control the system. I appreciate this isn’t what you after, but it should be fairly easy to do another PiFi that would boot into RISC OS using RPCEmu or the like? RPCEmu on ARM will be horribly slow due to the lack of JIT. So if you’re offering to implement an ARM JIT, then I’d definitely be interested ;-)

Jul 28, 2016 9:33pm Malcolm Hussain-Gambles (1596) 811 posts	Sljit? Surely there are others too? It looks like there are… Oracle java, openjdk too. That’s assuming the few Google searches weren’t bs. I’ll get a pi3 ordered then, best way to find out is to do! Perhaps a nice lunch time project