Bounty proposal - Multicore/node support

32 posts, 7 voices

Pages: 1 2

Feb 2, 2015 9:11pm David Feugey (2125) 2709 posts	I propose a bounty for AMP support under RISC OS. AMP stands for Asymmetric multiprocessing. It’s a very easy way to support many cores under an OS. With an ARM processor, a few lines of code (provided by ARM) allow to configure each core of the processor as an fully independent processor, with its own memory. The idea is: 1/ to enable and configure AMP mode in the kernel boot sequence 2/ to have a small monitor that can load and boot specific code under the other cores 3/ to set up tools to use other cores For point 3, with have two steps: 1/ Some specific library to create C and ASM code that will run in other cores, and will be able to communicate with RISC OS via shared memory. Easy to set up, but just the first step. Light threads for a limited SMP behaviour, could be provided here. 2/ a specific version of RISC OS. Without any I/O. Just a RamDisc. But complete, with Wimp_in_a_window (to get multitasking facilities on other cores). A CLI server with file transfer and message passing will communicate with the main session of RISC OS (need a client too). + of course some configure plug-ins for all of this. With point 2, another feature will be possible: cluster mode. Add optional encryption to the ‘CLI server with file transfer and message passing’ and make it use sockets and not shared memory. And you’ll have a good way to use RISC OS nodes as slaves. So any AMP application will also be a ‘clustered’ application. This tool could also be used to get all the power of a virtualized system with no native support for RISC OS. You use a level 1 hypervisor (basically Linux + KVM + Pi patches. Some people are working on this), and you connect as many sessions as the number of cores together (1 master, many slaves). Of course, I have some money to put on this project. Any ideas?

Feb 3, 2015 7:45am Rick Murray (539) 13840 posts	So basically a Tube-like mechanism for the other cores…?

Feb 3, 2015 9:44am David Feugey (2125) 2709 posts	Yes. Examples of uses: In basic mode (1 application for the whole core) Massive calculation/simulations OpenGL accelerator Crypto/compress engine Video decoding Accelerator for emulators (Archiemu, DosBox, etc.) Generic light threads With a RISC OS kernel Test of new versions of RISC OS Secured RISC OS environment (for debuging or security tests) Cluster in a SoC With a Linux kernel (on ARM, AMP is very closed to hardware virtualization) Access to all Linux applications Of course, you could mix the uses. For example, one core for RISC PC emulation (with faster than ARM6 speed), one core for RISC OS developer environment (completely crash proof, since it’s independent from the main session), one core for OpenGL, etc. Free ARM code for AMP/SMP implementation is here: http://www.arm.com/products/processors/cortex-a/arm-mpcore-sample-code.php ARM say that a monocore OS will work completely unmodified in AMP mode. The only problem is sharing of I/O (so only one session of RISC OS should access to the peripherals). This solution was choosen by Amiga OS community with great success (they opt for some light threads libraries, but use one AmigaOS kernel per core).

Feb 3, 2015 9:48am David Feugey (2125) 2709 posts	Basically, all the code needed to set up an AMP mode and use the second core from the first one (with PMT multitasking) is provide by ARM. To have a version of RISC OS adapted to the second core would be a plus (more features, more code ready to use). This project could apply to Raspberry Pi 2, Compute Module 2, Pandaboard (ES), BeagleBoard (XM +OpenPandora), i.MX6 boards, OMAP5 boards. The approach suggested here will permit to use the same tools for virtualized sessions of RISC OS (under Linux… work in progress) and cluster of RISC OS computers. And Perhaps even for the Hydra system :) And of course, we will able to say: RISC OS is multicore compatible with a clustering approach (and so ready out of the box for clusters).

Feb 3, 2015 11:22am Jess Hampshire (158) 865 posts	So a situation similar to the Risc PC running Windows on the second processor would be possible? So it would be possible to have Android apps running on another core, with RISC OS providing all the I/O? I suspect that Windows 10 would be too locked down to allow Windows in a window on the RO desktop, again.

Feb 3, 2015 11:51am David Feugey (2125) 2709 posts	So a situation similar to the Risc PC running Windows on the second processor would be possible? Yes, with tools to make the two talking together at high speed. So it would be possible to have Android apps running on another core, with RISC OS providing all the I/O? Why not. With a kernel almost similar has the one we would use in paravirtualized mode. RISC OS adaptation will be simpler. Basically we must ask the guest HAL to talk to the host HAL of the main session of RISC OS.

Feb 4, 2015 4:11pm Jess Hampshire (158) 865 posts	I could see such an arrangement with an alien OS on the other processor as being a huge advantage to a regular user. Android apps could appear on the RO desktop like widgets (or whatever) and could fill allow the machine to be used as a sole system. (Although the irony would be that the desktop programs would use a fraction of the resources of a widget.) Linux terminal programs would work really easily, and presumably X programs could be made to behave almost as normal RO programs with a suitable X server. The drawback I see would be that RISC OS is limited with its poor I/O system. I think this would hobble many complex linux programs. Apart for people writing their own systems for their own use, I can’t see where RISC OS on RISC OS would be of benefit. Programs would need to be rewritten to make use of it, but it would still be subject to the limitations of RISC OS. Would it not be better to either enhance RISC OS to provide alternative APIs without the limitations, so when programs are rewritten, they use these? Or to use a hypervisor on a different OS (removing the file I/O issues)and have a primary RO guest for the system that provides the display for the whole system?

Feb 4, 2015 5:54pm David Feugey (2125) 2709 posts	To use another OS is only one part of what is possible with an AMP system. That’s why I proposed to have also a ‘monitor’ that can launch tasks on other cores. Then it will be possible to provide a light threads library. With this lib, every RISC OS software will be able to use the power of all cores. The only limit is that we talk here of light threads, so no possibility to access the I/O from them, or the system SWI. That’s why to get RISC OS on other cores is also a good idea. You will be able to access all RISC OS features from light threads, but with still limited I/O (since it’s not the ‘main’ RISC OS session). Good point: if I make a WIMP software that can do calculations and work in cluster mode, I just have to replicate it on each RISC OS core, to use all the CPU power (then to other nodes to get even more power). No need to use threads. Summary So, it’s light threads (closed to SMP) or full OS (closed to clustering) or a mix of the two (by using an I/O less version of RISC OS as the monitor for the other cores, with tools to make FTP + CLI access [for cluster like applications] and shared memory + message passing [for light threads approach]).

Feb 4, 2015 6:00pm David Feugey (2125) 2709 posts	With this two in one solution, developers will have the choice. 1/ to use threads to get the power of all cores. 2/ to use a cluster approach to use the power of all cores, on one or many RISC OS computers. Of course, it’s also possible to use one core to make a specific OS running (Android, Linux, etc.). Note that this is not possible in SMP mode, as patents cover the reservation of a core. Note also that ARM provides the code to set up an AMP system and the monitor to use other cores. So the first part of this bounty could be easy to achieve, for a kernel developer. The second part (to make a version of RISC OS for other cores + tools) will be harder. Ha, and no need for SMP to set up an AMP system.

Feb 4, 2015 6:20pm Jess Hampshire (158) 865 posts	The light threads would be for new software? Would this be a subset of what a full multicore system would provide? i.e. so any software would still work on a full future SMP system.

Feb 4, 2015 6:45pm David Feugey (2125) 2709 posts	Light threads would be for new software, yes, as threading does not exist on RISC OS :) Software that use threads could work unchanged (but perhaps with a recompilation) under a SMP version of RISC OS. But to be honest, to make a SMP version of RISC OS is not possible today, as it’ll break most existing applications. Anyway, the difference between the two models is not so big. With AMP, you must offload calculations to threads to use other cores. On the other hand, you can use a cluster approach and not threading. Then, a multicore and multicomputer application will be exactly the same. On a SMP system, heavy applications will work on different cores automatically. But for one application only, you must use threads too. And it’s not adapted to clusters. So you must adapt your application to both multicore and multinode modes. So, in a way, AMP is more future proof than SMP, and SMP more convenient than AMP as threading is not even necessary. The point is that AMP is possible today on RISC OS. SMP is not :) Hope it will help. PS: technically, all cores can access to all resources in a SMP system. With AMP, you have a master, and some slaves, so it’s more limited, but no need to share resources, and so to rewrite all RISC OS. PS2: and it’s even not so limited. With sharing of resources, you run into ‘big lock’ problems, and loose a lot of CPU time. Two bosses is sometimes less efficient than only one.

Feb 4, 2015 7:18pm David Feugey (2125) 2709 posts	Example of use of the cluster mode. You have a MP3 encoding application, but you don’t want it to slow down the Wimp. So you set up a connection to the second core. You send a slave application to it, and launch it from ‘remote’ CLI. Then the work is done, the application send you back a wimp message. To be convenient, the application is a CLI one that works inside a taskwindow under the second RISC OS session. So you can launch other tasks on the second core (with PMT multitasking, provided by taskwindows). You don’t really need two applications here. Just to separate the work in two : a front-end on the main core, and an encoder for any core (main or other), or – if you need to encode many files at once for a web service – encoders on many cores and many nodes. Once the work is done for one core, it’s (almost) OK for really big clusters. It’s a very interesting approach, especially if you don’t have access to threads (or don’t want to use them). You could for example use it from and with Basic code. Communication tools could be accessed via SWI or *commands (CLI + file transfer + message sending / receiving). I did that a long time ago with the PC card, used to compress files and music under my RPC600. It was really a cluster model. Some other developers used an integrated model, for example a FPEmulator that worked on the RISC PC card. It’s more closed to threading than clustering, and could be also done on an AMP system (for an arbitrary precision calc library, for example, or OpenGL acceleration). I think the three ways are good : classic applications in cluster mode (one core = one computer); applications that use one full core (OpenGL, calc, multimedia, ARM3 emulation, etc.); applications that use threading to access other cores (SMP like, but not as complete). I like this, because it’s choice. We don’t need SMP, we need all. We don’t need PMT, we need single tasking + CMT + PMT :)

Feb 4, 2015 7:24pm Rick Murray (539) 13840 posts	What would be nice for a start (and probably not too hard to implement) is a mini version of RISC OS that kicks “unknown SWIs” and I/O over to the proper version of RISC OS on core zero. In its initial state, practically no SWIs will be recognised (so all would be passed to real RISC OS) but in time this can be built upon. Yes – PMT, AMP, SMP, and all the other acronyms would be nice to have (and I reckon this will wake up multi-core work on RISC OS now that such things are within the reach of everybody), but it will take time to agree on a definite way forward never mind actually turning that into code. For the shorter term, a Tube-like approach should be considered as it will give some of us practical experience with pushing work off to other cores plus allow experimentation in a way that will be sort of supported by RISC OS (even if replaced later on) which is surely better than different people implementing different variations of the same concept. I wonder if such a thing could be implemented as a module on the host (with minimal changes to the core OS itself)? That may help keep it as an “optional thing”? Personally, although I may find this difficult as Broadcom has never been great with documentation (it isn’t TI like); I would love to push off IIC processing to another core so I can finally make my MP3 player and it doesn’t matter how long dopey IIC disables interrupts for, the sound (playing on the main core) will carry on. Of course, interrupts disabled for long durations is a(n annoying) quirk of RISC OS itself and anyway a four-core ~900MHz processor with a GiB RAM, networking, and four USB ports is maybe getting into overkill for a simple MP3 player with OLED display. ;-) [that ought to be Model A territory, no?]

Feb 4, 2015 7:25pm Rick Murray (539) 13840 posts	I did that a long time ago with the PC card, used to compress files and music under my RPC600. It was really a cluster model. You have code to use the x86 co-processor to run tasks controlled by the RISC OS side? There are people who might be quite interested in that, given that Diva/!PC is not exactly the simplest thing to figure out…

Feb 4, 2015 7:26pm Jess Hampshire (158) 865 posts	> (smp) – as it’ll break most existing applications Would it only break them if it tried to enforce multi processor on them? If they all ran on one core and could stall the system when they wanted to access anything, would they not run fine? Obviously you would only seriously benefit from the SMP with new apps, and running old apps would reduce the performance of those apps. (But better that, than losing access to programs).

Feb 4, 2015 7:49pm David Feugey (2125) 2709 posts	You have code to use the x86 co-processor to run tasks controlled by the RISC OS side? Yes, they were available from a third party developer (remember Armedit?). I wonder if such a thing could be implemented as a module on the host (with minimal changes to the core OS itself)? That may help keep it as an “optional thing”? For the main core, AMP is transparent. You launch the second core with it’s own memory space. So you just need not to write in its memory space. Of course the second core does nothing without its own kernel (could be a version of RISC OS, or something simpler as the monitor provided by ARM). That may help keep it as an “optional thing”? Yes, especially if ROOL don’t want to make a bounty around this :) Problem is that ARM says that you must configure the other cores at boot. I’m not sure it’s really true, but it implies modifications in the RISC OS kernel (detect other cores, configure them and put some code to make them waiting for future instructions). This could be the first part of the bounty, as developers will be able to make a lot of funny things with these modifications (and a few examples). IIC processing to another core It should be possible. Every core as full access to peripherals in an AMP system. So you can access to GPIO from core 1… if you don’t make the same from core 0 (else: crash). Anyway, it’s better to use only one core for most/all hardware access. Or not :) For (other) funny uses: a GraphicsV driver that would write data of screen 2 to a specific part of the memory. Second core would compress it in realtime, then RISC OS will just have to send the result on the network. Almost no CPU time used on core 0. The same could be done for a encryption system for Filecore. Or a realtime Midi synth, linked to the Midi module (I would love this one).

Feb 4, 2015 8:20pm Steve Pampling (1551) 8170 posts	Yes, they were available from a third party developer Sounds like the Win Risc package from Armed Forces Software

Feb 4, 2015 8:29pm David Feugey (2125) 2709 posts	No. Armedit. WinRisc aim was to provide Windows applications directly on the RISC OS desktop. And FPEPC to use the FPU of the PC Card :) Perhaps not very efficient, but very fun. I hope that AMP will be as fun, as it’s a no_limit/do_what_you_want solution.

Feb 4, 2015 9:54pm Rick Murray (539) 13840 posts	No. Armedit. Ah. I’ve used that. Cool, but requires the whole x86 enviro to be set up. WinRisc aim was to provide Windows applications directly on the RISC OS desktop. Nice idea, but unbelievably slow. I think too much data was being squeezed through too small a hole.

Feb 9, 2015 6:53pm Eric Rucker (325) 232 posts	It’s also worth noting that the “lightweight OS on each core that punts most SWIs back to the main CPU” approach has already been done on RISC OS, too: http://www.simtec.co.uk/products/AUHYDRA/files/api.txt However, as I understand, Hydra was incapable of running Hydra threads on the main CPU. I’d argue that this is critical to adoption by developers, so you can run the multiprocessor support library on single-core systems (and then, multiprocessor-capable programs would have more of their code running in a PMT environment even on a single-core machine – CMT code could still bring the machine to a halt, but there’d be less of it).

Feb 9, 2015 7:24pm David Feugey (2125) 2709 posts	It’s also worth noting that the “lightweight OS on each core that punts most SWIs back to the main CPU” approach has already been done on RISC OS, too: http://www.simtec.co.uk/products/AUHYDRA/files/api.txt It is not a coincidence :) However, as I understand, Hydra was incapable of running Hydra threads on the main CPU. I’d argue that this is critical to adoption by developers, so you can run the multiprocessor support library on single-core systems (and then, multiprocessor-capable programs would have more of their code running in a PMT environment even on a single-core machine – CMT code could still bring the machine to a halt, but there’d be less of it). True. That’s why I suggest also a kind of Wimp2 remake, to give access to a PMT scheduler in the CMT environment (1), to give the possibility to have a more responsive desktop (2)… and to get light threads, on the main core (3). See here: https://www.riscosopen.org/forum/forums/8/topics/3080

Feb 9, 2015 9:19pm Rick Murray (539) 13840 posts	It’s also worth noting that the “lightweight OS on each core that punts most SWIs back to the main CPU” approach has already been done on RISC OS The difference being, once the system is running (not early days, but when it is more developed) the software being hosted on the mini-RISC OS won’t know if it is a mini-RISC OS or the real thing. Apart from probing specifically for it, both will look alike to the standard API. Well, that’s the idea… However, as I understand, Hydra was incapable of running Hydra threads on the main CPU. Uh… Yeah… What? Part of the attraction of Tube was that you could run special Tube aware and Tube only software on the co-processor… but you could also run legally written normal software on the co-processor. I’d argue that this is critical to adoption by developers, I completely agree. If something wants to detect that there are n cores and go whee-hee! then that is fine, however “standard” software using the environment should be able to downgrade to running on a single core (with the performance hit as a taken). Not just for developing, but also to aid in reducing the potential marginalisation of requiring this or that in order to have the program work at all. Did you ever use PCs in the mid-late ‘90s? If so, games often said that they needed this graphics card or that graphics card, and if you had something else, you’d be lucky to make it to a menu. I think this was part of the reason for the birth of DirectX. At any rate, the more systems that the plan can be capable of supporting, the more likely people will be to actually use it.

Feb 10, 2015 8:58am Eric Rucker (325) 232 posts	Hydra code only ran on Hydra processors, normal software only ran on the main processor, as I understand. So, similar to the Tube-only situation except legally-written normal software didn’t run there either (which, there was nothing to take advantage of by having normal software run on other CPUs, all the CPUs were identical in a Hydra). And, I thankfully missed the worst of that era, not having powerful enough hardware to actually get into that era’s current software. Everything I played just required MCGA and a Sound Blaster compatible. In any case, I do think it may be counterproductive to have multicore support and PMT support being discussed separately – really, it’s worth looking at the evolution of Mac OS here. DayStar developed the original Mac OS Multiprocessing Services, used until Mac OS 8.6, for their own Mac clones (back when Apple was licensing the OS to cloners), and Apple actually based their MP Macs on DayStar’s tech. In any case, Multiprocessing Services simply ran inside the CMT environment, and provided PMT threads (including on a single-CPU machine) and MP support for those threads. All PMT threads had to have a stub running inside the CMT environment, that non-thread-safe calls were made through, as I understand. Later, in Mac OS 8.6, Apple revamped the OS, so that there was a “nanokernel” running underneath everything. The CMT environment was just a highly privileged process (the only one allowed to make most calls) running alongside everything else (so your PMT threads kept going even if the CMT environment died, at least until they needed to call through it), and more of the OS was pushed out into PMT.

Feb 10, 2015 10:21am David Feugey (2125) 2709 posts	In any case, I do think it may be counterproductive to have multicore support and PMT support being discussed separately Not really. With an AMP system, you can use PMT to get threads, or you can use a whole OS on the second core (for example a CMT Wimp), or even dedicate completely one core to a specific task. See my other bounty proposal. https://www.riscosopen.org/forum/forums/8/topics/3080 I separate the two because, even if linked, this two projects can be done separately. The PMT part of the AMP project just need to use the same API as this PMT scheduler for RISC OS. Or not. This is just one option. To dedicate a core to a specific task will be probably the first use of an AMP system. And it’s very interesting. SMP systems don’t propose this, not because it’s not useful, but because they don’t have the right to do this (patents). OS/2 did (I believe).

Feb 17, 2015 9:26am h0bby1 (2567) 480 posts	aaaaa