Thinking ahead: Supporting multicore CPUs
Pages: 1 ... 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
David Feugey (2125) 2709 posts |
Nota: for an AMP system, there is no need to change RISC OS. Each core is completely independent. You ‘just’ need to set up the AMP environment at boot (ie, to reserve part of memory for other cores), or perhaps even after boot. Then everything can be done from a user space application or module (on the main core).
In fact, it would even work with monotasks kernels. You could for example use one core to drive GPIO pins, and only for this. AMP is quite simple: each core has is own memory and one core, and doesn’t know that there are other resources available. Each core can access all I/O, so you must decide who will make what. With an OS without I/O, you can set up a working AMP system just by launching one session of the OS on each core. ARM provides the tools to do this (only a few lines a code). IMHO, the first step is to have tools to set up the AMP environment, then to send code to other cores and share data. Some people will use this for to provide light threads, for their own use, to run another session of RISC OS, etc. Free choice. |
Jeffrey Lee (213) 6048 posts |
Nope. (Note – all the OS source is in CVS. SVN is only used for the website source)
Sounds good!
Maintaining an acceptable level of backwards compatibility is certainly going to be the tricky part.
Yes, I think that’s a good way to go. Perhaps even the only way to go – there simply aren’t enough developers available to implement a change of this magnitude in a reasonable timeframe. We need to be able to do it bit-by-bit so that we can continue to work on other aspects of the OS in parallel. I could for example, try and write something using LDREX and STREX (or other perhaps other opcodes depending which RiscOS CPU target was being built i.e. SWP for legacy)- which I think are intended for semaphore/lock construction. I’m an experienced ARM coder and I’m hesitant to try implementing thread synchronisation primitives from scratch. I’m not sure how I feel about an inexperienced (unexperienced?) coder attempting the same :-) In the past we’ve talked about the idea of porting RISC OS to run ontop of a microkernel which provides the threading functionality. However a blocker to that has always been being able to find a suitable microkernel to use (free, open source, supports everything ARMv3 and above). But one approach that I don’t think we’ve discussed much or at all is to take the thread scheduler & thread synchronisation primitives from BSD. That way we’ll get code which is (hopefully) robust and efficient, and which will provide us with good compatibility with other BSD code (although many people seem to be focusing on the idea of muti-core/multi-threading as a way to get PMT, I’m more interested in it being used to help to implement OS-level components like the network stack, USB stack, and filesystem stack. Also our ageing USB stack and geriatric network stack both came from BSD, so if we can get a broadly BSD-friendly threading implementation then it will make it a hell of a lot easier to update them to newer versions) Alternatively we could build ontop of the code we’ve already got, e.g. SyncLib, which is an implementation of some basic synchronisation primitives using LDREX/STREX.
Cooperation is the tricky part. With RISC OS, almost everything is global state, even the supposedly private address space of applications. So one of the barriers to getting PMT or concurrent CMT applications is going to be to make as much state as possible task-local (once you’ve actually taught the kernel about what a task/process is!), and to make sure any attempts to access global state behave in a suitable manner (e.g. global mutex lock).
For ‘threads implemented as tasks’ see DThreads We also have RTSupport where all the threads operate at a priority level just below that of interrupt handlers, but with the drawback of them having limited interaction with the rest of the OS. I wonder whether, resources / operating system features that suffer contention in a multi-core/multi-process environment, could be single threaded and service a job request FIFO queue that is polled cooperatively. Rather than, requiring to be isolated with a new interface put on it i.e. instead of changing interfaces, create an adaptor which the existing code calls. Thus, preventing that code being changed or made aware that the operating system feature or resource is no longer local to the core it is running on. The big problem with that approach would be memory accesses – input/output buffers for SWIs could be anywhere. They could be in the private address space of a task, or they could be on a global system stack (which would presumably have to be core or thread local in a multi-threaded version of the OS), or they could be in true global memory like the RMA. So rather than queueing up the requests and forwarding them all to one core, I think it would make more sense to queue up the requests and then service them on the core(/thread) they were issued from. That way you won’t have to worry about the memory map being different. |
Anthony Vaughan Bartram (2454) 458 posts |
Hi Jeffrey, Regarding experience : My RiscOS and ARM experience are low (except re: ARM when a target from C/C++ on WinCE). But I’ve got ~19 years in the industry & 11 as a hobbyist. Various languages (lots of C/C++), but my assembler has been limited to 6502/68k and x86 (oddly a game I wrote in 6502 is being distributed on Turbo MMC drives for BBC Micros (its not a particularly good game mind you…)). Professionally I do a certain amount of reverse engineering and am enjoying looking at the Risc OS sources so far. Should we identify a task list to progress this work? Identified tasks/goals/objectives
No idea about estimates. I think I need a greater understanding of the code structure of Risc OS. I might start sketching on paper to understand the code. If a good enough task breakdown is identified, the tasks could be opened to a wider audience and sub-tasks could to be taken by whoever was interested i.e. a RAD approach or SCRUM approach (but I’m not sure how long the sprints would be (might be too slow to call a sprint)). Then at the end of each sprint (or slow amble) builds could be delivered that evolve the code toward multi-core with newer stacks (less geriatric). By the way – David : Have you received the updated beta 2.1 of the game? (Sorry I know this is the wrong thread…) |
David Feugey (2125) 2709 posts |
Yes :) Sorry. I’m late on everything this week… |
Jeffrey Lee (213) 6048 posts |
Sounds reasonable! My estimation skills aren’t that great either, but I can at least write a list ;-) Irrespective of whether we’re aiming for AMP or SMP, I think the following will be required in order for us to get a good foundation to build upon:
Once all that’s done we can then move on to more advanced things, e.g. threading & thread synchronisation on the slave core (and preferably the master ;-)), support for more cores, support for running multiple tasks/jobs at once, better integration with the master core (to allow ‘real code’ to start offloading tasks and see performance gains), etc. |
Dave Higton (1515) 3534 posts |
I’m wondering if another useful step would be to get a codec running on another core. It seems to me that this would require less infrastructure than even a basic kernel, and might be a useful speed-up in its own right, being a purely software task. |
Jeffrey Lee (213) 6048 posts |
What would the codec do if there’s no kernel to allow it to communicate with the outside world? How would you confirm that it’s even working? :-) In my eye the first code we’d have running would just be something which spews characters over the serial port, or toggles a LED. Then add a bit more structure (e.g. echo characters received, set up basic processor vectors which will spit out a register dump if there’s a crash), then add the communications with the master core (doesn’t necessarily have to be reliant on interrupts – just polling I/O will do to begin with). Once that’s done you can then start building the extra framework needed to run ‘real’ code, e.g. whatever’s necessary to ensure cache/MMU coherency when code or data is being passed around, or the ‘OS’ level functionality like starting an arbitrary job running. Then interrupts, SWIs, and all the other stuff necessary to create a real kernel. |
Dave Higton (1515) 3534 posts |
I’m assuming inter-processor communication via shared memory, which I wouldn’t class as being a kernel. The shared memory could be set up by the master processor. Re. your first code: yes, of course. |
Anthony Vaughan Bartram (2454) 458 posts |
Hi Jeffrey, “* Implement inter-core communications: Investigate different approaches (hardware FIFOs, shared memory, etc.)” This is because its, initially at least, a general investigation task and whatever I come to understand is generally useful and is perhaps less immediately dependent on prior RISC OS knowledge. Regarding task tracking, I was thinking about the free-form discussion nature of these forums & the nature of task tracking that I’ve used in different companies at work. In order to progress this and use a RAD style development approach, perhaps the task list would need to be tracked in some form, rather than using a forum where it might disappear over time within other discussions. At the moment, is there currently a task tracking method in RISC OS Open for collaborative development of this type? If not, simple task tracking in a spreadsheet or text file could be sufficient with names against it to track task owners. Given the volunteer nature of RISC OS, it probably difficult to pin down dates for deliverables etc. Therefore, excluding target dates, the text file could have at a minimum the following information: Where task Status could be – not started / in analysis / in design / in code / in test / in review. Please let me know if these seems like a reasonable way forward, of if there is another approach that should be taken. Cheers, Tony. |
Jeffrey Lee (213) 6048 posts |
I know that I have a steep learning curve to climb. However, perhaps what I usefully could try is: Yeah, that’s a good thing to start off with.
Not really. About the best thing we’ve got available is the wiki. E.g. each hardware port currently has a status page containing a list of tasks (OMAP3, OMAP4, IOMD, etc.)
Sounds good to me! Another task for the list would be to look at the HAL and work out what changes are required there. Mainly working out how much of the multi-core safety should be handled by the HAL and how much by the OS. E.g. we’d almost certainly want the HAL to make sure the IRQ & FIQ functions are multi-core safe. For timers, UARTs, IIC busses, etc. the HAL can probably assume that each device will only be used by a single core at once, so the only thing it needs to worry about is protecting accesses to common registers. By necessity initial versions of the slave kernel will be using direct hardware access. But once it’s running with the MMU on we’d probably want to get it talking to the HAL ASAP so that the code becomes more hardware-agnostic and gains access to more IO options. |
Richard Molyneux (1568) 6 posts |
Sorry, I haven’t time to read all the comments now, but I wanted to point out Wimp2 and this multi-core proposal in case they’d been overlooked. |
Steve Pampling (1551) 8172 posts |
Has been mentioned: put “Wimp2” into the search box at the top right of the page and follow threads. As has also been pointed out the developer of Wimp2 went on to do things for ARM – lots of things. |
James Cartner-Young (2649) 7 posts |
Hello, everyone. I am a new member on RISC OS Open. I am very keen to get RISC OS onto every desktop, laptop, STB and mobile device in existence. (Also, not used to forums, so let me know of any transgressions in etiquette – I mean no harm :) I had a BBC A, BBC B, Master 128, A3000 and I used my A3010, which had gotten updated to an A3020, until 2001 where I moved onto a Windows 98 PC (Eurgh.) I have also tried umpteen Linux distros, but found them a tad shabby, to be honest, apart from Ubuntu, which is now overly bloated. I used to write RISC OS software in my youth, even touched on ARM assembler. I believe that we can get RISC OS back on top without having to resort to such foolish ideas as mandatory pagefile, complex, power-sapping pre-emptive multitasking (which ignores the user’s priorities in place of it’s own.) Suffice to say it’s been a while for me – I have a PI with the new RISC OS, a SA-RPC 600 with a 486 running RO3.6, and an A7000 running RO3.7 with a 40GB HDD (yes, it’s problematic) lol – but I would LOVE to get involved with bringing RISC OS up to date and get it back in front again, influencing lesser platforms again. ;) I have about 15 years of questions which have been left unasked. Please forgive my rabid curiosity… 1. Is there a digest/road-map/matrix I can check out to see where you guys are up to and what you’re looking at next, or should I read all the forum posts? :S 2. Do we still have John Kortink (Kortnik? spelling?) (Tranlatr/Creator author) in RISC OS? He was doing some work with AGP accelerators last I knew, thought it could be good for updating graphics and acceleration? 3. As I said, I haven’t read every post in the forum, but is there a discussion regarding Wi-Fi functionality? 4. My penneth with regards to multi-threading/multi-core functionality: I would hate to be stuck in pre-emptive mode when I need a time-critical, real-time execution. I always felt that the co-operative model suited the active power user better, as it focussed on what the user was currently doing, NOT what was happening in the background or what the user was doing 2 minutes ago. 5. Memory – I can’t believe I can now run RISC OS with over 24MB of RAM – It’s nuts running it in 512MB – There’s soooooooooo much free memory and no slow hard disk (read ‘record player’) slowing things down by paging stuff in and out while I’m working. 6. IS RISC OS now 64-bit, or still 32-bit? I don’t think that we’ll need to talk to 4GB+ of RAM just yet (unless Microsoft starts writing RISC OS software at least lol) but it would be cool to not have the limit for future. 7. …Anyway, so many questions still to ask. If I can help you guys with anything… I will. I may not be the world’s best coder, but I will try… I remember why RISC OS was the best OS around – at least for me – it didn’t get in the way, or try to be something it wasn’t. It was modest and ‘English’, which is so refreshing compared to Windows 7’s sensational 4GB+ footprint and 200+ multi-threaded behemoth of a desktop environment (without adding apps.) |
Richard Walker (2090) 431 posts |
1. https://www.riscosopen.org/wiki/documentation/show/RISC%20OS%20Roadmap 2. I’m sure I’ve seen him post here, deffo on StarDot. 3. Simplest option is a £20 Ethernet-to-WiFi adapter. To support WiFi directly, we’d need a whole bunch of drivers, and extra software stack for managing network availability etc. 6. 32-bit. |
h0bby1 (2567) 480 posts |
aaaaa |
Greg (2474) 144 posts |
Just a passing thought. Dont know if this is of any use but has anyone thought of obtaing the firmware to android to try and see if there is anything usefull you can glean to help better understand programming multi-cores. I know Samsung has just released the firmware to the new Galaxy S6 smartphone as they do with all their smartphones |
David Feugey (2125) 2709 posts |
ARM provides all the necessary code to boot a single core OS in AMP mode. They provide too code to use the other cores. Other possibility is to use bare metal code made by community, that can be used to access other cores AFTER boot. |
Theo Markettos (89) 919 posts |
I ought to go back and read all 16 pages of thread from the beginning, but just to point out rump kernels The idea is it’s a very-stripped-down NetBSD encapsulated behind a simple interface (files, memory, timers, threading and concurrency primitives) where you can build a kernel to perform a single function (USB, network, filesystem). It wouldn’t be too hard to imagine replacing the network stack with calls into a rump kernel, and have NetBSD do the rest. Likewise implement USB that way, rather than building the NetBSD code into RISC OS directly. This does rather more expressly raise the scheduler problem: while you could try to layer rump kernels on top of RTSupport I think that would be the wrong way to go. The other question about *BSD is… NetBSD worked on ARMv3 in 1996-8. How much has bitrotted since then? While NetBSD/acorn32 (and NetBSD/acorn26) still nominally exist, what would it take to make them first class platforms again? Or at least rip out the parts that you need – I don’t know what you would do about drivers, but for them to be concurrency-safe you probably want them in the BSD side of the divide. But then the acorn32 drivers probably aren’t actually safe in the first place. For a legacy-free approach I’d look at running on top of Xen on Cortex A7 and A15, but that doesn’t help existing hardware (even RPi 2 have messed this up). Which brings us back to… RISC OS as *BSD process(es)? Some stumbling blocks:
Plenty of things to ponder (and apologies if this stuff has been discussed upthread)… |
Anthony Vaughan Bartram (2454) 458 posts |
I am starting to look at this again. There are 2 things that I am trying: 1) To define a task tracking/allocation system as I believe this work could usefully be progressed by applying a SCRUM-like task/goal tracking method. Whilst further breakdown is required, Jeffrey Lee’s task list mentioned earlier on this thread could be further broken down to smaller estimated tasks and assigned. I have emailed a proposal to Steve Revill on this. 2) I am examining the ARM reference code for multi-core. Specifically for ‘* Implement inter-core communications: Investigate different approaches (hardware FIFOs, shared memory, etc.)”’ |
David Feugey (2125) 2709 posts |
And that’s cool… Some people reported having working multicore code for RISC OS. Bare metal examples seems to work, if memory reserved and cache configured as it should be. What would be great, would be to have an option to launch BBS Basic’s ASM code on another core :) |
Anthony Vaughan Bartram (2454) 458 posts |
Hi David – If there was a threading SWI that could run ASM on a different core, then we could schedule ASM on a different core from BASIC. Lots to do first though… I’ve been in dialogue with Andrew Hodgkinson and Steve Revill. The proposed task tracking system I’m going to evaulate is Trello. I’m looking at writing a detailed specification for code changes to help identify specific incremental tasks to underpin adding multi-core support. I am using Jeffrey Lee’s task list from Feburary 18th 2015 as the main list to identify the specific code to be changed. I plan to write this document in an iterative fashion, so it can be criticised and reviewed. I believe the development tasks could be performed in parallel followed by an integration development task to pull those features together. Trello might be useful in identifying whether all the required elements have been written. Basically I’m looking at project managing the effort to make RISC OS multicore & helping to write a specification. Jeffrey – does this sound ok? If you have any ideas, please reply to this forum with any specific code fragments, tasks or further pointers. |
Jeffrey Lee (213) 6048 posts |
Yep, that sounds good to me. I’ve heard of Trello, but don’t think I’ve ever used it, so I can’t really comment on its suitability. One other task for the todo list would be to add a HAL call to allow the other core(s) to be started, because there’s usually some platform-specific action required. For the Pi 2 – unless they’ve changed it recently (I spied some fixes in our CVS which are related to cores starting when they shouldn’t be) – you need to write to a hardware register (more info in this thread). For OMAP4/5 you also need to write to a hardware register (documented in the TRM). Not sure offhand about iMX6 or AM5732, but I believe all the required documentation is freely available. The HAL call itself should probably be kept as simple as possible – e.g. the only parameter to the call should be the (physical) address of the code to execute. It will then be expected to start all the extra cores (with the MMU disabled) and start executing the code. The OS should then be able to take care of most of the rest of the startup itself (e.g. identifying which core is which, enabling the MMU & caches, and booting the mini-kernel). Obviously we wouldn’t need that call right away, but it would be one of the things we’d need to sort out before we can start supporting multi-core on multiple platforms. |
David Feugey (2125) 2709 posts |
That would open many possibilities.
That would be perfect. Today, 3 things are really needed (for me): All are coming. So I’ll be very happy, sooner or later. |
George T. Greenfield (154) 749 posts |
This is (almost certainly) a really stupid question, but if RISC OS /were/ able to run in multi-core mode, would that mean (1) that any given app would run faster or (2) that several apps would be able to run simultaneously and, effectively therefore, complete tasks sooner, or (3) both of the above. I’ve just acquired a Pi 2 with 4 cores; I appreciate that RISC OS is only accessing one of these (always the same one?), and I’m just wondering how much things would speed up if all 4 were in use. Would (for example) a 4-core chip running at 1GHz do the same amount of processing as a single-core chip running at 4GHz? |
Chris Evans (457) 1614 posts |
It all depends… For real efficiency of multiple cores you need ‘parallel processing’ which require a special type of operating system and programs. See here When (I’m an optimist) RISC OS has some support for multi cores, I suspect it probably wouldn’t be able to use more than about 50% of total processing power of say four cores e.g. one core 100% and the others 33% usage each. So possibly 2GHz in your example. |
Pages: 1 ... 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26