Utilite – a nice Cortex-A9 box
Pages: 1 2
David Feugey (2125) 2709 posts |
To be precise, my suggestion is not for an AMP system, and also not for a SMP system, but more for an offloading approach. The kernel manages all the tasks. Only the core0 is cooperative without memory protection. The others implement a thin layer for memory and task management + mechanisms to communicate with the main OS. A SMP system behind a non SMP one, in a way. I know, it’s a bit stupid :) |
Trevor Johnson (329) 1645 posts |
The Roadmap lists “Multi-CPU support and preemptive multitasking (a simple job, clearly)”. Are light threads a logical sidestep along the road? If potential new developers attend the 2013 Portsmouth Show then the potential for multi-CPU support might be a(n additional) worthwhile topic of conversation. |
Theo Markettos (89) 919 posts |
Been sitting in multicore/multithread talks this morning and been having a think… My first thought was: we have dynamic areas as a convenient sandbox. So fill a DA with data and code, start off the code, and keep other cores away from it except via a defined communication mechanism. I’m assuming code written in C or other HLLs. The C library and any libraries that use I/O or communicate with other parts of the system would be replaced with libraries that are thread-aware and communicate with the non-threaded OS. That would mean vanilla C could be run just be relinking with different libraries (as long as it was position-independent). It’s borne out of the observation that many CPU-intensive processes aren’t very I/O intensive, so the I/O can be dealt with in a single-core way. Then I thought… the existing application process behaviour provides some degree of sandboxing. So keep (single-tasking) application code. Any time it calls a SWI, pass that SWI over to the system core. To start with, all SWIs are marked as non-thread-safe. Over time, SWIs can acquire a thread-safe marker. Any SWI which returns addresses not in client core address space or modifies OS state is not thread safe. If you’re executing a non-thread-safe API, the child core is paused until the operation is complete, which may be over more than one SWI call*. There’s also a need for a simple scheduler which marks whether something is in multicore or single core mode, and schedules around the multicore processes (may be more processes than cores). Anything which is in single core mode gets executed by the system core as normal. Anything WIMP based would probably be strongly single core due to all the communication, but essentially this is a multicore version of TaskWindow which abstracts more than just the terminal output.
|
David Feugey (2125) 2709 posts |
“a simple job clearly”: I think it’s ironical :) Light threads are not a waste of time. It permits to use the other cores, and software adapted for this technique will be ready for a true SMP kernel. Theo, yep, your proposition is clearly a way to make it. A simple solution + progressive migration + we’ll see later. The problem is to handle all those low level things: fire up the cores, give them a memory space and set up a small kernel/code that will manage the tasks on each core, and the communication with the main core. People from Simtec, and also from Aleph1 made things very closed to this one, and have perhaps some talents to do it again. Else I could see that, but 1/ I have no Pandaboard (not yet) and 2/ even if I develop in C and sometimes in ASM, I’m not a hardware guy. I’m not very familiar with all these APIC things, and don’t want to borrow GPL code. Anyway, first step is to detect cores, fire them up inside a specific memory space (that could be the same as a DA) and have a simple monitor to execute code and exchange messages with the main kernel. A complete scheduler is another (interesting) step. A SWI bridge is one another. Should we propose a bounty, or does some people have the knowledge that could help me, us, whoever? |
Rick Murray (539) 13840 posts |
Talking out of my backside… …I do not think physical processor allocation will be too difficult. RISC OS already manages what happens on one processor, adding the concept of “cores” (from 1 to n) is only an expansion of this. The problem that we will suffer is two-fold, an in a way related. The first thing is that no two cores should ever end up looking at the same thing, specifically if instructions or data are being cached. It ought to be “doable” if the MMU can permit regions of memory to be associated with one core (or even just marked as uncacheable?), and if so then this too may simply be an extension to the memory fiddling that RISC OS performs for task switching? Remember – the challenge is to get multiple cores working in an OS that historically is tied to a rather linear way of working. #insert <everything Theo said> |
Theo Markettos (89) 919 posts |
Sometimes. Sometimes it just spins until data arrives – it won’t switch to another task while waiting for a blocking call.
This is potentially an opportunity though. If process A is blocked on I/O (eg waiting for an interrupt to say the DMA has happened), that means the system core is free to execute code in process B that doesn’t interfere with the I/O being waited for. When the DMA interrupt happens it can dump what it’s doing and drop back into the OS, and a non-system core can pick up process B at a later date. How is the MMU organised in multicore ARMs? Are entries tagged with a core ID so that each core can get a different view of memory? If so, it might be possible for the non-system cores to have everything that’s not their process to cause an exception. So anytime the program calls a SWI, or tries to access some memory outside its address space, it gets handed over to the system core to do the access. That would be really slow for arbitrary accesses, but might suffice for people fixing programs to not address random parts of the RMA (for example). This model is leaning towards splitting up programs into a UI and a backend, the backend being multicore-safe and the UI not. Which has its disadvantages. But maybe that enforced separation will encourage thought about how to distribute the work (which is a key question in any distributed system). |
Jeffrey Lee (213) 6048 posts |
Ah, you didn’t mention that you were thinking of it as a short-term goal. Yes, for a short-term goal to act as a stepping stone to greater things, it makes perfect sense. In fact it’s probably the only option we have – when making the OS multi-thread/multi-core aware we’ve got to start with something, and basic kernel functionality is probably the best place to start.
I think that’s basically the approach we settled on last time we had this discussion (disclaimer: that may not actually be the last time we had this discussion. This discussion inevitably pops up in some form or another every few months). It’s certainly one of the most sensible ones from my perspective, as many modules are nice self-contained chunks of code which can easily be updated one at a time to be thread-safe.
The only knowledge I have is that I don’t have enough knowledge to implement a (good) microkernel on which the OS would run. So finding an open-source one would be nice, but there are quite a few factors to take into account unless we want to start ditching support for old machines. And the choices made may even affect whether third-party software would be able to run on any machines that don’t support the new threading APIs.
Each core has its own pointer to the level one page table. So depending on what the OS wants do, each core could be given completely separate views of memory or they could be made completely identical.
Having a dedicated “UI” thread is the approach that operating systems seem to be heading towards, with the aim of keeping the UI responsive to the user. E.g. on Android you can’t do network access from the main (i.e. UI) thread, and aren’t meant to do file access either. On any of the current generation of games consoles you need to keep your game rendering at a fairly consistent framerate so that the OS can display its UI over the top when necessary (achievement notifications, error dialogs, etc.). I think Windows 8 “Metro” apps have quite a few restrictions as well, although I can’t remember exactly what they are. But basically software development has now reached a level of maturity where it looks incredibly amateurish to write a piece of software which becomes unresponsive to the user (or holds up the system entirely) during some operations. But the question of whether to enforce such a system would be the right idea for RISC OS is another matter! |
David Feugey (2125) 2709 posts |
“Remember – the challenge is to get multiple cores working in an OS that historically is tied to a rather linear way of working.” Yep, so independent ‘light’ threads could be a solution. The main part of an application just launches them and let them live. “Are entries tagged with a core ID so that each core can get a different view of memory?” You can set up the cores inside their own MMU memory spaces. Interrupts can also be dedicated to a specific core. So when one answers to an interrupt, the others can continue to work normally (if I did understand correctly). In a pure SMP system, it can be more complex, since the interrupt can possibly lock the entire system. “This model is leaning towards splitting up programs into a UI and a backend, the backend being multicore-safe and the UI not.” You get it! On one core, the wimp, some I/O and the old applications. On the others, the main parts of the applications with preemptive multithreading and memory protection. “In fact it’s probably the only option we have – when making the OS multi-thread/multi-core aware we’ve got to start with something, and basic kernel functionality is probably the best place to start.” Can’t agree more. For more information, there is this : http://www.arm.com/products/processors/cortex-a/arm-mpcore-sample-code.php Best solution could be to set up a SMP system, with one core dedicated to RISC OS (that will manage I/O, sound, graphics, etc. as usual. All of these with it’s own cooperative scheduler) and the others dedicated to all the threads, but with limited I/O (just memory access). A bit like OS X :) Of course, if it’s possible to mix SMP and AMP on the same processor. Nota : if the SMP core manages RAM, it could also be used to create protected DA. Or protected application memory, for software compatible. The only big problem is to know if RISC OS is launched on core 0 inside a MMU specific memory space and with all the I/O, will it continue to work as usual or not? The only solution to get the answer is to modify the boostrap of RISC OS to put the multicore initialization procedure (as provided by ARM), and then to load RISC OS on core 0 and see what happens :) ARM promises that it will work “as is”, and even that you can launch different copies of RISC OS on each core. “Core 1 AMP + Other Cores SMP” should also be possible. Here, no Panda yet, so I’ll need to wait. By chance I have other RISC OS software to finish :) |
Rick Murray (539) 13840 posts |
That’s ’cos people these days have zero patience. <crusty old fart mode>In my day (When I Were A Lad, etc) it was commonplace to load stuff from a tape; modems (if you had them) ran at a mere 300 baud. That’s about 25 bytes per second (depending on word layout). To put that into context, you could download a RISC OS ROM image archive in 92274 seconds, or a little over a DAY. Add some time for the overheads (well, four bytes) plus the ACK/NAK response time of XMODEM; and note that it doesn’t auto-resume. And more time as our “uncorrected” data stream will probably fire a few NAKs along the way. Thankfully it is a series of 128 byte packets, so overheads there shouldn’t be too tedious. Unless, of course, line noise caused your ACK reply to be received as a CAN…</> Sorry… lost in the mists of reminiscence… |
WPB (1391) 352 posts |
I think this post by Ben Avison is a very sensible and useful approach to moving forward towards this goal. (I know Jeffrey’s already linked to that whole thread, but this specific post seems particularly relevant in the short term.)
I think the “Making Applications Have PMT Behaviour” section at the bottom of this page of riscos.info makes a similar point and shows the possibility of a practical implementation right now.
From my own perspective, I don’t mind if I have to wait (a little) while for an operation to complete, as long as I’m informed by the app/OS that it’s working in the background. What irks me is when things just appear to lock up and you’re left wondering if it’s a crash or just a mad swapping of masses of memory or the like. Conversely, the modern approach of making UIs super responsive sometimes backfires because the UI updates too quickly to represent a state that the backend hasn’t quite caught up with yet. If you’re working fast, that can prompt you to move on to the next step in whatever you’re doing, only to find that the system isn’t acutally ready. (Attempting to explain a phenomenon in very general terms, and not making a very good job of it. Hopefully some of you know what I mean!) |
nemo (145) 2546 posts |
There is a lot to be said for a ‘web workers’ type of light parallelism, both in terms of making the interface simple for the programmer, and the implementation realistically practical for this tiny team. A fully multithreaded RISC OS is much harder work (and I predict will never happen). As I said last time:
|
Dave Higton (1515) 3525 posts |
As I have recently discovered (but my colleagues knew all along), all you have to do is launch a long series of SQL operations on Windows, and the computer becomes unusable. Worse, if you do that on a VM, all the VMs on the same host become unusable. So we shouldn’t beat ourselves up about it too much. |
nemo (145) 2546 posts |
The Utilite can now be ordered. |
André Timmermans (100) 655 posts |
There is a review of the machine on Phoronic |
Trevor Johnson (329) 1645 posts |
Some technical info about the Utilite is available (probably mentioned previously):
There seems to be a little more publicly available info for the Qseven conga-QMX6 COM (module-store.com, uk.mouser.com, avnet-embedded.eu), including revisions 0.1 and 0.2 of the User’s Guide:
However, there’s also the Hardware Development Guide for i.MX 6Quad, 6Dual, DualLite, 6Solo Families of Applications Processors (via the i.MX Community and search). Is any of that any good to someone with the time, skills and inclination? |
Trevor Johnson (329) 1645 posts |
Indeed! |
Jeffrey Lee (213) 6048 posts |
“a simple job clearly”: I think it’s ironical :) If only I had all the time in the world! |
Trevor Johnson (329) 1645 posts |
And how do we know you’re not considering funding some unpaid leave by temporarily suspending the RO work in order to win $10,000 for the Quake III VideoCore IV competition? ;-) |
Jeffrey Lee (213) 6048 posts |
Because me suspending RO work would be madness! ;) |
Trevor Johnson (329) 1645 posts |
madness! <> !Madness There was some of the latter at the SW show, courtesy of ROUGOL’s Bryan Hogan! |
Steffen Huber (91) 1953 posts |
@Jeffrey, obviously you are not suspending RO work to win the money, because you do the Quake III port to RO at the same time :-) |
Pages: 1 2