Thinking ahead: Supporting multicore CPUs
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... 26
David Feugey (2125) 2709 posts |
Niall just validate my approach. So it could work (the same way it does with Wimp2). But with some workarounds. Simplest way is to rely on its code, and so to keep the module approach (because the code is under the GPL licence). But of course with HAL_Timer, since it’s now available. Ah, hum, compilation of the module needs Bax… |
Steve Pampling (1551) 8172 posts |
Stepping round the use of a BASIC extension that replicates elements of the ASM/ObAsm macro functionality is probly not a major item. |
Rick Murray (539) 13850 posts |
When I last looked at Wimp2, I was wondering how best to implement restoring state. Wimp2 operates outside of the API so it must do its thing and leave stuff exactly as expected. It’s a little more involved than a normal conversion of MOVS → MOV, etc. A rough outline of handling state if we are called (ie from a CallAfter/CallEvery1) from a USR mode task to the pre-emption routine:
We can’t push SPSR to CPSR before the unstack or we’ll be looking at R13[_USR] instead of R13_SVC for the saved registers; thus we must load return address to R14, update PSR, and then do the return. Well, that’s my understanding of the situation. There’s also a wodge of code that looks like it is hardwired with assumptions of content at certain absolute addresses (the OS_File chunk-loader parts; though perhaps given the speed of modern machines, we could just disable this part?). I have had a poke through Wimp2, but not enough to build a 32 bit version (yet). What has put me off so far is that it is fairly useless on its own2. I’d need to convert the helper too in order to get this working with existing applications. 1 Does Wimp2 use these? I don’t remember. 2 That’s not to say it is in itself useless; just existing apps won’t use it without the persuasion. ☺ |
Steve Pampling (1551) 8172 posts |
I think other people have been there before MW’s armc and 32bitlib, macros in generic32 in the DDE, etc1
Persuasion = WimpPatch —> source of WimpPatch won’t compile without support from Wimp2 and shows 94 instances of non-32bit (possibly mostly our friendly neighbourhood movs – I haven’t looked) Looks like fun.2 1 Because my learning pattern requires me to try and do something for the info to actually make into the grey cells I’ve spent a while looking at !PC and the 26bit macros used in it and how MSR/MRS need introducing to clean things up. I may never get it to work, but I will learn along the way. 2 Fun :) |
David Feugey (2125) 2709 posts |
You know it’s chinese for me? :) |
Steve Pampling (1551) 8172 posts |
BASIC, Assembler, macros, or my naff typing? |
David Feugey (2125) 2709 posts |
The whole thing :) |
David Feugey (2125) 2709 posts |
Just some thoughts on preemptive multitasking: Why not just something between the two? > An interrupt that stops the current task after some delay, and nothing else. The delay: you choose. vsync delay, for a graphic workstation, very small delay for a RT system, very high delay for intensive CPU systems, no delay for classic cooperative multitasking. Of course, you can change it ‘on the fly’. When you switch from one task to one another, you check if the next task allows (or, better, does not disallow) preemption, then you set up a timer. Time for each task could be time_of_cycle/total_numer_of_wimp_tasks. With the possibility to add slots of time for intensive tasks. If a task give back the hand to the system earlier than expected, very cool. The complete cycle will be shortened. If some system tasks (interrupts, etc.) takes too much time, some applications would need to wait for next cycle (to protect vsync delay on systems where the time of the complete cycle is the delay between two (or three for dualbuffering configurations) graphic frames. We would loose only small time (one slot max.) at the end of the cycle, or Benefits/problems: |
David Feugey (2125) 2709 posts |
Another thing: we have preemetive multitasking inside taskwindows. It would be great to have it from CLI too, so tools as webjames could be adapted to work in CLI mode. Not something complex, but with a module to manage tasks, as monotask CLI sessions. |
Rick Murray (539) 13850 posts |
It exists. Look at the Wimp2 module by Niall Douglas (not 32bit). Pay attention to the source code. You can’t just give and take time slots from a task as the Wimp expects to be able to provide application feedback via the Wimp_Poll[Idle] mechanism, and the task expects that when a poll returns, the Wimp has something for us. There are, of course, a number of issues involved here. An application knows that it has sole control of the high level parts of the system between calls to Wimp_Poll[Idle]. Thus, it is fairly free to do all sorts of things provided that it restores the expected state afterwards. VFP? Sprite redirection? Messing with environment? File activities? If you kick out an app in the middle of doing something, in most cases it will be okay, but not always. And, well, the one app I remember taking forever on my A5000 and RiscPC is the one app we absolutely cannot touch. As for the planned ideas – Wimp2 already did a lot of that.
It would be nice if there was a way to tell the Wimp to ‘favour’ a specific application. When !UnTarBZ2 is running, it does so in a taskwindow. It would be great if I could say “concentrate on this”. Might make it take only half an eternity to unpack the sources. ;-) Otherwise – I don’t really see a problem here that can’t be resolved by application writers making better use of Wimp_PollIdle and the option to mask out null polls. Together, these can instruct the Wimp not to bother even switching to a task at all until there is something for it to do. Alternatively, if you need null polls (say, you check the state of something in the background that isn’t available as a message or poll event), you could PollIdle with a 25cs timeout. Four checks a second should be responsive enough, no?
How does one “force” a task to have done something in an arbitrary amount of time? Refresh rates vary, and can change depending on display mode; on some hardware it is entirely bogus. For example, my DisplayManager thinks my refresh should be 67Hz. My monitor says it is 75Hz. The last time I measured VSync, it appeared to be running at 50Hz; on the Pi there is a canyon-sized disconnect between video as seen by RISC OS and video as implemented in hardware. Search the forums, you’ll see. [try from here: https://www.riscosopen.org/forum/forums/5/topics/2207?page=2#posts-27657 ] While there is no doubt that the Wimp2 module was a great proof of concept, the complaint that I have with it is that not only is it tied extremely closely to old systems (IOC timer, for instance) but at the time Niall had “issues” with anything newer than RISC OS 2. This means that porting to newer systems may be ‘interesting’, and you will find delights such as this:
That’s actually painful to read. :-)
Err… A TaskWindow is a multitasking CLI. |
David Feugey (2125) 2709 posts |
Yep, I know.
Yes, but the idea is to keep cooperation AND to force applications to do what people forget them to do.
By giving it only a specific time to run :) (of course with a way for the developer to know how much time is available before next interuption). A sort of time constraint cooperative multitasking (ok, not very english).
I mean, to make it available without wimp. |
David Feugey (2125) 2709 posts |
nota: if all tasks are done before the end of one cycle, then you coudl choose to 1/ start a new cycle 2/ wait for the next one (and so, you’ll have a big slot to sleep). slots could also been ‘sleep’ ones, to save battery. |
David Feugey (2125) 2709 posts |
I’m not on OS things, but technically, it’s now more simple do do that than with wimp2, since we have a generic microtimer that generate interrupts. The only problem is to interrupt the tasks only when non system code is executed (ie wimp calls). I’m not sure if it’s easy to implemente a timer that will act after a certain amount of time AND only then pure application code is executed. IMHO, that’s the main difficulty. We can assume that all the system code is, or will be, optimize to be cooperative and non blocking. The problem is to find a solution for people that don’t make good application code. Translation: a solution for not very cooperative applications :) |
Steve Pampling (1551) 8172 posts |
Can you? Really? |
David Feugey (2125) 2709 posts |
I can’t, but we can assume that system code should be (or will be) reliable, no? The other option is to put everything under the governance of the scheduler and the framework(s). But we probably don’t talk of RISC OS anymore. IMHO RISC OS is more a set of services than a set of rules. But I’m really not on system things :) |
Steve Pampling (1551) 8172 posts |
That was the bit I was talking about. Assuming. |
Rick Murray (539) 13850 posts |
Awww <beeeeep!> You tell me this after I have wasted my life searching for the code pixies at the bottom of my monitor?!?!?!
On a more serious note – one might like to introduce David to the idea of calling the filesystem in a timed CallAfter. Once he gets his head around the “FileCore in use” and why it is happening, it might help knock out some of those assumptions. There are some dusty unkempt dark alleyways in the kernel where even the pimps and pushers fear to tread. |
David Feugey (2125) 2709 posts |
Yep, but not everything can be put under scheduler and memory protection. A question of perimeter… and performances too. And of course the big question: what is doable under RISC OS? Preemptive multitasking just for applications and memory protection just to keep the system safe would be much better than what we have today, no? I’m not against big plans, if they have chance to become a reality. But to be honest, traditional multithreading and preemptive multitasking is perhaps not the right solution for RISC OS. Loss of performance can be very important and power management is really a problem (see all the efforts and tweaks in the Linux kernel). To extend cooperative multitasking (as in Wimp2) could help old application running on the new scheduler… or not :). That was just my thoughts on the subject. |
David Feugey (2125) 2709 posts |
That’s why I suggest not to interrupt the processor when running system code. A sort of BreakAfter xx cs, but not for the system, modules, etc. Just for applications. Like Wimp2 in fact, but in a cleaner way since we now have access to source code, and with more options (for example the cycle=vsync possibility). Is there a way to know what the processor did before an interupt? The idea is to put a timer (easily doable now, since it’s inside the new 32bit HAL), and to force a Wimp_Poll[Idle], only if application code was running (not an easy part, since processor mode will be different from user mode). In fact, my suggestion is to interrupt an application, but give back the control to another one (Call_Wimp_Poll_After :) ). |
Rick Murray (539) 13850 posts |
Not a lot, for we have two main problems.
Personally, I’m not so concerned about application code. The one that frightens me is that system resources (modules) operate in the same level as the kernel and important things like the file system. If I had silly amounts of money to spaff on pet projects, I’d want the RMA to operate in SYS mode, with SVC reserved for core OS functions. The entire RMA would be read-only. Dynamic Areas would be read only except to whatever created them. And so on.
That’s a pretty big “if”. When I was younger, I had planet-sized plans. I now have much more realistic ambitions.
Well, my Android phone stiffed up last week to the extent that I had to pop out the battery to recover it (makes me wonder what happens when everybody hard wires their batteries inside). Everything can crash…
Isn’t the typical solution to a loss of power to get a faster processor? Look at every Windows platform ever. On a less facetious note, do you think RISC OS is lacking performance? For some reason file operations are very slow, but otherwise it is remarkably quick considering what the hardware is. My Android phone can have some really noticable pauses, and while a sexy-awesome graphical interface with animated backdrops is way more complicated than RISC OS, it is running on a dual-core processor clocking twice what the Pi does. Surely it can be more fluid than that. If I want to take a photo, I don’t want to wait fifteen seconds for the camera app to start, and another ten seconds before it is capable of responding to input!
Is this the Linux kernel in general? The one that has to take into account dozens of different architectures all with their own quirks, and a massive acceptance on battery powered devices?
Why not just use the centisecond tick? Think of it as a 100Hz refresh. ;-)
I guess you could work out what registers/PC are stored? Go to SVC mode, jump to IRQ mode, read R13 and R14, return to SVC mode, then poke around what those point to?
Actually, that is “doable”. What I do in a program that wants to be able to do something on a regular tick without reentrancy problems is as follows:
The first thing to note is that CallEvery/CallAfter run off the system tick so will probably be entered in IRQ mode, and if not, in SVC mode but with the system “busy”. Essentially this code is barging in. Why don’t you have a crack at porting the Wimp2 module? See if the preemption idea stands up? At a rough look, I think it would involve the following:
|
David Feugey (2125) 2709 posts |
I agree
Same here
Exactly the same here :)
No, but we should preserve that. Multithreading and massive preemptive multitasking + memory protection have big impacts on performance.
Why not. My idea was to tell: choose. centisecond for power, less for pure reactivity, same as sync for games or graphics, etc.
Yep
Two reasons: 1/ it’s GPL code. 2/ most of the code are tweaks for timer, ROS2 applications and the fact that source code of the OS cannot be modified. And of course, main reason is that I coded only user apps in ANSI C, so it’ll take a lot of time for me to make something in ASM in the system space. |
David Feugey (2125) 2709 posts |
BTW, for your memory protection thoughts, it seems to be doable too. Just a few big zones, with change of context (writable ou not). It’s simpler than to define new zones for each context change (zone specific to the running application or module). |
Jeffrey Lee (213) 6048 posts |
Thinking about things recently, I wonder if there would be any merit to splitting the kernel into multiple modules. Make the core kernel be much more like a microkernel which handles the bare minimum needed to start the rest of the OS (interrupts, memory, module chain, vectors, SWIs, etc.) and move all the rest (CLI, system variables, keyboard/mouse buffers, CMOS, VDU, etc.) into one or more external modules. Originally I was thinking of using this as a convenient way of solving the keyboard scan/CMOS reset and CMOS storage problems that modern machines are facing. By getting rid of any CMOS-reliant code from the core kernel it would allow us to separate the modules into two main groups. First the modules that don’t use CMOS will be initialised, then the keyboard scan + CMOS reset can be performed, and then the OS can go on to initialise the rest of the modules. However I’ve realised that this approach would also be useful for if/when we start making RISC OS fully multithreaded. If we go with the approach of adding a flag to the module header to indicate whether the module is thread-safe, we can start off by concentrating on making the core microkernel thread safe (and adding the thread management calls), and then simply mark the ancillary kernel modules as thread-unsafe. Then those modules will automatically fall back to using the global mutex to enforce single-threaded execution, allowing us to get a multi-threaded RISC OS up and running quicker than if we had to wait until the entire kernel was thread-safe (or if we had to add some nasty hacks to manually claim and release the global mutex on entry/exit to the unsafe areas) |
rob andrews (112) 200 posts |
This is a great idea can’t wait to see the results should make the OS more stable too |
David Feugey (2125) 2709 posts |
That’s a very good idea. It’ll be easier too to upgrade or change components in the kernel (with modules and not vectors). |
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... 26