Thinking ahead: Supporting multicore CPUs

636 posts, 79 voices

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... 26

Mar 16, 2014 10:53pm David Feugey (2125) 2709 posts	Niall just validate my approach. So it could work (the same way it does with Wimp2). But with some workarounds. Simplest way is to rely on its code, and so to keep the module approach (because the code is under the GPL licence). But of course with HAL_Timer, since it’s now available. Ah, hum, compilation of the module needs Bax…

Mar 17, 2014 7:41pm Steve Pampling (1551) 8172 posts	Ah, hum, compilation of the module needs Bax… Stepping round the use of a BASIC extension that replicates elements of the ASM/ObAsm macro functionality is probly not a major item. I think the amount of non-32 bit code is probably more of an issue.

Mar 17, 2014 8:54pm Rick Murray (539) 13850 posts	When I last looked at Wimp2, I was wondering how best to implement restoring state. Wimp2 operates outside of the API so it must do its thing and leave stuff exactly as expected. It’s a little more involved than a normal conversion of MOVS → MOV, etc. A rough outline of handling state if we are called (ie from a CallAfter/CallEvery¹) from a USR mode task to the pre-emption routine: Stack all important registers Copy SPSR (USR mode CPSR) to a register Stack that … blah blah – stuff happens … Unstack saved PSR back to its register Push that to SPSR Unstack all important registers – change LDMFD …,PC} to unstack to R14. Copy SPSR to CPSR ← this updates processor mode MOV PC, R14 We can’t push SPSR to CPSR before the unstack or we’ll be looking at R13[_USR] instead of R13_SVC for the saved registers; thus we must load return address to R14, update PSR, and then do the return. Well, that’s my understanding of the situation. There’s also a wodge of code that looks like it is hardwired with assumptions of content at certain absolute addresses (the OS_File chunk-loader parts; though perhaps given the speed of modern machines, we could just disable this part?). I have had a poke through Wimp2, but not enough to build a 32 bit version (yet). What has put me off so far is that it is fairly useless on its own². I’d need to convert the helper too in order to get this working with existing applications. ¹ Does Wimp2 use these? I don’t remember. ² That’s not to say it is in itself useless; just existing apps won’t use it without the persuasion. ☺

Mar 17, 2014 9:42pm Steve Pampling (1551) 8172 posts	A rough outline of handling state if we are called (ie from a CallAfter/CallEvery1) from a USR mode task to the pre-emption routine: I think other people have been there before MW’s armc and 32bitlib, macros in generic32 in the DDE, etc¹ That’s not to say it is in itself useless; just existing apps won’t use it without the persuasion. ☺ Persuasion = WimpPatch —> source of WimpPatch won’t compile without support from Wimp2 and shows 94 instances of non-32bit (possibly mostly our friendly neighbourhood movs – I haven’t looked) Looks like fun.² ¹ Because my learning pattern requires me to try and do something for the info to actually make into the grey cells I’ve spent a while looking at !PC and the 26bit macros used in it and how MSR/MRS need introducing to clean things up. I may never get it to work, but I will learn along the way. ² Fun :)

Mar 17, 2014 10:37pm David Feugey (2125) 2709 posts	Stepping round the use of a BASIC extension that replicates elements of the ASM/ObAsm macro functionality is probly not a major item. You know it’s chinese for me? :)

Mar 17, 2014 11:02pm Steve Pampling (1551) 8172 posts	You know it’s chinese for me? :) BASIC, Assembler, macros, or my naff typing?

Mar 18, 2014 10:54pm David Feugey (2125) 2709 posts	The whole thing :)

Jul 6, 2014 10:06am David Feugey (2125) 2709 posts	Just some thoughts on preemptive multitasking: Why not just something between the two? > An interrupt that stops the current task after some delay, and nothing else. The delay: you choose. vsync delay, for a graphic workstation, very small delay for a RT system, very high delay for intensive CPU systems, no delay for classic cooperative multitasking. Of course, you can change it ‘on the fly’. When you switch from one task to one another, you check if the next task allows (or, better, does not disallow) preemption, then you set up a timer. Time for each task could be time_of_cycle/total_numer_of_wimp_tasks. With the possibility to add slots of time for intensive tasks. If a task give back the hand to the system earlier than expected, very cool. The complete cycle will be shortened. If some system tasks (interrupts, etc.) takes too much time, some applications would need to wait for next cycle (to protect vsync delay on systems where the time of the complete cycle is the delay between two (or three for dualbuffering configurations) graphic frames. We would loose only small time (one slot max.) at the end of the cycle, or ~~sometime~~ a vsync limit, or nothing, if you choose not to stick to the original cycle limit. Of course this option could be switch on and off ‘on the fly’ (a game could force it, for example with a run_sync command in the obey file, that will set the duration of the cycle to vsync and force the tasks not to run when vsync is done). Benefits/problems: 1/ Applications will not be called more often depending on priority, but could ask for more time than the others: it’s better for power management. If you really wan’t a RT system, the time of the complete cycle needs to be reduced. 2/ With possibility to force tasks to be completed before a screen sync, no vsync problem anymore. Potentially problematic with multiscreen configuration. 3/ A task that hangs when interrupted could contine to work in cooperative mode.

Jul 6, 2014 10:11am David Feugey (2125) 2709 posts	Another thing: we have preemetive multitasking inside taskwindows. It would be great to have it from CLI too, so tools as webjames could be adapted to work in CLI mode. Not something complex, but with a module to manage tasks, as monotask CLI sessions.

Jul 6, 2014 2:28pm Rick Murray (539) 13850 posts	Why not just something between the two? > An interrupt that stops the current task after some delay, and nothing else. It exists. Look at the Wimp2 module by Niall Douglas (not 32bit). Pay attention to the source code. You can’t just give and take time slots from a task as the Wimp expects to be able to provide application feedback via the Wimp_Poll[Idle] mechanism, and the task expects that when a poll returns, the Wimp has something for us. Wimp2 needed to ‘poll’ on behalf of the task, and fudge things so that it could manage its own list of events for the application to act as a go-between for the Wimp and the application. There are, of course, a number of issues involved here. An application knows that it has sole control of the high level parts of the system between calls to Wimp_Poll[Idle]. Thus, it is fairly free to do all sorts of things provided that it restores the expected state afterwards. VFP? Sprite redirection? Messing with environment? File activities? If you kick out an app in the middle of doing something, in most cases it will be okay, but not always. And, well, the one app I remember taking forever on my A5000 and RiscPC is the one app we absolutely cannot touch. !Printers. As for the planned ideas – Wimp2 already did a lot of that. Applications will not be called more often depending on priority, but could ask for more time than the others: it’s better for power management. If you really wan’t a RT system, the time of the complete cycle needs to be reduced. It would be nice if there was a way to tell the Wimp to ‘favour’ a specific application. When !UnTarBZ2 is running, it does so in a taskwindow. It would be great if I could say “concentrate on this”. Might make it take only half an eternity to unpack the sources. ;-) Otherwise – I don’t really see a problem here that can’t be resolved by application writers making better use of Wimp_PollIdle and the option to mask out null polls. Together, these can instruct the Wimp not to bother even switching to a task at all until there is something for it to do. Alternatively, if you need null polls (say, you check the state of something in the background that isn’t available as a message or poll event), you could PollIdle with a 25cs timeout. Four checks a second should be responsive enough, no? 2/ With possibility to force tasks to be completed before a screen sync, no vsync problem anymore. Potentially problematic with multiscreen configuration. How does one “force” a task to have done something in an arbitrary amount of time? Refresh rates vary, and can change depending on display mode; on some hardware it is entirely bogus. For example, my DisplayManager thinks my refresh should be 67Hz. My monitor says it is 75Hz. The last time I measured VSync, it appeared to be running at 50Hz; on the Pi there is a canyon-sized disconnect between video as seen by RISC OS and video as implemented in hardware. Search the forums, you’ll see. [try from here: https://www.riscosopen.org/forum/forums/5/topics/2207?page=2#posts-27657 ] While there is no doubt that the Wimp2 module was a great proof of concept, the complaint that I have with it is that not only is it tied extremely closely to old systems (IOC timer, for instance) but at the time Niall had “issues” with anything newer than RISC OS 2. This means that porting to newer systems may be ‘interesting’, and you will find delights such as this: `\ Apologies to RO3 owners, but the message list was always \ pointless. MOV R0,#200 LDR R1,finalisetask SWI "XWimp_Initialise"` That’s actually painful to read. :-) It would be great to have it from CLI too, Err… A TaskWindow is a multitasking CLI.

Jul 6, 2014 3:36pm David Feugey (2125) 2709 posts	As for the planned ideas – Wimp2 already did a lot of that. Yep, I know. I don’t really see a problem here that can’t be resolved by application writers making better use of Wimp_PollIdle Yes, but the idea is to keep cooperation AND to force applications to do what people forget them to do. How does one “force” a task to have done something in an arbitrary amount of time? By giving it only a specific time to run :) (of course with a way for the developer to know how much time is available before next interuption). A sort of time constraint cooperative multitasking (ok, not very english). It would be great to have it from CLI too I mean, to make it available without wimp. (for server environments)

Jul 6, 2014 3:38pm David Feugey (2125) 2709 posts	nota: if all tasks are done before the end of one cycle, then you coudl choose to 1/ start a new cycle 2/ wait for the next one (and so, you’ll have a big slot to sleep). slots could also been ‘sleep’ ones, to save battery.

Jul 6, 2014 3:43pm David Feugey (2125) 2709 posts	I’m not on OS things, but technically, it’s now more simple do do that than with wimp2, since we have a generic microtimer that generate interrupts. The only problem is to interrupt the tasks only when non system code is executed (ie wimp calls). I’m not sure if it’s easy to implemente a timer that will act after a certain amount of time AND only then pure application code is executed. IMHO, that’s the main difficulty. We can assume that all the system code is, or will be, optimize to be cooperative and non blocking. The problem is to find a solution for people that don’t make good application code. Translation: a solution for not very cooperative applications :)

Jul 6, 2014 4:13pm Steve Pampling (1551) 8172 posts	We can assume that all the system code is, or will be, optimize to be cooperative and non blocking. Can you? Really?

Jul 6, 2014 7:56pm David Feugey (2125) 2709 posts	I can’t, but we can assume that system code should be (or will be) reliable, no? The other option is to put everything under the governance of the scheduler and the framework(s). But we probably don’t talk of RISC OS anymore. IMHO RISC OS is more a set of services than a set of rules. But I’m really not on system things :)

Jul 6, 2014 8:28pm Steve Pampling (1551) 8172 posts	I can’t, but we can assume That was the bit I was talking about. Assuming. There are no code pixies.

Jul 6, 2014 9:21pm Rick Murray (539) 13850 posts	There are no code pixies. Awww <beeeeep!> You tell me this after I have wasted my life searching for the code pixies at the bottom of my monitor?!?!?! On a more serious note – one might like to introduce David to the idea of calling the filesystem in a timed CallAfter. Once he gets his head around the “FileCore in use” and why it is happening, it might help knock out some of those assumptions. There are some dusty unkempt dark alleyways in the kernel where even the pimps and pushers fear to tread. [an appropriate soundtrack]

Jul 6, 2014 9:29pm David Feugey (2125) 2709 posts	Yep, but not everything can be put under scheduler and memory protection. A question of perimeter… and performances too. And of course the big question: what is doable under RISC OS? Preemptive multitasking just for applications and memory protection just to keep the system safe would be much better than what we have today, no? I’m not against big plans, if they have chance to become a reality. But to be honest, traditional multithreading and preemptive multitasking is perhaps not the right solution for RISC OS. Loss of performance can be very important and power management is really a problem (see all the efforts and tweaks in the Linux kernel). To extend cooperative multitasking (as in Wimp2) could help old application running on the new scheduler… or not :). That was just my thoughts on the subject.

Jul 6, 2014 9:39pm David Feugey (2125) 2709 posts	On a more serious note – one might like to introduce David to the idea of calling the filesystem in a timed CallAfter. Once he gets his head around the “FileCore in use” and why it is happening, it might help knock out some of those assumptions. There are some dusty unkempt dark alleyways in the kernel where even the pimps and pushers fear to tread. That’s why I suggest not to interrupt the processor when running system code. A sort of BreakAfter xx cs, but not for the system, modules, etc. Just for applications. Like Wimp2 in fact, but in a cleaner way since we now have access to source code, and with more options (for example the cycle=vsync possibility). Is there a way to know what the processor did before an interupt? The idea is to put a timer (easily doable now, since it’s inside the new 32bit HAL), and to force a Wimp_Poll[Idle], only if application code was running (not an easy part, since processor mode will be different from user mode). In fact, my suggestion is to interrupt an application, but give back the control to another one (Call_Wimp_Poll_After :) ).

Jul 6, 2014 11:00pm Rick Murray (539) 13850 posts	what is doable under RISC OS? Not a lot, for we have two main problems. The first is backwards compatibility. Our developer base is small enough that it is unfair on them and us to implement a change for the sake of change. The current multitasking is not perfect, but it has worked for us for a quarter century. The second is a lack of human resources, how many people are developing RISC OS at the moment? Would this be the best use of their time? If you believe so, then put up a bounty proposal and see if anybody bites. just to keep the system safe would be much better than what we have today, no? Personally, I’m not so concerned about application code. The one that frightens me is that system resources (modules) operate in the same level as the kernel and important things like the file system. If I had silly amounts of money to spaff on pet projects, I’d want the RMA to operate in SYS mode, with SVC reserved for core OS functions. The entire RMA would be read-only. Dynamic Areas would be read only except to whatever created them. And so on. Somewhere, down towards the end of the list, we’d have application paging. It shouldn’t be too hard to sort that. If the entire application space is kept locked off (no read/write) except for the current application, this could be handled while paging in the task. Just some more bits of the lookup tables fudge. ;-) I’m not against big plans, if they have chance to become a reality. That’s a pretty big “if”. When I was younger, I had planet-sized plans. I now have much more realistic ambitions. But to be honest, traditional multithreading and preemptive multitasking is perhaps not the right solution for RISC OS. Well, my Android phone stiffed up last week to the extent that I had to pop out the battery to recover it (makes me wonder what happens when everybody hard wires their batteries inside). Everything can crash… Loss of performance can be very important and power management is really a problem Isn’t the typical solution to a loss of power to get a faster processor? Look at every Windows platform ever. On a less facetious note, do you think RISC OS is lacking performance? For some reason file operations are very slow, but otherwise it is remarkably quick considering what the hardware is. My Android phone can have some really noticable pauses, and while a sexy-awesome graphical interface with animated backdrops is way more complicated than RISC OS, it is running on a dual-core processor clocking twice what the Pi does. Surely it can be more fluid than that. If I want to take a photo, I don’t want to wait fifteen seconds for the camera app to start, and another ten seconds before it is capable of responding to input! (see all the efforts and tweaks in the Linux kernel). Is this the Linux kernel in general? The one that has to take into account dozens of different architectures all with their own quirks, and a massive acceptance on battery powered devices? (for example the cycle=vsync possibility). Why not just use the centisecond tick? Think of it as a 100Hz refresh. ;-) Is there a way to know what the processor did before an interupt? I guess you could work out what registers/PC are stored? Go to SVC mode, jump to IRQ mode, read R13 and R14, return to SVC mode, then poke around what those point to? only if application code was running (not an easy part, since processor mode will be different from user mode). Actually, that is “doable”. What I do in a program that wants to be able to do something on a regular tick without reentrancy problems is as follows: I have a CallAfter that calls my code after a specified number of centiseconds have elapsed. The CallAfter handler then schedules a CallBack. The CallBack handler does the work, and then sets up the next CallAfter. The first thing to note is that CallEvery/CallAfter run off the system tick so will probably be entered in IRQ mode, and if not, in SVC mode but with the system “busy”. Essentially this code is barging in. A CallBack, by contrast, is handled by RISC OS when the system is no longer busy (supervisor stack is empty) and the system is about to drop back to USR mode for application benefit. This means it is a good time to request a bit of attention from the machine. What we really need is a timed CallBack. ;-) You will notice, by the way, that I repeatedly schedule CallAfter and do not use CallEvery. This is because you should only set up one callback, and we have no actual guarantee of when RISC OS will process the callback. “When it is no longer busy” is fairly vague. Why don’t you have a crack at porting the Wimp2 module? See if the preemption idea stands up? At a rough look, I think it would involve the following: rip out all the IOC code and replace it with calls to the HAL timer rip out the OS_File code – we can worry about getting that working later you may need to save/restore VFP contexts? Check the Wimp sources. please, for the love of god, stop it downgrading everything to a RISC OS 2 application – the message list exists for a reason! the usual 32bit conversion, though careful attention must be paid to CPSR and SPSR. We can’t assume that anything at all is safe to be modified.

Jul 7, 2014 5:45am David Feugey (2125) 2709 posts	The first is backwards compatibility I agree I now have much more realistic ambitions. Same here Everything can crash… Exactly the same here :) On a less facetious note, do you think RISC OS is lacking performance? No, but we should preserve that. Multithreading and massive preemptive multitasking + memory protection have big impacts on performance. Why not just use the centisecond tick? Think of it as a 100Hz refresh. ;-) Why not. My idea was to tell: choose. centisecond for power, less for pure reactivity, same as sync for games or graphics, etc. What we really need is a timed CallBack. ;-) Yep Why don’t you have a crack at porting the Wimp2 module? See if the preemption idea stands up? At a rough look, I think it would involve the following: Two reasons: 1/ it’s GPL code. 2/ most of the code are tweaks for timer, ROS2 applications and the fact that source code of the OS cannot be modified. And of course, main reason is that I coded only user apps in ANSI C, so it’ll take a lot of time for me to make something in ASM in the system space.

Jul 7, 2014 5:49am David Feugey (2125) 2709 posts	BTW, for your memory protection thoughts, it seems to be doable too. Just a few big zones, with change of context (writable ou not). It’s simpler than to define new zones for each context change (zone specific to the running application or module).

Sep 3, 2014 12:51pm Jeffrey Lee (213) 6048 posts	Thinking about things recently, I wonder if there would be any merit to splitting the kernel into multiple modules. Make the core kernel be much more like a microkernel which handles the bare minimum needed to start the rest of the OS (interrupts, memory, module chain, vectors, SWIs, etc.) and move all the rest (CLI, system variables, keyboard/mouse buffers, CMOS, VDU, etc.) into one or more external modules. Originally I was thinking of using this as a convenient way of solving the keyboard scan/CMOS reset and CMOS storage problems that modern machines are facing. By getting rid of any CMOS-reliant code from the core kernel it would allow us to separate the modules into two main groups. First the modules that don’t use CMOS will be initialised, then the keyboard scan + CMOS reset can be performed, and then the OS can go on to initialise the rest of the modules. However I’ve realised that this approach would also be useful for if/when we start making RISC OS fully multithreaded. If we go with the approach of adding a flag to the module header to indicate whether the module is thread-safe, we can start off by concentrating on making the core microkernel thread safe (and adding the thread management calls), and then simply mark the ancillary kernel modules as thread-unsafe. Then those modules will automatically fall back to using the global mutex to enforce single-threaded execution, allowing us to get a multi-threaded RISC OS up and running quicker than if we had to wait until the entire kernel was thread-safe (or if we had to add some nasty hacks to manually claim and release the global mutex on entry/exit to the unsafe areas)

Sep 3, 2014 1:29pm rob andrews (112) 200 posts	This is a great idea can’t wait to see the results should make the OS more stable too

Sep 3, 2014 2:30pm David Feugey (2125) 2709 posts	That’s a very good idea. It’ll be easier too to upgrade or change components in the kernel (with modules and not vectors).