C is not a low level language (any more)

39 posts, 10 voices

Pages: 1 2

May 24, 2018 2:21pm nemo (145) 2556 posts	sounds like you’re saying that only one task should be running at once, which sounds a bit barmy to me Define ‘running’ It’s clear that desktop protocols require strict serialisation. By definition that means that the Wimp dispatching a double-click event to the Filer has to result in a static ordering of Task Wimp_Poll message passing between all tasks until the message queue is exhausted. If there is pre-emption, then that message passing chain must block until the preempted task is ready. That would be very poor design. Consequently the GUI thread is effectively a single thread. It could be argued that subsets of serial tasks could be formed from the declared message lists – so that a task that does not respond to DataLoad/DataOpen (or any of the other broadcast invitations) could be ‘running’ in a parallel Wimp thread, but ‘running’ is misleading because of the single-threaded nature of most of the OS. Redraw must be single-threaded because of the lack of graphical context (and the non-centralised nature of the graphics stack). Redraw can be deferred, but the visual effect will be poor (though passive surface capture could mitigate that to some extent… for reveals, though not for scrolling). The Wimp’s various window stacks cannot be mutated in multiple threads, so Open/Close (and Redraw/Update) poll events must be single-threaded. Keypress events are single-threaded and serial for the same reason as DataLoad/DataOpen. LoseCaret/GainCaret and PointerLeave/PointerEnter are obviously serial (eg app must process LoseCaret before other app receives GainCaret). By definition clicks/scroll requests are single-threaded (unless you have two mice? Joke.) This is all painfully obvious. The only problem left to solve is the number-crunching/recalculating problem caused by the conflation of UI handling with actual work. Our programming model has always been one of doing work during the UI thread… because on the whole there isn’t another thread. This is bad design, and is specifically NOT how OSes usually work. Even Psion’s EPOC split applications into the UI and the Engine. Windows has separate UI and work threads. There’s no (reasonable) way of making the WindowManager’s UI protocols multithreaded in a backwards-compatible way. However, there’s plenty of scope for adopting a Web Worker style model that an application can use having worked out what the UI protocols require it to do. I’ve casually suggested the WindowManager ought to be split into two or three parts, with TaskManager gaining all the task switching and message passing responsibility, and WindowManager becoming purely a GUI. TaskManager could then gain additional thread/worker capability which would not be confused by being lumped in with “the Wimp” which, as I’ve explained, can’t itself be multithreaded. This is analogous to the experimentation we did Some Time Ago with a modified TaskWindow to provide multi-threading with a deliberately constrained API for the subordinate threads. I have some of Dan’s TaskWindows around here somewhere. Passing spreadsheet recalculation or ChangeFSI processing or whatever from the UI thread allows that system to decide best how to implement it – a single core system would immediately call the ‘worker’, producing the same serial work-in-UI thread behaviour we have. A dual-core might start the worker on the other (asymmetric) core and return immediately to the UI thread (which will subsequently return to it with a new WorkComplete event for example). A multi-core might subdivide the work for multiple worker threads if it’s array-form processing. But if we start thinking we can deliver DataLoad to one application while another is processing a Click then you are inviting race conditions in every app that responds to messages because of the sudden unsynchronisation of what has always been completely synchronous. Most application’s implementation of the desktop save protocol works mainly by coincidence. How many actually store an arbitrary number of my_refs on the off-chance that two transfers might be happening simultaneously? None. Try to multithread the UI thread and you break everything really.

May 24, 2018 2:31pm nemo (145) 2556 posts	Those who have spent any time playing with Niall Douglass’ Wimp2 will recall that it could only perform pre-emption by having a separate UI thread for every application. Pre-emption is not multithreading, but in this respect it is analogous.

May 24, 2018 2:55pm Jeffrey Lee (213) 6048 posts	sounds like you’re saying that only one task should be running at once, which sounds a bit barmy to me Define ‘running’ Running = a processor core executing code at some given point in time. So for example, you could have two tasks running at once if code from Edit was running on core 1 at the same time as code from NetSurf was running on core 2. E.g. user editing some text while NetSurf is parsing and performing layout calculations for a page it’s downloading. I’m not disputing your statements that access to the Wimp needs to be carefully controlled (e.g. serialised). I’m just making sure that you’re not saying that code which doesn’t interact with the Wimp also needs to be restricted just because it doesn’t belong to the same task as another task which is currently running.

May 24, 2018 3:05pm nemo (145) 2556 posts	No I’m specifically saying that code could be running in parallel, but only one of those parallel threads can be the Wimp’s UI. The others would be ‘workers’, with a very restricted API available to them. That’s why I think The Tube is a useful metaphor, if not an efficient model.

May 24, 2018 5:42pm Rick Murray (539) 13851 posts	Consequently the GUI thread is effectively a single thread. Wow. What a long-winded way to eventually agree with me that the current behaviour of the entire GUI does not lend itself to spreading the current work around the available cores. Mais, alors, nous on arrivé. Enfin. :-) Technically, it isn’t difficult – app #1 on core #0, app #2 on core #1, and so on as required. Except that as you point out, the messaging, the graphics context, all of that will likely go wrong. Not to mention the lack of reentrancy in FileCore which I’d imagine would be a pretty big stalling point for tasks using file I/O. Instead, it will require specially written tasks which have the ability to say “do this”, where “this” is a restricted API that can be pushed to another core. And, as you say, the Tube protocol is extremely pertinent because it permits the “host” to handle all of the graphics, hardware, and file stuff. It may stall, pushing everything through the main OS core, but it’s workable. The alternative, not so much (without a fundamental rewrite of huge chunks of the OS). it could only perform pre-emption by having a separate UI thread for every application. It was my understanding that it put itself in between the application and the Wimp, so that the application would think it was running normally doing stuff, the Wimp would think that the task was polling regularly, and Wimp2 would be sort of fudging between the two; time-slicing the application, and queuing some messages while throwing others away (and downgrading everything to the sophistication of the RISC OS 2 Wimp “because”). I’m not sure I’d call it a “thread” exactly. More a fancy veneer. user editing some text while NetSurf is parsing and performing layout calculations for a page it’s downloading. That would be nice, but can it be achieved easily? Actually – perhaps yes. Imagine if the Wimp has a SWI to say “this is a lengthy bit of code coming up”. On a single core machine, it will have no effect (the code will run as normal) but on a multi-core machine, the Wimp could then use that as an indication to farm out the entire task to another core (if available), then carry on running other applications. There will be caveats, all file access will need to pass through a Tube-like gateway, there will be no GUI stuff, and clearly Wimp events will either be queued or discarded; however Wimp2 did prove that it was possible to do, to existing applications. I used to use it for ChangeFSI that was pretty slow on my A5000. The “other core” mechanism can be stopped by either calling the same SWI with a different reason code (to mean ‘done now’), or by simply calling Wimp_Poll (which switches out the task anyway). Could work. And if it does, please let’s call the interface glue module “Tube”, for hysterical raisins.

May 25, 2018 2:15pm nemo (145) 2556 posts	What a long-winded way to eventually agree with me I was neither agreeing nor disagreeing with you, I was clarifying terminology with Jeffrey. It was my understanding… You understanding is incomplete. Don’t conflate Andrew Teirney’s WimpPatch with Wimp2 itself, they are separate parts of the whole. WimpPatch attempts to provide the message-handling thread implementation for existing tasks. It is not entirely successful under all circumstances. Its purpose is to respond to messages even when the task has been pre-empted when busy. So yes, it is a separate message-handling thread, as I said. Just one that the task itself is unaware of because it wasn’t written for Wimp2. Code that is written for Wimp2 passes the address of its message handler in R2 during Wimp2_Initialise – that is the code that will get called in another thread even if the main thread has been pre-empted in the middle of doing something. It’s not “a fancy veneer”. the Wimp could then use that as an indication to farm out the entire task to another core (if available) No. The model would be: Wimp_Poll returns Click reason Click handling realises “Recalculate” has been chosen Range of data to be processed is compiled in an array form App calls TaskManager_SubmitWork SWI (or whatever) with the array of “work to do” and a ptr to the worker code TaskManager decides, based on a thread-pool system, how many threads/cores can be allocated to do the work On a single-core/thread system, the worker is called immediately, and ‘finished’ returned from the SWI ¹ On a multi-core/thread system, the TaskManager splits the array up into enough pieces (or some pieces to be going on with) and starts or schedules some workers on other threads/cores, returning ‘ongoing’ as the result App returns to Wimp_Poll If the work was ‘ongoing’, at some point the app will receive a ‘finished’ event/message The point being that what we think of as the Wimp program would become the Wimp UI thread, and actual work would (potentially) be carried out on another core, perhaps on multiple cores, ensuring that the UI remains responsive even while ChangeFSI is crunching pixels, for example. Note that this model cannot be imposed on an existing application, it’s a different programming model and must be coded appropriately. Quite how it could be implemented in BASIC (as opposed to MC-in-BASIC) I don’t know… but I know a guy who probably does. hysterical raisins We all are. ¹ Or perhaps it would only return ‘finished’ if it completed before a pre-emption time-slice

May 25, 2018 2:23pm nemo (145) 2556 posts	Note that though this form of thread-pooling and work subdivision is neither novel nor particularly complicated, I have no idea whether it can reasonably be achieved with RISC OS’s Sideways-ROM-like task paging. Do separate cores have their own memory tables? Do hyper-threaded cores have the equivalent? Can this only be achieved with Dynamic Areas that are always available? I’m outside my area of competence and can only formulate the questions. Someone with a lot more knowledge will be along in a moment… I’m rather more fond of Dynamic Areas than some RO5 enthusiasts it seems, but I think the main complaint about DAs is due to address starvation, which would (I think we’ve agreed) be mitigated by Relocatable Dynamic Areas. But that’s a detail.

May 25, 2018 2:39pm Colin Ferris (399) 1818 posts	Has anyone had a go at 32bitting Wimp2 :-) (Has the login page changed?) (OTT Is there a easy way of Disassembling a Internet/DLL module) Like dropping it in !Zap :-) 486 mode?

May 25, 2018 4:43pm Jeffrey Lee (213) 6048 posts	Do separate cores have their own memory tables? Yes. Do hyper-threaded cores have the equivalent? I don’t think there’s a direct equivalent to hyper-threading with ARM. (If there is, then I suspect the answer would be “yes”) On ARMv6+ we do get two page table pointers per core (one for low logical address space, one for high, with the boundary address customisable), which fits in very nicely with RISC OS’s memory map, and is also a sensible way of handling things for multi-core. The OS currently doesn’t make use of this feature, but once it does it should make task switching an order of magnitude faster than the current lazy task swapping system.

May 25, 2018 9:15pm nemo (145) 2556 posts	Yes & “yes” I couldn’t see how it could do anything useful otherwise, but it’s best to check an expert. we do get two page table pointers per core Does that imply you’d need a complete page table per task, if you’re just swapping ptrs? Or would you just cache a number of them for the largest/busiest tasks? Colin asked Has anyone had a go at 32bitting Wimp2 :-) Great minds and fools. I’ve never done the analysis of which of its routines actually need the flag preservation that Niall has thrown at everything. I’m not sure how I feel about Wimp2. It works, in so far as it does what it intends to. It’s a shame the thread handling wasn’t completed. But I don’t think anyone ever wrote anything for it (other than using Andrew’s WimpPatch). By comparison Alexander Thoukydides’ Virtualise was a much more accomplished (and totally different) thing. Oodles of inexpensive memory has rendered Virtualise as obsolete as Moore’s Law has for Wimp2/WimpPatch. However, Wimp2 would be much easier to modernise.

May 26, 2018 11:56am Jeffrey Lee (213) 6048 posts	we do get two page table pointers per core Does that imply you’d need a complete page table per task, if you’re just swapping ptrs? Or would you just cache a number of them for the largest/busiest tasks? You do need a page table per task, but the per-task page tables don’t need to cover the entire 4G logical memory map. They only need to be large enough to reach the boundary address between the upper & lower page tables. So for a 512MB wimpslot that would be 2KB of L1PT, and at minimum one page of L2PT (since we’ve got some pesky global pages from &0 to &8000). If we wanted to we could save some memory by adjusting the boundary address on a per-task basis, right down to 32MB, although it’s probably only worth doing that once we start offering larger (1GB or 2GB) wimpslot sizes. Each page table is associated with an “address space identifier”, which is where most of the performance boost comes from – cache & TLB entries are tagged by ASID, avoiding the need to flush them when swapping task. But this also means there’s a hard limit of 256 ASIDs. So once we hit the limit of 256 active tasks we’d need to start swapping ASIDs between tasks, which would slow things down a bit because cache/TLB maintenance would be required. No doubt there are a number of different algorithms which could be tried in order to minimise the cost associated with this (e.g. do we swap ASIDs on an LRU basis in the hope that swapping will be infrequent, or do we try to minimise the cost of each swap by looking for ASIDs with the least amount of memory associated with them, or by using lazy task swapping?)

May 26, 2018 1:00pm nemo (145) 2556 posts	So for a 512MB wimpslot that would be 2KB of L1PT, and at minimum one page of L2PT Which is looking cheap these days. do we swap ASIDs on an LRU basis in the hope that swapping will be infrequent, or do we try to minimise the cost of each swap by looking for ASIDs with the least amount of memory associated with them, or by using lazy task swapping?

May 26, 2018 7:01pm Steffen Huber (91) 1953 posts	By comparison Alexander Thoukydides’ Virtualise was a much more accomplished (and totally different) thing. For those with a deeper interest: Thouky has some RISC OS stuff on GitHub, here is Virtualise: https://github.com/thoukydides/riscos-virtualise

May 27, 2018 8:46am nemo (145) 2556 posts	Thanks for that.