"Let's talk 64bit" presentation at ROUGOL next Monday, 19th August
Pages: 1 2
Charles Ferguson (8243) 427 posts |
Hiya, I’m giving a presentation to ROUGOL at their next meeting, Monday 19th August, to talk about 64 bit RISC OS. I’ll be going through some of the differences that AArch64 has from the 32 bit architecture that we know, the issues that they cause, and some of the ways that they can be addressed. Although I’ll be doing this as a presentation, I’d like to encourage a degree of participation in the discussion. I have some clear ideas in some areas, and in others I’m more hazy – if there are clever people who want to lend their thoughts, please come along, and we can have a (hopefully) useful discussion about this. Who am I, and why am I qualified to talk about this? I’m a RISC OS developer, who’s worked on RISC OS as a user for many years, before updating and maintaining the operating system for RISCOS Ltd20. In recent years, I’ve produced the only other version of RISC OS other than RISC OS Classic – RISC OS Pyromaniac1. I have produced online tools2 to aid developers3 on other platforms, evangelised regression testing4 and safe practices which encourage developments of the system. I have produced documentation5 and tools for creating documentation6 in a collaborative way. I have made software available publicly for most of my work7 and continue to release tools8 and software for RISC OS and other systems9 through my own websites and GitHub10. I tinkered with a few games11 and lately, I’ve been doing demonstrations of coding for RISC OS as live streams on YouTube12. I’m not sure that I’m especially qualified to talk about 64 bit, but I have a little experience. And I know RISC OS quite well. So why not bring your own experience, and we can all learn things together. 1 https://pyromaniac.riscos.online 20 https://gerph.org/riscos/index.html, https://www.riscos.com/support/developers/riscos6/core/index.html |
Rick Murray (539) 13840 posts |
Ummmm…. 😂 |
Simon Willcocks (1499) 513 posts |
Thanks for the presentation, it was very interesting. |
Charles Ferguson (8243) 427 posts |
Thanks to everyone who came and contributed. There was a lot of interesting discussion. The slides and video for the presentation can be found here: https://presentation.riscos.online/64bit/ |
Rick Murray (539) 13840 posts |
* Stop writing code in assembler… 👏 |
Jon Abbott (1421) 2651 posts |
If no software is compatible, isn’t it a good opportunity to deprecate all the nasty design decisions? ie drop all SWI that return flags, drop all SWI that take flags in address fields, drop OS_CallASWI / R12 as they’re no longer required. Ditch FileCore, add partitioning from the outset, replace BASIC with a modren variant…the list is endless. SWI # in a register is definitely the way to go – I would not put it in the instruction as it mixes data/instructions, causing a cache miss, data cache pollution and potential cache maintenance issues. I’d completely forget about backward software compatibility when reworking the OS, that can be covered in software with legacy hardware/OS/ 26/32 bit via emulation. |
David J. Ruck (33) 1635 posts |
It is an opportunity to drop a lot legacy stuff, but on the other hand there are only a few developers left, and the only way you are going to get them to port their software to the new 64 bit RISC OS is to make it easy as possible. Otherwise the only thing the new OS would run is an emulator, and you can run those on top of anything. As far as assembler, you can throw that out the window as it’s completely different, it’s just like the change from 8 bit, you’ll need the equivalent of the !65Host emulator to run it. This will allow the SWI calling conventions to completely changed on 64 bit. However, from higher level languages it is important that compatibility is maintained, in the same way that RISC OS BASIC emulated the CALL OSBYTE, OSWORD, etc of the BBC’s MOS, RISC OS BASIC 64 needs to emulate the existing SWI calling mechanisms when SYS keyword is used, and have a new SYS64 introduced for native 64 bit SWIs. Similarly in C, libraries such as OS Lib can provide the same interface as legacy SWIs, but translate them to the new 64 bit SWIs. That still leaves fundamental things like making the Wimp pre-emptively multi-task which would enviably need code changes to applications, but that isn’t 64 bit specific. |
Simon Willcocks (1499) 513 posts |
Not necessarily, I hope. I think a combination of cooperative multi-tasking for message passing, poll loops, etc. can be combined with the ability for other threads to update window contents between window stack adjustments would be a reasonable compromise. A video player, for example, can have a second thread that doesn’t get blocked by polling. That thread decodes the incoming data stream, passes audio to the audio mixer, and displays video to a window using Wimp_UpdateWindow/GetRectangle more or less traditionally. When a Wimp task tries to do anything to affect the window stack, it is paused until all programs updating their windows in the background have called GetRectangle and given nothing more to do, the stack is adjusted, and the Wimp tasks are informed as now, with RedrawWindow messages, etc.. Existing programs can then execute unchanged, while new programs can make use of the new facilities. |
David Pilling (8394) 96 posts |
In my stuff all SWI calls are via one C function os_swix (?). You could change the way SWIs work as long as you supplied a piece of glue that mapped new to old. Old apps could exist in a world of limited but familiar capabilities. Given the plan is to break all old apps to some small extent. |
Rick Murray (539) 13840 posts |
Interesting discussion as always. It amuses me those who want to cling on to “something that resembles the current API”. But, whatever, the speed things move at around these parts, I’ll be gaga before this happens. ;) For me, priority ought to be on maximising the abilities of existing hardware. Looking into ways to make more use of existing hardware can deliver real benefits with a lot less effort and hassle than would be required to shift the OS to an entirely different processor. |
Piers (3264) 43 posts |
There’s no reason for pausing apps – there’s almost no reason why most apps should care where its windows are. But the wimp needs to be able to handle redraw better outside the redraw loop. ie. it still sends the redraw message, but an app doesn’t need to respond immediately. This is the model other OSes use. The first obvious step (which would be compatible with existing apps) is for the wimp to handle the global origin for you – every app needs to pointlessly duplicate the origin code. Then redraw can more cleanly occur at a later time whatever state the wimp is in as your window may have moved in the time between you getting the redraw request and when you get a chance to redraw. There are plenty of ways of handling multi-threaded redraw – I wrote a graphics queue module for RISC OS Java which let any thread draw at any time to the screen by simply queuing them up and then using Wimp_UpdateWindow when it was convenient. The graphics queue handled the origin at this point. The queue was needed because X11 (which we based the AWT port off) lets you efficiently draw to a window at any time. I’ve not tried Wimp2 for a very long time, but I assume it took a similar approach to Java, albeit by intercepting graphics vectors for compatibility. You get a bit of lag (grey or chequerboard pattern) in redraw, but conversely, nothing blocks for the redraw so the desktop is smoother. Another (and probably more future-proof) would be for apps to have a dedicated sprite (double buffered) to draw into for their window. This has been the MacOS X approach from the outset and means the GPU can composite the windows without any input from apps (which allowed it to animate all sorts of things in the days when that was trendy).
I seem to recall the idea of RISC OS Gold was for old apps to run within a compatibility window (ie. appear like !PCEm, or VNC), rather than intermixed with existing apps. It never sounded very usable, but then I’m not aware of any work occurring on Gold. Mac Carbon gave MacOS 8 apps a near-source code compatible route to upgrade to MacOS X. It was an interim C API that mapped to Cocoa (NextSTEP). That ought to be possible here, though I do suspect the first decision is to decide how close this will be to RISC OS, given it’s a total rewrite. |
Piers (3264) 43 posts |
WebKit’s inherently single-threaded. OK, so a few bits aren’t (network fetches, HTML parsing, image decode…), and JS has a clunky threaded API (web workers) that 99% of websites don’t use. I haven’t tried Iris (I have no wifi as I’m on a Pi400), so what’s slow? If it’s redraw, that’s probably because it uses OpenGL throughout. From screenshots it looks like it’s using a GTK-like UI. Maybe that’s slow? Any 64-bit RISC OS should settle on OpenGL or Vulkan as a baseline requirement, though I don’t know how feasible that is for drivers. |
André Timmermans (100) 655 posts |
The idea is to define new clean interfaces for both 32 and 64-bit use. Let new apps/libraries make use of these new interfaces when present and let the legacy 32-bit SWIs call these clean interfaces. Trying to reuse existing code for 64-bit will quickly face encounters with incompatible interfaces, simple calls like OS_BGet and OS_GBPB using the C flags already get you into problems. |
Rick Murray (539) 13840 posts |
I think (very brief look at the code) that it simply runs through the motions of redraw, queueing the rectangles given by the Wimp to later pass on to the app. Which means things might get “interesting” if the window moves or is closed.
Also means that moving windows around can be handled by the Wimp itself and not need loads of redraw events to get everything back to how it was.
…and the amount of money Apple had to throw at that, versus the budget capabilities of ROOL…?
Google Docs, for example. It’s a lot of work to get the web version running, but an average smartphone can do it, just about. RISC OS? Amazingly can do it, but it’s like treacle. …but I think it may have to be a TUBE-like co-processor arrangement, else all those “this code is not reentrant” things are going to bite. |
Simon Willcocks (1499) 513 posts |
The apps wouldn’t be paused, they’d simply be told there’s nothing to be done for now. As if their windows weren’t visible. The problem isn’t where the app’s windows are (that’s covered efficiently by the current system), it’s the position of overlying windows. “I’ve got a new frame, where should I display it?” “Nowhere? OK.” A redraw request can also be reported to a background thread that will request an update, instead.
In these days of huge amounts of memory, that might sound reasonable, but I’d rather the program made that choice! |
Paolo Fabio Zaino (28) 1882 posts |
As far as I remember, WebKit renders in software unless there is supported hardware acceleration like Metal or OpenGL. This means on RISC OS, it’s doing everything in software (which is the first performance issue). Additionally, I believe it’s using GTK (or maybe Qt), which also handles everything in software on RO. And yes, the multi-core aspects are obviously running on a single core and are concurrent. In the past, it also had to access the SD card, but now it has a launcher from RAMDisk, which should also host the cache (or maybe there’s a new cacheFS for it). However, this is less relevant now, considering quite a few RISC OS systems are using SATA and NVMe. I’m not sure if the JIT for JavaScript has finally been enabled or if it’s still using the interpreter. If it’s the interpreter, then on sites with massive JavaScript libraries and async/await protocols, it will obviously be quite slow. I agree that using the extra cores probably wouldn’t boost performance much. A future version of RISC OS that works with hardware-accelerated OpenGL might help more. However, right now, a side issue that may be impacting performance is that on RISC OS, everything is running on a single core: networking, Wi-Fi, USB, kernel, multiple apps (and ported apps from Linux also assume Preemptive MT which makes things clanky on RO). It’s impressive that it’s still running at all at this point! XD |
Piers (3264) 43 posts |
Interesting choice of site, given I’ve spent nearly a year working on browser support for that specific site. Have you tried it on a Pi in Chromium or Firefox? It doesn’t perform exactly brilliantly in either – on a Pi400, neither can keep up with my typing (it’s not miles behind but it’s disconcerting to touch type when the letters appear with such lag), and auto repeat is horrible (and error-prone for deleting). It’s like Acorn DTP on an A305.
Why? As soon as one app refuses, the whole desktop would go clunky because the redraw needs to be synchronised.
I suspect for a single app (or single tab in WebKit’s case), multicores makes less of a difference than you’d think. Does Iris expose the Webkit inspector? Using it on my Mac M2 with Rick’s example of Google Docs, it shows that almost only the main thread is used (ignoring startup), and at an impressive 50% CPU – threads make up 10% of usage. (Go to inspector, then Timelines, then record. After a short recording, scroll down to ‘Details’ and expand that. It gives you a graph showing CPU usage for the main thread, workers, and other.) GPU usage won’t appear because it’s a CPU graph (though the CPU needed to create the OpenGL commands is included), and I suspect this is the area that would make the biggest difference (for the single app/tab use-case). |
Paolo Fabio Zaino (28) 1882 posts |
Sounds like something got lost in the translation here. I never said multicore support for WebKit would improve WebKit performance, sorry. What I meant there is running – other things – on different cores would free CPU cycles for WebKit. |
Martin Avison (27) 1494 posts |
That amazes me. |
Simon Willcocks (1499) 513 posts |
What, like the current system? It seems to work pretty well to me. |
Piers (3264) 43 posts |
You were very clear. I just picked up on your side issue as I suspect the use-case of Google Docs in a single tab has negligible external overheads. Very little IO occurs (typing, a bit of mouse, and a bit of async network activity to synchronise your document with other collaborators). There’s no disc IO (which is slow). If there’s a performance difference between RO and PiOS on the same hardware, I’d suspect it’s redraw. Google Docs does have a pulsating cursor (as opposed to flashing). Due to the way modern browsers composite the web page into layers, I wouldn’t be surprised if that’s doing a large sprite plot every frame, rather than a few pixels. With OpenGL this will be faster than the calculations needed to calculate the minimal redraw size. But, as I’ve said, I don’t have Iris so it’s just speculation. I’d be very happy if it were something simpler that was easy to address. Someone mentioned the JIT may be off. Or maybe it’s already the same performance as Chromium on the same hardware. |
Piers (3264) 43 posts |
The specs change. https://github.com/whatwg/html/commits/main/ |
Paolo Fabio Zaino (28) 1882 posts |
Oh, I see. In that case, I’d be a bit more precise and add JavaScript to the list, along with the GPU backend (in your example, probably more Apple Metal than OpenGL), given that Google Docs has some hefty JavaScript on the client side. This also aligns with the poor performance of other websites that rely heavily on JavaScript. WebKit should support a JavaScript JIT that, if I recall correctly, uses Thumb instructions, so it’s possible that it’s being used on RO. However, I’m not 100% sure, and even if it is in use, the WebKit JS JIT is notoriously less performant than V8, which is why I’d (cautionally) include JavaScript here along with the GPU-accelerated backend. It should be easy, though, to measure JavaScript performance in isolation on Iris to have a comparison and avoid any redraws, thereby excluding any OpenGL/Metal-related performance improvements. This might help narrow it down a bit more. To measure GPU vs. CPU load, Linux could be helpful. You could use either Epiphany or Minibrowser (both, like Iris, based on WebKit AFAIR) with the same test and run the usual GPU load analysis tools for Linux, like nvtop or radeontop. On a Mac, you could use Apple Instruments for very detailed reports if you have or use Xcode. |
Rick Murray (539) 13840 posts |
? Shouldn’t that be the sort of thing that doesn’t concern the app? It just redraws when asked to, and the windowing system decides if it’s to use a bitmap for the window and/or get the app to do it each time.
Well, they done gone and fixed FlightRadar24 for me ;) so that was the next “most difficult” I could think of. Actually, script.google.com doesn’t work, but then it does horrible things on mobile too. It works so long as nothing needs to be edited, and then it… well… let’s just say that the input/editor is peculiar on a machine with a keyboard. Without one, it copes spectacularly badly. Which is kind of poor given that it’s a creation of the same company that makes one of the most widely used mobile OSs.
My Pis have only ever run RISC OS, except for a brief foray with OSMC or Kodi or whatever it’s called now on the Pi Zero. I was comparing with Firefox on a middle-of-the-road smartphone back in the summer of 2019.
I’m not expecting brilliant. You’d need some grunt behind the browser for that to work. It was laggy on the phone. But, sadly, as is often the way, the website does multidinous things that the app can’t do.
Ohhh, I remember that on an A3000 and… yeah… there was a reason I was happy that some bastard stole my discs from the “safe and secure” computer room at school. Because I was told to store it there, the school had to pay up. Went and got myself Ovation, never looked back.
I was pretty amazed by Google Docs. Given that I started this lark with HTML 2.0 and technology like Webite and ArcWeb, the idea of having a functioning word processor that’s basically a baby-version of Word in behaviour, is kind of mind-blowing.
Problem is, not only do standards change, as was pointed out above, but companies have a habit of bolting on stuff for their own little use cases. Additionally, there are those, like the dual examples of Google Docs and Google Script, who take what a browser is able to do and push it up to eleven.
It’s one of the reasons why I always thought it stupid that Apple bake Safari into the iOS releases. Between the security issues/fixes and the ever-evolving standards, browsers are a moving target. You can either be like NetSurf and aim for the baseline and skip the fancy stuff, or you join the grind and end up evolving into a Chrome clone… |
Simon Willcocks (1499) 513 posts |
Well, yes and no. In a video player, it makes sense to have the current frame to copy to the screen and the next frame that you’re working on (maybe one more, just for luck). If you’re a text editor, it makes more sense to redraw on demand. How is the Wimp supposed to know which it is? That said, there’s nothing stopping the Wimp from caching windows’ contents by getting the applications to redraw the whole thing to a bitmap instead of the screen and not bother the program’s redraw routines after that. However, that makes no sense if the program is updating the window tens of times a second, or has its own buffer. Or if the redraw takes a fraction of a second, which it will if it has its own buffer. It only really makes sense if you want windows to wobble as you drag them around, or whatever (and you could have a layer just for that window). Scrolling a window also has the same “problem”; you either have to trust the program to do it quickly or cache a bitmap of potentially millions of lines of text. (Right button scrolling of windows is one of the really, really great RO innovations.) |
Pages: 1 2