ARMv8 support
Theo Markettos (89) 919 posts |
Now we have a board implementing ARMv8 and can run RISC OS, thought I’d start a thread to cover what are and aren’t usable of ARMv8 features. So what’s the state of play with all the various aspects of ARMv8 under RISC OS, for example:
‘None at all’ and ‘never’ are valid answers :) |
Theo Markettos (89) 919 posts |
I suspect that the exception mode ordering enforced by the spec means that we can’t run any A64 instructions in supervisor mode, and hence also userland. And that moving to AArch64 exceptions would be tricky due to the different CPU modes (no IRQ, FIQ, SVC, etc). But thought it was worth having the discussion anyway. |
Jeffrey Lee (213) 6048 posts |
The answer to all of the above is basically “no support yet”. However you might be able to assemble some ARMv8 instructions (maybe even AArch64) using GCCSDK’s ‘as’ (I’m not sure offhand when GCC started gaining ARMv8 support)
Yeah, I doubt that switching between AArch32 and AArch64 on the fly is going to be easy – certainly not as easy as switching between ARM and Thumb. Maybe we could have an “OS_EnterOS64” SWI which you’d call in 32bit mode and it would return to you in 64bit mode. But since there are so many architectural differences (e.g. the hardware requires the stack pointer to be 16 byte aligned in AArch64) it would make more sense to put tight restrictions on where the execution state switches take place – e.g. only at the module or application boundary. So you wouldn’t be able to have a 32bit module which can drop into 64bit mode for key routines if it’s running on a 64bit machine, but you could have a 64bit module which supports being called from 32bit mode (probably with two SWI handlers – one for 32bit, and one for 64bit, both called in 64bit mode, but the 32bit one accepts a pointer to a 32bit register dump rather than taking the arguments directly in registers). This would of course require a 64bit kernel of some kind. If you ask me, the best thing we’d be able to do to provide 64bit support would be to design a new kernel for the OS (call it RISC OS 7?). One which is RISC OS “in spirit” but doesn’t suffer from any of the shortcomings of the current kernel/OS (no process management, no threading support, etc.). Design it to work well on 32bit and 64bit. Then convert the ROM modules to be able to run on it – APIs would mostly stay the same but there’d be extra stuff needed to deal with thread and process management. And rewrite them all in C so that the 64bit version of the OS will actually work. Running 32bit apps (even those which run on 32bit RISC OS 7) on a 64bit system might be tricky though, due to the memory management issues. Maybe we’d have to limit the amount of interaction, e.g. 32bit apps are actually running a VM which contains a stripped-down version of the 32bit OS with tight integration to the 64bit world. |
Ben Avison (25) 445 posts |
Well, objasm has had full support for the new ARMv8 AArch32 (including the cryptographic instructions) since 4.02, February 2014 – well ahead of the curve! None of the Norcroft toolchain even attempts to think about AArch64 though. It’s worth noting that the relationship between AArch64 and AArch32 isn’t like that between ARM and Thumb. As I understand it, you can’t switch between them at will, only at exception entry and return, and it’s one-way – so you can have a 64-bit hypervisor running a mixture of 64-bit and 32-bit OSes, and you can have a 64-bit OS running a mixture of 64-bit and 32-bit applications, but not the other way round. In other words, as long as RISC OS remains 32-bit itself, the only way you’l get 64-bit code running on the same chip is if you run RISC OS under a 64-bit hypervisor which is also running a different 64-bit OS at the same time. From RISC OS’s point of view, that’s basically no change – any 64-bit code would be running on a different virtual machine and wouldn’t be interacting with any normal RISC OS APIs. |
Theo Markettos (89) 919 posts |
Could you make a hypercall from the 32 bit world into a hypervisor that is running a second 64 bit OS, that just happens to share the same page table (ie memory map) as the 32 bit OS? This hypervisor and 64 bit OS are simply a wrapper to translate RISC OS ABIs (ie SWIs) between 64 and 32 bit worlds, which you’d have to do anyway given EABI (and the SVC instruction only having a 16 bit immediate field, not 24 as a SWI). 64 bit programs can share data with the 32 bit world and have access to OS APIs, but they wouldn’t be able to share instructions (not give function pointers to SWIs, no dynamic linking without specific support, no code in dynamic areas). If you ran this with only one core enabled you wouldn’t have the problem of RISC OS not being thread safe. That would conflict badly with wanting to run RISC OS under virtualisation, but it might be possible to patch the hypervisor to support this side-channel (at the expense of security, obviously). |
Rick Murray (539) 13840 posts |
<ROTFL>
Just out of interest, who will be doing all this work? This might be a dumb question – but if we have a multicore processor, why can’t one core run RISC OS as normal in 32 bit mode, and another core execute 64 bit code independently of RISC OS? There will obviously need to be some sort of fudge so RISC OS and it can interact, but otherwise, can’t they otherwise run autonomously? |
Theo Markettos (89) 919 posts |
They can, it just depends what level of sharing you expect between them. With a little bit of hypervisor/OS shim, you could run 64 bit processes on a separate core passing OS calls back to the primary 32 bit core. You would experience difficulties because OS APIs aren’t thread safe (which we know, and is nontrivial to fix) but also because data structures aren’t thread safe. For instance, if you have the two cores accessing shared memory you need locks to ensure they don’t trample over each other. Likely most programs haven’t considered that. If you run them in separate address spaces you don’t need to worry about it, but instead need to worry about passing messages to other processes and build a thread-safe way to do that. In the end it comes down to the usual question in these kind of AMP contexts: if you aren’t going to make the OS multithreadable, what should the OS API look like for programs that run on other cores? It clearly can’t be the full SWI API, but what is the balance between a simple but restricted API (just pipes, maybe no files, no graphics) or an expressive API with complex concurrency behaviour (WIMP etc)? |
Steve Pampling (1551) 8170 posts |
(call it RISC OS 7?). Er, yeah. Took me a while to stop choking and clean up the tea spatters. Edit: Of course really speaking as the next majorly significant version beyond RO5.x it should have been RO6, but since someone left a lump in the road we get bounced over that one. |
Jeffrey Lee (213) 6048 posts |
And rewrite them all in C so that the 64bit version of the OS will actually work. The same people who have been working on multicore support since the PandaBoard port came out. I.e. nobody. Personally, it will take me at least a year to finish off the features that I’m currently working on (iMx6 HDMI audio, zero page relocation/crash reporting, physical memory pools, GraphicsV enhancements). Probably even two years. After that, I’m not really sure what I’ll be looking at next – GraphicsV has been the thing I’ve been wanting to do for 6 years now, and the only significant part I’ve really tackled so far is support for new pixel formats. Hopefully not any new hardware ports, because they’re a massive time sink, both in terms of the initial work to get the OS to a “beta” status, and then in terms of pushing to stable and providing long-term support. But maybe if I’m lucky I’ll be in a position where working on multicore or 64bit support (and being able to work on it long enough to make some useful progress) is a possibility. |
Ben Avison (25) 445 posts |
In a sense, the rewrite I did of ADFS recently is one step in this direction. Not only does it use concurrency locks to support multi-core from day 1 (by virtue of being derived from SDFS) but it’s nearly all in C. I would argue that a module-by-module rewrite like that would be the way to go, rather than breaking everything and attempting to rebuild a new ecosystem from the ground up; it means that each new module can be tested and debugged largely in isolation. It’s just a shame there are so many more modules that would need the same treatment. One thing in our favour is that most of our inter-module APIs (SWIs, service calls etc) pass arguments and results (crucially including all pointers) in registers, and the registers are of course larger in AArch64. This limits the number of hoops a 64-bit RISC OS would need to jump through in order to support both 32-bit and 64-bit applications. For example, in the whole of ADFS, I think the only example of a data structure being part of the public API is the scatter list. In every other respect, the fact that pointer fields in ADFS’s internal data structures and function calls are wider in AArch64 would be an implementation detail that would be taken care of for free by the compiler.
It would be a shame to cripple the potential to run RISC OS as a guest OS under a hypervisor. That’s probably the only way we’ll ever see RISC OS running on platforms where the manufacturers aren’t as open as TI, Freescale and Broadcom have been with their documentation (and arguably a HAL to interface to a hypervisor would be a better use of our limited developer time than continuing to write HALs to enable RISC OS as the host OS on lots of new platforms, now that most new CPUs will have the virtualisation extensions). |
Jeffrey Lee (213) 6048 posts |
One “solution” to the 32bit vs. 64bit pointer problem could be to make the initial 64bit version of the OS use an ABI that uses 32bit pointers (e.g. ILP32). Considering how little RAM RISC OS apps generally use, we could find that we could stick with a 32bit ABI for several years. Then when we actually need the extra logical address space we can switch over to a 64bit ABI (e.g. LP64) and start worrying about how to handle pointer/data marshalling with ILP32 code. If we can do pointer marshalling for ILP32 AArch32 apps, maybe we can also do pointer marshalling for ILP32 AArch64 modules (although I suspect it would be easier to just require modules to be rebuilt for LP64 – so that there aren’t any problems with LP64 modules calling into ILP32 modules. Or maybe the kernel could trap that and prevent it from happening – essentially making the ILP32 modules only accessible from other ILP32 code). I guess it would all depend on the reasons why we decide to make a 64bit version of the OS:
|
Jeffrey Lee (213) 6048 posts |
Some relevant information about the Pi ARM boot process:
I’m not sure what this means as far as the default EL for AArch64 – if Pi 2 starts in HYP by default, presumably selecting AArch64 would start you in EL3, and maybe kernel_old would be ignored. |
Ben Avison (25) 445 posts |
FWIW, I tweaked the BCM2835 HAL to adapt to being entered in HYP mode months ago as soon as I became aware of the change. |
William Harden (2174) 244 posts |
Jeffrey: I would think the main desire for migrating towards AArch64 would be that A32’s days are no doubt numbered. Probably with a lot of numbers yet, but numbered nonetheless. We know how long the migration took from 26→32 + emulation. I wonder whether the way forward would be to utilise a second core (use A32 on Core 0, AArch64 on Core 1), and work out a method to exchange SWI calls and data blocks between cores initially. There would also need to be some mechanism to keep a single module list so that both OSes know which module is being called. To start with, work towards 64-bit modules, SWI calls and service calls, with all applications sitting on core 0. This would allow progress towards a 64-bit ‘OS’ build, albeit one which only runs the actual OS and not application support. That approach might even allow the opportunity to redefine ‘modules’ conceptually – progressively switching off A32 modules in favour of AArch64 ones with a more modern architecture. |
Jeffrey Lee (213) 6048 posts |
Answer: There’s no AArch64 support in their ARM bootloader, so kernel_old=1 is necessary, so expect all cores to enter your code in the fresh-out-of-reset state (Also, see the demo code further down that same thread) |