The release in 2011 of architecture ARMv8 added a whole new instruction set called AArch64 with wider 64b register bank, and renamed what was previously referred to as the ARM instruction set (which RISC OS uses) as AArch32. The AArch64 instructions are not binary compatible with the older ones.
Until very recently, the ARMv8 targets we would like RISC OS to support have fortunately still contained an implementation of AArch32 for backwards compatibility, allowing RISC OS to run and entirely ignoring the AArch64 aspects. As at 2020 it was becoming clear that Arm intended to wind down ARMv7 and earlier (fewer than 50% of the cores available to license would run that way) to focus on their ARMv8 offerings, of which many dropped AArch32 support entirely.
In April 2021 the ARMv9 architecture was announced with AArch32 relegated to being a license option, in much the same way that 26 bit mode became an option in ARMv4 rarely taken up.
At the end of September 2023, Raspberry Pi announced their new Raspberry Pi 5, with a 64-bit quad-core Arm Cortex-A76 processor. While this still supports 32 bit instructions, crucially only in ‘user’ mode. RISC OS makes heavy use of ‘supervisor’ mode, which this processor doesn’t implement. As such, there is no straightforward way to port RISC OS to the Pi 5.
This proposal identifies aspects of RISC OS that will require attention in order to migrate away from AArch32, to help making design decisions, and ultimately a route to implementing changes to ensure there are chips in future to run on.
DDI0847 – The ARMv8 architecture reference manual (A profile)
DDI0608 – The ARMv9 supplement to DDI0847
IHI0055 – Procedure Call Standard for the Arm 64-bit Architecture
No more big 32-bit cores for RISC OS from 2022
What would AArch64 BASIC look like?
ARMv8 support
Secondary relevance: Could you run on an Apple MacMini and RISC OS on hypervisor
The AArch32 is the family of instructions with 32 bit wide integer registers addressing up to 2 32 bytes of memory, and also includes the 26 bit addressing mode too.
The AArch64 is the family of instructions with 64 bit wide integer registers addressing up to 2 64 bytes of memory.
Since RISC OS is coming to the 64 bit scene rather late, it has the advantage of being able to look at how other mainstream operating systems have approached the problem (though some solutions may not be practical on an Arm processor).
In the remainder of this analysis we assume approach (4) is the primary solution, though to assist development elements of (3) and (5) may be used as temporary measures. This is not unlike the approach taken when getting the first 32 bit versions of RISC OS to work – initially they kept the old memory map with 28MB application limit, and once everything stabilised those limits were raised and more and more of the OS was moved higher up to free up the large application slot we enjoy today.
The opcode under AArch32 has space for a 24 bit immediate value which encodes the call number, for a total of 16M possible unique values, though in practice some of the number space is used to encode other information such as bit 17 holding the non-error returning ‘X’ flag.
The opcode under AArch64 has a reduced size 16 bit immediate value, for a total of 64k possible unique values, which clearly is insufficient to directly encode all of the currently circulated allocations.
Despite the reduced number space, this does encompass the high value OS SWI block from &00 to &1FF. Since the advent of split I+D caches with the StrongARM is has not been convenient to dynamically generate supervisor call opcodes without incurring a cache flush penalty, and the strong recommendation has been to call OS_CallASWI
or OS_CallASWIR12
instead. This indirect method, passing the call number via a register which is then passed for despatch via an OS SWI could be used to allow existing SWI allocations to be retained regardless of the instruction set in use.
Programmers in C will be used to using the _swix()
and _swi()
functions which are already indirect calling methods, and programmers in BASIC use SYS
which the interpreter can change to an indirect call as required.
Whereas current in AArch32 R0-R15 are 32 bits wide, and therefore can address any location in the 4GB of logical address space, with AArch64 the general purpose register bank can be viewed either as X0-X30 (64 bits each) or W0-W30 (32 bits each). The AArch64 program counter is not directly visible.
Therefore, the AArch32 register bank could be viewed as a subset of the AArch64 register bank, in that parameters passed to a SWI executed as an AArch32 instruction could be losslessly passed to an AArch64 RISC OS kernel for handling by sign or zero extending as appropriate.
Note that in a mixed AArch32/AArch64 system it is not guaranteed whether the narrower 32 bit registers (containing the SWI’s parameters) are zero extended when entering the AArch64 exception handler. The AArch64.CallSupervisor
pseudo code in the ARMv8 architecture reference manual is the clearest place to follow this through AArch64.TakeException
you get to AArch64.MaybeZeroRegisterUppers
where the loop to clear is conditional on ConstrainUnpredictableBool
.
Assuming a register widening solution is adopted, the only places where the size of a pointer is of concern is where the pointer is passed in via a parameter block held in memory. This section surveys the core module SWIs for places where this technique is used in order to see how widespread a problem it might be.
The following list modules which have been checked but whose SWIs don’t have any potential pointer issues. Modules which don’t implement any SWIs, such as application modules, are not listed here.
* AcornHTTP * AcornSSL * ADFS * ATAPI * BCMSupport * BlendTable * BootFX * Buffer Manager * CDFS, CDFSDriver * ColourTrans * CompressJPEG * DDEUtils * Debugger * DeviceFS * DHCP * Dialler * DOSFS * DragASprite * DrawFile * FilerAction * Filter Manager * Font Manager * Free * Freeway * FrontEnd * FSLock * GPIO * Hourglass * IIC * Internet (Socket) * InverseTable * Joystick * JPEG | * MakePSFont * NetFS * NetMonitor * NetPrint * NetTime * NFS * Parallel Device Driver * PDriver * PDumper * Portable Manager * RamFS * RedrawManager * ResourceFS * RTC * RTSupport * ScreenBlanker * ScreenFX * ScreenModes * SCSIFS, SCSIDriver * SDFS * ShareFS * ShellCLI * SMP * Sound (Level 1), Sound (Level 2) * Sound Control * Squash * SuperSample * TaskWindow * Toolbox * URI * VCHIQ * VFPSupport * ZLib |
Assuming a register widening solution is adopted as for SWIs, the only places where the size of a pointer is of concern is where the pointer is passed in via a parameter block held in memory. This section surveys the core module service calls for places where this technique is used in order to see how widespread a problem it might be.
List of AArch32 affected service calls
The following list service call ranges which have been checked but which don’t have any potential pointer issues. Modules which don’t implement any service calls are not listed here.
* ADFS (&10800) * SCSI (&20100) * Wimp (&400C0) * NetPrint (&40200) * Toolbox (&44EC0) * SDIODriver (&81040) * IIC (&81100) * Window (&82880) * URL (&83E00) |
The SWIs OS_File and OS_GBPB include some subreasons which deal with load and execution addresses. These are currently 32b quantities, albeit deprecated in use. Various places store these as 32b quantities for example: in the extended attributes of a ZIP file, in file server messages, in the directory entries of FileCore discs.
The SWI OS_FSControl 12 (Add FS) and OS_FSControl 35 (Add image FS) pass a pointer to a FileSwitch FS Information Block which includes 32b offsets to functions to implement a filing system. Provided modules are not expanded beyond their existing maximum size of 64MB these 32b offsets will suffice because they are relative to the module base address.
The SWI OS_SpriteOp doesn’t make use of absolute addresses in memory. Provided sprites are not expanded beyond their existing maximum size of 2GB these 32b offsets will suffice because they are relative to the sprite area base.
The 4 word MessageTrans block is opaque to the caller, so while it may contain a pointer, its layout could be changed without impacting clients.
The SWI ResourceFS_RegisterFiles includes a block with a 32b offset to the next item to add in the chain, however that still allows blocks to be kept ±2GB apart.
Devices registered via the list in R1 to DeviceFS_Register include a 32b offset to the device name as the first word of the buffer. This limits the string to be within ±2GB of the block.
Toolbox Res files include 32b offsets (to the body, strings, etc) some of which are relocated when the Res file is loaded into absolute addresses.
Mbuf Manager works with mbctl and mbuf structures, these contain both 32b function pointers and linked list pointers.
The AIF format has always included a flags word, with a small number of valid values allocated by Acorn or Arm for their use. Following dialogue with Arm there are still plenty of spare flag bits. Therefore, a flag bit can be allocated to denote the code was intended to be run on a 64 bit version of the OS and rejected on 32 bit versions.
However, up to now RISC OS doesn’t actively reject unknown flags in the AIF header when loaded – the error “Application is not 32-bit compatible” is actually generated when an attempt is made to initialise the SharedCLibrary using the APCS-R calling convention, rather by FileSwitch as might have been expected. Therefore changing the flag word at offset +0x30 in the AIF header wouldn’t guarantee an error if attempting to run it on RISC OS 5 or earlier incarnations which expect either a value of 32 or 26 or 0 in the bottom byte of the flags and may have been written to not use the SharedCLibrary at all.
Since application binaries are central to the purpose of an application a new filetype could be allocated for a 64 bit executable, but unfortunately all the filetype numbers in the range &Fxx are already exhausted.
Instead, the AIF header will contain a minimal code sequence written using AArch32 instructions which throws an error. The flags word will contain 64 in the bottom byte of the flags, and the AIF header extended to be followed by a second header with extra fields and information relevant for a 64 bit environment.
Compare with the Portable Executable format (PE) which prefixes the 64 bit executable with a simple MS-DOS program which just gives the error “This program cannot be run”.
Plain binaries which are *Run
and rely on the 32b load and execution addresses held in the RISC OS file attributes would either require a change to the FileCore logical format to support longer attributes, or be unsupported (the use of load/execution addresses has been deprecated for some time), or be limited to loading into the low 4GB of the memory map as presently.
Utilities run in User mode after being loaded into the RMA. The kernel currently runs utilities without type checks since in User mode any undefined instruction can only cause inconsequential damage. An optional “32OK” signature appended to the image is used by Aemulor to suppress its emulation, so precedent exists for adding “64OK” to denote the change of instruction set. Heuristics to detect AArch32 opcodes are likely to lead to false matches due to the AArch64 opcodes overlapping.
Since RISC OS 5 the module header has included a flags word to provide for future changes. Therefore, a flag bit can be allocated to denote the code was intended to be run on a 64 bit version of the OS, rejected on 32 bit versions.
In addition, the first few words are expected to be a limited subset of AArch32 instructions (B, BL, MOV) to add confidence to the decision.
Podule loaders have 4xAArch32 instructions and an optional “32OK” signature at the start. Since AArch64 instructions are also 32b in size, and the signature could be changed, should there be an ARMv8+ machine with a podule bus, this can be accommodated.
Modules and applications can currently be squeezed. The kernel includes a lightweight decompressor for modules, and applications are typically squeezed to reduce the time they take to load off disc.
Both of these compression algorithms are somewhat biased as the tables used to pick out frequently occurring values are based on common AArch32 instructions. Given the increased performance of 64 bit systems and modern hardware it is likely that the compression algorithm could be changed for something more common – for example ZLib or UNIX compress.
The BASIC interpreter could need to address memory above the 4GB boundary, for example through DIM
or interacting with SWIs through the SYS keyword, but with its current 4 byte integer variables would not be able to do so.
Other dialects have already introduced 64 bit integers, so ARM BBC BASIC may be able to copy that syntax for declarations and indirection.
The integral types in C are unsigned/signed versions of char, short, int, long, long long, and pointers. C programmers are probably already familiar with the pitfalls moving between systems where an int has only 16 bits rather than 32, so how big might an int be in a 64 bit environment?
When specific sized variables are required using one of the types from <stdint.h> is highly recommended. For everything else there’s a design decision to make of how to change the integral types, which has its own terminology:
Historically Windows defined lots of in-memory structures using the LONG type and decided to pick LLP64 as a means of minimising the impact on its API had all those LONG variables suddenly changed size, while most other operating systems chose LP64. Given most of RISC OS’s APIs use int to mean 32 bit word it would seem sensible to follow the LP64 data model, and this is also what Arm recommend in their AArch64 Procedure Calling Standard.
In terms of change, this does mean that a careful inspection of source code for places where the size of a long or size of a pointer was assumed to be 32 bits will be needed. In general this is only important when fixed sized structures in memory exist or where casting between pointers and integers has occurred.
Phase | Status | Completion | Latest updates |
---|---|---|---|
Conceptual design | In progress | 20% | 03-Oct-2023 Document updated (see history) |
Mock ups/visualisation | - | - | - |
Prototype coding | - | - | - |
Final implementation | - | - | - |
Testing/integration | - | - | - |
v1.00 – 10-Apr-2021
v1.01 – 01-Aug-2021
v1.02 – 18-Apr-2022
v1.03 – 19-May-2022
v1.04 – 11-Jun-2022
v1.05 – 23-Jul-2022
v1.06 – 24-Sep-2023
v1.07 – 30-Sep-2023
v1.08 – 03-Oct-2023