The release in 2011 of architecture ARMv8 added a whole new instruction set called AArch64 with wider 64b register bank, and renamed what was previously referred to as the ARM instruction set (which RISC OS uses) as AArch32. The AArch64 instructions are not binary compatible with the older ones.
To date, the ARMv8 targets RISC OS supports have been fortunate in still containing an implementation of AArch32 for backwards compatibility, allowing it to run and entirely ignoring the AArch64 aspects. As at 2020 it was becoming clear that Arm intended to wind down ARMv7 and earlier (fewer than 50% of the cores available to license would run that way) to focus on their ARMv8 offerings, of which many dropped AArch32 support entirely.
In April 2021 the ARMv9 architecture was announced with AArch32 relegated to being a license option, in much the same way that 26 bit mode became an option in ARMv4 rarely taken up.
This proposal identifies aspects of RISC OS that will require attention in order to migrate away from AArch32, to help making design decisions, and ultimately a route to implementing changes to ensure there are chips in future to run on.
DDI0847 – The ARMv8 architecture reference manual (A profile)
DDI0608 – The ARMv9 supplement to DDI0847
No more big 32-bit cores for RISC OS from 2022
What would AArch64 BASIC look like?
ARMv8 support
Secondary relevance: Could you run on an Apple MacMini and RISC OS on hypervisor
The AArch32 is the family of instructions with 32 bit wide integer registers addressing up to 2 32 bytes of memory, and also includes the 26 bit addressing mode too.
The AArch64 is the family of instructions with 64 bit wide integer registers addressing up to 2 64 bytes of memory.
The opcode under AArch32 has space for a 24 bit immediate value which encodes the call number, for a total of 16M possible unique values, though in practice some of the number space is used to encode other information such as bit 17 holding the non-error returning ‘X’ flag.
The opcode under AArch64 has a reduced size 16 bit immediate value, for a total of 64k possible unique values, which clearly is insufficient to directly encode all of the currently circulated allocations.
Despite the reduced number space, this does encompass the high value OS SWI block from &00 to &1FF. Since the advent of split I+D caches with the StrongARM is has not been convenient to dynamically generate supervisor call opcodes without incurring a cache flush penalty, and the strong recommendation has been to call OS_CallASWI
or OS_CallASWIR12
instead. This indirect method, passing the call number via a register which is then passed for despatch via an OS SWI could be used to allow existing SWI allocations to be retained regardless of the instruction set in use.
Programmers in C will be used to using the _swix()
and _swi()
functions which are already indirect calling methods, and programmers in BASIC use SYS
which the interpreter can change to an indirect call as required.
Whereas current in AArch32 R0-R15 are 32 bits wide, and therefore can address any location in the 4GB of logical address space, with AArch64 the general purpose register bank can be viewed either as X0-X30 (64 bits each) or W0-W30 (32 bits each). The AArch64 program counter is not directly visible.
Therefore, the AArch32 register bank could be viewed as a subset of the AArch64 register bank, in that parameters passed to a SWI executed as an AArch32 instruction could be losslessly passed to an AArch64 RISC OS kernel for handling by sign or zero extending as appropriate.
Note that in a mixed AArch32/AArch64 system it is not guaranteed whether the narrower 32 bit registers (containing the SWI’s parameters) are zero extended when entering the AArch64 exception handler. The AArch64.CallSupervisor
pseudo code in the ARMv8 architecture reference manual is the clearest place to follow this through AArch64.TakeException
you get to AArch64.MaybeZeroRegisterUppers
where the loop to clear is conditional on ConstrainUnpredictableBool
.
Assuming a register widening solution is adopted, the only places where the size of a pointer is of concern is where the pointer is passed in via a parameter block held in memory. This section surveys the core module SWIs for places where this technique is used in order to see how widespread a problem it might be.
OS_Word 0 (Read line)
OS_Word 16? (Econet transmit)
OS_Word 17? (Econet open or receive)
OS_Word 21 (Define pointer shape)
OS_Word 22 (Write screen base address)
OS_FSControl 26 (Copy objects)
OS_ChangeEnvironment handler 6 (Error handler)
OS_ChangeEnvironment handler 7 (Callback handler)
OS_ChangeEnvironment handler 8 (BreakPoint handler)
OS_DelinkApplication and OS_RelinkApplication
OS_HeapSort[32]
OS_ReadMemMapEntries and OS_SetMemMapEntries and OS_FindMemMapEntries
OS_Memory 0 (General page block operations)
OS_SpriteOp 52 and OS_SpriteOp 56 and OS_SpriteOp 65
Wimp_Poll 13 (Pollword non-zero)
Wimp icon blocks include 32b pointers to sprite areas or validation strings overloaded with the 12 byte icon data.
Wimp menu blocks can include a 32b pointer to a submenu at offset +4 of the menu item data.
Podule_ReadInfo?
FileCore_MiscOp 6 (Read processed FileCore_Create block)
FileCore_DiscOp[64]
Draw path block code 1 is a 32b continuation path block pointer. There are plenty of spare code numbers to assign a new one for 64b pointers.
RemotePrinterSupport_EnumerateUSBPrinters?
MimeMap_Translate?
CompressPNG_Start?
PDumperSupport_Claim?/PDumperSupport_Free?/PDumperSupport_Find?
SharedCLibrary_LibInit[_A|_R|Module|APCS_32|ModuleAPCS_32]
Resolver_GetHost? and Resolver_GetHostByName?
ATA_PacketOp?
USBDriver_ScheduleSoftInterrupt
Driver Information Blocks (DIB) include pointers to the name, address, module, and location which are 32b in size and surrounded by other structure members. Pointers to DIBs also appear in registers for Service_DCIFrameTypeFree and Service_DCIDriverStatus.
The transmit SWI also uses an mbuf chain which has 32 bit linked list pointers in it (see notes on MBuf Manager).
The following list modules which have been checked but whose SWIs don’t have any potential pointer issues. Modules which don’t implement any SWIs, such as application modules, are not listed here.
* AcornHTTP * AcornSSL * ADFS * ATAPI * BCMSupport * BlendTable * BootFX * Buffer Manager * CDFS, CDFSDriver * ColourTrans * CompressJPEG * DDEUtils * Debugger * DeviceFS * DHCP * Dialler * DOSFS * DragASprite * DrawFile * FilerAction * Filter Manager * Font Manager * Free * Freeway * FrontEnd * FSLock * GPIO * Hourglass * IIC * Internet (Socket) * InverseTable * Joystick * JPEG | * MakePSFont * NetFS * NetMonitor * NetPrint * NetTime * NFS * Parallel Device Driver * PDriver * PDumper * Portable Manager * RamFS * RedrawManager * ResourceFS * RTC * RTSupport * ScreenBlanker * ScreenFX * ScreenModes * SCSIFS, SCSIDriver * SDFS * ShareFS * ShellCLI * SMP * Sound (Level 1), Sound (Level 2) * Sound Control * Squash * SuperSample * TaskWindow * Toolbox * URI * VCHIQ * VFPSupport * ZLib |
Assuming a register widening solution is adopted as for SWIs, the only places where the size of a pointer is of concern is where the pointer is passed in via a parameter block held in memory. This section surveys the core module service calls for places where this technique is used in order to see how widespread a problem it might be.
As noted above, a ResourceFS file block includes a 32b offset.
The limitation of a 32b address in the page block already causes a problem with systems whose physical memory map has memory above 4GB. For this Service_PagesUnsafe64 has been introduced.
The limitation of a 32b address in the page block already causes a problem with systems whose physical memory map has memory above 4GB. For this Service_PagesSafe64 has been introduced.
The linked list of DIBs assumes a link pointer fits into a 32b value in memory.
The linked list of statistics providers assumes a link pointer fits into a 32b value in memory.
Subreason 1 uses a linked list of USB service call blocks which assume a link pointer fits into a 32b value in memory.
The HAL device descriptors include several absolute function pointers and pointers to description strings which are 32b.
The render state block pointed to by R2 contains a number of 32b limited pointers.
The font state block pointed to by R2 contains a number of 32b limited pointers.
The following list service call ranges which have been checked but which don’t have any potential pointer issues. Modules which don’t implement any service calls are not listed here.
* ADFS (&10800) * SCSI (&20100) * Wimp (&400C0) * NetPrint (&40200) * Toolbox (&44EC0) * SDIODriver (&81040) * IIC (&81100) * Window (&82880) * URL (&83E00) |
The SWIs OS_File and OS_GBPB include some subreasons which deal with load and execution addresses. These are currently 32b quantities, albeit deprecated in use. Various places store these as 32b quantities for example: in the extended attributes of a ZIP file, in file server messages, in the directory entries of FileCore discs.
The SWI OS_FSControl 12 (Add FS) and OS_FSControl 35 (Add image FS) pass a pointer to a FileSwitch FS Information Block which includes 32b offsets to functions to implement a filing system. Provided modules are not expanded beyond their existing maximum size of 64MB these 32b offsets will suffice because they are relative to the module base address.
The SWI OS_SpriteOp doesn’t make use of absolute addresses in memory. Provided sprites are not expanded beyond their existing maximum size of 2GB these 32b offsets will suffice because they are relative to the sprite area base.
The 4 word MessageTrans block is opaque to the caller, so while it may contain a pointer, its layout could be changed without impacting clients.
The SWI ResourceFS_RegisterFiles includes a block with a 32b offset to the next item to add in the chain, however that still allows blocks to be kept ±2GB apart.
Devices registered via the list in R1 to DeviceFS_Register include a 32b offset to the device name as the first word of the buffer. This limits the string to be within ±2GB of the block.
Toolbox Res files include 32b offsets (to the body, strings, etc) some of which are relocated when the Res file is loaded into absolute addresses.
Mbuf Manager works with mbctl and mbuf structures, these contain both 32b function pointers and linked list pointers.
The AIF format has always included a flags word, with a small number of valid values allocated by Acorn or Arm for their use. Following dialogue with Arm there are still plenty of spare flag bits. Therefore, a flag bit can be allocated to denote the code was intended to be run on a 64 bit version of the OS and rejected on 32 bit versions.
In addition, the first few words are expected to be a limited subset of AArch32 instructions (B, BL, MOV) to add confidence to the decision.
Plain binaries which are *Run
and rely on the 32b load and execution addresses held in the RISC OS file attributes would either require a change to the FileCore logical format to support longer attributes, or be unsupported (the use of load/execution addresses has been deprecated for some time), or be limited to loading into the low 4GB of the memory map as presently.
Utilities run in User mode after being loaded into the RMA. The kernel currently runs utilities without type checks since in User mode any undefined instruction can only cause inconsequential damage. An optional “32OK” signature appended to the image is used by Aemulor to suppress its emulation, so precedent exists for adding “64OK” to denote the change of instruction set. Heuristics to detect AArch32 opcodes are likely to lead to false matches due to the AArch64 opcodes overlapping.
Since RISC OS 5 the module header has included a flags word to provide for future changes. Therefore, a flag bit can be allocated to denote the code was intended to be run on a 64 bit version of the OS, rejected on 32 bit versions.
In addition, the first few words are expected to be a limited subset of AArch32 instructions (B, BL, MOV) to add confidence to the decision.
Podule loaders have 4xAArch32 instructions and an optional “32OK” signature at the start. Since AArch64 instructions are also 32b in size, and the signature could be changed, should there be an ARMv8+ machine with a podule bus, this can be accommodated.
Phase | Status | Completion | Latest updates |
---|---|---|---|
Conceptual design | In progress | 5% | 10-Apr-2021 Document updated (see history) 01-Aug-2021 Document updated (see history) 18-Apr-2022 Document updated (see history) 19-May-2022 Document updated (see history) |
Mock ups/visualisation | - | - | - |
Prototype coding | - | - | - |
Final implementation | - | - | - |
Testing/integration | - | - | - |
v1.00 – 10-Apr-2021
v1.01 – 01-Aug-2021
v1.02 – 18-Apr-2022
v1.03 – 19-May-2022