Showing changes from revision #25 to #26:
Added | Removed | Changed
This guide exists to help programmers update their code to run correctly on ARMv6+ processors. This guide only covers differences between ARMv5 and the newer architectures – i.e. it is assumed that your code is already 32bit compatible and runs correctly on an Iyonix (and without the use of Aemulor!). Also, this guide only covers differences that are relevant to the ARM instruction set – Thumb-only details are not listed. Note that you shouldn’t confuse ARMv6 and ARMv7 (current generations of the ARM architecture) with ARM6 and ARM7 (processor families used in the Risc PC and A7000).
Note that whilst StrongARM processors are ARMv4, StrongARM RiscPCs are generally treated as ARMv3 or ARMv3m, due to the RiscPC memory bus not supporting halfword transfers. This is generally true of emulation solutions too.
The Raspberry Pi 2 has an ARMv7 CPU, whereas older models (A, B, B+, Compute) used an ARMv6 CPU. The ARMv6 models are able to take advantage of a compatibility mode (on by default for the standard RISC OS Pi distribution) to allow them to run non-ARMv7 compatible software – see below for more details.
If software is very old (pre-2002), it probably isn’t ‘32 bit compatible’. You might wish to read the Writing 32 bit code guide on how to convert it, in addition to following the details below.
If a program is written in plain C, chances are that a simple recompilation of the code is all that’s needed to ensure your program is compatible.
For users of Norcroft you need to use the -memaccess option to disable the use of unaligned loads/stores.
-memaccess -L22-S22-L41
The above will produce code which is compatible with all ARM architectures. Norcroft version 5.68 and above uses this setting by default. If you want to upgrade to the latest version, and are already using a ROOL branded copy of the Desktop Development Environment, you can upgrade for half price.
If you don’t require RiscPC compatibility, you can change ‘-L22-S22’ to ‘+L22+S22’ to enable the use of halfword loads/stores (note you will also need to use the appropriate -cpu option to enable use of the LDRH/STRH instructions).
To ensure compatibility with all machines, GCC version 4.7.4 release 3 (or newer) is recommended. Earlier releases of GCC 4.7 have ARMv8 compatibility issues in UnixLib (use of SWP instruction), and versions of GCC prior to 4.1.1 release 2 have issues with unaligned loads/stores.
The latest release of GCC is available from the RISC OS GCCSDK website
There are several differences between ARMv7 and ARMv5 which can cause hand-crafted assembler to fail or malfunction. These issues are listed below.
For ARMv5 and below, the bottom two bits of the source/destination address were always ignored and treated as zero.
For ARMv7 and above, an abort will be raised if the source/destination address is not word-aligned. The abort will occur regardless of the alignment exceptions setting.
For ARMv6, the behaviour is configurable via the system control register – but for maximum compatibility your code should assume ARMv7 behaviour.
For ARMv5 and below, this has the behaviour of a “rotated load”. The data is loaded from a word-aligned address, and then rotated by the number of bytes specified by the bottom two bits of the original address. I.e., for “LDR R1,[R0]”, the behaviour is as follows:
BIC temp,R0,#3 LDR R1,[temp] AND temp,R0,#3 MOV temp,temp,LSL #3 MOV R1,R1,ROR temp
For ARMv7 and above, this has the behaviour of a “sequential load” – four bytes are loaded from sequential locations in memory. I.e., for “LDR R1,[R0]”, the behaviour is as follows:
LDRB R1,[R0] LDRB temp,[R0,#1] ORR R1,R1,temp,LSL #8 LDRB temp,[R0,#2] ORR R1,R1,temp,LSL #16 LDRB temp,[R0,#3] ORR R1,R1,temp,LSL #24
For ARMv6, the behaviour is again configurable via the system control register.
The important thing to realise is that for any particular unaligned load, the bottom N bytes of the data will be correct, while the top 3-N bytes will be “incorrect”. This has caused problems with C compilers, where halfword loads were typically implemented using the following pseudocode:
LDR R1,[R0 EOR #2] MOV R1,R1,(LSR|ASR) #16
This will produce different results in ARMv5 and ARMv7, and could easily lead to unexpected data corruption. This is why alignment exceptions are currently turned on by default for versions of RISC OS running on ARMv6+.
Thus, the recommendation is to avoid the use of unaligned LDRs when running on ARMv6+.
On ARMv5 and below, the bottom two bits of the destination address are ignored and treated as zero.
On ARMv7 and above, STR performs a “sequential write” – i.e. writing bytes to sequential memory locations using the same addressing scheme as LDR.
For ARMv6, the behaviour is again configurable via the system control register.
Due to the differences in behaviour, and the use of alignment exceptions by default on ARMv6+, it is recommended that unaligned STRs are avoided when running on ARMv6+.
On ARMv6 and below, a non-halfword aligned LDRH/STRH has unpredictable behaviour.
On ARMv7 and above, non-halfword aligned LDRH/STRH performs a sequential load/store, as per LDR/STR.
Due to the differences in behaviour, and the use of alignment exceptions by default on ARMv6+, it is recommended that unaligned LDRH/STRH is avoided when running on all architecture versions.
Other, more exotic multi-byte memory access instructions (LDRD, STRD, LDC, STC, LDRT, STRT, etc.) have had their behaviour changed as well. However, since alignment exceptions are likely to be turned on, it is recommended that you avoid using unaligned loads/stores of any kind unless your code is certain that it will work in the intended manner.
On all ARM processors running in 32bit mode, any ALU-based instruction (MOV, ADD, BIC, etc.) that writes to the PC and updates the flags via the ‘S’ bit will behave in an unpredictable manner if executed in User mode or System mode. So if your code falls foul of the above rule then it clearly isn’t as 32bit-compatible as you thought when you read the notice in the Introduction.
Unfortunately, not all unpredictable behaviours are created equal – experience suggests that MOVS pc, lr behaves “correctly” on an IOP321 (i.e. Iyonix), but fails on OMAP3, causing code which was otherwise believed to be 32bit compatible to fail.
This has the exact same rules as MOVS pc, lr – the combination of loading the PC and using the ^
flag causes the CPU to try and set the PSR to the value of the (mode-specific) SPSR, but in User and System mode there is no SPSR, resulting in the instruction being unpredictable when executed in those modes. This instruction also seems to be another case of an unpredictable instruction that “works” on IOP321 but fails on OMAP3.
On ARMv7 and above, any ALU-based instruction (MOV, ADD, BIC, etc.) that writes to the PC performs what is known as interworking. Interworking is a process used to switch between ARM and Thumb mode, and is controlled by the bottom two bits of the target address. This means that unless the bottom two bits of the target address are zero, your code is likely to behave unexpectedly on ARMv7 and above. X1 = switch to Thumb, 00 = switch to ARM, 10 = unpredictable.
On ARMv6 and below, the bottom two bits are ignored and treated as zero.
In ARMv7 and above, an LDM with writeback, and with the base register in the register list, is an unpredictable instruction.
In ARMv6 and below, it was merely the final value of reg that would be unpredictable, and not the entire instruction itself.
In ARMv7 and above the CPU is allowed to take the undefined instruction vector if an undefined instruction is “executed” but fails the condition code test. For example, code which identifies the CPU architecture and then uses conditional execution to execute the correct MRC/MCR instruction may fail. This code should be restructured to branch over the non-excuted paths.
Generally this behaviour of taking the undefined instruction vector is seen on ARMv8 devices rather than on ARMv7.
In ARMv6, SWP/SWPB were deprecated in favour of the new LDREX/STREX instructions. In ARMv7VE support for the SWP/SWPB instructions is optional, and in ARMv8 support for the instructions has been removed completely.
Software which uses SWP/SWPB should be updated to use the newer LDREX/STREX suite instructions. OS_PlatformFeatures 0 can now indicate which instructions are available.
In ARMv8, reading the CPSR from user mode will return an “unknown” value for the E, A, I, F, and M fields (essentially the lower 16 bits of the PSR). So far this behaviour has only been observed on ARMv8 CPUs which lack privileged-mode AArch32 support (which RISC OS is unable to directly run on), but the issue could become more significant as time passes (especially if you believe more recent editions of the ARMv7 ARM, which suggest that the same behaviour is also valid on ARMv4-ARMv7!)
It’s expected that the above behaviour will mainly impact code which attempts to determine whether the CPU is in a 26bit or 32bit mode; on a new CPU the mode field may be read as zero, which will cause the code to incorrectly believe the CPU is in a 26bit mode.
If you just want to detect 26 vs. 32bit CPU mode, the following code sequence will work on all CPUs and in all CPU modes:
TEQ PC,#0 R0, TEQ R0 PC,PC ; NE sets Z (can be omitted if 26bit, not in User mode) TEQ PC, PC ; EQ if 32bit in a 32-bit mode, NE if 26-bit
Alignment exceptions (referred to as alignment faults in the ARM ARM) are a feature available in ARMv6 and above. If alignment faults are enabled, any improperly aligned memory access (i.e. loading a word from a 1, 2 or 3 byte offset, or loading a halfword from a byte offset) will trigger an alignment fault. At present RISC OS will report this with the standard “Abort on data transfer” message, so you may need to use the Debugger module to manually determine the exact cause of the error.
Note that all ARMv6+ ROM images have alignment exceptions enabled by default. This is to help track down code which relies on ARMv6+ -incompatible unaligned LDR/STR behaviour. It is unknown when, or if, alignment exceptions will be switched back to the “off” state, so it is advised that you modify your code accordingly.
For testing purposes only, the below BASIC program can be used to turn alignment exceptions off:
10 DIM code% 256 20 P%=code% 30 [ OPT 0 40 SWI "OS_EnterOS" 50 MRC CP15,0,R0,C1,C0,0 60 BIC R0,R0,#2 70 MCR CP15,0,R0,C1,C0,0 80 MSR CPSR_c,#&10 90 MOV R0,R0 100 MOV PC,R14 110 ] 120 CALL code%
The same program, but with an ORR instruction instead of BIC (line 60), can be used to turn alignment exceptions back on again.
Raspberry Pi comes with a Configure plugin to set different modes, on a whole-system basis. These are:
ARMv5 compatibility mode is the default, so do remember to flip into ARMv7 strict mode when testing.
Note that the ARMv5 compatibility mode is not available on the Raspberry Pi 2 or 3, due to the change to newer CPUs.
The LDREX/STREX instructions that were introduced with ARMv6 rely on a component of the CPU called the ‘local exclusive monitor’. For the load/store exclusive instructions to work correctly, the ARM requires that operating systems manually reset the exclusive monitor to the ‘open’ state before returning to any pre-empted code.
As far as RISC OS software is concerned, the exclusive monitor must be manually cleared under the following situations:
To reset the exclusive monitor, you should call the function returned by OS_PlatformFeatures 35. Alternatively, if you are certain about the requirements of the CPU you can either execute a STREX instruction to a dummy location, or use the CLREX instruction (if available – check with OS_PlatformFeatures 0).
Failure to correctly clear the exclusive monitor may result in a STREX succeeding in a situation in which it should have failed.
IRQ handlers and vectored callbacks (OS_AddCallBack) do not need to clear the exclusive monitor; the OS will do it for you on exit from the handler. The above list comprises the situations where code returns directly to the pre-empted foreground task rather than exiting via the OS.
There are some exceptions to the above rules, for situations where the monitor does not need to be cleared:
If any of the above are true then the monitor does not need to be cleared manually.
Note that RISC OS does not guarantee that the exclusive monitor is in the open state on return from SWIs; therefore if your code calls a SWI then it must take action to clear the monitor before it returns.
For full details of the operation of the local exclusive monitor (and the related ‘global exclusive monitor’), consult the ARM ARM. Understanding the operation of the exclusive monitors is critical to correctly writing software which uses the exclusive access instructions.
For much more detailed information about the differences between ARM architectures, it is recommended that you consult the relevant version of the ARM ARM.