Cortex A7 deprecated instructions
Jon Abbott (1421) 2651 posts |
Is there a reference of instructions that are now deprecated on the Cortex A7? Two I’ve found through trial and error are: STMxx Rd!, {< reglist containing Rd >} An example of the STM would be:
The LDR is a potential compiler issue, if it’s being lazy and encoding LDR R0,[R0] without setting the Prefix bit. Others I’ve found are: The following are deprecated but may still work: |
Jeffrey Lee (213) 6048 posts |
Appendix I of the ARMv-AR ARM has some information, but most of it’s hidden away in the details for each individual instruction. E.g. in the assembler syntax for STM, they say this: The SP and PC can be in the list in ARM instructions, but not in Thumb instructions. However, And for STR (immediate, ARM), the pseudocode has this: if wback && (n == 15 || n == t) then UNPREDICTABLE; |
Jon Abbott (1421) 2651 posts |
Thanks for the info Jeffrey, I did check the individual instructions, although obviously not in the correct area of the ARM ARM.
The behaviour I’m seeing with SP being in the list, is that it triggers an unpredictable Undefined instruction Abort – which is odd in itself. The STR also triggers an Undefined instruction Abort (I think – may have been a normal Abort) – certainly not an unpredictable result as it happens every time an offending instruction is decoded. |
Jon Abbott (1421) 2651 posts |
Now that I’ve coded up the misaligned Abort handler for LDR/STR/LDM/STR/SWP I’ve been running games on the Pi B+ and Pi 2 in parallel as comparative tests (with both set to generate alignment faults.) The codebase in ADFFS is identical across the two CPU’s, but I’m seeing differences. Examples include:
I’m working my way through the RISCOS issues and am reporting them once I’ve confirmed they are ARMv7 incompatilities or bugs in general. It’s slow going though as I have to first debug the games and confirm it’s not a bug in the game that’s triggering an issue in RISCOS. The fact I’m seeing differences across the two machines however, does imply there’s possibly more deprecated features than the two I’ve listed in the OP, which could be affecting the OS in the assembler source, DDE compiled sections or possibly both. Does DDE take account of the deprecated features? I’m assuming it does. I’m also assuming the two I’ve listed above are general ARMv7 deprecations and not specific to the Cortex-A7 (I’ve yet to check) There’s also the question of errata on the Cortex-A7 823274: Load or store which fails condition code check might cause deadlock of data corruption – looks fairly fatal if it’s coming into effect. I suspect its not, but it looks fairly easy to reproduce accidentally. One other odd thing I’m seeing, is where Aborts are triggered in the OS. When the Abort handler tries to read the instruction, it triggers another Abort as if Abort mode can’t read the memory. SVC seems to be able to read the instruction okay. |
Steffen Huber (91) 1953 posts |
Pi B(+) is ARMv6, not v7. Or did I misunderstand? Not sure if this is relevant wrt alignment exception stuff. |
Jon Abbott (1421) 2651 posts |
Good point, I’ve modified my post accordingly. As you’ve noted it’s mainly related to alignment exception and deprecated ARMv7 issues. |
Jeffrey Lee (213) 6048 posts |
I’ve seen that myself, and have had to add little hacks to the OS to try and work around them (see the comments for VMSAv6 revisions 1.1.2.6 and 1.1.2.9. My theory is that they’ve all been caused by places where we haven’t been doing the correct cache/TLB maintenance when changing the page tables. Earlier this year a lot of fixes went in to try and make sure the OS does things properly (newer CPUs were seeing more and more issues), but I’m yet to pluck up the courage to try and remove some of the earlier hacks. So if you’re poking page tables manually then make sure you’re following all the correct cache/TLB maintenance rules as laid out in the ARM ARM (which I think I managed to summarise here). It may also be worth double-checking the ARMv7 compatibility primer for any of the unpredictable code sequences which ADFFS is allowing to be executed. |
Jon Abbott (1421) 2651 posts |
Almost certainly the case. Is all cache/TLB maintenance being done by the ARMOp’s? There’s no internal direct calls that someone’s put in for speed anywhere? It doesn’t explain why reading the ROM is generating Aborts though, as I’d expect it to all be paged in and nothing to be changing it’s TLB entries. I can check though, by adding a TLB and cache clean prior to loading the Aborting instruction1.
I’ve now switched all my code to the relevant ARMOp’s and confirmed they’re all working as expected.
Although that page is useful as a primer, it doesn’t cover deprected instructions such as the two I’ve detailed in the OP. I’m fairly certain there are other gotcha’s that I’ve not discovered yet, I’ll have to read the PDF small print to see what hidden gems ARM have thrown in. I’ve implemented unaligned LDR/STR/LDM/STM in the Abort handler and verified they match ARM3 behaviour. SWP is also covered as although you can use it, it’s been deprecated since ARMv6 for obvious reasons. EDIT: 1 The sequence of events is that RISCOS generates an Abort trying to write to an appspace page that contains code (in USER), this is expected. The Abort handler then loads the aborting instruction and proxies it, performing any required cache maintenance. The Abort handler however is itself aborting whilst attempting to load the aborting instruction from ROM. The issue doesn’t occur on ARMv6 strict, only ARMv7 strict so there’s something odd going on that’s specific to ARMv7. |
Jon Abbott (1421) 2651 posts |
I’ve finally tracked down what was causing Zarch to crash on Pi2/3. The root cause was a typo in the game code, however the instruction in question (LDMIA R13!,{R0-R9,R14}^ in USER) has worked up until ARMv7, where its now treated as a NOP, resulting in stack corruption. I expect this was deprecated a long, long time ago. |
Jeffrey Lee (213) 6048 posts |
Yep – the revision B ARM ARM from 1996 states that (once you find the right page) that form of the instruction is unpredictable when used in user or system mode. |
Clive Semmens (2335) 3276 posts |
Mea culpa – perhaps we should have had a specific table of deprecated instructions somewhere. Others should perhaps have thought of it too, but in the end it was down to me, for the first edition of the v7 ARM ARM anyway. Although on the whole, deprecated instructions were corner case instructions whose use was daft in the first place, so perhaps we might be excused… Unpredictable was always a funny choice of word, in that the results were usually absolutely predictable in reality, albeit possibly different from one implementation to another, and certainly not usable for random number generation for example! What it really means is “this isn’t a sensible instruction, it doesn’t have useful results, and the actual results may vary from implementation to implementation, or possibly even vary according to other factors on your particular implementation”. |
Jon Abbott (1421) 2651 posts |
The problem I find, is instructions that change behaviour between revisions, the Zarch case being a fine example. There’s no way I would have spotted that it was deprecated in 1996, but remained functional up until ARMv7. I’d need every revision of ARM ARM, which aren’t necessarily easy to track down, not to mention the amount of cross checking required to find the changes. I’ve spent two days so far this week, trying to figure out which instruction has changed behaviour, that’s breaking Alone in the Dark and Dune II. It’s like looking for a needle in a haystack when there’s no immediate abort! As far as I can tell, it’s related to the C compiler used, so my only hope of tracking it down is to try games that were compiled with the same compiler in the hope that one will crash near the instruction at issue. When the crash occurs in CLib or UtilityModule however it leads to a lot of head scratching. Even code tracing doesn’t help, as invariably the code will run thousands of instructions ahead before the effects are noticed. It’s then a case a breakpointing on two ARM revisions and comparing the registers until they differ, which is a tedious time consuming task. There is a glimmer of hope in that the RO5 debugger is very strict, so some incorrectly encoded instruction stand out in a memory dump. An example here are games compiled with RiscBASIC, which fails to set the Wb bit in post-indexed FPU instructions and so they decode as MCR/MRC. It’s usually a combination of all these techniques that are required to find the behavioural changes both in RISCOS and ARM revisions, in many cases the changes are documented, just not in obvious places. |
Clive Semmens (2335) 3276 posts |
The rotated load instructions were a real issue, of course, in that while they were a little strange, they could obviously occasionally actually be useful. Up to the beginning of ARM v7 + Thumb2 + Neon (I can’t speak for anything after that), I can’t think of another change that involved anything that anyone with any sense would have been using anyway. (I simplify slightly there: I can think of devious uses for some of them.) I feel very sorry for people who were using compilers that issued these weird instructions – but blame the compiler writers, not the architecture engineers or the architecture documentation! If there are non-weird instructions (other than the rotated loads) affected, do let me know. There’s nothing I can do about it now, but it would be nice to know. |
Rick Murray (539) 13840 posts |
I’m not entirely sure I see the point of a rotated load. If one needs to mask off the unwanted data afterwards…why not just normal load then rotate and mask? |
Jeffrey Lee (213) 6048 posts |
Speed. Let’s say we’re dealing with halfwords (which is what the C compiler often used rotated loads to access). The naive way of loading a halfword (without using LDRH or a non-atomic pair of LDRB’s) would be: LDR R0,[R0,#offset] MOV R0,R0,LSL #16 MOV R0,R0,LSR #16 ; (or ASR if you wanted a signed halfword) This will work with rotated and unaligned loads – although with unaligned loads there’s the danger it will read off the end of the current page and into unmapped memory. But if you know that the base register is word aligned and that the halfword is in the lower half of a word, on a system with rotated loads you can simplify it to the following: LDR R0,[R0,#offset+2] MOV R0,R0,LSR #16 This will work on a system with rotated loads (the LDR will put the desired halfword into the upper half of the register), but fail on systems with unaligned loads (the upper half of the register would contain the halfword at #offset+4 instead of the one at #offset) Also if the halfword is in the upper half of the word then you can use the same trick but with #offset-2 – load a word aligned location and then shift right to get the upper halfword into the lower half of the register. In fact this can be generalised as “#offset EOR 2” for any situation where the base register is word aligned. As far as C code goes, structs are generally assumed to be word aligned, and the compiler will know the offsets of all of the members of the struct – so if you have a struct containing halfwords then the compiler can easily work out the “#offset EOR 2” values at compile time, allowing it to always use the shorter sequence. Only if you’re accessing halfwords by array index/pointer would it need to resort to the three-instruction version. |
Rick Murray (539) 13840 posts |
Mmm, it’s a shame one can’t directly AND with &FFFF, then one could simply LDR aligned then barrel roll it into the right place, using AND to hack off the excess; though the naive double MOV isn’t such a bad thing either. |
Clive Semmens (2335) 3276 posts |
If you’ve got a lot of them to do (not that unlikely) you could have &FFFF sitting in a register. |
Rick Murray (539) 13840 posts |
So how does one pick up user mode registers from a privileged mode? Or is it only a NOP in user mode? |
Clive Semmens (2335) 3276 posts |
Yes. And it always was unpredictable in User mode. |
Jon Abbott (1421) 2651 posts |
Yes, now corrected. |