SWI “OS_EmulateTheNextInstruction”?
tymaja (278) 172 posts |
Does something like this exist? I could guess that AEmulor could easily do something like this, but I can’t test it out (I run RISC OS via VNC, and the VNC client hates screen mode changes, so I can’t really test out most older software via AEmulor; If it doesn’t exist, I do wonder if something like this (not an OS_SWI) could be created?. It would be slow, inefficient etc, but could be useful in some circumstances (admittedly more in backporting code to older systems). It could be useful for the occasional ARMv8 instruction on older systems (for example, to patch an app that uses ARMv8 instructions that don’t generate an undefined instruction fault on StrongARM /ARM610). I have made so,e simple proof-of-concept code :D it returns quickly on ARMv8 (acting as a NOP so the ARMv8 can just do the next instruction), otherwise it emulates the (very limited subset) of ARMv8 instructions. I can’t test it out because my SA RPC power supply ‘switched off’ many months ago, and I haven’t got round to figuring out if it took the entire system down with it… |
David J. Ruck (33) 1629 posts |
No it doesn’t exist and it doesn’t make any sense. There’s two ways to handle newer instructions, the easiest is if it causes an undefined instruction trap, otherwise the code needs patching similar to how SWP is handled. Get yourself a better VNC client, Remmina on Linux handles mode changes without problems. |
Stuart Swales (8827) 1349 posts |
If software authors care about making their products work on older systems, they’ll compile for the lowest common denominator, either for all platforms or specifically release versions for earlier ones. Lack of ARMv7 instruction support is not going to be the only limiting factor running Iris (for example) on a RISC PC.
Why not fire up RPCEmu? |
Colin Ferris (399) 1809 posts |
Have a search for ‘Ace’ on this site. |
tymaja (278) 172 posts |
I agree – I was actually thinking of a situation where somebody wanted to get software working on an older system; the patching could be almost automatic; replace the problematic instruction with a B, which branches to code consisting of a SWI, a ‘modern’ instruction, and a B back to (the instruction after the one that was just replaced. A use case could be that it could be a quick way to ‘port’ software in a ‘draft’ way (rather than manually recoding a random load of BFC, BFI, CLZ, and similar type instructions, each with different registers specified, decisions around how to preserve data while manually recoding each instruction). While there would be little use for it now, there could be in the future (UDIV / SDIV instructions versus older Pis would be one example). Manually patching executables with ‘B, SWI, (copy/paste instruction), B’ might have some use in the future, if the source is not available, particularly if a single CLZ or DIV or BFI type instruction was the only issue. RE: RPCEmu – it is ages since I used it; it is good, but (like all emulators) tends to generate a lot of heat, and because I use a MacBook, the thermal design is ‘not great’ – the tempersture quickly goes up to 100 degrees and stays there (this is within design spec, despite fans working, dust removed, good thermal paste etc). I like the 17” display and don’t want to fry the ATI graphics chip. Also, I don’t know much about RPCEmu / compatibility, but I guess I should dig out my PC laptop and give it a go! |
Rick Murray (539) 13806 posts |
Why would it? Debuggers work by patching things directly rather than issuing “do the next one” SWIs.
If the instruction exists (like ORR into R14 on the 26 bit systems), then it can be scanned for. Luckily the ARM is regular and every instruction is word aligned.
This. SimpleSeq is aimed at RISC OS 5 (and may or may not use some stuff within RISC OS 5, I don’t remember…) so the compile flags are ~-cpu Cortex-A8 -arch 5@ which tells the compiler to use ARMv5 (Iyonix or later) instructions and schedule in a way that is appropriate for an A8 core (Beagle-like). Something else, I forget which, schedules for an A53 core (Pi3B+). Now, with Norcroft, this doesn’t actually seem to make much difference. It isn’t like the ARMv5 code is full of weird instructions that don’t exist on earlier processors. I’ve just thrown the SimpleSeq executable into ARMalyser and it highlighted two instances of MRS, both in the CLib stubs, so nothing the compiler output.
Because I “only” have XP? ;) Still, for my purposes v0.91 or whatever is good enough.
I wonder if there wouldn’t be more important issues that crop up? I’m fairly easy-going with my compilation flags, given that it doesn’t seem to make much difference with the DDE (at least, not with the sort of code I write). Where it usually comes in is in things that just don’t exist on the older systems. For example, for my BBS server, it’s the BBS module that initially picks up the incoming telnet connection. Once it has validated it (against a blacklist and record of recent connections) it will assign a port to the connection and then start up a linetask to handle the BBS bits. It does this by calling It seems to me that, generally, if somebody is specifying certain sorts of systems and compiling code as necessary, there’s going to be a reason for it that may prove to be more of a show stopper than simply patching the instructions that don’t work. Case in point, one of my games, Virus, needed specific workarounds in order to work at all on a RiscPC class machine. The triple-buffering of 16M colour required more than the 2MB of VRAM provided, so it’ll detect this and degrade to a 256 colour mode and plot sprites with a translation table and no background. It’s slower, but it works. Tested on RPCEmu.
Doesn’t MacOS allow you to control application priorities? I had a game, I think it might have been the open source incarnation of Duke Nukem, that made my XP box freak out, leading to the fan cranking up to maximum in under a minute. Since it practically never does that, I wasn’t expecting it to sound like somebody was hoovering my room. |
Graeme (8815) 106 posts |
If you are just wanting those instructions to run, you can try my ACE module as Colin suggested. It emulates lots of ARM instructions without having to change the source. Available at http://www.ro32.co.uk – there is a list of most of the instructions it can emulate. Raspberry Pi 3, 4 and I believe some 2 users may also benefit from the emulated SWP instruction. For example Perl can work on a Pi 4 but freezes without ACE. |
tymaja (278) 172 posts |
Very nice! I wasn’t aware of this, but it looks like a very useful tool! I wasn’t aware that so many instructions would cause the ‘undefined’ exception to occur. It pretty much covers the user mode instructions I was thinking of :) |
Graeme (8815) 106 posts |
ARM have added some lovely instructions which have descriptions that include: Signed Halving Add and Subtract with Exchange and Signed Halving Subtract and Add with Exchange.
As well as: Signed Multiply Accumulate Long (halfwords) multiplies two signed 16-bit values to produce a 32-bit value, and accumulates this with a 64-bit value. The multiply acts on two signed 16-bit quantities, taken from either the bottom or the top half of their respective source registers. The other halves of these source registers are ignored. The 32-bit product is sign-extended and accumulated with a 64-bit accumulate value.
Overflow is possible during this instruction, but only as a result of the 64-bit addition. This overflow is not detected if it occurs. Instead, the result wraps around modulo 2^64. I am surprised programmers are not using these instructions more often! (Sarcasm off now!) It did take me while programming these to come to the conclusion that many of these are for processing media such as sound and video. I may be wrong but it seems that way. Because of this there are a lot of similar instructions with different names. |
tymaja (278) 172 posts |
The ‘Signed Multiply Accumulate Long (halfwords) is actually used at least once in the RISC OS ROM! |
nemo (145) 2529 posts |
For the last twenty years at least, almost every decision about instructions is with an eye on the stupid compilers. i.e. “The compiler produces this terrible code for that common algorithm, how can we help it be less terrible?”. You could call this the principle of least surprise, but I can’t help feeling there’s a degree of compiler-author-tail wagging the CPU-designer-dog. |
tymaja (278) 172 posts |
I agree re: instructions; there are so many odd ones that were added to Aarch33 over the year#; many definitely with a view to compilers, but also a definite focus on DSP! I used to dislike the idea of ‘CLZ’ being added in ARMv5, but it does have many good uses! A lot of the weird multiplies and other things were also aimed at audio/video DSP; ARMv5 was used an things like PDAs, and they were focusing strongly on multimedia performance at the time; and then came ARMv6, when Apple really got involved! |
David J. Ruck (33) 1629 posts |
That isn’t how new instructions are chosen. Large quantities of code implementing common algorithms is analysed (most produced by compilers, but also some handwritten assembler) and the benefits of introducing instructions which combine several different operations are modelled. DSP algorithms have benefitted from additions to instruction sets, as has cryptography. |
Cameron Cawley (3514) 156 posts |
It’s also worth mentioning that often programs such as FFmpeg that use the newer instructions have built-in runtime detection to switch between fast routines for newer ARM processors and generic routines for older ones. With that in mind, I’m not sure that emulating newer instructions on older processors is that useful given the performance overhead of handling undefined instructions and the fact that the only reason for using the new instructions is because the performance isn’t good enough on the machines that don’t have them. |
tymaja (278) 172 posts |
DSP algorithms have benefitted from additions to instruction sets, as has cryptography.” This makes a lot of sense – CLZ being a good early example for Aarch32; we are probably ‘on the same page’ to some degree (multimedia has been a huge thing, and the fact they are mostly grouped as ‘media instructions’ in the ARMARM made me focus only on media) but Imagree plenty of other stuff (including internet!) has happened too. I wish I had had the chance to use the ‘saturating instructions’, which would have been great for software 3D rendering :) RE: FFMPEG etc – that makes sense (runtime selection of instruction set). |