When is it safe to use LDRH?
André Timmermans (100) 655 posts |
I am just starting to modify my sound modules to make them ARMv7 compatible. Most of the time this will mean duplicating routines using half-word aligned LDRs into routines using LDRH and selecting at runtime which one to use. So, the question is: how do I determine if LDRH (Iyonix, OMAP, A9, ...) is supported and safe to use (i.e. not the StrongARM on a RISCPC)? |
John-Mark Bell (94) 36 posts |
OS_PlatformFeatures will probably help. Alternatively, given that none of the machines you’ve listed actually have a StrongARM in them, you could just detect that (and any ARM <=v3 architecture chips) and not use halfword accesses on them. Unless, that is, you want to support the Omega. :) |
Jeffrey Lee (213) 6048 posts |
I think OS_ReadSysInfo 8 is the SWI you want, not OS_PlatformFeatures. OS_ReadSysInfo 8 will return 7 for the A9 and 5 for all current RISC OS 5 machines. But since RISC OS 5 (just about) runs on RiscPC’s, and could potentially be made to run on other hardware that doesn’t support LDRH/STRH, you’ll probably want to check the ARM ID code register as well, to determine the instruction set revision. I.e. enable LDRH/STRH for the A9, and any RISC OS 5 machine which is ARMv5 or above. |
Theo Markettos (89) 919 posts |
Silly question, but do LDRHs not work on Risc PCs? The memory bus will return a 32 bit word – is this not sorted out in the processor card, or do you get the wrong behaviour in the other 16 bits? STRH is a problem, obviously. Could you do a little test to find out?
You might need to invalidate the cache line so it really does go to DRAM. Is there some known-uncached memory around? And there might be problems on Kinetic (SDRAM is presumably OK, but EDO DRAM isn’t). That way it should use STRH on an emulated SA, assuming that implements STRH correctly. Not sure how feasible it is, but an idea… |
James Lampard (51) 120 posts |
AIUI LDRH will appear to work on a RiscPC when using data in the cache, but will fail when it actually has to access memory. So your test may succeed and give you a false sense of security. I suspect someone more intelligent will be along to explain this better shortly. |
André Timmermans (100) 655 posts |
On that topic I just made a post on the RPCEmu mailing list asking if these instructions will work on their StrongARM simulation since it shouldn’t suffer from the problem. It would certainly allow me to test my new routines without having an Iyonix or A9Home at hand. |
Steve Revill (20) 1361 posts |
I went through this several years ago and the conclusion is this: There is no simple and safe way to decide if you can or cannot use halfword operations on the RiscPC – sometimes they will appear to work (depending upon the CPU) but other times will fail. Unless it has a Kinetic card, in which case they will work again. I decided the safest way to do this was to not use them on a RiscPC or earlier hardware at all. And then I’ve no idea if they’d work properly on the Omega, A9home, RPCemu, VRPC, etc. If it’s an IYONIX or Beagle, then you’re probably safe (any ARMv5 CPU or later). Earlier than that and you’re probably in trouble. |
Peter Naulls (143) 147 posts |
I’ve followed this too. It is my understanding that it is RiscPC hardware alone that has his, but I am not 100% certain. I did asked Adv6 about the A9 when it came out and was told they worked ok on that. In addition, it’ll only work on Kinetic if it doesn’t come out of motherboard memory – which won’t be most of the time, but could still happen. Firefox is built with half word instructions. If someone wanted to be really certain, it shouldn’t be too hard to come up with a test program, including suitable cache flushes, etc. |
Ben Avison (25) 445 posts |
I’d be very surprised if LDRH/STRH don’t work on the A9, or any post-Risc PC hardware for that matter. The problem is caused by the fact that the Risc PC was interfacing the v4 StrongARM to a processor bus designed for v3 ARMs. The days of this bus being visible outside the silicon containing the CPU core are long gone. My suggestions are these:
But the only way to guarantee the optimal result on all machines is to have some bit of code “suck it and see” by turning off the cache and doing some test accesses to RAM to see if they’re broken – allowing for the fact that the Kinetic has two sorts of RAM, one of which works and one which doesn’t (so an algorithm would have to be chosen to select the correct RAM for the test). Presumably no emulator author would emulate the broken bus behaviour, so you’d also get to use LDRH/STRH on virtual machines this way. However, this is quite a specialised test to perform, so it’s really crying out for it to be something the kernel checks and makes available to everyone else via an OS_Memory or OS_PlatformFeatures call. |
André Timmermans (100) 655 posts |
I just made an attempt with RPCEmu, and the instructions are not supported, which is a pity because it would have allowed we to quickly test code changes. I will thus just check that OS_PlatformFeatures,0 as bit 7 set (26-bit mode not available) so that the LDRH/STRH code is used on A9, Iyonix and OMAP variants, while the old code is used on StrongARM. I doubt the LDRH code will make a difference in performance anyway as the instruction lacks the ability to specify a shift. |
André Timmermans (100) 655 posts |
Hi, I have recompiled KinoAmp and written an LDRH/STRH variant of the Desktop 32K/64K plotting routine which as far as I can see are the only changes required. Could someone download and test it: http://www.riscos-digitalcd.net/image/kinoamp/kino041.zip sources: http://www.riscos-digitalcd.net/image/kinoamp/kino041src.zip |
Keith Dunlop (214) 162 posts |
André, Seems to work fine – without any sound of course! Proof with a MPEG2: http://epistaxsis.co.uk/beagleboard/screen2.png Nice work! Nest stop DigitalCD once we have sound :-) |
Jeffrey Lee (213) 6048 posts |
That version seems to run fine on my Iyonix, and almost fine on my BeagleBoard – sometimes when dragging the playback window it crashes with a data abort while inside the shared C library’s trap/abort handler. It looks like it could be some kind of memory corruption, since both the original abort (which occurs in CallBackHandler in RISCOS_Lib.kernel.s.k_body) and the second abort (from within the trap handler) are due to sl being corrupted in the register dump. I’ll let you know what the cause is when (if!) I get to the bottom of it – but otherwise KinoAmp appears to work fine on the BeagleBoard, in 8/16/32bpp modes and at different scales. |
Keith Dunlop (214) 162 posts |
hmm that is interesting as I can’t replicate that on my beagleboard. OK I only have small MP2s and am playing them one at a time as if you play more than one the second one gets very jerky <—probably a USB bandwidth issue? Having said that I do sometimes get a line of noise passing the screen like when you are running the acorn screensaver. |
Jeffrey Lee (213) 6048 posts |
Are you running a build with alignment exceptions turned on? The code that was crashing inside CallBackHandler was only crashing because the corrupt sl wasn’t word-aligned. I’ve fixed my copy of the C library to check for this, and KinoAmp no longer crashes. But my GrabError module (which I’ve finally made 32bit compatible to help me track down this issue) still detects an abort (from _kernel_unwind unwinding a bad stack frame, called from FindAndCallHandlers, from CallEventHandlers, from CallBackHandler), but the abort doesn’t seem to cause KinoAmp to crash or any error dialog box to appear. I think I’m going to have to stare at this one a lot longer to work out what the problem is :(
I think that line of noise is to do with the settings we’re using for the video DMA. It seems to happen in high-bandwidth modes when the CPU is stressing the memory bus. Hopefully if we tweak the settings a bit then it will go away. |
Jeffrey Lee (213) 6048 posts |
I’ve found the cause of the ‘silent abort’:
So, what have we learned today?
I’m sure there’s a more up to date reference for this, but looking at the APCS standard defined in the PRMs (PRM 4-403):
So it’s quite clear that the shared C library is one of those “particular operating systems” that requires fp, sp and sl to be valid at all times. Of course the bad news is that you’re certainly not the only person to corrupt fp, sp and sl, and to do it while outside a C$$code section. I’m fairly certain that none (or at least very little) of my C-linked assembler files contain code that isn’t in a C$$code section (Although I usually use GCC, which might have different rules for what get’s included in the C language descriptor block), and most of it corrupts fp and sl at will. Up until now I thought that just storing the originals on the stack would suffice, but I gues not. And it wouldn’t surprise me if I had some code somewhere that stores data on the stack without first claiming the space by modifying sp. So, in summary: Bah. I’ll try and fix the shared C library to perform better checks on fp when unwinding the stack (if someone can tell me what checks are valid!), but it looks like you’ll have to (at the least) make sure your assembler is covered by the language descriptor block (i.e. place it all in C$$code sections). |
André Timmermans (100) 655 posts |
Jeffrey, thanks for the time spent, it’s an interesting read. I would never have thought that fiddling with the fp and sl registers in assembler parts could affect anything else than abort/signal handlers and backtraces. Changing A$$code to C$$code will be an easy change but getting rid of sl, fp without affecting performance in those routines will be harder.
That sure makes me remember the random crashes occurring once in a while in the first versions of the AudioMPEG, which I finally tracked down to the way the C compiler sometimes sometimes pushed values on stack before updating the stack pointer limit, which could then eventually be overwritten if an interrupt occurred just between the writing on the stack and the sp update. |