When is it safe to use LDRH?

17 posts, 9 voices

Nov 9, 2009 9:05pm André Timmermans (100) 655 posts	I am just starting to modify my sound modules to make them ARMv7 compatible. Most of the time this will mean duplicating routines using half-word aligned LDRs into routines using LDRH and selecting at runtime which one to use. So, the question is: how do I determine if LDRH (Iyonix, OMAP, A9, ...) is supported and safe to use (i.e. not the StrongARM on a RISCPC)?

Nov 9, 2009 9:21pm John-Mark Bell (94) 36 posts	OS_PlatformFeatures will probably help. Alternatively, given that none of the machines you’ve listed actually have a StrongARM in them, you could just detect that (and any ARM <=v3 architecture chips) and not use halfword accesses on them. Unless, that is, you want to support the Omega. :)

Nov 9, 2009 10:07pm Jeffrey Lee (213) 6048 posts	I think OS_ReadSysInfo 8 is the SWI you want, not OS_PlatformFeatures. OS_ReadSysInfo 8 will return 7 for the A9 and 5 for all current RISC OS 5 machines. But since RISC OS 5 (just about) runs on RiscPC’s, and could potentially be made to run on other hardware that doesn’t support LDRH/STRH, you’ll probably want to check the ARM ID code register as well, to determine the instruction set revision. I.e. enable LDRH/STRH for the A9, and any RISC OS 5 machine which is ARMv5 or above.

Nov 9, 2009 11:28pm Theo Markettos (89) 919 posts	Silly question, but do LDRHs not work on Risc PCs? The memory bus will return a 32 bit word – is this not sorted out in the processor card, or do you get the wrong behaviour in the other 16 bits? STRH is a problem, obviously. Could you do a little test to find out? Test to see if you’re on >= ARMv4 (is that the right number?) If so, write a known 32 bit value to a memory location. Try STRH at various adr+0, adr+1, adr+2 and see what happens when you load back the full word You might need to invalidate the cache line so it really does go to DRAM. Is there some known-uncached memory around? And there might be problems on Kinetic (SDRAM is presumably OK, but EDO DRAM isn’t). That way it should use STRH on an emulated SA, assuming that implements STRH correctly. Not sure how feasible it is, but an idea…

Nov 10, 2009 7:04am James Lampard (51) 120 posts	AIUI LDRH will appear to work on a RiscPC when using data in the cache, but will fail when it actually has to access memory. So your test may succeed and give you a false sense of security. I suspect someone more intelligent will be along to explain this better shortly.

Nov 10, 2009 8:47am André Timmermans (100) 655 posts	Silly question, but do LDRHs not work on Risc PCs? The memory bus will return a 32 bit word – is this not sorted out in the processor card, or do you get the wrong behaviour in the other 16 bits? STRH is a problem, obviously. On that topic I just made a post on the RPCEmu mailing list asking if these instructions will work on their StrongARM simulation since it shouldn’t suffer from the problem. It would certainly allow me to test my new routines without having an Iyonix or A9Home at hand.

Nov 10, 2009 1:01pm Steve Revill (20) 1361 posts	I went through this several years ago and the conclusion is this: There is no simple and safe way to decide if you can or cannot use halfword operations on the RiscPC – sometimes they will appear to work (depending upon the CPU) but other times will fail. Unless it has a Kinetic card, in which case they will work again. I decided the safest way to do this was to not use them on a RiscPC or earlier hardware at all. And then I’ve no idea if they’d work properly on the Omega, A9home, RPCemu, VRPC, etc. If it’s an IYONIX or Beagle, then you’re probably safe (any ARMv5 CPU or later). Earlier than that and you’re probably in trouble.

Nov 10, 2009 2:42pm Peter Naulls (143) 147 posts	I’ve followed this too. It is my understanding that it is RiscPC hardware alone that has his, but I am not 100% certain. I did asked Adv6 about the A9 when it came out and was told they worked ok on that. In addition, it’ll only work on Kinetic if it doesn’t come out of motherboard memory – which won’t be most of the time, but could still happen. Firefox is built with half word instructions. If someone wanted to be really certain, it shouldn’t be too hard to come up with a test program, including suitable cache flushes, etc.

Nov 12, 2009 10:26pm Ben Avison (25) 445 posts	I’d be very surprised if LDRH/STRH don’t work on the A9, or any post-Risc PC hardware for that matter. The problem is caused by the fact that the Risc PC was interfacing the v4 StrongARM to a processor bus designed for v3 ARMs. The days of this bus being visible outside the silicon containing the CPU core are long gone. My suggestions are these: For ROM code – I expect Risc PC ROMs will always be built with the Hdr:CPU.Arch options set to require v3 compatibility, so as long as you provide a v3-compatible fallback for any LDRH/STRH code wrapped in a “NoARMv4” switch, you should be OK. For softloadable code (which you want to run on any machine) a run-time check for v5 or above is safe. If you want to allow for the A9Home (which I think is only v4) or any other theoretical non-Risc PC ARMv4 machines, you could add a check for the absence of IOMD using OS_Memory 9 – hopefully RISC OS 6 will correctly report it as absent on an A9Home. But you still need the v5 check because the Iyonix PC has a (cut-down version of) IOMD. But the only way to guarantee the optimal result on all machines is to have some bit of code “suck it and see” by turning off the cache and doing some test accesses to RAM to see if they’re broken – allowing for the fact that the Kinetic has two sorts of RAM, one of which works and one which doesn’t (so an algorithm would have to be chosen to select the correct RAM for the test). Presumably no emulator author would emulate the broken bus behaviour, so you’d also get to use LDRH/STRH on virtual machines this way. However, this is quite a specialised test to perform, so it’s really crying out for it to be something the kernel checks and makes available to everyone else via an OS_Memory or OS_PlatformFeatures call.

Nov 14, 2009 8:20pm André Timmermans (100) 655 posts	I just made an attempt with RPCEmu, and the instructions are not supported, which is a pity because it would have allowed we to quickly test code changes. I will thus just check that OS_PlatformFeatures,0 as bit 7 set (26-bit mode not available) so that the LDRH/STRH code is used on A9, Iyonix and OMAP variants, while the old code is used on StrongARM. I doubt the LDRH code will make a difference in performance anyway as the instruction lacks the ability to specify a shift.

Nov 15, 2009 9:57am André Timmermans (100) 655 posts	Hi, I have recompiled KinoAmp and written an LDRH/STRH variant of the Desktop 32K/64K plotting routine which as far as I can see are the only changes required. Could someone download and test it: http://www.riscos-digitalcd.net/image/kinoamp/kino041.zip sources: http://www.riscos-digitalcd.net/image/kinoamp/kino041src.zip

Nov 15, 2009 2:10pm Keith Dunlop (214) 162 posts	André, Seems to work fine – without any sound of course! Proof with a MPEG2: http://epistaxsis.co.uk/beagleboard/screen2.png Nice work! Nest stop DigitalCD once we have sound :-)

Nov 15, 2009 2:29pm Jeffrey Lee (213) 6048 posts	That version seems to run fine on my Iyonix, and almost fine on my BeagleBoard – sometimes when dragging the playback window it crashes with a data abort while inside the shared C library’s trap/abort handler. It looks like it could be some kind of memory corruption, since both the original abort (which occurs in CallBackHandler in RISCOS_Lib.kernel.s.k_body) and the second abort (from within the trap handler) are due to sl being corrupted in the register dump. I’ll let you know what the cause is when (if!) I get to the bottom of it – but otherwise KinoAmp appears to work fine on the BeagleBoard, in 8/16/32bpp modes and at different scales.

Nov 15, 2009 3:49pm Keith Dunlop (214) 162 posts	hmm that is interesting as I can’t replicate that on my beagleboard. OK I only have small MP2s and am playing them one at a time as if you play more than one the second one gets very jerky <—probably a USB bandwidth issue? Having said that I do sometimes get a line of noise passing the screen like when you are running the acorn screensaver.

Nov 15, 2009 4:16pm Jeffrey Lee (213) 6048 posts	hmm that is interesting as I can’t replicate that on my beagleboard. Are you running a build with alignment exceptions turned on? The code that was crashing inside CallBackHandler was only crashing because the corrupt sl wasn’t word-aligned. I’ve fixed my copy of the C library to check for this, and KinoAmp no longer crashes. But my GrabError module (which I’ve finally made 32bit compatible to help me track down this issue) still detects an abort (from _kernel_unwind unwinding a bad stack frame, called from FindAndCallHandlers, from CallEventHandlers, from CallBackHandler), but the abort doesn’t seem to cause KinoAmp to crash or any error dialog box to appear. I think I’m going to have to stare at this one a lot longer to work out what the problem is :( Having said that I do sometimes get a line of noise passing the screen like when you are running the acorn screensaver. I think that line of noise is to do with the settings we’re using for the video DMA. It seems to happen in high-bandwidth modes when the CPU is stressing the memory bus. Hopefully if we tweak the settings a bit then it will go away.

Nov 15, 2009 9:46pm Jeffrey Lee (213) 6048 posts	I’ve found the cause of the ‘silent abort’: KinoAMP enters one of the assembler plotter routines The OS generates a “key released” event as I let go of the left mouse button The event handler that was registered with OS_ChangeEnvironment gets called. Since KinoAMP was the active task at the time, the shared C library event handler (CallBackHandler in s.k_body) gets entered The shared C library attempts to pass the event onto the event handler that belongs to the runtime library for the currently executing code (Although it’s obviously called the shared C library, it is designed to support interfacing with multiple languages which might have their own abort/signal/event/etc. handlers) On entry to CallBackHandler, R12 points to the C library’s static data. This contains the register dump for when the program was suspended by the OS (Or at least I think that’s how it works!), along with the heap bounds (and lots of other variables which won’t be important to this discussion). To work out if there’s a user-mode stack available, it uses the heap bounds to verify that the stored sp and sl registers are valid If sl looks valid [sl,#-560] and compares it against the magic stack chunk marker (&FC0690FF). This is where it was originally failing, because sl wasn’t word aligned. If sl points to a stack chunk then it thinks “Hey, I’ve got a stack!” and copies the register dump into it, and switches into user mode with interrupts enabled in order to perform the rest of the processing. The stack pointer is set just beyond the register dump so it doesn’t get overwritten. FindAndCallHandlers now gets called, in order to find the right language-specific handler to notify of the event. FindAndCallHandlers loads the PC from the register dump, along with the base pointer & end pointer of the array of language descriptors (see PRM 4-254). It runs through each entry in the descriptor list, checking the PC against the code area limits to try and identify the language. But since all your assembler seems to be in A$$code segments (instead of C$$code), it wasn’t included in the limits of the C language descriptor. So FindAndCallHandlers fails to find the handler. FindAndCallHandlers then checks the internal language descriptor of the shared C library itself to see if the C library was the active code at the time of the event. This check also fails. Then, rather than give up, FindAndCallHandlers attempts to unwind the stack one level, and perform the same set of checks on the next stack frame. But because your assembler routine modified fp (to contain the completely garbage pointer &9F988E89) the code dies horribly (and for some reason never generates an abort – maybe I need to check the GrabError isn’t claiming the abort by mistake and silencing it) So, what have we learned today? _kernel_unwind needs more sophisticated checks for whether ‘fp’ is valid. At the moment it just checks that the bottom two bits are clear (Actually it just checks bit 1, since bit 0 is allegedly used for marking a stack extension – if this is true, would stack extensions run the risk of causing an abort on ARMv6/v7? I’m not quite sure of how fp gets used, i.e. whether any code LDMs/STMs from it without clearing the bottom two bits first, something that would have worked before but not any longer). However I’m not really sure what the rules are for where stack extensions are allowed to be located – would it be valid to check against the heap limits (retrieved from the static data), or is it allowed for frames to be placed outside the heap? I’m guessing module code must use the SVC stack, so we’d need to check against the SVC stack limits as well. What other situations are there? Storing data on the stack without updating the stack pointer to reflect this is a bad thing, because if your program gets suspended and an event occurs the C library runs the risk of clobbering the data with its register dump (which needs to be copied to the stack so it can safely be unwound by _kernel_unwind. I think. Or at least it ensures the code is re-entrant since most of the processing is performed in user mode with interrutps enabled) Similarly, invalidating sp, sl, or fp may result in your code missing event callbacks because the C library is unable to find the right handler to call (Or in extreme cases it could end up calling the wrong handler entirely) And finally, if sp and sl are valid, you’d better make sure that any code which invalidates fp lies within one of the regions pointed to by the language descriptor table. If not then the callback will try unwinding the stack and fail because fp doesn’t point to a stack frame. I’m sure there’s a more up to date reference for this, but looking at the APCS standard defined in the PRMs (PRM 4-403): This standard does not define the values of fp, sp and sl at arbitrary moments during a procedure’s execution, but only at the instants of (external) call and return. Further standards and restrictions may apply under particular operating systems, to aid event handling or debugging. In general, you are strongly recommended to preserve fp, sp and sl at all times. So it’s quite clear that the shared C library is one of those “particular operating systems” that requires fp, sp and sl to be valid at all times. Of course the bad news is that you’re certainly not the only person to corrupt fp, sp and sl, and to do it while outside a C$$code section. I’m fairly certain that none (or at least very little) of my C-linked assembler files contain code that isn’t in a C$$code section (Although I usually use GCC, which might have different rules for what get’s included in the C language descriptor block), and most of it corrupts fp and sl at will. Up until now I thought that just storing the originals on the stack would suffice, but I gues not. And it wouldn’t surprise me if I had some code somewhere that stores data on the stack without first claiming the space by modifying sp. So, in summary: Bah. I’ll try and fix the shared C library to perform better checks on fp when unwinding the stack (if someone can tell me what checks are valid!), but it looks like you’ll have to (at the least) make sure your assembler is covered by the language descriptor block (i.e. place it all in C$$code sections).

Nov 17, 2009 12:02pm André Timmermans (100) 655 posts	Jeffrey, thanks for the time spent, it’s an interesting read. I would never have thought that fiddling with the fp and sl registers in assembler parts could affect anything else than abort/signal handlers and backtraces. Changing A$$code to C$$code will be an easy change but getting rid of sl, fp without affecting performance in those routines will be harder. And it wouldn’t surprise me if I had some code somewhere that stores data on the stack without first claiming the space by modifying sp. That sure makes me remember the random crashes occurring once in a while in the first versions of the AudioMPEG, which I finally tracked down to the way the C compiler sometimes sometimes pushed values on stack before updating the stack pointer limit, which could then eventually be overwritten if an interrupt occurred just between the writing on the stack and the sp update.

Reply

To post replies, please first log in.

Forums → General →

When is it safe to use LDRH?

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options