ARM on ARM discussion
Steve Pampling (1551) 8170 posts |
The usual place to find various manuals/datasheets/etc is Chris’s Acorns – and that doesn’t have anything… |
Jeffrey Lee (213) 6048 posts |
Application Note 295, which is indeed available from Chris’s Acorns. |
Andrew McCarthy (3688) 605 posts |
And this one from Peter Howkins site, StrongArm Data Sheet V2.0 |
Steve Pampling (1551) 8170 posts |
That’s what I get for expecting the title to give a clue. |
Jon Abbott (1421) 2651 posts |
Some “StrongARM compatible” titles rely on *CACHE OFF working. The StrongARM version of Lemmings 2 falls into this category. Implementing *CACHE OFF in a JIT can be problematic as it relies on the JIT cache being turned off and essentially switching to emulation mode. I’ve implemented it in ADFFS by immediately flushing the whole JIT cache and switching on the self-modifying code detection, which is a bit of a hack but should be sufficient. Retesting Lemmings 2, it now gets past the decode stage and into the game but hangs when starting a level, so it’s possible there’s still some self-modifying code left in the game somewhere. |
Jon Abbott (1421) 2651 posts |
I debugged the RPC release of SpeedBall 2 last night, which hung during loading. It’s a squeezed AIF that self-expands but fails to call OS_SynchronizeCodeAreas before branching back to 8000. Looking at the code, it’s been compiled specifically for StrongARM support as it relies on STM R13!, {PC} being +8, so the issue is v5.00 of squeeze that was used, which wasn’t StrongARM compatible at the time. This was a known issue that is resolved by the UnSqzAIF Module which patches squeezed AIF’s at the Service_UKCompression stage during loading. So in this instance, although it’s an issue in the original code, it’s actually a problem with my code which is not mirroring RISCOS behaviour by detecting the AIF header and patching the unsqueeze code. Once I’ve resolved that, I’ll retest the failures to see how many rely on the OS for the Unsqueeze. I’m almost finished with testing all the games I can and it’s around a 1:3 ratio of failure. EDIT: Mirroring UnSqzAIF behaviour only fixed Speedball 2 |
Jon Abbott (1421) 2651 posts |
It’s taken months of head scratching, but I’ve managed to implement cache mismatch checking. It’s not perfect as it can cause some games to hang due to IRQ’s being off too long, but its sufficient for my purposes. It’s currently only scanning memory when the JIT is entered, so I do need to put a scheduled memory trawl in place as game code can become native very quickly. It does not check each instruction as they execute – that would require either emulation or every instruction to have a JIT wrapper around it. Instead it scans Appspace and/or the JIT RMA and checks every instruction the JIT has previously seen matches what the JIT thinks it should be. This latter bit is what’s taken the time as I’ve had to unpick optimisations, account for 1:1 instruction replacements, codelets and code fixups. I’ll post the findings on my forum thread instead of polluting this thread as its all rather specific to games. Initially I’m testing games released from 1996 onward and will eventually expand this to all games that appear to run on StrongARM. The caveat being we can never be 100% sure they’re truly StrongARM compatible without checking the source, but it will highlight the obvious failures. |
David Feugey (2125) 2709 posts |
Thanks for the hard work :) |
David J. Ruck (33) 1635 posts |
Just to be impossibly late the party… the overhead of the SWI dispatcher will quickly add up if you start calling SWIs millions of times a second My !SWIstat application (http://www.armclub.org.uk/free) is showing 1.23 million SWIs per second on an ARMx6 Mini.m (the old Iyonx is only 101,000 per second). |
Jon Abbott (1421) 2651 posts |
Have you analysed what’s raising them? If the OS is raising that many SWI’s there’s some optimisation needed somewhere. |
David J. Ruck (33) 1635 posts |
I’ve only got just under a million SWIs/sec tonight, vast majority in the utility module. Ones with 100,000 calls per second and above are:- OS_Byte I can filter on application, and the apps doing over 20,000 SWIs per second are:- Messenger Pro !APPstat shows that there isn’t any null claimers wirring away in the background. About 200 task swaps per second, with most apps using 10cs Wimp_PollIdles, except for Netsurf which getting around 70 nulls per second. |
Rick Murray (539) 13840 posts |
Interesting. On my Pi2 (ARMv7) right now, I’m getting around 500,000 SWIs per second to UtilityModule SWIs (the others pale into insignificance, even the Wimp SWIs). Several apps do things on either null polls or short PollIdles, and they call maybe a couple of dozen to a couple of hundred SWIs; with the exception of CoolSwitch, WebJames, and Zap that are maybe a few thousands because of background activity. The big offenders, by far, with ~200,000 SWIs per sec to UtilityModule are Hearsay, and managing to top that by half again as many (~300,000 UtilityModule SWIs) is Paint. There’s a bit of Schroninger’s Cat here. If I quit HearSay and Paint, then the other three I mentioned take up the slack and keep the per-second rate bouncing around from 50,000 to 1,200,000 per second (though it seems to usually settle around 86,000 with 85,000 being to UtilityModule). It’s strange that Paint makes so many SWI calls given that it doesn’t appear to null poll. Ah, I see. I’ve reloaded Paint and it’s making no calls now. I wonder what threw it off before (nothing was loaded but I had used it). HearSay is claiming about 10%-33% of time while doing nothing. The rest is system overhead, as most of the rest of the applications idle. Even as I write this text into NetSurf, it’s barely pushing 1%. :-) Hmm, HearSay is making a ridiculous number of calls to OS_ValidateAddress and OS_Memory. Is it the Wimp that’s actually doing that (switching in/out on each null poll)?
Is that it? I’m seeing Hearsay hit around 8,000 (if CPU speed is 400MHz) to 50,000 (CPU at 900MHz) null polls per 1000ms. Small quirk in your programs, by the way. Open a window for stats. |
Jon Abbott (1421) 2651 posts |
At least they’re all 3rd party apps and not the OS. You should probably take all this to a new thread as it’s completely unrelated to ARM on ARM and more specific to code optimisation. That said, are you certain the SWI’s are coming from apps and not Modules? Is the source available so I can see how they’re being measured? Whilst we’re on the subject, please consider releasing the source for ARMalyser. I emailed you about it many years ago, but no response. It’s a utility I use heavily so would like to extend and resolve the issues as I come across them. I’d like to extend it to support arbitary code with entry points other than 8000 for starters. |
Andy S (2979) 504 posts |
The big offenders, by far, with ~200,000 SWIs per sec to UtilityModule are Hearsay, and managing to top that by half again as many (~300,000 UtilityModule SWIs) is Paint. Heh, that doesn’t surprise me about Paint. It’s strange that Paint makes so many SWI calls given that it doesn’t appear to null poll. Yeah it does use nulls quite a bit, for example quite a few of the tools, like the brush, enable them to track the mouse pointer and keep laying down paint. It’s supposed to disable them again when you stop using the tool. That’s the idea, anyway. |