Module cache alignment
Jon Abbott (1421) 2651 posts |
RMA allocations are always &xxxxxxx4 – which on an ARM3 was sufficient to allow module code to be cache aligned. This doesn’t seem to have been updated as the cache line has extended so modules are invariably not cache aligned. Are there any plans to correct this in RO5 now that cache lines are 32 bytes and even 64 bytes in more recent ARM chips? |
Dave Higton (1515) 3526 posts |
Now that the clock speeds of ARMs are greater by a factor of over 100 compared with the old days when that cache alignment was put in place, my question is whether you would perceive any difference. |
Jon Abbott (1421) 2651 posts |
I suppose it depends on what your module is doing and how often the code is called. In the most part, you’re correct, it wont make the blindest bit of difference – which begs the question, why retain the &xxxxxxxx4 in RO5. If it’s going to be retained, at least update it so Module code can be cache aligned. It’s another one of those legacy decisions that Acorn made, where the original purpose has been lost over time. Very similar to Service_ModulePreInit |
Jeffrey Lee (213) 6048 posts |
I think you’ll find that the intent was to keep the RMA block headers cache/burst aligned, not the block content. Remember that directly preceeding the user area of any heap block there’s a 4 byte header containing the block length. Keeping the block headers aligned will help speed up any of the O(N) heap ops (i.e., most of them!) where it has to walk through the allocated (or free) block list looking for a block which satisfies certain criteria. Originally all RMA allocations were forced to a multiple of 16 bytes to facilitate this, but then for RISC OS 5 it looks like it was updated to 32 bytes – although the fact that the heap header is 16 bytes in size means that the block headers actually start at a 16 byte offset into each cache line (doh!) Attempting to align the heap blocks to make code cache aligned would be pretty pointless, considering that module headers are variable-length, code entry points are at arbitrary offsets within modules, etc. Another possible reason for rounding up the allocation size would be to help cut down on memory fragmentation caused by differently-sized small blocks. If you allocate a 4 byte block, free it, and then allocate a 28 byte block then it will be able to use the space that just came from freeing the 4 byte block, instead of leaving a 4 byte hole in the middle of the heap which (potentially) won’t get used for a long time. |
Jon Abbott (1421) 2651 posts |
Quoting PRM1-207:
For an ARM3 you ALIGN your code to 12, for SA/ARM9 it’s 28. The size of the header doesn’t really matter as ALIGN is based on a modulus. |