Does this GCC compiled code make sense?
Simon Willcocks (1499) 509 posts |
Or have I made a stupid mistake? (I wouldn’t be at all surprised!) Background: I want some stack space for an array of words, but on a 16-byte boundary. So, I allocate more than I need on the (word-aligned) stack, and find the first word on a 16-byte boundary using a loop that will execute at most three times. I could also do (p + 15) & ~15, but whatever. Here’s a minimal file: extern void interested( int *n ); 00000000 <aligning>: 0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 4: e24dd07c sub sp, sp, #124 ; 0x7c 8: e28d3078 add r3, sp, #120 ; 0x78 c: e3130008 tst r3, #8 10: e1a0000d mov r0, sp 14: 1a000002 bne 24 <aligning+0x24> 18: e2800004 add r0, r0, #4 1c: e310000f tst r0, #15 20: 1afffffc bne 18 <aligning+0x18> 24: ebfffffe bl 0 <interested> 28: e28dd07c add sp, sp, #124 ; 0x7c 2c: e49de004 pop {lr} ; (ldr lr, [sp], #4) 30: e12fff1e bx lr Where does that tst r3, #8 come from, and why? (gdb) print /x 0×100-0×7c That seems to call interested with 0×84, which is not aligned! |
Stuart Swales (8827) 1348 posts |
And why is it looking at the alignment of the TOP of the array1? … while ((0xf & (int)p) != 0) p++; can be written as while ((0xc & (int)p) != 0) p++; // because the bottom two bits of that pointer will be zero Gets a little odder with GCC 12 which uses [Aside #1: I see that (compiler explorer) with Amusuingly if you pass in b .L8 .L3: adds r0, r0, #4 .L8: lsls r3, r0, #28 bne .L3 [Aside #2: I hate it when compilers just stick unneccessary ‘S’ bits on instructions like that ADD.] 1 aapcs32 function entry constraints make this the easy test (below) |
David J. Ruck (33) 1629 posts |
You don’t use a loop, add 15 and clear the bottom 4 bits. |
Stuart Swales (8827) 1348 posts |
Indeed, and OP acknowledges that. Loopy code is still wrong1 though, isn’t it? … No. 1 It DOES actuially make sense under the aapcs32 ABI – sp will be 8-byte aligned at the entry of the function and the code works: it only has to test one bit to see whether sp was also 16-byte aligned on entry, hence the |
Paolo Fabio Zaino (28) 1853 posts |
@ Simon, I think the compiler just did not understand your intentions correctly, have you tried this? void aligning( ) { int __attribute__ ((aligned (16))) space[30]; int * p = space; while ((0xf & (int)p) != 0) p++; interested( p ); } With a -O2 it should produce something like: ... aligning: str lr, [sp, #-4]! sub sp, sp, #132 add r0, sp, #15 bic r0, r0, #15 bl interested add sp, sp, #132 ldr lr, [sp], #4 bx lr ...
It looks vaguely something it would do on an x86 with the test instruction, so maybe something that has been converted not being very thoughtful of ARM? |
Simon Willcocks (1499) 509 posts |
@Paolo, I think it might have worked, with beq, instead of bne. @Stuart The compiler was arm-none-eabi-gcc-9.2.1, so not aapcs32 ABI, I guess? @Druck I did the +15&~15 thing, it works. |
Stuart Swales (8827) 1348 posts |
OK, but ARM EABI also states that the stack is 8-byte aligned at any public interface, so the original code matches that. For your calculations, remember that it’s pushed lr before dropping sp. |
Simon Willcocks (1499) 509 posts |
Ooh! Bum. Thanks for that. |
Stuart Swales (8827) 1348 posts |
It’s the sort of thing I’m sure I’d forget (a few times) when interworking C and asm in this environment! |