SharedCLibrary_LibInitAPCS_R documentation
Pages: 1 2
Jon Abbott (1421) 2651 posts |
Is there any documentation on what this call and the other 26bit calls do? I need to hypervise it into a 32bit call. As far as I can tell (from using CLib 5.53 and calling it on RO3.1 /3.71) it builds a table of branch instructions. On RO3.1 these branch to 18Bxxxx upwards and on RO3.71 FE20xxxx upwards Why are they invalid addresses? Will calling the equivalent 32bit SWI do the same thing? Or do I need to change the table passed via R0 somehow? |
Jeffrey Lee (213) 6048 posts |
Because you’re doing it wrong? :) IIRC the PRMs have some documentation about the SWI (table formats, list of functions that get filled in, etc.), so it’s probably worth checking there. The actual implementation is in s.initmodule
It looks like you’ll need to change the table; the old 26bit code just used to insert a branch address since all addresses would have been within the +/-32MB branch limit. But 32bit versions insert an LDR PC,[PC,#xxx] and use a second table held elsewhere (just after each table?) to store the actual routine address. So to map a 26bit LibInit call to a 32bit one you’d have to reserve some space somewhere in order to fit the larger 32bit table, patch the 26bit table to branch to the 32bit table, and then call the 32bit SWI to fill in the 32bit table. And do whatever is necessary to deal with the APCS-R vs. APCS-32 ABI changes (i.e. you need to make sure the PSR condition code flags are preserved over the calls) |
Jon Abbott (1421) 2651 posts |
Doing it wrong!!! LOL I put a breakpoint after a working call to see what changed before/after. I thought it odd, as I was expecting branches into application space – seems to work though. Thanks for the info…just what I needed. |
Rick Murray (539) 13840 posts |
The LibInit call creates a branch table in application space of the relevant calls within the C module in module space. So if your routine calls It might be worth taking a poke around the sources to the AMPLE player module (a page or two back in Recent Posts history) as it was an interesting thing – a module written in assembler that jacked in to CLib to use various C functions. [I would write a blog post about doing this if I could find specifics on the APCS-32 tables.] |
Alex Farlie (1992) 44 posts |
I strongly suspect that this enquiry is related to the previous one.. I had my suspicions that the AmpMod was compiled C code… So assuming someone knows how the 26bit C compiled it, it will be easier to unpick future code.. |
Alex Farlie (1992) 44 posts |
The original !Ample is of 3.70 vintage apparently… |
Jon Abbott (1421) 2651 posts |
I’ve not found any APCS-32 specific detail, however quoting the page on heyrick “We also know that, with the newer SharedCLibrary, you can run APCS-32 code alongside APCS-R. So long as you don’t start using processor-specific instructions (UMULL and MRS, the same code should work across the entire range of RISC OS machines” That implies SharedCLibrary_LibInitAPCS_R calls should still work, unless it means should still work on a 26bit OS. I don’t see how it could still work on a 32bit OS as it can only address 32mb. I’ve not tried it yet, but I’m guessing a call to SharedCLibrary_LibInitAPCS_R will error on a 32bit OS. I’m planning on implementing Jeffrey’s suggestion, it’s a simple matter of adding another codelet to hypervise the calls in/out of CLib ensuring the PSR flags are preserved. What I don’t know yet is if any CLib functions are expected to set the PSR on exit, in which case each CLib function will need its own managed entry/exit to set flags accordingly. |
Jeffrey Lee (213) 6048 posts |
Modern versions of CLib come in two versions, depending on whether they’re for 26bit or 32bit machines:
AFAIK there aren’t any which set the flags on exit. E.g. the SWI call functions return the flags in registers rather than in the PSR. However there are the _kernel_irqs_on and _kernel_irqs_off functions which manipulate the IRQ state – so you’ll want to make sure it’s only the NZCV flags which are preserved and not the full PSR. |
Rick Murray (539) 13840 posts |
…which was true, when using 26/32 code on a 26 bit machine with the appropriate version of CLib loaded. Since then, other issues have arisen (unaligned loads, for example). Even so, an APCS_32 application using the correct CLib should still work on old and new… but it won’t be using APCS_R. 32 bit machines don’t offer APCS_R, just like how older CLib no longer offered APCS_A… |
Jon Abbott (1421) 2651 posts |
Using MemoryI to look at the branch table created using APCS_32 or APCS_R with the latest 32bit CLib on RO3.71 you get:
Which on initial inspection is an invalid branch table. It looks like CLib is actually trying to set the branch table to:
Which is beyond the 32mb branch limit although it seems to work somehow. |
Jeffrey Lee (213) 6048 posts |
I’m fairly certain that because the CPU is in 26bit mode the PC will wrap around within the 64MB address space when a branch instruction applies its offset. (0×30C88 + (0×884636<<2) + 8) & 0×3FFFFFF = 0×2242568 So in this case CLib is fine and it’s a fault in the debugger module for not taking into account the address space wrapping when it disassembles the instruction. |
Jon Abbott (1421) 2651 posts |
That was my initial assumption, but I wasn’t sure if ARM610/ARM710 did wrap the PC as it’s not mentioned in the datasheets. The Debugger “bug” in that case is present in versions of RO above RO3.11, they must have been thinking ahead for 32bit support ;) |
Rick Murray (539) 13840 posts |
Does anybody know offhand which source file sets up the CLib jump table? As far as I remember, the 32 bit version adds a few things, so I’d like to see what – given as the PRMs document the 26 bit version… |
Jeffrey Lee (213) 6048 posts |
s.initmodule (Try reading the second post in this thread ;-)) |
Rick Murray (539) 13840 posts |
Thanks. That’s just the code to initialise the tables and pointers. I was looking for the contents of the tables themselves were defined. It looks (only looked briefly) as if it may be: CLib: Likewise for Kernel: |
Jon Abbott (1421) 2651 posts |
Someone in the know needs to properly document the APCS 32bit side of things, I’m having to wade through source code simple to translate APCS_R to APCS_32. What should have been a five min job has turned into a month long exercise in guesswork! |
Jon Abbott (1421) 2651 posts |
Is it sufficient to reserve 48 words for Chunk Id 1 (Kernel) and 183 words for Chunk Id 2 (CLib) on the current CLib, or have these changed since the RO3.5 documentation? EDIT: Have the static data space requirements changed for the chunks since RO3.5? Or are they still &31C for Chunk Id 1 (Kernel) and &B48 for Chunk Id 2 (CLib) Is there any documentation for SharedCLibrary_LibInitAPCS_32? |
Jeffrey Lee (213) 6048 posts |
For APCS-R, I believe you can reserve the same amount of memory for the branch table and static data as was used with RISC OS 3.5. After all, if those sizes had increased then it would mean that the new CLib wouldn’t be backwards compatible with old software, which wouldn’t be very useful. However:
Not that I’m aware of. However it should be basically the same as LibInitAPCS_R except it’ll need the extra space for the jump table and it’ll obviously install APCS-32 versions of the routines instead of APCS-R. |
Jon Abbott (1421) 2651 posts |
It looks like chunk 2 (CLib) has two extra entry points as it’s now 185 entries. Cross checking with the source, the following entries have been appended at the end of the jump table:
It also doesn’t do any checking on the buffer length, as it’s happily going over the limit passed to it and overwriting code. EDIT: Shouldn’t it fill up to the entry vector limit for backward compatibility? Probably not so much of an issue on a 32bit OS, but on 26bit the newer CLib will be overwriting two words at the end of the chunk 2 vector table. EDIT2: These two calls are documented here |
Jeffrey Lee (213) 6048 posts |
From looking at the source, it looks LibInitAPCS_32 expects there to be empty space after the jump table for it to insert the table of addresses. I.e. the table size you report in the library chunks should be the same as for LibInitAPCS_R, but in reality you want to have a buffer of spare space after the jump table for it to patch the addresses into. |
Jeffrey Lee (213) 6048 posts |
It looks like it fills any unknown entries with MOV PC,#0. It’ll certainly get the job done, but I do wonder why it doesn’t just exit with an error if the caller requests more entries than are available. |
Jon Abbott (1421) 2651 posts |
The problem will appear on the 26bit version (which uses B instead of LDR PC), where the client specifies a buffer of &B48 as per the RO3.5 documentation. The current CLib requires &B50 for the extra two entry points and as there’s no checks being done and it doesn’t observe the buffer limit parameter, it simply overruns the end of the buffer. In my veneer code, I only fill up to the buffer limit parameter, which is probably what the 26bit version of CLib should do, to retain compatibility on 26bit OS’s. |
Jeffrey Lee (213) 6048 posts |
How are you specifying the limits? Via r1, r2, r4 and r5 on entry, or via the library chunk structures pointed to be r0? From looking at the source it looks like it works out how much to copy and to where by looking at the chunk structures given in r0. |
Jon Abbott (1421) 2651 posts |
Specifying the limits in the stub structure (+4 = vector base, +8 = vector limit) I think I’m getting my clients confused, when specifying 183*2 words as the vector limit, it writes 185 LDR PC’s and 185 addresses, so it’s 32bit that isn’t observing the limits in the stub structure. That’s probably not an expected scenario, although it does make the vector limit somewhat redundant. chunk id 1 is okay as the size hasn’t changed, it’s only an issue in chunk id 2. As there’s no documentation, I’m not sure if the two extra entries were added when 32bit CLib was created, so it may or may not be an issue. The 32bit side should probably take the vector limit and fill to half, so it doesn’t overrun the limit specified by the client. |
Jeffrey Lee (213) 6048 posts |
Well I just took the simplest C program I could, found the library chunks at the end of the program, reduced the CLib chunk size from 185 entries to 183, patched in SWI XOS_Exit after the call to LibInitAPCS_32 and ran it. On both my Iyonix and RiscPC (32bit CLib 5.80 on the Iyonix, 26bit CLib 5.46 on the RiscPC) the new chunk size is respected.
There might not be documentation, but there is CVS! The history only goes as far back as RISC OS 3.60, which still has both _swi and _swix listed, so they were either added for 3.60 or sometime before then. |
Pages: 1 2