RISCOS Hypervisor
Pages: 1 2
Jon Abbott (1421) 2651 posts |
I’d like to gauge how much interest there is for a Hypervisor in the community and how willing the RISCOS devs are to assist extending RISCOS where appropriate. From the experience I’ve gained over the past 15 months coding the JIT and partial VMM/Hypervisor that manages software running under it within ADFFS, I believe it will be possible to code a full type 2 Hypervisor that works from ARMv3 (ARM610) upwards. On ARMv3 (ARM610) thru ARMv5 (80321) there’s no hardware virtualization, but that’s not really a limiting factor. The CPU mode can be paravirtualized (ie all code runs in USER mode) to ensure the Hypervisor is running at a higher security level and separate user/privileged page tables maintained to ensure the page security is correct in the appropriate guest CPU mode. ARMv6 (ARM11) adds Fast Context Switch Extensions (FCSE) which is ideally suited to hosting 32mb machines without the hit on flushing the TLB on every Virtual Machine Manager (VMM) switch. 32mb would be fine for hosting Archimedes class VM’s (A3xx thru A5xxx), but may be a limiting factor for A7000/RiscPC VM’s. However, as the limit on application space is ~27.5mb, there’s nothing to stop you firing up multiple VM’s to run every app on independent machines. Unfortunately ARM have changed it’s implementation several times and there’s no backward compatibility so it may be simpler to take the hit on flushing the TLB and not use it. ARMv7-A adds Large Physical Address Extension which is getting closer to Intel/AMD virtualization extensions, but isn’t quite where it needs to be so is probably not worth pursuing. I can also see two issues with it around getting the CPU into HYPER mode initially (which has to be done at reset) and the way they’ve implemented hardware vectors, which are offered to the guest before the Hypervisor. I’m proposing to code a Hypervisor that will run on ARMv3 upwards and sits between RISCOS5 and the VMM’s. It will take over the hardware vectors and steer IRQ’s to the host OS and where appropriate trigger guest IRQ’s (eg VSync/Timers etc). This is already implemented in the upcoming release of ADFFS and just needs extending to be trigger appropriate Mouse/Keyboard IRQ’s on the Guest when it has focus. On top of the Hypervisor will sit various selectable VMM’s allowing you to host ARM3/IOC up to StrongARM/IOMD. These will provide both the virtual hardware and Hypercall code to resolve incompatible CPU instructions. Although ADFFS has a fast performing JIT (50%+ host speed) its done with codelets. These have the benefit that all code ends up running natively on the host CPU and the JIT eventually stops processing instructions, but does mean maintaining a heap of codelets and remove them as the original instructions are overwritten. I’m proposing to start from scratch for the Hypervisor and use Hypercalls instead of codelets for sensitive instructions and either emulate them or use hardcoded code blocks where appropriate inside the Hypervisor. MOVS PC, R14 for example could have a fixed Hypercall code block, but LDR R0, [PC, #24] may be emulated. ADFFS already handles self-modifying code and I’m proposing to take that across as is, its done though judicious use of memory access protection and an Abort handler that either proxies memory writes or cleans the cache as appropriate. The downside is it has to split the code and data into separate memory areas, however switching to Hypercalls means the code doesn’t have to sit within the 1st 32mb, so can run outside of the VM’s memory. Performance wise is unknown currently, the biggest hit on what ADFFS currently can achieve is the hit on flushing the TLB when switching VM’s and exiting out to the host. This could be mitigated to a certain degree by allowing a VM to take priority where appropriate. If you’re playing a game full screen for example, you’re probably not worried about other VM’s running in the background, so the only context switching is going to be when passing back IRQ’s to the host OS. Initially I’ll start with an A440/1 VMM and with community support build other VMM’s for RiscPC / Iyonix etc. I’m not proposing to go beyond an A440/1 myself as emulating a RiscPC is a fair chunk of work, but the Hypervisor will be designed to allow other VMM’s to be loaded as Modules, allowing it to be easily extended and future proofed. It will be the responsibility of the module to provide both the virtual hardware and CPU instruction interpretation into Hypercalls with code appropriate for ARMv7. There are however some outstanding questions that need to be resolved:
|
Rick Murray (539) 13850 posts |
Some random comments:
Another problem, which affects a recent discussion regarding the supposed relative ease of porting 26 bit software to 32 bit is when non-obvious things are done to registers. There was one program that went a little like this:
Making this 32 bit by getting rid of the MOVS meant the program now failed in two ways; but most notable was that PC was some bogus value so the thing crashed.
Very much so. I think we need not only a standardised high resolution timer API, but also something akin to CallBack that can work in millisecond units, not centisecond ones. I was idly looking at documentation for porting webkit the other day (no, I don’t plan to, was just interested in seeing if any interesting obstacles were encountered) when I stumbled upon http://www.cranksoftware.com/blog/webkit-porting-tips-the-good-the-bad-and-the-ugly/ which said:
If “T” you mean timer; are you looking at it from a host or an emulated point of view? I think most hosts these days have fewer available timers (the Pi has four but it seems only two are available to the OS and RISC OS itself uses one…).
This is something that may need to be investigated along the way as well. For example, my suggested for cheap’n’cheerful-PMT absolutely requires the Wimp to do this natively (not a third party hack like Wimp2) because that appears to be about the only place where it is possible to associate tasks and task switching. As far as I can determine, none of this mechanism has been documented anywhere, so it is fun trying to figure out WTF is going on. I think it is the AMB stuff in the kernel that does the actual memory fudging (under direction of the Wimp) but I have only had a cursory examination of the code. |
Jon Abbott (1421) 2651 posts |
Thankfully, the JIT in ADFFS and the new one I’m proposing remove the need to manually patch apps to 26bit. You’re right though, sometimes it does take a few passes to figure out exactly what code is doing when it’s manipulating the PSR.
Sorry, should have made myself a bit clearer. I was referring to triggering the 4 IOC Timers on a guest VM running as an IOC based chipset – nothing to do with the host OS. Realistically, taking the Pi as an example, there’s only one free timer so without some API for sharing it, it’s not much use. I can write my own internal API to share the timer, but that will only work provided there’s no HAL_Timer subscribers on the host OS. Perhaps RTSupport can be extended to support higher resolutions, the framework is already there but it would need modifying to use a HAL_Timer instead of TickerV (I think it’s currently based on TickerV) and deal with overlapping and closely triggering events.
It was your PMT suggestion that got me thinking in more detail about a Hypervisor as the requirements are much the same. My thought was along the lines of running every app in it’s own VM and paravirtualize the Wimp SWI’s back to the host OS to seamlessly integrate them into the host. Akin to XP Mode on Windows if you like. I’ll implement Wimp support in ADFFS along those lines, with ADFFS being the Wimp task and it isolating the actual tasks running under the JIT so they’re hypervised. At this point ADFFS could break into the task at an appropriate point and provide PMT outside of the guest app calling Wimp_Poll. I’ve not looked at Wimp2, but from the discussions so far it sounds like it needs perfectly behaved apps for it to work, so isn’t ideal. ADFFS is far more integrated into the host as it takes over all hardware vectors whilst the JIT is running, in this context I believe all of the issues mentioned with Wimp2 can be avoided. ADFFS has more technical challenges than a Hypervised VM though, as it has to make the host OS think it’s seeing a RO5 app and the app think it’s running on a RO3 OS/machine. With a VMM its running the original OS so compatibility issues are no longer a concern from the hosts perspective. The Wimp Module could then simply be replaced with a shim that passes everything back to the host OS via Hypercalls. |
David Feugey (2125) 2709 posts |
Damned… RISC PC emulation is really needed :) Could I suggest something with less work as a complete RPC emulation, but more useful for modern uses? A Qemu like emulation, for Linux, NetBSD and possibly even RISC OS 5. Could be also a Pi emulation. That would solve all our problems of browsers (by running Linux), security (by isolation), etc. Of course, 32 MB is not a lot, but for containers, it could be enough. To get more memory, hardware virtualization is an option too, on Pi2. Or secure mode on Cortex? Anyway, I like the idea :) |
David Feugey (2125) 2709 posts |
Could solve many problems with a ROS5 > ROS5 translation (just to add PMT). |
Jon Abbott (1421) 2651 posts |
I hear you, but there’s no enough hours in the day, I’ve already put in around 400 hours on ADFFS in the past month and 17 hours today alone. I’ve been slowly adding RiscPC support to ADFFS over the past year, the VIDC translation for example was all internally done as VIDC20 from the outset so the blitter is already emulating a VIDC20. In the upcoming release of ADFFS, I’ve started adding extensions to get RiscPC games to work but it’s still nowhere near where a VMM would need to be. I still need to look at adding ARMv4 extensions to the JIT, not to mention IOMD emulation. The later is the time consuming part as the documentation is patchy at best. For a VMM it would need accurate MMU and cache emulation, which hasn’t been done to date on any emulator. Red Squirrel is the closest so far, but is closed source so not much help. It would probably be a case of a lot of trial and error to get a working StrongARM.
I did consider RO5 > RO5, with a RiscPC/StrongARM VMM it would certainly work. |
David Feugey (2125) 2709 posts |
I know (so the smiley).
Support of HOMM2 would probably please many users.
I was more thinking of the Wimp emulation part only (without VMM), to provide things like PMT (with small overhead). |
Jon Abbott (1421) 2651 posts |
It needs adding to ADFFS at some point, so I could probably create a cut down module for it at a later stage. |
David Feugey (2125) 2709 posts |
Cool :) |
rob andrews (112) 200 posts |
Look what appeared on the inter web today http://genode.org/documentation/articles/arm_virtualization |
Jeffrey Lee (213) 6048 posts |
Actually, FCSE was introduced in ARMv5. In ARMv6 it’s deprecated, and in ARMv7 support for it is optional. ARMv7 adds a much better way of doing things – dual translation table base registers. You can configure the system so that all virtual address below N use TTBR0 while all addresses above N use TTBR1 (where N is a power of two between 32MB and 2GB inclusive). Using the short descriptor format (i.e. not using LPAE) TTBR1 is designed to be used for global memory while TTBR0 is designed for process-specific memory, using an 8 bit address space identifier (ASID) to identify the process. I’ve long had a goal of using this to replace the use of page table manipulation during Wimp task swapping, as it reduces the page table manipulation required for task swapping to just a couple of CP15 register writes and some sync instructions. If we combined it with a version of the OS which has high processor vectors enabled (and if we also relocated/removed scratch space), and were to implement support for sparse application slots then it would allow us to give each Wimp task complete freedom in how it maps and manages its memory. Incidentally, if anyone feels like modifying FPEmulator to not require a word of workspace in zero page then that would be appreciated – at the moment that’s the only module, apart from the kernel itself, which needs to know at compile time whether zero page is high or low. Also feel free to start lobbying for ROOL to host ROM downloads with high processor vectors enabled (or to make high processor vectors the default), the OS has supported it for over 3 years now but probably hasn’t had much exposure to all the nasty third-party apps which are full of null pointer dereferences or which deliberately access zero page locations. If I was to start a new hardware port I’d definitely make it so that high processor vectors were enabled by default, forcing programmers to fix their badly-behaved software if they want it to work, but so far it seems that everyone else is jumping on all the new hardware before I get chance! |
Jon Abbott (1421) 2651 posts |
I did struggle to establish exactly when it was introduced. ARM should really improve their documentation! The immediate deprecation of it just highlights ARM’s somewhat scattergun approach to virtualization. ARMv7’s way of doing it, which on the face of it does look like a workable solution, is badly let down by their implementation of the hardware vectors. As far as I can establish, they’re passed to the guest VM before the Hypervisor – which is a major mistake. I believe they’ve learned the error of their ways and are correcting in ARMv8.1? I’ve gleaned all this from reading numerous attempts to code VMM’s on ARM from the likes of VMware etc, so don’t quote me on this. The ARM detail is hard to find via ARM’s site. As with cache flushing, virtualization is now an utter mess on ARM and somewhat in flux, so we have to fall back to worst case scenario…hence my plan to code all this for ARMv4 and ignore ARMv5+ extensions. At least until they settle on a final design and we’re on the Pi3..4..5 etc. We could use FCSE etc where appropriate to reduce the hit on the TLB but just need to be mindful of the limitations, which are pretty big in some cases….32mb limits being an example.
I can probably do this…dip my toe into RISCOS development so to speak. Might need some hand holding to figure out how to use the tools/compiler though.
Who controls this kind of decision? Why do we not have a regular steering committee to cover these sorts of decisions? Even meeting once every six months would cover most of these big ticket changes.
From the work I’ve done in the next ADFFS release to handle LDR’s in page zero, this is a big can of worms. If you look at this post on the JASPP site, I’ve detailed the games (out of most of the supported ones on the Pi) that read from page zero inadvertently due to bugs. I opted to fix the bugs, but in reality we’d need to handle them on-the-fly. My plan here requires vectors to go high so we get Aborts, I even considered switching ADFFS to require high vectors on the Pi and forcibly move them whilst it’s running. With vectors high, ADFFS’ JIT would see a massive improvement in speed as it wouldn’t need to emulate LDR’s on 1st pass to establish if they were reading from page zero. Writes are dealt with via an Abort as I restrict the pages whilst ADFFS’ JIT is running – all very clean and simple to fix up in real-time. I can code an Abort handler for this if you like, to allow vectors to go high and proxy any read/writes to page zero. We’d have to write off 0…4000 as unusable space though – which I think is probably the intention anyhow. What’s your opinion on this?
From what I’ve seen to date in legacy games (ref. link above), the bulk of the issues are caused by:
Yes, I’ve noted this as well. Were you involved in the Pi2 process? Was there any kind of peer review of the build before it was publicly release? The way it magically appeared and was then quickly fixed to resolve some minor issues certainly raised some questions about the current process. That’s not to denigrate any of the work the somewhat elusive elite team developers do to bring these builds to release, but it doesn’t seem like a very “Open” process. Having said that, I can understand there may be commercial secrecy around the process – the Pi2 was certainly sprung on the world out of the blue. Obviously a decision was made between Ebon and ROOL to bring a RISCOS build out for day 1, which I wouldn’t expect to be made public. It’s a tricky one, I grant you. |
Jeffrey Lee (213) 6048 posts |
No idea – I’ve mostly been ignoring ARMv8+. Incidentally, if anyone feels like modifying FPEmulator to not require a word of workspace in zero page then that would be appreciated The beginner’s guide to ROM builds is probably where you’ll want to start.
FPEmulator’s undefined instruction handler needs to be able to find FPEmulator’s workspace pointer. Because of the way the OS-agnostic core code (which was written by ARM, AIUI) operates I don’t think there’s a spare register which can be used to hold the value – instead it uses the AdrWS macro to look it up on-demand. Presumably Acorn just stuck it in zero page because that was the easiest solution, or maybe there were still in BBC mode thinking that statically allocated workspace is a good thing. The solution I was thinking of was to move the initial undef entry point into the RMA, using a PC-relative LDR to get the workspace pointer (or maybe an ADR if the entry point lives in the workspace itself). Then work out some way of plumbing the value through the core code so that all of the places which need it can still access it (making r12 or some other register the workspace pointer would be the obvious solution). It looked like it was possible last time I looked at the code, but I hadn’t actually tried doing it yet, so I clearly still have some reservations ;-) Also feel free to start lobbying for ROOL to host ROM downloads with high processor vectors enabled I suspect you’ll need ROOL to answer those questions!
Yeah, some kind of abort handler to provide compatibility for old/buggy software would be good. It’s something I was planning on doing myself but evidently never got around to. And making the first 16K of memory, and eventually the first 32K, completely unmapped by default would be the eventual goal of the changes. With zero page relocation enabled, the first 16k of workspace and the processor vectors get moved to &ffff0000. However the low 16K of address space isn’t completely empty – there’s one page (at &1000 IIRC) for the Debugger to use as its workspace. The reason for this is that the Debugger wants to be able to use “MOV PC,#xxx” in order to jump from any breakpoints into its code. We’d probably want to fix that by changing it to use the BKPT instruction (upside: No more static workspace needed, downside: Corrupts some registers in ABT mode) Also note that rather than have your abort handler assume the location of the relocated workspace values, you’d want to look up the locations using OS_ReadSysInfo 6. At the moment that SWI only lists the values that are used internally by RISC OS, so if there are other values which have leaked out over time then we’d probably need to extend it to expose those as well. so far it seems that everyone else is jumping on all the new hardware before I get chance! Well, I’m not really complaining. There are still plenty of things left to do before we come close to using the full potential of a BeagleBoard or a Pi 1, let alone all the multi-core machines that have come after them.
Nope. The hardware + software release was as much of a surprise to me as it was to (almost) everyone else here. |
Jon Abbott (1421) 2651 posts |
Ah. I avoided that issue by not using one in my Undefined handler ;)
Is the core code not in the Module then?
I’d need to code it for ADFFS if we’re going to put vectors high, so can spin it out into a dedicated Module easily enough.
This would need resolving, we’d been page zero completely clean for any fix-up Module to work. I can look at rewriting the Debugger so it doesn’t have that requirement. I’ve not looked at BKPT so will have to do some reading From a quick look at the Debug exceptions page it looks like ABT_r14 is the only register altered? I’d expect that though, is there more going on? |
Jeffrey Lee (213) 6048 posts |
The solution I was thinking of was to move the initial undef entry point into the RMA, using a PC-relative LDR to get the workspace pointer (or maybe an ADR if the entry point lives in the workspace itself). Then work out some way of plumbing the value through the core code so that all of the places which need it can still access it Yes. The undef entry point would be in the RMA (so it can find the workspace pointer), then it would call into the core code held in the module.
I think that page may be talking specifically about when CPU debugging is enabled (e.g. via JTAG). If the system is running normally then it looks like BKPT operates by generating a prefetch abort (I’m not overly familiar with the behaviour myself – I vaguely remember having a couple of issues when I tried to use BKPT for JTAG debugging once) I’m guessing the IFSR will show the abort cause as being an ‘instruction debug abort’, so I’d expect to see the following registers changed:
It might be desirable to make some changes to the prefetch abort handler in the kernel so that it can cleanly pass the abort onto the debugger module, otherwise it will go straight to the prefetch abort environment handler, which will most likely go straight to triggering a crash dump from the C runtime or whatever. |
Rick Murray (539) 13850 posts |
Looking at the description, it looks like BKPT would be ignored if there is a branch in the pipeline. Umm… is this really what we want in a debugger? |
Jeffrey Lee (213) 6048 posts |
Yes, that’s exactly what we want :-) Let’s say you have a branch instruction at &8008. If your program is entered by a branch to &8000 then the CPU might actually start off by prefetching &8000-&8020 into the pipeline. Until the branch instruction at &8008 reaches the execute stage of the pipeline, the CPU might not know whether the instructions at &800C-&801C are actually needed (especially if it’s a conditional branch). If the branch is taken, the CPU will flush the instructions from &800C-&801C out of the pipeline and start fetching instructions from the new location (if it hasn’t already started fetching them). If the branch isn’t taken, those instructions will remain in the pipeline, and if &800C happened to be a breakpoint then it will now find itself in the execute stage of the pipeline and the prefetch abort will be triggered. So what the description is saying is that although the BKPT instruction causes a prefetch abort, the prefetch abort only occurs when the instruction has made it far enough through the pipeline that the CPU will try to execute it. This is the same as with the most common type of prefetch abort, where you’re trying to execute unmapped memory or memory for which you don’t have read permissions – the abort only occurs when you try executing the location, not when the CPU first tries accessing the memory and fails. |
Rick Murray (539) 13850 posts |
Thanks for the lucid explanation. That makes perfect sense. Why couldn’t ARM have just said it like that? |
Jon Abbott (1421) 2651 posts |
I’m guessing the IFSR will show the abort cause as being an ‘instruction debug abort’, so I’d expect to see the following registers changed: Normal Abort behaviour then, that’s fine – we’re not expecting Debugger to handler Aborts in Abort mode are we? It’s aimed at general programmer debugging for USER/SVC/IRQ? I did implement re-entrancy Aborts in my Abort handler but it was a bit messy and did eventually disable it. It might be desirable to make some changes to the prefetch abort handler in the kernel so that it can cleanly pass the abort onto the debugger module, otherwise it will go straight to the prefetch abort environment handler, which will most likely go straight to triggering a crash dump from the C runtime or whatever. We can either have a permanent hook in the Prefetch Abort Handler to pass to the Debugger Module – or have the Debugger Module independent and add itself to the Prefetch Abort Handler as required. Considering the amount of use its likely to get, the later may be a better option as it keeps Debugger self-contained – the obvious caveat being that there’s currently no method to cleanly insert/remove1 from the hardware vectors if they’ve been since been taken over by someone else. I’m thinking keep Debugger self-contained. Where’s the FPEmulator source? I can’t seem to find it in the CVS tarball – it’s not obvious as any rate. 1 Can’t we fix that problem with a jump table that acts as a middle-man and have it pass the jump table address instead of the actual handler address when handing out handler address to new handlers? You just shuffle the table entries up/down to insert/remove handlers then. |
Jeffrey Lee (213) 6048 posts |
Probably not.
The way I see it, there are three approaches:
The third option is obviously the one I’m leaning towards, although since one of the main users would be the abortable DAs we’d probably want to wait until we’re ready to implement that before we try changing the abort handling.
mixed.RiscOS.Sources.HWSupport.FPASC. When looking for things, ModuleDB is your friend |
Jon Abbott (1421) 2651 posts |
This was my thinking, wasn’t aware of the third option although can see the advantages. Sounds like we’ll have to go with the hardware vector approach initially and revisit at a later date once your abort descriptor method is in place. Might be worth starting a separate thread on this if you’ve not already defined the API. mixed.RiscOS.Sources.HWSupport.FPASC. Cheers, another hidden gem in the source code!! |
Jon Abbott (1421) 2651 posts |
From what I’ve seen to date in legacy games (ref. link above), the bulk of the issues are caused by: I’ve been looking into issue 2, more specifically how its causing Conqueror and Pac-mania to read from page zero. I’ve tracked the problem down to the GateOn entry not always being called when the sound is first used and as a consequence the Voices are not initialising the SCCB with their working variables. Unless I’m misunderstanding how it should work, I believe GateOn should always get called once when the sound is first used and again whenever the envelope changes (which may not have been implemented in RISCOS?) The problem doesn’t occur on Arthur, but does from RISCOS 2 onward so it’s a issue that’s been around a while. In the Level 1 instantiate code it doesn’t appear to set the GateOn flag when the sound is instantiated, but instead initialises the flags to ForceFlush, so instantiating doesn’t force a call to GateOn. The only place I can see that sets the GateOn flag is in the SoundControl code, it checks R1 for bit 7 and sets the GateOn flag if it’s clear. Looking further up the code, it looks like R1 is “emulation of amp/(env !)” or the amplitude depending on the result of R4 here There’s also a comment here suggesting GateOn is only set if R1 is &101-&17F so it may possible to make a sound without GateOn being called if R1 is &181-&1FF Most game Voice handlers that use working variables in the SCCB suffer from this issue, which makes me believe that either the documentation wasn’t clear at the time – or the behaviour changed in RISCOS 2 such that it’s possible for the Fill entry to be called before the GateOn entry. I’m not sure how this should be fixed though, possibly by changing the instantiation code to force GateOn as well a ForceFlush? GateOn should really be called when a sound is used for the first time, regardless of the amplitude – if I’m understading the purpose of the GateOn correctly. The PRM statement for GateOn says:
I’ve no documentation from the Arthur days to see if that contained more detail on it. I vaguely remember speaking to Acorn directly back in 1987 to find out how to code Voice handlers, but I can’t find the documentation, it’s probably long been lost. |
Adrian Lees (1349) 122 posts |
Seconded. I really think that any other approach is asking for trouble, with the number of system extensions/features that need to handle aborts. There are now a large number of instructions (including coprocessors, remember) that may access memory, and to duplicate this (quite possibly incompletely/incorrectly) in each of these extensions seems ridiculous. I think the interface between the kernel’s abort handler and the vector claimant, which should specify one or more address ranges when registering itself should be quite high level, transferring a number of bytes/halfwords/words/double words without regard for signedness/interpretation of the raw data. Akin, if anybody else has hardware experience, to the interface presented on the AMBA AXI bus. It’d also be expedient, when specifying the interface, to consider the possibility of the kernel cacheing information so that repeated his to a particular address range may be processed as quickly as possible with minimal decoding work. Bonus points for considering the impact upon DMA transfers :) What happens if I decide to call OS_File to load data from a DMA-using filing system into the ‘emulated’ address range? |
Jon Abbott (1421) 2651 posts |
I think all three of us are agreed this is the better route to take, it’s needs defining in detail and we need to consider the impact and work required on the incumbent Abort handler – or simply decide to replace it and start from scratch. I presume we’d do the instruction decode in the Abort handler and pass the Vector the instruction type (LDR/STR/SWP/LDM/STM + modern variants etc). In my Abort handler I do the initial decode and pass the instruction as well, so LDM/STM can do post correction. We could have the Vector pass back a parameter if it doesn’t want post register fix up to occur and have the Abort handler deal with all post corrections. This would make sense, centralising as much code as possible. Another thing to be aware of is that emulators don’t always accurately emulate Early Abort Mode, so you can’t rely on OS_PlatformFeatures 0 bit 4. I perform the test on LDR / LDM seperately and treat them independently in the Abort handler, the bit in OS_PlatformFeatures assumes both instructions behave the same – which isn’t always the case under emulation. Register wise, do we store all 14? Or try to avoid the banked registers if possible, to avoid switching CPU states? I don’t think the overhead is that great on newer CPU’s, SA probably takes a hit though. As well as DFSR/IFSR we’d want to pass FAR, possibly caching info as well. We need to consider the scenarios it’s going to be used for and ensure we cover them all, for example: Chocolate, virtual memory, Sparse DA, alignment faults, rotated load faults, protected access, page zero read/write etc.
Wouldn’t it break? Won’t DMA routines need to validate the memory before issuing the DMA? |
Jeffrey Lee (213) 6048 posts |
What happens if I decide to call OS_File to load data from a DMA-using filing system into the ‘emulated’ address range? We’d probably want to make it so that DMAManager can detect that the memory isn’t ‘real’ and make it use a bounce buffer instead. Or DMAManager could negotiate with the owner of the memory in order to lock it in place while the DMA operation is in process – e.g. if an abort handler is being used to track writes to memory (for something like BPP conversion in video drivers, where apps read/write to a fake screen buffer and the driver translates the data to a second buffer used by the video hardware) then it could DMA to the memory as normal and then notify the owner of the memory that the contents has been changed so that it can do a bulk conversion of the entire block. |
Pages: 1 2