VFP context switching
Pages: 1 2
Jeffrey Lee (213) 6048 posts |
I spent most of yesterday reading through the VFP docs, and after a while was able to write some code to enable the VFP/NEON unit and perform some simple maths. So now we know how to make it work, the question is how we want to handle the context switching. The way I see it, we have three approaches:
As you can probably guess, I’m leaning towards option 3. However, what I’d really want is for VFP to be a true ‘opt-in’ system – i.e. if a task hasn’t asked for VFP, or if no context is active, then attempting to use VFP/NEON instructions will result in an abort. This will help make sure that programs are using the context system properly instead of just running the risk of clobbering another task’s context. Unfortunately I’m not sure whether doing this is such a good idea, since it will make it hard for well-behaved programs to make use of VFP (e.g. a SWI may want to do some light VFP calculations, for which a full context save/restore would represent a large overhead). But since VFP exceptions are asynchronous, enforcing context switching would be desireable in order to allow the blame to be placed on the correct piece of code, as the context switch would provide a good place for the exception to be checked for. Enforcing context switching would also provide a way to make sure that the VFP unit is in a sensible state when a VFP-using SWI attempts to use it. Essentially then, I’d envisage there would be two types of contexts – “light” and “heavy”. “Heavy” contexts are used by Wimp tasks and the like, and store the full VFP state and registers. “Light” contexts only store the VFP configuration and none of the registers; registers must be stored manually as and when they are needed, and so are more suited to transient code like SWI calls. Admittedly the SWI call could just do all the context switching manually, but due to the variations in the different VFP implementations, and the possibility of further variations in future revisions, I think it’s for the best if common code is used where possible, even if it results in the overhead of an extra couple of SWIs. If necessary the SWI overhead could be removed by allowing the context switch code to be called directly from privileged modes, making the code practically as fast as if the VFP using SWI performed manual context switching. Also, since the VFP/NEON register file is quite large (32×64bit registers), and VFP operations can be quite slow, it would be good if the system was designed to support lazy context switching, as described in the ARM ARM - i.e. when a context switch is requested the OS merely disables access to the VFP/NEON coprocessors and then performs the context switch when a failed access results in an undefined instruction abort. Although in a worst-case scenario lazy context switching will hurt performance, I think that in most situations (i.e. wimp tasks responding to mundane events) it will result in an improvement. With that in mind, I’d propose an API similar to the following:
Now, about exception handling. There are two bits of good news – the first is that ARM provide some standard code with app note 98 for handling all the possible exceptions that VFPv1/v2 can produce (e.g. both exceptions due to unimplemented features, and exceptions due to bad math). The second bit of good news is that everything I’ve read indicates that the above code isn’t needed with VFPv3, so we don’t (yet) have to worry about integrating the code with RISC OS. This is also part of the reason why I’ve been going with the name VFPSupport instead of VFPEmulator – since we won’t (yet) be providing full VFP emulation for VFP-less systems (and ARM’s support code doesn’t provide a full emulation anyway, so it would be a signficant effort if we wanted to provide full emulation). As usual writing this post has helped me to answer most of my questions myself, so over the coming days I’ll probably start work on the VFPSupport module using option 3 + light/heavy contexts + lazy context switching and see what issues I run into. So if anyone has any objections then it would be best if they spoke up before the code hits CVS! (Although I will give ROOL a quick poke in this direction if they haven’t responded before I’ve taken things too far) |
Ben Avison (25) 445 posts |
I haven’t studied the ARM VFP support code in detail either, but I think it’s important that any RISC OS SWI interface is designed to fit in with it in a similar way that everyone can now see that the FPEmulator SWIs fit with the ARM FPA support code. (Not that I’m that intimately familiar with that either, sadly.) Some random observations…
|
Jeffrey Lee (213) 6048 posts |
Good point, I should definitely take a closer look at the support code before settling on the API.
Yes. I’d probably be a bit more keen on trying to integrate the support code with this first version if I wasn’t put off/confused by all the legalease in the licence! (you can’t use it on a dual monitor computer without purchasing a floating licence, you can’t copy the documentation even though it’s freely available for viewing/download on the ARM site, etc.)
True, although if it’s a program compiled specifically for FP-less hardware then GCC’s softfloat library would still be better :)
Yes. As far as I know the VFP support code is re-entrant enough to cope with switching contexts while in the middle of processing a bounced instruction, although I’ll certainly check to make sure. Other than that it should be trivial to provide the fast, SWI-less context switching API that I touched on in my post.
Yes, I was considering having more than just the two context types available, but was hesitant in case it made it too complex. “light”, “heavy”, and “EABI” sounds like it could be a sensible comprompise. |
Terje Slettebø (285) 275 posts |
Just to add to this: Using Jeffrey’s VFP setup code, I’ve made a simple test of the VFP, and put it up at the extASM page. If you test it, be sure to get the latest version of extASM, as well, as the earlier version didn’t have support for the “generic” coprocessor instructions, needed to set up the VFP unit. It’s just a simple test, adding two floating point numbers and storing the result, but it shows that the VFP is alive and kicking on RISC OS. :) |
Jeffrey Lee (213) 6048 posts |
After having a quick look through the VFP support code docs:
|
Jeffrey Lee (213) 6048 posts |
I’ve been looking through the VFP/NEON docs some more today, and it looks like there are many different architecture variants available. This could result in quite a complex API if we want the VFPSupport module to be able to indicate to the code that a particular feature isn’t available. For VFP the factors are as follows:
If NEON is present then there’ll always be the 32×64bit register file, and if VFP and NEON are present then it’ll be some version of VFPv3 with 32×64bit registers and single precision float support (double precision is optional) Detecting which VFP/NEON features are available is also a bit complex because the MVFR0 and MVFR1 registers only became part of the VFP spec with ARMv7/VFPv3. So writing code to handle VFPv3 is fairly easy, but for VFPv2/VFPv1 the code may have to rely on some external interface to determine what hardware is available and what features it has. Also, about those deprecated FLDMX/FSTMX instructions: Because a double precision value occupies two single precision registers, ARM added the FLDMX/FSTMX instructions to allow the coprocessor to load/store data in an implementation-defined way for situations where the CPU doesn’t know what format of data is in the registers (i.e. during context switching). The VFPv1 defines two formats FLDMX/FSTMX can use to store the data – “standard format 1”, where the registers are stored straight to memory, and “standard format 2”, where a header word is inserted to flag which 64bit registers contained doubles and which contained singles. Standard format 2 seems to have never been used, so in VFPv2/ARMv6 they deprecated FLDMX/FSTMX and changed the spec to make it clear that the other load/store instructions can just be used instead without resulting in any data corruption. |
Jeffrey Lee (213) 6048 posts |
For the past few days I’ve been mulling over the specification for the SWIs. To keep things simple the initial specification is ignoring all aspects of hardware variants, except for the absolute minimum (which is the number of doubleword registers supported by the hardware/emulator). But the are still problems remaining with how to handle nested contexts, and how to handle reentrant pieces of code (i.e. SWIs) Basically the nesting problem can be summed up as follows:
The solution to this problem is obviously to make sure that all of thread A’s registers get saved out to memory at some point by the VFPSupport module. But the hard part is finding a sensible place to do so that doesn’t hurt performance (compared to a fully lazy system), and minimises the risk of people making mistakes when writing VFP/context switching code. At the moment the specification I’ve got allows you to pass various flags to the context switching code to indicate when/if the full context should be saved, but I think that there are too many options and it will just confuse people, or it will be all too easy to make a mistake and introduce a situation where the code will fail. I’ll try simplifying it to a simple ‘sync’ operation and see what happens – then it’s quite easy to say in the spec that whenever a thread switch occurs a sync call should be made so that VFPSupport knows to save the full register file the next time a context switch is made. But to get that to work right it would have to be hooked into the OS at various places – the kernel IRQ dispatcher would need to call it so IRQ handlers can use VFP, the callback dispatcher would need to call it for each callback, etc. But on the other hand this may be a non-issue for IRQ handlers and callbacks, since most OS’s don’t class them as threads anyway (and thus they’d be banned from doing anything that could require a thread context switch) For reentrant pieces of code there are a couple of issues at play – there needs to be some way of initialising the context with the values you want to use each time (i.e. the FPSCR register), and there needs to be some way of dynamically allocating memory to store the contexts registers should the code get preempted (e.g. a reentrant SWI that calls itself, or two threads that call the same reentrant SWI, etc.). In practice this would probably mean storing the registers on the stack using a save area that gets specified when the reentrant context is activated. But I don’t really like the idea of that since I want to keep the context save areas somewhere safe where bad code can’t corrupt them (and where we’d be free to increase the size of the context dumps without worrying about knock-on effects for programs with limited stack space). |
Jeffrey Lee (213) 6048 posts |
A draft spec for the VFPSupport SWIs is now on the wiki As indicated above, I’ve kept it simple for now by ignoring the complexities of the different VFP configurations. I’ve also gone with the solution of using user-allocated context save areas for reentrant code, and eliminated all nesting-related problems by simply dropping support for light contexts. However light context-like functionality can still be obtained by creating stack-allocated contexts that are large enough to contain just as many registers as your code requires. Note that Ben’s suggested EABI context currently isn’t supported either, since it can introduce similar problems with nesting. In terms of things missing from the spec, I/we still need to work out:
|
Jeffrey Lee (213) 6048 posts |
After much delay, the first version of VFPSupport is now in CVS. I’ve also added it to the OMAP ROM build, and updated the API docs to match the API that’s now in use. Remaining issues:
But the good news is that the stuff that is implemented does seem to work properly :) |
Trevor Johnson (329) 1645 posts |
That sounds brilliant! Until there’s a new ROOL ROM build uploaded (or when I get the build tools running again) I can’t see what the testing (with Test/test1,ffb) does. Never mind, but I noticed that the file isn’t a tokenised BASIC program – however, Zap sorts it out if you change the Mode to BASTXT and then save as BASIC. Is this me missing something obvious, or is it intentional? Thanks so much for all the effort that’s gone into this, Jeffrey. |
Jeffrey Lee (213) 6048 posts |
Nothing particularly interesting unless you like lots of indecipherable debug output! Next time I update the code I’ll probably add a proper description of what output to look for.
BASIC (or the RISC OS 5 version, at least) will happily load and run detokenised files just by double-clicking on them. It will spit out a “program renumbered” message because I didn’t put any line numbers in, but that’s a small thing compared to the tons of output the program itself produces. And, since you’re no doubt wondering – it’s a detokenised file so that it can be diff’d by CVS. Although admittedly it will all end in tears if someone checks in a tokenised version by accident, so I might change it to a tokenised version (or drop the ,ffb extension) just to be on the safe side. |
Trevor Johnson (329) 1645 posts |
It’s the ,ffb which threw me. Zap didn’t like it but I never thought to actually just try to run it…
...which is because I wasn’t aware of that! Thanks for the explanation.
You mean CVS/SVN/whatever it is can’t decode tokenised BBC BASIC? ;-) |
Bryan Hogan (339) 595 posts |
BASIC has dealt with text format files in that same manner all the way back to RISC OS 2 at least. I used to take advantage of that when using the early versions of Edit which couldn’t load tokenised programs. This was pre-Zap of course :-) |
Trevor Johnson (329) 1645 posts |
Well I never! (Mind you, I’m not a programmer.) |
Steve Revill (20) 1361 posts |
There’s a new OMAP build on the ROOL site now. |
Jeffrey Lee (213) 6048 posts |
Cheers Steve, although it’s already out of date since I just checked in the code to add VFP context switching to the Wimp ;) To keep things simple the context switching isn’t tied down to Wimp_Initialise version numbers or Wimp_Poll flags or anything like that; instead it just performs context switching for all tasks. Each task (created by Wimp_StartTask) starts with the null context active, so it’s down to each program to create/destroy contexts as needed.
After looking into this, it looks like FPEmulator only updates the FP register dump when an FP exception occurs. So since standard VFPv3 doesn’t generate exceptions, if we followed FPEmulator’s lead then the VFP register dump would never get populated. |
Trevor Johnson (329) 1645 posts |
If you’ll excuse the naive question, is the intention that the VFP/NEON work could potentially enable RISC OS support for Adobe’s Flash Player? Version 10.1 has been updated to r105, but I guess the sources aren’t available because it’s a proprietary system. However, I wonder what conditions Adobe’s production agreement would impose. |
Jeffrey Lee (213) 6048 posts |
VFP/NEON support is certainly a big step towards getting the flash player getting. If we got access to the flash player source code then (once GCC is ready) it would probably be fairly trivial to get it working. But if we can’t get hold of the source code then things are going to be a lot harder. With regards to GCC - a week or two ago I did some initial work to get VFP/NEON working with the (work-in-progress) RISC OS port of GCC 4.6. The compiler seems to be working OK, but before I can release the changes I need to work out the best way of getting unixlib properly integrated with the VFPSupport module. This will probably also require a few changes to VFPSupport itself, to come up with an API that works well in the real world instead of just in my head. |
Trevor Johnson (329) 1645 posts |
I’ve put a question on the Talk page of the above TI wiki page . If it’s not answered there, perhaps ROOL would consider engaging in advance discussions with TI/Adobe regarding this. It’d be useful to know the conditions under which the source code could be obtained.
Yes, I saw (but naturally couldn’t fully comprehend) your discussion and considered posting this ARM hardfloat info there but decided against it because it’s over my head1. 1 ...And here I propose the ROOL “forum noise” Christmas challenge: Try to meaningfully terminate your posts with the last two words of the preceding post! (Obviously a coincidence in this case.) |
Trevor Johnson (329) 1645 posts |
"The production agreement referred to, is for customers who are in business selling a physical product, like an Android Phone, or a Linux based media player." from someone who’s made enough contributions to probably know what they’re talking about1. So I guess it’s over to ROOL to consider making contact on behalf of potential resellers. I don’t think it’s realistic to expect a number of separate small RISC OS companies to individually contact Adobe, nor realistic to expect any meaningful dialogue with such a giant. ROOL, however, is an ARM OS partner. IMHO we need an idea of cost before exploring other options (and granted that OS support is yet a long way off too). (beagleboard.org discussion of some relevance too.) 1 Edit:
It turns out that was Prabindh Sundareson’s blog, and that he’s a graphics manager at TI. |
Trevor Johnson (329) 1645 posts |
So do you think there’ll also be scope for porting GStreamer? |
Jeffrey Lee (213) 6048 posts |
Probably. |
Trevor Johnson (329) 1645 posts |
Thanks. And (back to Flash) there may possibly be something later on that can be used from the Pandora community. |
Jeffrey Lee (213) 6048 posts |
I’ve just checked in some VFPSupport changes, to make life easier for the code that’ll go into GCC (and probably anything else that has to manage contexts). Basically the changes legalize the act of placing contexts in application space, and legalize the ability for programs to move/copy/delete inactive contexts without having to tell VFPSupport about it. This means that you can now safely create a context in application space and (hopefully) not have to worry about cleaning it up when your program exits, since the Wimp should safely deactivate it when the task dies. Unfortunately things are still a bit fuzzy for programs launched via system() or similar, where control will return to the parent on exit – at the moment it’s down to the parent program to set everything up properly before/after running the new program. I’ve also made one backwards-incompatible change: VFPSupport_ExamineContext now returns the context size in R4 instead of the context ID. (There’s little point in it returning the context ID anymore, since ‘context ID’ is now synonymous with ‘context pointer’). Obviously once programs/compilers start making proper use of the module I won’t be making any more backwards-incompatible changes, but for now it should be safe enough. Hopefully in a day or two I’ll be able to start properly testing my development version of GCC. |
Trevor Johnson (329) 1645 posts |
Now 10.2/10.3. "Request for access is approved on a case-by-case basis to end equipment manufacturers/customers." |
Pages: 1 2