VIDC1 / RO3.1 frame store emulation
Jon Abbott (1421) 2651 posts |
I’m proposing to implement something that sits between the OS and the current video driver to present a RO3.1 / VIDC1 emulation to the OS and application and then translate that to the actual frame store on VSync, correcting for VIDC1 registers and unsupported bit depths. The stumbling block is getting the OS to write to 1FD8000 for example, instead of the HAL frame store. Is this possible? Will any of the proposed changes to GraphicsV help? Providing a second virtual display for example? |
Jeffrey Lee (213) 6048 posts |
Yes, this is one of the situations where the GraphicsV changes will be a big help. You’ll be able to create your own video driver and tell the OS to use that instead of the standard one. Your driver can then decide on a case by case basis whether to pass calls straight on to the original driver or perform some kind of translation (e.g. providing a fake screen buffer to emulate support for low colour modes). I’m hoping to get the changes required to support this checked in sometime in the next month or two. The only tricky bit would be getting the OS to map the memory to &1FD8000 – at the moment the GraphicsV interface only allows drivers to dictate the physical address of the memory, not the logical address. |
Jon Abbott (1421) 2651 posts |
Excellent, I’ll await the updates as it sounds like you’re adding exactly what we require. Regards the dreaded 1FD8000, can’t I work out the physical address from the page no? I think I’ve seen an SWI for that somewhere? If not, do it the hard way from L2PT. |
Jon Abbott (1421) 2651 posts |
Following up on my comment about using an SWI to determine the physical address. I’m not in a position to look at the ADFFS source at the minute, but I’m fairly certain I recoded the latest version to avoid all RISC OS memory related SWI’s for physical/logical memory mapping, to avoid issues I was seeing with memory page flag corruption and bugs when remapping duplicate pages (all known issues). ADFFS is in fact quad mapping pages on IOMD which would not be possible without direct L2PT changes anyhow. So, answering my own question, I’m fairly certain I already know the physical address so the fact GraphicsV requires a physical not logical address shouldn’t hopefully be an issue. |
Jeffrey Lee (213) 6048 posts |
At some point I probably will be extending GraphicsV to allow drivers to specify the logical address of the memory, as there are some situations (like this one) where it would be useful. When I get a chance I’ll have a look at the kernel source and see how difficult a change it would be. |
Jon Abbott (1421) 2651 posts |
I’ve read through the GraphicsV and proposed changes several times now, it looks like I might be able to implement this as things stand, by hijacking GraphicsV 7, 8 and 9 and passing the new physical address for the frame buffer, relevent feature set and mode validation. One question though, when do GraphicsV 8 and 9 get called? On a MODE change, or only when the driver is initialised? The documention doesnt make this clear. Regards the proposed changes, hardware overlays are a possibility although it’s not clear if the NDA on the Pi GPU documention will allow this or not. I like the idea of starting/stoping a GraphicsV driver although I’m not sure at this point how I would interact with the existing GraphicsV driver to pass palette and MODE changes etc through. Would I simply pass the calls on to,the default driver? Extending support for logical addressing would be very handy, although not essential if I can ensure the frame store is mapped in one contiguous physical block. I’m assuming that when DA2 is sized RO does map it in one physical block, is this the case? I’m proposing to simply allocate 2mb to it as ADFFS loads and then logically map it to the four memory areas to cover RO3.1 and RO3.5 video memory blocks when a game is loaded. The frames will then be copied to the actual video memory at the FPS pace of the game adjusting for lower bit depths. |
Jeffrey Lee (213) 6048 posts |
You’d also need to intercept GraphicsV 2 (to map the pixel depth to something the real driver supports) and GraphicsV 6 (to translate the screen start/end/init addresses to the buffer used by the real driver).
The OS spams GraphicsV 8 quite a bit (including before and after mode changes), so you shouldn’t have to worry about cached values causing problems. However once your driver is started you’ll probably want to give the ScreenModes module a kick (e.g. by reloading the current MDF), as it uses a cached list of valid modes. So if you don’t get it to rebuild that list then software might not think that low-colour modes are available. GraphicsV 9 only gets called on a mode change, when the OS resizes the screen dynamic area for the new mode. However the timing of the call can be controlled by the setting of bit 5 of R0 returned by GraphicsV 8. If the bit is clear, the OS will call GraphicsV 9 and resize the screen dynamic area before the mode change (i.e. before the call to GraphicsV 2). If the bit is set it will do it after the mode change (i.e. just after GraphicsV 2). So if you wanted to turn your fake screen buffer on and off depending on the screen mode you’d probably want the bit to be set so that you can let the OS use the real buffer directly wherever possible. (The default behaviour the OS takes of asking for the memory before changing the mode may seem a bit counter-intuitive, but I think it was designed mainly for systems with dedicated VRAM, where asking for memory before changing the mode would allow the OS to complain if the mode was trying to use more memory than possible. But if you have bit 5 of R0 set then the OS assumes that the driver does its own checks (i.e. during GraphicsV 7) for running out of memory)
Pretty much, yeah.
Yes, if the driver doesn’t provide its own memory then the OS will fall back to the old behaviour of using DA 2 and allocating a contiguous range of physical pages starting from physical page number zero. Note that if a driver provides its own memory, DA 2 isn’t used at all – the OS just maps the memory in manually using OS_Memory 13 (This may change at some point, as OS_Memory 13 isn’t the ideal SWI to use if you’ve got a framebuffer which changes address on each mode change). To start with you might be able to get by with letting the OS allocate your screen memory for you using the screen dynamic area – i.e. don’t return any address from GraphicsV 9. That way all you’ll have to worry about is mirroring the memory to &1FD8000, and manually mapping in the memory used by the original driver. This should work fine on the Pi and Iyonix (where the original driver won’t have been using the screen DA), but won’t work on OMAP or IOMD (where the screen DA is used by default). |
Jon Abbott (1421) 2651 posts |
Leaving the OS to map DA2 solves that issue rather neatly. IOMD isn’t an issue as ADFFS already supports it on the Pi. I think I have enough info to start coding a solution now, thanks Jeffrey. |
Jon Abbott (1421) 2651 posts |
“GraphicsV 6 (to translate the screen start/end/init addresses to the buffer used by the real driver)” Am I correct in assuming that as DA2 starts at physical address 0, I simply need to add the original framestore physical address to R1 and then pass the call on? Tied in with this is a mem copy of DA2 to the actual framestore that happens on a frame swap. I’ve tested that separately with Zarch (with GraphicsV unclaimed) so know that bit works. I should be seeing the contents of DA2 on the screen, but get nothing which makes me think its the DAG used by the hardware that’s wrong. |
Jeffrey Lee (213) 6048 posts |
No. It starts at physical page zero, not physical address zero. You’ll probably want to use something like the following logic:
One extra thing I’ve just thought of – you’ll probably want to block GraphicsV 13 (i.e. claim the service call but return with R4 unaltered) to stop the real driver from trying to hardware accelerate copies/fills. |
Jon Abbott (1421) 2651 posts |
I’m starting to think that by letting the OS take over and use DA2, it’s causing an underlying issue. I simply can’t get an image to appear, even when hard-coding the addresses. EDIT: If the OS uses DA2, GraphicsV 9 doesn’t work as it doesn’t have a separate framestore. Testing on IOMD, GraphicsV 9 returns R0=0, R1=0 and GraphicsV 6 is offset from 0 as I suspected. EDIT2: Just a though, if I modify R1 in the GraphicsV 6 code and then pass the call on, does it remain modified? Also, will my code be called before it hits the hardware code? EDIT3: Everything works provided I don’t let the OS claim GraphicsV 9. Does the OS blank the video device or do anything related to the GPU when it takes over GraphicsV 9? |
Jon Abbott (1421) 2651 posts |
Found the problem. The issue lies with getting the GPU framestore logical address. I’m currently doing the following on Service_ModeChange: 1. Pass-thru GraphicsV 9 allowing the GPU to own it OS_ReadVduVariables is returning the DA2 address and I’m not sure how to temporarily force it to return GPU framestore. I’ve tried OS_Memory 0 to convert the GPU framestore physical address to logical, but it returns 0 for some reason – is that a bug, or feature? What triggers parameter 149 to be updated? That issue aside, I’ve proved that by hijacking GraphicsV 9 the OS can be pointed at DA2 and by hardcoding the GPU framestore logical address, I have Zarch working beautifully. Once I have this issue resolved, my next problem is figuring out how to change the GPU screen geometry when VIDC registers change. James Pond for example wants a MODE 318 × 224 pixels, how do I implement that and more to the point, in a way that results in a valid resolution that a VGA monitor will display – I’m hoping the GPU is intelligent enough to auto-scale to a valid resolution. |
Jeffrey Lee (213) 6048 posts |
The answer is that you probably shouldn’t try and temporarily force it to return the GPU framestore. If you keep switching things back and forth between your buffer and the GPU’s buffer then that’s likely to be a recipe for disaster.
It’s a limitation of the API. OS_Memory 0 only knows how to do physical to logical translations for regular RAM pages (it uses the physical RAM map to determine the page number, then checks in the CAM to see where that page is mapped). On the Pi and the Iyonix the OS treats the GPU framestore as IO memory, not RAM, so OS_Memory 0 doesn’t know how to find it when doing physical-to-logical translations (but logical-to-physical works fine, as they go directly via the page tables). You should really be using OS_Memory 13 to map in the GPU framestore yourself – if you’re overriding GraphicsV correctly then the OS won’t be attempting to map in the GPU’s framestore for you, you have to do it yourself. OS_Memory 13 will return a pointer to an existing mapping if there is one, if not it will create a new one. It’s a bit convoluted, but take a look at ModeChangeSub in the kernel sources to see what the OS does on a mode change.
You’ve got a few approaches at your disposal:
In terms of GPU intelligence, the Iyonix is pretty dumb and can produce corrupt displays if given mode timings it doesn’t like. So some kind of software scaling will almost certainly be required to get low-res or unusual modes to work properly with both the graphics card and the user’s monitor. The Pi is fairly smart, since we aren’t actually changing the mode timings at all – we’re just changing the parameters of the overlay which the GPU then scales to fit the screen. |
Jon Abbott (1421) 2651 posts |
The solution to the GPU framestore issue turned out to be remarkably simple. When GraphicsV 6 is called with R0=0 or R0=1 I simply change R1 to the original DAG address. Everything else is passed through. In effect, RISCOS is doing the frame buffer swapping inside DA2 but using only one GPU framebuffer. This has greatly simplified the code removing the need to convert memory address between 8bit and the current mode bitdepth and means I can leave the DAG handling to the existing drivers/OS…hopefully making it forward compatible. Regards the screen geometry, I missed the fact it’s all done through type 3 VIDC lists. What I’ll do is buffer up the VIDC register writes and issue a GraphicsV 2 on the next VSync. This will work, provided it doesn’t force the screen memory to be cleared. This should be pretty quick to code as the Abort handler is already in place and handling VIDC1 writes. I’m only concerned with Pi compatibility, unfortunately my Iyonix is still dead so I can’t do any testing. I now have both Zarch (8-bit mode) and Jet Fighter (4-bit mode) working…just need David’s 26bit CPU interpreter to start testing more games. The final piece of the jigsaw that I need to add, is doing the frame blit from DA2 to the GPU at the game pace, currently I’ve fixed it at every alternate VSync for simplicity. |
Jon Abbott (1421) 2651 posts |
I’ve posted some photos of a few games running under this on the JASPP forum: |
Jon Abbott (1421) 2651 posts |
While implementing the 4-bit palette change support, I’ve noticed that GraphicsV 9 (11) is being called by the OS with more than 16 palette entries – its the flashing colour code that’s doing it. Is that a bug or feature? I’ve worked around it by restricting it back down to 16 before passing it on. |
Jeffrey Lee (213) 6048 posts |
Odd… I guess that’s a bug. I’ll look into it. |
Jon Abbott (1421) 2651 posts |
I’m using GraphicsV 2 to set the screen geometry, but it’s not doing what I’m expecting. What’s the correct way to change the screen geometry on RO5? After a MODE change, I take a copy of the VIDC type 3 list passed to GraphicsV and update the horizontal/vertical display size. If for example in MODE 13 (320×256) I use GraphicsV 2 to change the width to 300 pixels, it simply reduces the visible screen down to the left 300 pixels of the 320 used by the MODE. I was expecting pixel 301 to become the first pixel on the next raster line. |
Jeffrey Lee (213) 6048 posts |
GraphicsV 2 is the correct way, but what you’re running into is a limitation of the hardware (I guess I was lying when I said that the Pi is fairly smart!). The Pi hardware seems to have a restriction that the start of each row must be on a 32 byte boundary. So although you can request a screen width of 300 (and the display will be cropped to that size), the stride of each row will be rounded up to 320 bytes. At the moment the way the OS deals with this is that any call to GraphicsV 7 (vet mode) will fault the mode if the rows aren’t a multiple of 32 bytes long. So if you check with GraphicsV 7 before attempting to set the mode you should at least be able to detect if the hardware will support it properly (assuming the driver implements that call – the current version of the NVidia driver doesn’t!). Unfortunately GraphicsV 7 doesn’t have any way of indicating why the mode isn’t supported or suggesting an alternative (the docs say that minor edits to the VIDC list/workspace values are allowed, but so far I don’t think any driver makes use of that – and it’s not documented what the OS considers to be a minor edit). In the future you should be able to use the “extra bytes” control list item to request a 300 pixel wide mode with 20 extra bytes on the end of each row. But so far I think it’s only the NVidia driver which supports that item – I haven’t hooked it up to the other drivers yet because I’m not sure if the OS understands it (not such an issue for you, but it would be an issue for if the setting was automatically used to allow for odd-sized screen modes in the desktop). And on a related note, I don’t think any drivers complain if they’re given control list items which they don’t recognise. Welcome to the party! |
Jon Abbott (1421) 2651 posts |
The power of documentation ;) Sounds like I need to recode the blit to do full emulation when the pixel width MOD 32 is not 0 and round the pixel width up to the next 32 pixel boundary. That’s not so bad. I have basic VIDC1 and VIDC20 emulation now done, so once I’ve sorted this MOD 32 issue out, I’ll be moving on to support palette changes mid-frame. Am I correct in thinking you coded the VIDC emulation for ArcEM? How did you track the timing of the palette changes? My initial thought is that I need to emulate the IOC timers, note the time at the start of VSync, note the time at each palette change and calculate the effective vertical raster, then use that table to change the effective palette when blitting the frame to a 16/24bit mode. I don’t suppose it’s possible to change the palette mid-frame on the Pi. |
Jeffrey Lee (213) 6048 posts |
Correct.
There are two techniques ArcEM uses, and (for RISC OS at least) the user can switch between them in order to get better performance where possible. There’s the full emulation approach (used by the ‘standard’ display driver) where we emulate VIDC timings down to the scanline level. For each scanline an event fires in the emulator which triggers the video emulation code to copy one row of video data to the screen, converting from palettised to 16bpp/32bpp. The conversion from palettised to true colour is important as it means the emulator doesn’t have to care whether the host hardware supports mid-frame palette swaps (and neither does it have to worry about getting the emulator timing precisely right so that the emulated machine is in sync with the real video hardware). Once the end of the display region is reached the video emulation sleeps for a little while until it’s time to fire off the fake VSync interrupt, then sleeps again until it’s time to start writing out display pixels again. I can’t remember off the top of my head how the border colour is handled (ArcEM renders the border manually, since it’s another feature we can’t rely on the host to support), but I think that for each display scanline the border is updated in real time as each video scanline is output, and then it does the top and bottom border regions as two big rectangles when it does the pre/post VSync sleep. However the full emulation approach is a bit slow, especially considering most games don’t need it, so there’s a secondary video driver (the ‘palettised’ one) which only updates the screen once per frame, on the emulated VSync interrupt. This doesn’t perform the true colour conversion, but does convert from 1/2/4/8bpp up to 8bpp if needed. There’s also a whole mass of (optional) logic to track memory writes and palette changes, to try and reduce CPU usage in the video emulation wherever possible. For simple games or software which doesn’t make use of multiple screenbanks this can help a bit, but for games where the screen needs to be entirely redrawn on a regular basis (whether due to palette changes, hardware scrolling, or lots of pixel updates) this state tracking can be a significant overhead. So there’s a mode which can be enabled which disables the memory tracking (the main cause of the overheads) and instead relies on just changes to the DAG and palette registers to decide when to redraw the screen. For the ‘standard’ driver once a DAG/palette change has been detected it causes the next frame to run the full per-scanline logic (to ensure the frame is redrawn in time with the palette updates the game is performing), while for the ‘palettised’ driver it just checks the DAGs at the end of each frame and then redraws everything if they’ve changed. Palette changes don’t trigger a redraw as they just get passed through to the host palette. There’s also a failsafe timer which makes sure that if the DAGs/palette suddenly stops updating (e.g. due to a loading screen or menu not making use of screen banks) the emulator will force a full redraw of the screen and then temporarily switch the memory write tracking back on until the game starts using screen banks again. Of course you have the advantage of not running full CPU emulation, so you should be able to get by with not having as many performance tweaks as ArcEM.
It’s not possible. From my own experience when implementing the driver it looks like the GPU only responds to one palette change request per frame. To cope with this I’ve made it so that the driver batches up any requested palette changes internally and only submits them to the GPU on the next centisecond timer event. |
Jon Abbott (1421) 2651 posts |
I think the approach I’m going to take is to store the palette changes and process them as the frame is converted to the 16/32bit GPU framebuffer, that will reduce the cache misses as it can pre-cache the palette, palette conversion buffer, frame and conversion code. Which leads me onto HAL_Timer – are these general purpose timers? Can you simply claim the first timer not in use? I want to set one up as a high precision timer at the start of VSync, which I can then read to determine which scanline the palette change should occur on. I want to use another for emulating the IOC timers. Final question, is GraphicsV 1 (VSync occurred) triggered at the top or bottom of the frame on the Pi? |
Jeffrey Lee (213) 6048 posts |
Yes.
Yes – although there isn’t actually an API to allow them to be claimed yet. The only hard rule is that timer 0 is used by the OS, for the other timers it’s a bit of a free for all.
No idea! |
Dave Higton (1515) 3526 posts |
Now that we’ve got rid of CRT monitors, does it make any significant difference? |
Jon Abbott (1421) 2651 posts |
It does if you need to calculate the scanline as a function of time beyond VSync. I recall some discussion previously about OMAP or the Pi GPU triggering VSync at a different time to classic Arcs, but can’t find it. On classic Arc’s I believe the VSync is triggered at the start of flyback. |