VIDC1 / RO3.1 frame store emulation
Jon Abbott (1421) 2651 posts |
One solution I suppose is to trigger my own VSync’s, at 50Hz. How can I block the GPU from passing it’s VSync through to the OS though? |
Jeffrey Lee (213) 6048 posts |
You’ll receive the GraphicsV 1 call before the OS does, so all you need to do is to claim it (e.g. set R4 to zero) and that should stop the OS from seeing it. Just remember to add a flag to your code so that you can pass through the call that your code generates for the fake VSync! You’ll probably have to use a fake VSync for the Pi anyway, as we don’t have any control over the mode which the hardware uses – all that happens when we change screen mode is the GPU resizes the overlay that we use as our framebuffer. So we don’t have any control over the mode timings. |
Jon Abbott (1421) 2651 posts |
I’ll go the fake VSync route then as it allows VSync to be triggered at the same timing a VIDC1 / 20 would, based on the requested pixelrate etc. |
Rick Murray (539) 13840 posts |
:rolls eyes: A resource I have (for knocking VGA out of a microcontroller) times it for 640×480 as the VSync starting a minimum of 0.45ms after the last line. The pulse stays low for 64μs, and then the first line of the next frame will begin a minimum of 1.02ms after VSYNC ends. Obviously this changes depending on the resolution/refresh rate. LCD monitors using analogue VGA use exactly the same mechanism as CRT monitors. Here’s a shocker. So do digital monitors – the HDMI spec describes HSYNC and VSYNC signalling (in the TMDS data, not as discrete signals), as it is the device’s way of knowing when new lines and frames begin. Even with 480p (525 lines, NTSC style) an HDMI signal has a VBI period. Looking at the above, 0.45ms prior and 1.02ms after, it would seem logical that the VYSNC would be triggered close to the end of the current frame. It serves a purpose lost in history (there’s no need to wait for an electron beam to fly back, and indeed most modern CRT monitors could do it a lot quicker than the sync allowed for), however it is also useful for the programmer as a time to draw into screen buffers or switch buffering or… Without this, computer games would look really poor. When, and how, the sync signals occur is an important thing, and is precisely defined in the relevant standards. tl;dr summary: yes, it’s important. |
Rick Murray (539) 13840 posts |
? The code says:
And the Pi forum says: Note the GPU uses timers 0 and 2. 3 is reserved for linux, so would be most suitable for a bare metal OS. 1 is currently unallocated, so could be used. Which might restrict your options slightly… |
Theo Markettos (89) 919 posts |
Rick, there are two sets of timings: GTF, which is designed for CRTs, and CVT which is intended for LCDs. The advantage of CVT is it means you have more time plotting pixels and less dead time. That is quite handy because it can reduce the required video/pixel bandwidth for a given frame rate. While this is bad news for the programmer, it means machines that are limited by bandwidth (like the Risc PC and perhaps BeagleBoard) can squeeze out larger modes. I’m not sure if it’s useful for VIDC1 machines as they’re typically limited by video RAM size anyway (unless you fancy black and white ‘high res’ modes). |
Jeffrey Lee (213) 6048 posts |
Can you simply claim the first timer not in use? Nothing I’ve said contradicts the sources. The HAL exposes GPU timers 1 and 3 to the OS. But since the HAL API requires timers to be numbered sequentially starting from zero, the HAL renumbers them such that HAL timer 0 corresponds to GPU timer 1, and HAL timer 1 corresponds to GPU timer 3. On all systems HAL timer 0 is used by the OS for the centisecond timer, and the other timers (how ever many there may be) are free for other software to use. |
Rick Murray (539) 13840 posts |
Thank you for the clarification. |
Rick Murray (539) 13840 posts |
Ah, the “reduced blanking” option, the Pi uses that in monitor-style modes to run an insanely high (120Hz?) refresh.
No thanks. Used to do DTP work in MODE 23 on a rather underpowered machine way back when. Mmm… Wasn’t that the one where the screen would blank out while accessing the floppy disc? ;-) |
Jeffrey Lee (213) 6048 posts |
FYI: I’ve just thrown a bit of a spanner in your works by checking in this set of changes. Specifically, you’ll have to watch out for the following things stopping your code from working:
Once that’s done you should probably consider making ADFFS register itself as a proper GraphicsV driver instead of just intercepting calls to the original one. This will be the best for long-term compatibility, although in the short term I suspect there’ll still be some changes I’ll be making which will break things (e.g. for multiple head support your code will probably need to be aware of which head it should be writing stuff to). There’s a brief overview at the bottom of the GraphicsV page for how the registration/deregistration process works. |
Jon Abbott (1421) 2651 posts |
I should really get the code over to you to look at, you could potentially include it in RO. I’m not actually doing much, the OS does all the work as it switches the frame buffer to DA2. All my code does is force the GPU to mirror the OS MODE, but in 8 bits, and copy the frame buffer from DA2 to the GPU frame buffer with conversion to 8bit when the GPU triggers a VSync. There’s a bit of code to copy palette changes, to speed up the conversion, but apart from that the OS is doing all the work. The one thing it does need, is a means to convert the physical address of the GPU frame buffer to its logical address and visa-versa. Extending existing SWI’s to support IO memory would be the most sensible route. I’ve not actually touched the code for a few week. I put it on hold whilst I code the ARM3 JIT, as I need some overscan games running on the Pi to check the blitter routines take account of the VIDC1/20 registers correctly and can add mid-frame palette change support. The JIT is now up to StrongARM compatibility, so I’m not far off full 32-bit compatibility. I have a few games running on the Pi that use 4-bit modes for you to look at. Terramex is working, Pac-mania is running on StrongARM although crashing on the Pi, I’m busy debugging the 32-bit code to track the issue down at the minute. |
Jeffrey Lee (213) 6048 posts |
Sure, feel free to send the code my way. I’m interested in seeing exactly why you’re having trouble getting the GPU framebuffer logical address! As I’ve said before, I believe it’s your responsibility to map in the memory using OS_Memory 13, so I’m interested in seeing why that’s not working, or why you might have implemented things different to how I would have done it (or would have tried doing it). Having said that, I don’t think we’re yet at the stage where including the code in the OS would make sense. At the moment handling of screen memory is too primitive – ideally we’d need OS-level support for multiple pools of screen memory (so that the OS can handle allocation of both the GPU framebuffer and the emulation framebuffer), and an improved data abort handler (to allow the mode emulation code to track page writes so it knows which pages need translating). Both of those features would also help with other things, so I’m hoping to implement them at some point. |
Jon Abbott (1421) 2651 posts |
Sent…along with the ARM3JIT and the original Terramex floppy image, to test both. The GPU’s HAL layer is mapping the memory, I don’t need to touch it. You’ll see from the code it’s doing very little – it’s less than 100 instructions if you ignore the blitter code. You’ve given me an idea though, instead of waiting for OS_Memory 0 to be extended to support IO memory, I could monitor OS_Memory 13 and create my own IO physical>logical map. Is there a reason OS_Memory 0 doesn’t cover IO memory? I can’t think why it wouldn’t, unless there was either a problem with it doing so, or was it simply and oversight and not extended when OS_Memory 13 was added? |
Jon Abbott (1421) 2651 posts |
Actually, I don’t think that will work, as the GPU framestore will have already been mapped during POST. I’ll put my thinking cap on, I’m sure there is a reliable way of converting a physical address to logical address, direct from L1/2PT I guess. |
Jon Abbott (1421) 2651 posts |
Jeffrey – in reply to your message (eMail is down, so can’t reply), firstly I’m sorry for the CMOS issue. I forgot to mention that you’ll need SparkFS or similar loaded before using “Boot floppy” as the Boot scripts are in a ZIP file. The floppy normally contains the script as well, but I’ve not added it to the floppy I sent you…sorry. Regards CMOS protection, ADFFS does that already and the floppy is flagged as requiring CMOS protection…I’ve yet to add the code to force it on when the floppy is mounted though. |
Jeffrey Lee (213) 6048 posts |
No problem! |
Jon Abbott (1421) 2651 posts |
“is GraphicsV 1 (VSync occurred) triggered at the top or bottom of the frame on the Pi?” Answering my own question, it starts at the top of the frame. By the time the Pi has copied 80kb for MODE 13, it’s advanced by ~6 rasters. I’ll probably have to implement dual frames on the GPU to get around that, as it will affect games that palette swap. |
Jon Abbott (1421) 2651 posts |
This is now all coded and a beta available on the JASPP site. It won’t work on RO5.21 alphas past 14-12-13 though due to the GraphicsV changes – that will be corrected in a later beta. Pac-mania is also available and runs under this and the ARM3 JIT on RO5. |
Jon Abbott (1421) 2651 posts |
Does the DAG start address have to be aligned on the Pi? I’m trying to frameswap, but the second buffer is always shifted – as if the DAG has to be aligned. I’ve tried 1K, 4K, 32K, 64K, they all seem to produce the same result. |
Jeffrey Lee (213) 6048 posts |
The only requirement for the Pi is that the address needs to be aligned to the start of a scanline, relative to the start of the GPU screen memory. This is because we can’t directly specify the DAG addresses to the GPU; instead we’re limited to displaying a 2D subrectangle of a 2D framebuffer. So what we actually do is request a buffer that’s N times taller than the mode RISC OS wants, and then we adjust the vertical offset of the displayed rectangle according to which screen bank needs to be displayed. |
Jon Abbott (1421) 2651 posts |
Next problem, does a call to GraphicsV 6 trigger a VSync? When I make the call I’m seeing the VSync rate double! |
Jeffrey Lee (213) 6048 posts |
Try setting BCMVideo$ScreenBanksEnabled to 2. Remember that:
|
Jon Abbott (1421) 2651 posts |
BCMVideo 2 fixes the bizarre VSync issue, however palette changes then stop working! |
Jon Abbott (1421) 2651 posts |
What triggers RISC OS to reset the palette? Is it a GraphicsV 1 call, Event 4 or something else? To abstract the GPU VSync from the VSync software gets, I use the (true VSync) call to GraphicsV 1 to blit the frame from DA2 to the GPU and then return claiming the call. This as far as I can tell prevents RO from seeing the VSync event. I then issue a GraphicsV 1 call at 50 Hz which my code exits as unclaimed, letting RO do Its thing. RO should then reset the palette and trigger Event 4 etc at 50hz – which it does and once RO has done its thing, games change the palette back to their own, music plays at the correct speed as do games. This worked okay until I added frame swapping on the GPU to avoid tearing. The immediate issue was that VSync went from 50Hz to 100Hz, causing everything to go too quickly – the visible frame rate however halved due to the VSync delay in the HAL drivers when changing the DAG. Using BCMVideo 2 resolved these issues, however RO is now changing the palette after the game has. One of two things may be happening: 1. The palette change code in RO isn’t triggered by GraphicsV 1 The first thing I did was download the latest alpha and see if anything has changed post GraphicsV updates, unfortunately the build was so unstable I had to go back and didn’t get to really test it, I will try again in a few days though. EDIT: Could the HAL buffer be filling up perhaps? How many palette changes, frame swaps etc can it buffer when BCMVideo is set to 2? EDIT2: It looks like the buffer is reaching it’s limit, if I cache all the palette changes and then issue a GraphicsV 11 before switching frame, I don’t see the problem. However, this introduces more problems: 1. The palette change doesn’t occur until the frame after next (that’s with the GraphicsV 11 call before or after GraphicsV 6) |
Jon Abbott (1421) 2651 posts |
There seems to be another alignment requirement in addition to being the start of a scanline on the Pi. When in 320×256×32, if I try to get the DAG to +&50500 (ie 320*257) the display is shifted. Is there a restriction on where the 2nd screen buffer can start in the vertical buffer? I’ve had a quick look at the Pi’s GraphicsV driver although the code contains no comments, so it’s not clear what it’s actually doing. EDIT: Ignore me, I was adding size to get the 2nd buffer instead of size*4. Could GraphicsV be extended so graphics driver restrictions can be discovered? The start alignment and width modulus differ across video drivers, but there’s no way of discovering what they are. Likewise, minimum screen width and height would be useful. And a GraphicsV call to return the logical DAG address would be useful, instead of having to re-map the memory via OS_Memory 13 to discover it. |