Unmapping IO memory
Jon Abbott (1421) 2652 posts |
Is there a way to unmapped IO memory? OS_Memory 13 maps it in, but I can’t see any obvious way of unmapping it. I’ve noticed that if I map the GPU screen memory enough times via OS_Memory 13 eventually the screen ends up corrupt or blank. I’m not sure what the actual issue is yet, possibly Logical address space being exhausted as the previous mapping isn’t being removed? |
Jeffrey Lee (213) 6048 posts |
No. (Unless you’re using OS_Memory 14/15, but you wouldn’t want to use those for your use-case)
The kernel will reuse an existing mapping if it can find one, including if you’re trying to map in a smaller section of a larger region that’s already been mapped. So if you were to make one request to map in all of the GPU’s memory, then any future requests wouldn’t need to map in anything extra. Unfortunately this falls down a bit as soon as doubly-mapped memory comes into the picture – instead of looking for a mapped range with (base R1, size R2) it needs to for two mappings of that range which are right next to each other. So it effectively needs an exact match on the base & size on an existing mapping in order to reuse that mapping for the new call. And since the GPU is in charge of allocating the framebuffer on the Pi, it’ll likely move around quite a bit from one mode to the next. I should probably look into seeing how hard it would be to disable the double-mapping of screen memory for drivers like the Pi (external framestore + no hardware scrolling). Software shouldn’t be relying on the double-mapped nature of the memory, so it’ll hopefully just be a few bits of kernel code to unpick regarding address calculations. Then BCMVideo can just map the entirety of GPU memory on startup, and there won’t be any need to create any additional mappings for the framebuffer or any overlays. |
Jon Abbott (1421) 2652 posts |
I’m mapping the video memory at every emulated mode change, as cacheable/bufferable but not double buffered. The size changes on each mode change as I’m only mapping in enough for three frame buffers. I believe I coded it that way, as I’d previous found it impossible to get the logical address of the GPU framebuffer (OS_Memory 0 doesn’t work for IO memory?) Your advice was to remap the memory to get the logical address. As its switching between my GraphicsV driver and the OS one, I can’t think of any other way to get the logical framebuffer address other that remap it every mode change. I suppose I could change it to use the GPU framebuffer size if its big enough to avoid size changes.
What’s double mapping the memory? Surely not BCMVideo as it would be pointless – the GPU is working on physical memory and doesn’t touch the MMU as far as I know. |
Jeffrey Lee (213) 6048 posts |
If it’s cacheable, are you taking care to flush the CPU cache before each mode change? You won’t want write-back data being written to the framebuffer after the GPU has released it and started using the memory for something else.
Correct Since you’re not double-mapping the memory, you could have ADFFS read the base & size of the GPU memory on startup (via the mailbox interface) and then map in the entire area. That’ll avoid the OS creating extra mappings for your future requests.
It’s the kernel that does it. |
Jon Abbott (1421) 2652 posts |
Its cleaned at the end of every blit and the blitter is shut down during a mode change so yes, the CPU/GPU are in sync.
This would work if the framebuffer base/size are fixed from boot and never change. Do any of the GPU OS builds change the GPU framebuffer base/size?
I understand the legacy reasons for video memory being double mapped, which was initially a quirk of the way physical RAM started at &2000000 on MEMC. This was later mirrored on the RiscPC via the MMU for games I guess, but do we still need it? Unless there’s some underlying reason to double map video memory, such as the OS using it, I wouldn’t have thought any 32bit apps/games would be relying on it as triple buffering is pretty much required for tear free screen updates. |
Jeffrey Lee (213) 6048 posts |
read the base & size of the GPU memory on startup (via the mailbox interface) and then map in the entire area. That’ll avoid the OS creating extra mappings for your future requests. Yes, the framebuffer base and size will change – but it’ll always be located within the memory that the firmware reserves for the GPU on startup. So if you make a request to map all of the GPU memory on startup, and you’re not using double-mapping, any subsequent requests you make will just re-use the existing mapping.
That’s the unanswered question! |
Jon Abbott (1421) 2652 posts |
The problem I’ve got is GraphicsV 8 could indicate the framebuffer base/size returned by GraphicsV 9 can change, which means I have to map the memory based on the current values returned by GraphicsV 9 when my driver takes over the mode.
I’m not sure I can do that, due the base/size potentially changing. Can I assume that GraphicsV 9 base/size don’t change for ROM based GraphicsV drivers? If so, there will be three mappings of the GPU RAM. The double mapping from the existing driver and my mapping. That’s not going to cause any issues is it? And will the kernel remove my mapping if the existing GraphicsV driver were to remap it’s allocation?
Perhaps we should take the Page Zero approach and see what breaks via nightly build? I’m fairly certain it’s only going to affect games that rely on memory wrapping, or buggy software that’s writing directly to screen memory and overruning. EDIT:
I’ve just tried this approach, but it doesn’t work. I’m guessing the physical base address is changing when I change mode on the GPU via GraphicsV 2 |
Jeffrey Lee (213) 6048 posts |
To confirm: 1. You’re using the get VC memory mailbox message to read the GPU memory Although now I think about it, the GPU cache mode of the range returned in step 1 might not match the cache mode that it decides to use for the framebuffer in step 3 (the top two bits of the physical address control whether the GPU’s L2 cache is used). So that might be why it’s not working for you – and unless you start guessing about what cache mode the GPU is going to be using for the framebuffer (or you waste address space by mapping in all possible variants), this approach isn’t really workable.
No. (example: BCMVideo)
Since your driver is creating a cacheable mapping (of BCMVideo’s framebuffer), there are potential issues if a GraphicsV 13 call makes it through to BCMVideo, since the OS/drivers don’t do any cache maintenance when accelerated ops are performed. But other than screen corruption nothing should catch fire.
No – the kernel maps it in using OS_Memory 13 (which, as we’ve discussed, has no corresponding “unmap” call) That’s the unanswered question! Yeah – if it works for me with a brief bit of testing then I’ll just check it in and see if anyone else can break it. |
Jon Abbott (1421) 2652 posts |
I pass GraphicsV calls onto the existing driver unless it’s a mode my driver is going to take over. It then calls GraphicsV 9 to get the original driver’s physical framebuffer base/size and then OS_Memory 13 to remap it with caching/buffering and no double map and to get a logical address for the blitter to write too. At no point do I touch hardware directly, that’s handled by the existing driver.
Nothing will get through to BCMVideo, as my driver will have stopped passing GraphicsV calls to it. When you go back to a mode my driver isn’t implementing, I start passing GraphicsV to the existing driver, which will carry on using it’s existing framebuffer allocation. Everything works, except for the logical memory leak that occurs every driver switch, because I can’t release my IO mapping. |
Jeffrey Lee (213) 6048 posts |
Disregard that – while looking at the code yesterday I was reminded that the ARM physical addresses aren’t affected by GPU the cache mode. RAM accesses by the ARM are always performed using whatever the default GPU cache mode is. DMA controllers (and perhaps a couple of other bits?) can use the top two bits of the address to control which cache mode is used, so programming the cache mode correctly is relevant for those, but the logic for that is all within the BCM-specific drivers, so that higher-level drivers/software only has to deal with regular ARM physical addresses. That’s the unanswered question! Today’s ROM contains a change to disable double-mapping of VRAM for drivers which use GraphicsV 9 and don’t support hardware scrolling. Also, BCMVideo will now map in all of the GPU’s memory on startup, so when the OS maps in the framebuffer or when BCMVideo maps in overlays there shouldn’t be any new IO mappings created. If you use similar code to lines 524-554 to map in the memory as cacheable then that should avoid ADFFS creating extra mappings on mode change. (Just don’t copy the register mixup bug that’s in there – RSBHI r1, r2 should be RSBHI r2, r1) |
Jon Abbott (1421) 2652 posts |
Thanks, I’ll give it a try once the build is available. This does however only resolve the leak when ADFFS is used with BCMVideo, I do not have other platforms to test but I suspect it will also leak logical memory space under other GPU drivers such as OMAP/Iyonix/Titanium etc. |
Jon Abbott (1421) 2652 posts |
I’ve finally got around to revisiting this issue, but I’m no further forward. I can’t use Jeffrey’s advice above as it’s specific to the BCMVideo driver – I need one of the following:
Without one of these, there is currently no way of getting the GraphicsV framebuffer logical address or preventing memory exhaustion. |
Jon Abbott (1421) 2652 posts |
I think I’m going to code up a Repro for this and see just how quickly memory space is exhausted, depending on how the framebuffer mapping is requested. The way I currently have it coded it seems to happen quite quickly, possibly after 20-30 mode changes, but as I’ve never actually counted that’s just a guess. Is there a method to read the current IO memory mappings? I’d be interested to see if normal day to day use can cause IO memory to leak as devices are switched and the memory mapping not released. |