Thoughts on GraphicsV memory management

9 posts, 3 voices

Oct 28, 2015 12:02am Jeffrey Lee (213) 6048 posts	Here are my current thoughts on how video/GraphicsV memory management should be extended to cope with all the cool stuff that’s been on my wishlist for the past few years. First, there are a few major assumptions: For multi-monitor setups with spanned displays, we want the Wimp and all other software to see it as one framebuffer. E.g. for two 1080p monitors side by side the OS would tell you that you’re in a 3840×1080 screen mode, and it would act no different to as if you had a single ultra-wide display. All the complexity of configuring the GPUs to read from different parts of the framebuffer (and setting the correct mode timings for each monitor) will be handled by the VDU driver and GraphicsV. Likewise for display rotation, the Wimp and other apps won’t be aware that anything is going on. E.g. a 1080p monitor rotated 90 degrees will be seen by the OS as being in a 1080×1920 screen mode, and the responsibility of making sure the data gets rotated on its way to the monitor will be the responsibility of the GraphicsV driver. I’m hoping that it won’t be an insurmountable task to teach the OS about screen modes where there’s a gap at the end of each line (i.e. the LineLength mode variable isn’t just the screen width multiplied by the bytes-per-pixel). If this isn’t possible then some of the approaches I’m proposing will need a lot more CPU support. But, AFAIK, these kinds of mode have been used in the past (e.g. the interlace hack that was used on some STBs), so hopefully there won’t be too much which needs fixing up. Some advanced functionality (e.g. software BPP conversion, or multi-monitor support) will be implemented in the form of ‘wrapper’ drivers – e.g. when multi-monitor is in use, the OS will talk to a multi-monitor manager driver, which will then talk to two or more drivers under the hood. With that in mind, along with several other complex scenarios (mixing multi-monitor setups with display rotation, scaling, pixel format conversion, cacheable screen memory, etc.), here are my thoughts on how things will need to be restructured in order to work sensibly: PMPs PMPs are designed to give flexible control over memory mapping, so all relevant dynamic areas will be based on those So that we don’t lose any existing functionality, PMPs will be extended to add support for doubly-mapped memory. However I haven’t quite decided what the best approach would be for managing the size/offset of the second mapping – i.e. whether it should be managed automatically by the kernel, based on the current logical size of the DA (as with standard doubly-mapped DAs), or whether there should be an explicit ‘set offset of second mapping’ call. The former would require the kernel to be more restrictive with what you can do with a PMP (e.g. you’d only be able to add and remove pages from the end of the area, rather than unmap a big hole in the middle), while the latter would (probably) be easier to implement and would allow for some more exotic setups (e.g. a doubly-mapped bit at the start of the DA followed by a single-mapped bit). But on the other hand it may also make it non-obvious to software exactly which bits of the DA are and aren’t valid. PMPs will be extended to allow non-RAM pages to be used. This is required for ‘external’ VRAM (Iyonix, Pi), and for any situations where special hardware units are used (e.g. display rotation hardware) Physical memory management GraphicsV will be extended to allow drivers to provide a list of which address ranges their GPU can access, along with any other relevant constraints (e.g. max gap between the end of one scanline and the start of the next). This will help the other components in the system decide where and how to place framebuffers. In essence it’ll be an extended version of GraphicsV 9. For drivers which have ‘external’ VRAM, there’ll also be a GraphicsV interface to allow other systems to allocate memory from that VRAM. E.g. the OS will call into GraphicsV in order to allocate a framebuffer, and will then make another call tell the driver to use that framebuffer for the display. However this may need revising a bit to cope with the Raspberry Pi, where memory allocation is handled by the GPU during the mode change. Maybe the driver just needs a way of responding to the GraphicsV 9-alike with a response that is “I can only access my own memory, and I handle memory allocation myself” – kind of like bit 5 of GraphicsV 8 Logical memory management GraphicsV drivers will not be responsible for mapping memory into logical address space. For now let’s assume this responsibility lies with the kernel (like it currently does) When a driver first registers with the kernel and is issued its first mode change command, the sequence of events will roughly be as follows: The kernel will create a new, doubly-mapped, PMP-based DA to provide a logical mapping of the framebuffer The kernel will query the driver to determine which pages the GPU can access The kernel will use a suitable API to allocate the physical memory needed, from an address range which the GPU can access (e.g. for external VRAM it would allocate directly from the driver via a GraphicsV call, or for GPUs with unrestricted memory access it would use something like OS_Memory 12 to find a free, physically contiguous range of RAM pages). Either way, the end result will be a list of pages to use for the framebuffer. The kernel will pass that page list into the PMP physop call in order to add them to the physical pool The kernel will then issue a PMP logop call in order to map the pages in (plus do whatever extra actions are necessary for setting up the second mapping). If you have two separate displays which you want to turn into one spanned display, the process would be roughly as follows: Both existing DAs would be destroyed and the memory allocated to the framebuffers would be freed A new DA will be created The kernel will query both drivers to determine which address ranges they can access. If there’s a sufficiently large overlap, memory management will effectively be the same as above (e.g. allocate one large framebuffer and then configure the drivers to use display different portions of it). However if there’s no overlap (e.g. two different video cards in an Iyonix) the kernel will allocate memory for each head individually (e.g. two 1080p framebuffers), and turn them into a single buffer in logical space by using the MMU to interleave the pages (via specifying appropriate page lists to the PMP operations). This is assuming you’re after a side-by-side setup, vertical setups should be easier. Things will get a bit complicated if the combined LineLength isn’t a multiple of 4K – this is where the end-of-line gap mentioned at the start of this post will come in handy. E.g. for side-by-side 1080p in 16m colours, the LineLength will be 15360 (15K). By adding 512 bytes of padding to each buffer (512 bytes to the left of the left display, 512 bytes to the right of the right display) the join will lie on a page boundary, so the MMU can be used to interleave different rows from each buffer and create one large logical buffer (which starts at 512 bytes into a page and has a 1K gap between each row) For three displays things will get even more complicated. You’ll want to have a buffer setup as below: Physical buffers: (note - alignment is for illustration purposes only) \|....\|....\|....\|....\|....\|....\|....\|....\| (page boundary guide) +-----------+--+ \| \|==\| LineLength = 3 pages \| display 1 \|==\| \| \|==\| +-----------+--+ +-+-----------++ \|=\| \|\| LineLength = 3 pages \|=\| display 2 \|\| \|=\| \|\| +-+-----------++ +---+-----------+---+ \|===\| \|===\| LineLength = 4 pages \|===\| display 3 \|===\| \|===\| \|===\| +---+-----------+---+ Logical mapping: \|....\|....\|XXXX\|....\|XXXX\|....\|....\|....\| (page boundary guide) +---------+-+-----------++----------+---+ \| \|=\| \|\| \|===\| LineLength = 8 pages \| display \|=\| display 2 \|\|display 3 \|===\| \| \|=\| \|\| \|===\| +---------+-+-----------++----------+---+ The pages in the columns marked with XXXX will need special handling, for they are the locations where the physical framebuffers are overlapping. To cope with this it’s expected that a custom abort handler will be used to detect writes to the overlapping pages, so that any pixel data written to the out-of-bounds area can then be copied to the correct display (e.g. wait until VSync and then either have the CPU copy the data, or use memory-to-memory DMA). If the CPU is to copy the data then it will require the relevant pages from display 1 and 3 to be mapped in somewhere else (perhaps in a completely different DA, especially if PMPs take the ‘restricted’ approach to supporting doubly-mapped areas) Observant people may spot that the buffer setup for this example isn’t optimal, and we could get by with one XXXX column instead of two – relax, it’s just an example. The final code will hopefully be smart enough to organise things in an optimal manner. The abort handler mechanism will need to be able to cope with other situations as well – primarily, drivers which are reliant on being told when screen memory has been updated. This covers ‘wrapper’ drivers which perform BPP conversion, software display rotation, etc. in order to extend the functionality of a real driver. It also covers some real drivers, e.g. the DisplayLink driver needs to encode any dirty pages and send them to the device over USB. Or any driver which uses cacheable screen memory will want to know which pages do and don’t need flushing from the cache. I’m not yet sure exactly how all this will work. Perhaps we need a generic ‘memory watcher’ API, which allows multiple clients to register their interest in specific lists of pages. On the other hand it may make more sense to make it a video-specific thing, so that some hardware operations (e.g. rectangle copy/fill) can be coped with in a neater manner than all the watchers simply being told “some DMA has updated this page”. What if a driver needs to access a logical mapping of its own memory? E.g. DisplayLink, or any of the ‘wrapper’ drivers. For this to work, there’ll have to be an API (most likely a SWI?) to allow a driver to request a logical mapping of a given rectangle of a framebuffer. The result of this SWI will be a list of rectangles – e.g. in the three-monitor setup above, if a mapping of all of display 1 was requested, there’d be one rectangle for the lefthand portion (which would be part of main screen memory), and one rectangle for the righthand portion (which might be off in some other DA). There’ll also have to be a “release mapping” call, which will allow any temporary mapping which was created to be released (although, the kernel will probably just take a lazy approach and leave the pages mapped in just in case they’re needed again later) What about the HAL video API? Most of these changes won’t reach the HAL video API; the required interactions between the driver and the OS are going to become too complex. So it’s probably best to consider the HAL video API to be deprecated. What about ADFFS/Aemulor? When it comes to screen memory, they’re mostly interested in BPP conversion, so can follow the same method as a BPP conversion wrapper driver (e.g. give the OS different page lists depending on whether the current pixel format is one the hardware supports or not). But they also want to emulate the Arc-era memory map, so will want to control the logical address of where the screen memory is mapped. To cope with this it’s probably best to just have a GraphicsV call to allow drivers to specify the base address of the DA that the kernel is about to create. What about the screen dynamic area? When you think about it, there are only really two choices – keep it (and make sure it works sensibly), or get rid of it. But I’m not sure which would be best. If we get rid of it, some legacy apps may break, and it will complicate ADFFS/Aemulor even more. They’d either have to intercept a bunch of SWIs to emulate the DA, or we perhaps could extend the ‘specify base address’ GraphicsV call to also allow the DA number to be specified. If we keep it, then we’ll need to make sure that it’s always associated with the active driver. One way of doing this would be to remap the relevant memory (e.g. change ownership of the memory from one DA to another) whenever the active driver changes. However this would also present problems for ADFFS/Aemulor – if the DA never dies, they’ll have no way of specifying the base address. So maybe we should use OS_DynamicArea 4 – whenever the active driver changes, renumber the DAs so that the active driver is always DA 2? So when the DA is created ADFFS/Aemulor will be given the opportunity to specify its base address, and then when the driver becomes active the DA will be renamed to become DA 2. That’s all I can think of for now (or at least, all I can afford to write tonight). If anyone has any comments – good or bad – feel free to share them before I start implementing all of this! (which won’t be right away due to other tasks continually popping up, but it is the next big thing I want to do) [edit – For some reason textile has decided it wants to use a different font for HAL, API, and ADFFS within those h2. sections. Just ignore its silly formatting.]

Oct 28, 2015 2:31am Jon Abbott (1421) 2651 posts	What about the screen dynamic area? From an ADFFS perspective, it doesn’t matter where DA2 is or who owns it – so long as RISCOS is using it for it’s VDU output, it’s at a legal address that’s always available regardless of appspace mappings and its separate to the GPU memory. When a MODE is entered from an app that’s being Hypervised, ADFFS will map the RO3.1 double map (eg 1F88000 / 2000000) for DA2 for compatibility. When the MODE changes, ADFFS will unmap the RO3.1 double map and only remap it if the MODE is legacy and the task is one being Hypervised. As you note, if DA2 were to go, ADFFS would have to intercept the relevant SWI’s and emulate the DA. If it’s kept, under ADFFS it’s always associated with the primary driver as RISCOS neatly falls back to DA2 if there’s no alternative GraphicsV driver. ADFFS achieves this by Hypervising GraphicsV and passing calls to either RISCOS or the GPU GraphicsV driver as appropriate, and in some cases doubles them up so both are aware of the call. Switching DA2 to the active (GPU) driver would break ADFFS, as it’s relying on DA2 and the GPU being separate entities. If however ADFFS was to become an active GraphicsV driver and not Hypervise GraphicsV, it would carry on working. Task switching would cause big complications here though, as ADFFS would have to register/unregister with GraphicsV so VDU output goes to the correct driver as tasks are switched. As it’s currently Hypervising GraphicsV, this is easily achieved by checking the DomainID and passing the call on if the task isn’t one it’s Hypervising.

Oct 28, 2015 10:22am Jon Abbott (1421) 2651 posts	I should probably add that if DA2 is to change, don’t let ADFFS hold it up and I certainly don’t expect RISCOS to include any botches to keep ADFFS working, I’ll recode to fit around RISCOS; we’d just need to coordinate the changes. Short of a few tweaks to the blitter, general bug fixing and ARMv7 support, ADFFS is almost complete in terms of RO5 providing a single IOC / IOMD VMM. It may need additional SWI’s either Hypervised or Paravitualized to get the odd RO2/3 game working, but as a whole it’s near complete. So that we don’t lose any existing functionality, PMPs will be extended to add support for doubly-mapped memory. However I haven’t quite decided what the best approach would be for managing the size/offset of the second mapping – i.e. whether it should be managed automatically by the kernel, based on the current logical size of the DA (as with standard doubly-mapped DAs), or whether there should be an explicit ‘set offset of second mapping’ call. I’d forget double-mapping as a separate entity, just provide a means to map the same physical address space multiple times and leave the driver to manage it. If a driver then needs a double map for hardware scrolling for example, it could just claim double the screen memory it requires and map the same physical space twice. This shifts the management of them out of the OS and gives total flexibility on how they’re mapped. The only thing RISCOS might need to track is the fact there’s multiple MVA’s that might need cleaning in the event of a L2 cache flush – although from your notes and what I discovered yesterday, provided the multiple maps are all in-sync flag wise, this may not be necessary on current hardware. Logical memory management Agree with all your points here, the only thing I’d add is that we need an easier means to get logical <> physical address translation. Having to remap the physical address to get the logical as is currently the case isn’t a long term solution. Things will get a bit complicated if the combined LineLength isn’t a multiple of 4K – this is where the end-of-line gap mentioned at the start of this post will come in handy. This will heavily rely on the hardware supporting borders to blank out the padding, and RISCOS / GraphicsV drivers / software to account for the padding. I’m certainly using it in ADFFS and I believe the Iyonix uses something similar where screen widths aren’t divisible 32 pixels, so agree it’s the correct way to go. Do the iMX6 / Titanium / Pandaboard etc all support this though? It probably needs technical input from Chris Evans, Andrew Rawnsley and a few others to confirm. The abort handler mechanism will need to be able to cope with other situations as well – primarily, drivers which are reliant on being told when screen memory has been updated. This covers ‘wrapper’ drivers which perform BPP conversion, software display rotation, etc. in order to extend the functionality of a real driver. It also covers some real drivers, e.g. the DisplayLink driver needs to encode any dirty pages and send them to the device over USB. I’ve been looking at this since we last mentioned it as a means to only blit the display when it’s actually updated. As pony as the RO4 implementation was (moving the screen DA to a different Domain and trapping Domain access violations) it does seem like the only viable option and not really a massive overhead, as once you know the screen has been written to, you alter the TLB for the whole DA until the next VSync. Perhaps we need a generic ‘memory watcher’ API, which allows multiple clients to register their interest in specific lists of pages. On the other hand it may make more sense to make it a video-specific thing, so that some hardware operations (e.g. rectangle copy/fill) can be coped with in a neater manner than all the watchers simply being told “some DMA has updated this page”. Almost certainly needs an API, so the pages are only raising Aborts if a driver specifically requires it. The client should register it’s interest in being notified about writes to it’s logical space and leave the API to handle the detection of writes and TLB changes required. Two options should be available here: Notify of 1st write. ie the driver only needs to know the screen has change, to cover full screen blitters that aren’t interested in per-page delta changes Notify of pages changed. ie SPI / USB display etc where you’d want to send delta changes only. Where it’s known that a write is about to take place that’s DMA based, the DMA initiator should notify the API of the rectangle it’s about to modify and leave the API to work out the page list that’s subsequently sent to a delta change based driver. This puts a slight overhead on the DMA initiator and may cover a few pages that haven’t changed but is possible a good compromise for speed. What about ADFFS/Aemulor? When it comes to screen memory, they’re mostly interested in BPP conversion, so can follow the same method as a BPP conversion wrapper driver … But they also want to emulate the Arc-era memory map, so will want to control the logical address of where the screen memory is mapped. To cope with this it’s probably best to just have a GraphicsV call to allow drivers to specify the base address of the DA that the kernel is about to create. To cope with multi-tasking / Wimp based tasks, the legacy DA2 needs to map in/out as the task switcher switches apps being covered by ADFFS/Aemulor. Something similar needs to happen to provide 26bit Module support without restricting the whole machine to a 32mb appspace limit, but on a per-Module basis. As things stand currently, only the DA2 side can be implemented (via MODE based service calls or Paravirtualizing Wimp_Poll / Wimp_PollIdle. I have some ideas on how to coerce RISCOS into working around the 32mb appspace limit, but ideally some fundamental changes to the way taskswitcher tracks per-app / per-module memory maps would be the preferred route. If a 26bit app is running and it’s switched in, the legacy DA2 map can be switched in as part of the appspace TLB changes in one hit and unmapped when it switches out. Likewise if a 26bit Module is entered, the relevant pages below the 32mb limit are mapped in/out by taskswitcher. My current idea here, which I’ve partially implemented, is that ADFFS creates a stub Module in the actual RMA which then maps in the relevant pages below 32mb temporarily whilst the Module is doing it’s thing. The stub Module acts as an entry/exit Hypervisor that initiates the required memory map changes. I guess what I’m getting at here is that Aemulor/ADFFS could handle 26bit address space as things stand without OS support, it all depends on how far we want to push legacy support in the OS. I personally don’t think the OS should be concerned in this regard, as the requirement is specific to ADFFS / Aemulor but if we’re considering changes to GraphicsV to handle DA2 then its worth at least considering. In the longer term, extending taskswitcher to allow VMM’s to operate efficiently could be a way around the multi-core / pre-emptive multitasking issue, allowing RISCOS to remain single tasking and leave the VMM to deal with the multi-core / pre-emptive implementation by running multiple copies of sandboxed RISCOS’ within a Type 2 Hypervisor. Way beyond the topic of this discussion though and already raised elsewhere as a separate thread. What about the screen dynamic area? Making DA2 the active screen DA does make sense, with multi-head displays sharing the DA2 memory map as you’ve described with padding either side for alignment etc. Ironically, in the past week I’ve been considering implementing a near identical implementation for ADFFS so it can handle screen geometry changes between successive frames. This would be implemented as a triple head display in the GPU buffer with only one being visible at any one time, but in theory its a triple-head display and you could have all three shown at once. Changing DA2 to only be the active display in a multi-head setup however could add major complications, with DA2 switching continually depending on which GraphicsV driver is being entered at the time. Huge potential for TLB overheads here. Having GraphicsV drivers sharing one DA2 in an interleaved fashion is probably the better route. In this scenario, ADFFS would simply touch a rectangle within DA2 when it blits. There is however the complication of dual/triple buffering, which we need to cover for gaming. ADFFS is currently dual buffering both the GPU and legacy DA2 and I’m testing triple buffering on the GPU for the next release. In a multi-head display setup, you could implement it as it currently is, with buffers one after the other. eg Head 1 buffer 0:Head 2 buffer 0:Head 1 buffer 1:Head 2 buffer 2. This would however break hardware scrolling, alternatively implementing as: Head 1 buffer 0:Head 1 buffer 1:Head 2 buffer 0:Head 2 buffer 1 would get around this, but possibly add complications elsewhere as there’s the potential for one head to switch from single to dual/triple buffering and cause the logical address of Head 2/3 to change, so where would need to be a means to notify drivers of a logical address change to their view of DA2.

Oct 28, 2015 2:37pm Jeffrey Lee (213) 6048 posts	I’d forget double-mapping as a separate entity, just provide a means to map the same physical address space multiple times and leave the driver to manage it. Supporting arbitrary multiple mapping of pages is harder than supporting double-mapping. And as you’ve discovered, bad things can happen if multiple mappings of pages aren’t handled correctly – which is why I’d want the kernel to be fully aware of any multiple mappings, instead of leaving it to the software which created the mapping to deal with all of the headaches. E.g. if we’re using aborts to track changes to pages, and that page is multiply-mapped, we’d typically want all mappings to be subject to abort trapping – something the kernel could easily manage but external software might not be aware of. The main data structure which the kernel uses to keep track of memory is the CAM. It’s a simple table which is indexed by physical page number (which means it’s limited to coping with RAM pages), and for each entry it stores the current logical address of the page and the page flags. if you look deep enough into the kernel you’ll see the CAM being referred to as the “soft CAM”, which makes me think it was originally a direct softcopy of the MEMC-format page tables (and as I’m sure you’re aware, MEMC required things to be specified in terms of physical → logical mappings, rather than the logical → physical page tables which are used now). Supporting doubly-mapped areas are “easy” because if the kernel can work out which DA a doubly-mapped page belongs to, it can easily work out the offset of the second mapping from the primary mapping (since it’s equal to the current size of the DA). But historically the CAM hasn’t stored the DA association, so that’s where things like OS_SetMemMapEntries get unstuck if you try modifying a doubly-mapped page. With the PMP changes, the CAM has been extended to allow it to track which PMP a RAM page belongs to. This was required to allow reclaiming of PMP pages to work correctly, but the code could also easily be extended to allow the DA association of regular pages to be stored (although that would increase the risk of breaking legacy software which likes to be able to remap pages at will). Other solutions to allow doubly-mapped areas to be tracked properly would be to (if the doubly-mapped area is a PMP) use the PMP association to work out the second mapping, and to add a check to OS_SetMemMapEntries (and any other relevant APIs) to make sure that non-PMP doubly-mapped areas aren’t interacted with in dangerous ways (e.g. OS_SetMemMapEntries does now contain a check to make sure that pages belonging to PMPs aren’t interacted with in dangerous ways – if the OS loses track of the PMP association of pages then bad things are likely to happen). Safely supporting arbitrary multiple mapping of pages would require the kernel to keep a list of all the logical addresses the page is currently mapped to. It’s not an impossible task, but it would add a fair bit of extra complexity, so it’s something I’d like to avoid (s.ChangeDyn is already seven and a half thousand lines of near-incomprehensible code!). So I’d prefer to avoid supporting arbitrary multiple mappings. But if we find out that that is the only sensible way to deal with certain situations then maybe I won’t have a choice! Logical memory management Agree with all your points here, the only thing I’d add is that we needs an easier means to get logical <> physical address translation. Having to remap the physical address to get the logical as is currently the case isn’t a long term solution. Yes. OS_Memory 0 is the standard way of doing logical to physical translation, but it currently has the limitation that it only works with 4K page sizes (so is useless for memory mapped by OS_Memory 13). OS_Memory 0 will definitely need expanding once PMPs can map IO memory, probably using the same scheme as ROL, PhysicalPageNumber = (PhysicalAddress>>12) + (1<<30). So it’s possible I’ll extend it to support non-4K page sizes at the same time. Physical to logical translation is a trickier matter, since the only generic way the OS would be able to do it would be by scanning the page tables, which would be a bit slow (OS_Memory 0 can do physical to logical translation, but only for RAM pages, where it can easily work out the physical (i.e. RAM) page number and then have a quick peek at the CAM). Things will get a bit complicated if the combined LineLength isn’t a multiple of 4K – this is where the end-of-line gap mentioned at the start of this post will come in handy. This will heavily rely on the hardware supporting borders to blank out the padding, and RISCOS / GraphicsV drivers / software to account for the padding. I’m certainly using it in ADFFS and I believe the Iyonix uses something similar where screen widths aren’t divisible 32 pixels, so agree it’s the correct way to go. Do the iMX6 / Titanium / Pandaboard etc all support this though? It probably needs technical input from Chris Evans, Andrew Rawnsley and a few others to confirm. This is fine for all TI chips (Titanium, Pandaboard, etc.), the Pi and Iyonix. iMX6 I’m not 100% sure about, but I’d be surprised if it didn’t support it The only machine I know of which will definitely doesn’t support gaps between the rows is IOMD, so it may be that some features are either a lot slower (more software emulation) or simply aren’t supported at all (e.g. anything requiring abort trapping may be tricky for chips which use the base-updated abort model or suffer from the abort restart bug). But on the other hand, we don’t have any USB drivers for iOMD, or any drivers for the video podules, so you’re highly unlikely to be in a situation where multiple spanned displays are needed. I have some ideas on how to coerce RISCOS into working around the 32mb appspace limit, but ideally some fundamental changes to the way taskswitcher tracks per-app / per-module memory maps would be the preferred route. If a 26bit app is running and it’s switched in, the legacy DA2 map can be switched in as part of the appspace TLB changes in one hit and unmapped when it switches out. Likewise if a 26bit Module is entered, the relevant pages below the 32mb limit are mapped in/out by taskswitcher. One of the things on my todo list is to work out how to make application memory more flexible, essentially granting PMP-level control of memory mapping to applications (so they can map and unmap pages anywhere they want within the entire 512MB application space window). This would allow you to create your own (fake) dynamic areas within application space (I think you’re doing this already?), but have the kernel manage mapping them in and out on task switches. However this won’t solve the problem where you’re wanting to (a) doubly-map memory, and (b) doubly-map memory which doesn’t even belong to you. Another option (perhaps in addition to the above) would be to implement the “area bound to client application” DA flag. Then you could legitimately create a doubly-mapped area within application space (by manually specifying the DA base), which would only leave the problem of wanting to doubly-map memory which doesn’t belong to you. Or (as I’ve mentioned many times before) you could give up on the idea of running old games directly and go down the full system emulation route like ArcEm ;-) In a multi-head display setup, you could implement it as it currently is, with buffers one after the other. eg Head 1 buffer 0:Head 2 buffer 0:Head 1 buffer 1:Head 2 buffer 2. This would however break hardware scrolling, alternatively implementing as: Head 1 buffer 0:Head 1 buffer 1:Head 2 buffer 0:Head 2 buffer 1 would get around this, but possibly add complications elsewhere as there’s the potential for one head to switch from single to dual/triple buffering and cause the logical address of Head 2/3 to change, so where would need to be a means to notify drivers of a logical address change to their view of DA2. Yeah, hardware scrolling is a bit of a bitch. There are some situations it would work (e.g. vertical scrolling would be fine if the displays are arranged horizontally, and all the same height), but for most multi-monitor situations it probably wouldn’t be possible. But, you’re unlikely to be running a game (especially an old one which wants to use hardware scrolling) that’s spanned across multiple monitors. When you start the game the OS would switch into single-monitor mode, allowing the DA to use a standard flat memory mapping, which would then allow for the same level of hardware scrolling that’s available now.

Oct 28, 2015 4:20pm Rick Murray (539) 13840 posts	Just a small question: would this proposal permit two entirely different things on each screen? I’m thinking of the primary display being the…primary display (!) and one of the others (such as composite video) being a lower resolution debug/trace display. Something along those lines, but would apply also to stuff where the control happens in one screen and the output in another; where each display is an entity to itself (and likely different resolution) and not simply a continuation of the primary. Well, small question less small explanation. ;-)

Oct 28, 2015 4:39pm Jeffrey Lee (213) 6048 posts	Just a small question: would this proposal permit two entirely different things on each screen? Yes.

Oct 28, 2015 7:14pm Jon Abbott (1421) 2651 posts	We could probably do with some input from Adrian around how Aemulor provides legacy DA2 support. If there’s no legal way to provide the double (it’s actually tertiary) map of DA2, then lets not worry about it. I can still directly modify L2PT on entry/exit as is currently happening, and if that eventually turns out to not work, I’ll figure out a workaround. Or (as I’ve mentioned many times before) you could give up on the idea of running old games directly and go down the full system emulation route like ArcEm ;-) Where’s the challenge in that! ADFFS is a stepping stone to a full type 1/2 Hypervisor, one of the key things with a Hypervisor is the code runs natively on the CPU and only falls back to Paravirtualization where absolutely necessary (eg CPU privilege level). The only bits that require emulation (or more correctly virtualizing) should be the IO and in our case, some CPU behaviour to match 26bit CPU’s. You just want me to speed up ArcEm … admit it ;-) Just a small question: would this proposal permit two entirely different things on each screen? Yes, what’s being proposed here is to extend GraphicsV and relevant areas of RISCOS to support multiple graphics cards and multiple monitors. We’re also trying to figure out how to provide legacy MODE support for 1/2/4/8 bpp MODE’s on GPU’s that only support 24/32bit bpp, in a manageable legal way that allows ADFFS/Aemulor to work and eventually allow RISCOS to provide bpp upscaling natively.

Jul 6, 2017 1:19pm Jeffrey Lee (213) 6048 posts	One of the things that’s been on the todo list for a while is implementing support for ROL’s OS_ScreenMode reason codes 7-10. On the surface they look trivial, but OS_ScreenMode 7 has always confused me a bit, preventing me from proceeding. Reading through the memory management docs today, I think it’s finally clicked into place: Drivers have full control over their memory allocation (in both logical & physical space) Banks are no longer contiguous in logical or physical space Drivers can unmap banks from the logical address space if the bank isn’t active. Since you can only have two active banks – VDU and display – this means the driver only needs logical address space large enough for two banks to be mapped at once. This also suggests that if you deactivate and then reactivate a bank, it may be assigned a new logical address Memory for a screen bank will be allocated the first time the bank is used OS_ScreenMode 7 effectively returns the number of screen bank that have been created so far, plus the maximum number of additional screen banks that could be created given the driver’s current free memory level Since DA 2 is no longer in use, there’s no need to use OS_ChangeDynamicArea to manually reserve the required amount of memory for the number of screen banks you want (AIUI on IOMD machines the video driver is in full control over how VRAM is allocated; possibly it no longer shares it with the rest of the system) There’s no way of guaranteeing that you’ll be able to create all the screen banks you want, either before switching into a mode (the OS_ScreenMode 7 docs say as much), or even after you’ve switched into the mode (“The number of banks may change with system use depending on the driver. Drivers may allocate space in order to cache private data or for other purposes which may affect the number of banks available.” – which suggests to me that even after OS_ScreenMode 7 has given you a number, the driver might suddenly decide to use some of the free memory for itself, preventing you from creating all the banks you wanted). Hardware screen scrolling is only supported to the minimum extent that is required by the OS (VIDC-style vertical scrolling within the confines of one screen bank); since the banks are no longer required to be contiguous in logical or physical space there’s no way you can scroll a framebuffer which is taller than the screen. Presumably hardware scrolling still results in the screen logical address changing (with the scrolled bank doubly-mapped) A lot of the properties of ROL’s approach sound good for us (giving drivers more control over memory management, simplifying screen scrolling support). But there are some bits that feel a bit nasty (lack of strong guarantees that you’ll be able to get all the screen banks you want/need), and I suspect removing support for scrolling tall framebuffers will cause problems with games. To start with, I think we can implement OS_ScreenMode 7-10 without any significant changes to our memory management: For drivers which use DA 2: OS_ScreenMode 7 will return the number of available screen banks as the maximum size of DA 2 divided by the screen size. (Potentially also adding a check to make sure there’s enough memory in the free pool to allow DA 2 to grow to max – so low-memory systems won’t give obviously incorrect answers) OS_ScreenMode 8/9 (or OS_Byte 112/113) will automatically try to grow DA 2 to the required size For drivers which use external framestores, OS_ScreenMode 7 can just use the framestore size divided by the screen size, and OS_ScreenMode 8/9 shouldn’t need to allocate any memory This should allow software to use multiple screen banks without having to worry about resizing DA 2 manually, as per RISC OS Select (and it should also help them avoid the pitfall of resizing DA 2 on systems that don’t use it). Once the basics are implemented we can then have another look at memory management overall and see if we can come up with a design which gives drivers more control over things without breaking any important use-cases. Perhaps we should take a leaf from the Pi’s book, and have separate “virtual” and “physical” framebuffer sizes? So software can explicitly request a framebuffer which is larger than the screen, and then scroll the display freely (ish) within that area, potentially with VIDC-style wrap around (if the hardware supports it). Single-monitor setups/modes could use this approach for handling multiple screen banks, while more complex scenarios (multi-monitor modes, hardware display rotation, etc.) can use the more restrictive ROL-style approach where each screen bank is a separate block of memory.

Apr 17, 2018 12:58pm Jeffrey Lee (213) 6048 posts	Some updates: I’m hoping that it won’t be an insurmountable task to teach the OS about screen modes where there’s a gap at the end of each line This has been dealt with via GraphicsV 19 and the pre-existing ExtraBytes control list item (although BCMVideo is the only driver that implements GV 19 at the moment). So that we don’t lose any existing functionality, PMPs will be extended to add support for doubly-mapped memory. However I haven’t quite decided what the best approach would be for managing the size/offset of the second mapping The thought I’ve had today, is that for each (doubly-mapped) PMP the kernel could maintain a list which specifies the address offset(s) of the additional mapping(s). The list would store the details for ranges of addresses/pages, rather than storing the data on a per-page basis – avoiding excess memory overheads while keeping lookups reasonably fast. New OS_DynamicArea reason codes would be used to create/destroy/modify the multiply-mapped areas within the DA/PMP.