category: Specification <div id="toc_heading"></div><div id="toc"></div> h2. Goals Traditionally, 32 bit ARM CPUs have had a 32 bit address bus. ARMv6 was the first architecture version to break this limit, by adding support for supersections - page table entries that can support a physical address up to 40 bits wide, but with the restriction that the mapping granularity is 16MB. This was fine for IO devices, but awkward for RAM, where the OS will typically want to the freedom to use 4KB pages. To improve on this ARM devised a new page table format (the so-called "Long descriptor" format), and released it as part of the ARMv7 Large Physical Address Extensions. Amongst other changes, this new format allows all page sizes to be used across the whole address space. For AArch32, the supported page sizes are 4KB, 2MB and 1GB, with a maximum physical address space of 40 bits. For AArch64 the physical address space can reach a size of 52 bits. The Raspberry Pi 4B can support up to 8GB of RAM, with half of that RAM being located above the 32 bit physical address space barrier supported by the older "Short descriptor" page table format that RISC OS 5 has previously been limited to. Similarly, the 4GB version of the IGEPv5 also places half of its RAM above the 32 bit barrier. On current versions of RISC OS 5 this means that for both machines, only half of the fitted RAM can be used. The goals of the changes are therefore: * To add support for the Long Descriptor page table format, to the degree where the OS can use it in place of the Short Descriptor format, without most existing programs needing to know or care about the difference * To extend APIs which deal with physical addresses, to add support for 40 bit (or wider) physical addresses, specifically to allow device drivers to allow for DMA to or from addresses wider than 32 bits * To extend APIs which deal with pools of RAM, to add support for RAM amounts of more than 2GB * To remove other limits within the OS which prevent more than 4GB of RAM from being used by the OS * To extend the BCM2835 & OMAP5 HALs to allow for the full 8GB & 4GB of RAM to be reported to the OS for the Raspberry Pi 4B & IGEPv5, respectively * To extend any relevant hardware drivers to ensure they still offer DMA to/from RAM above the 4GB physical address barrier, where possible * To extend the softload tool to operate correctly on systems where the long descriptor format is being used h2. Existing documentation h3. Relevant specifications * "DDI0406C.d":https://developer.arm.com/documentation/ddi0406/cd/?lang=en - ARMv7-A & -R architecture reference manual h3. Relevant forum threads * "Long descriptor page table support":/forum/forums/3/topics/14885 h2. Terminology *Low RAM* - RAM which can be accessed using a 32 bit physical address *High RAM* - RAM which cannot be accessed using a 32 bit physical address (because it needs a 33 bit or larger address) *Soft CAM* - Originally this was a soft copy of the MEMC page tables (which were stored in "Content Addressable Memory" that was write-only from the perspective of the CPU). On ARMv3+ CPUs a completely different page table scheme is used by the integrated MMUs, but the "Soft CAM" is still an important bookkeeping data structure to help the OS to keep track of where RAM is mapped, who owns it, and its access permissions. h2. Detail h3. Initial Long Descriptor page table support This historic batch of work comprised the following notable changes: "Preparation for long descriptor page table support":https://gitlab.riscosopen.org/RiscOS/Sources/Kernel/-/merge_requests/50 * Moving and refactoring various bits of kernel code, to make it easier to add long descriptor support, and to make those future changes easier to see/understand/review * From OS/software point of view the kernel should look and function exactly the same as before "Initial long descriptor page table support":https://gitlab.riscosopen.org/RiscOS/Sources/Kernel/-/merge_requests/51 * Implement support for the long descriptor page table format within the kernel, but restricted to a 32 bit physical address space * Add build-time support for selecting whether the kernel will use the long or short descriptor format * Adjust the memory map to remove the 16MB size limit on the soft CAM; the 16MB limit would have restricted the OS to 4GB of RAM. With the limit removed, the OS can easily support much more RAM (although the maximum of 1TB of RAM possible under the long descriptor format would result in the soft CAM consuming the full 4GB logical address space, so at some point we may need to change the CAM to be dynamically mapped to avoid it consuming excessive amounts of logical space on systems with 32+GB of RAM) * Extend the [[RISCOS_AddRAM]] entry point used by the HAL, to allow HALs to register any high RAM which may be present. With this version of the kernel this extra memory will be ignored, but it allows the initial BCM2835 & OMAP5 HAL changes to be merged. "8GB RAM support":https://gitlab.riscosopen.org/RiscOS/Sources/HAL/HAL_BCM2835/-/merge_requests/17 for BCM2835 & "4GB RAM support":https://gitlab.riscosopen.org/RiscOS/Sources/HAL/HAL_OMAP5/-/merge_requests/4 for OMAP5 * Extending the HALs to report the extra memory to the OS * Changing the BCM2835 HAL to rely on the OS to perform the HAL clear, since the software-based RAM clear will only function on low RAM (and it's significantly slower than letting the kernel do a software RAM clear) * Changing the OMAP5 HAL to report the full 34 bit memory map from HAL_PhysInfo (the BCM2835 HAL already reported the full memory map for the Pi 4) Also note that current kernel builds do not enable the long descriptor page table support. h3. Support for RAM banks with high physical addresses "This kernel merge request":https://gitlab.riscosopen.org/RiscOS/Sources/Kernel/-/merge_requests/53 comprised several major changes to the kernel and memory-related APIs: * The PhysRamTable data structure that the kernel uses to keep track of memory was changed to store the address of each RAM bank in terms of (4KB) pages instead of bytes, effectively allowing it to support a 44 bit physical address space. * [[OS_Memory 12]] was extended to allow R4-R7 to be used to specify a (64bit) physical address range which the recommended pages must lie within. This is to help ensure hardware drivers which need to allocate memory are given memory which is within an address range which the hardware can actually access. For backwards compatibility this defaults to 0-4GB. * Add [[OS_PlatformFeatures 0]] bit 21 (aka @CPUFlag_HighRAM@), which will be set if any high RAM is present, and thus whether software should prefer to use APIs which support 64 bit physical addresses over the older APIs that are limited to 32 bit addresses. The older APIs may fail with an error or have sub-optimal support when used with high RAM locations. * Add [[OS_Memory 64]], an extended form of [[OS_Memory 0]] which uses 64 bit addresses instead of 32 bit. Using 64 bit physical addresses allows conversions to/from physical addresses to be performed on pages with large physical addresses. Using 64 bit logical addresses provides us some future-proofing for an AArch64 version of RISC OS, with a 64 bit logical memory map. * Extend [[OS_Memory 19]] for 64 bit physical addresses (and fix it to understand non-DMAable memory) * Add [[Service_PagesUnsafe64]] and [[Service_PagesSafe64]], versions of [[Service_PagesUnsafe]] and [[Service_PagesSafe]] which use 64 bit address fields. See the [[Service_PagesUnsafe]] documentation for details of how the different service calls are issued. * The page replacement logic (which is responsible for issuing the PagesUnsafe/PagesSafe set of service calls) will prevent a low RAM page from being replaced with a high RAM page. This is necessary to ensure that users of the old 32bit APIs see the page replacement take place. However it does mean that programs will be unable to claim pages of low RAM which are in use if there are not enough free low RAM pages in the free pool. * More refactoring of the long & short descriptor page table code to allow for runtime selection of whether the long or short format should be used. This is necessary to avoid needing to introduce different ROM variants for different Raspberry Pi models, since the long descriptor format isn't supported on earlier models (e.g. ARMv6 Pi 1) * [[RISCOS_LogToPhys]] entry point used by the HAL improved to add support for all types of page table entry * [[OS_Memory 65]] introduced, which exposes [[RISCOS_LogToPhys]] via a SWI, to allow logical to physical address conversions performed by software to work with all page table entry types (unlike previous APIs, which are only worked with 4KB RAM pages) The above API changes apply to all kernel builds, regardless of whether long descriptor page tables are supported/enabled or not. Also note that current kernel builds do not enable the long descriptor page table support. This set of changes (with the correct kernel build options) allow for the full amount of RAM on the 8GB Pi 4 and 4GB IGEPv5 to be used by the OS. h3. SATA driver improvements "SATADriver was changed":https://gitlab.riscosopen.org/RiscOS/Sources/HWSupport/ATA/SATADriver/-/merge_requests/1 to use OS_Memory 19 if (a) high RAM is being used by the OS and (b) the SATA controller supports 64 bit physical addresses, to ensure that transfers aren't needlessly performed via bounce buffers. However in practice this doesn't yield any improvements for current devices, because the bus/interconnect architecture on the OMAP5/IGEPv5 means that the high RAM is only accessible to the CPU. h3. Dealing with the lack of "user read-only, privileged read-write" access permission The short descriptor page table format supported a "user read-only, privileged read-write" access permission (aka [[Memory Map Page Access|AP 1]]) which RISC OS has made heavy use of. The long descriptor page table format lacks support for this access mode, and the temporary workaround employed by earlier changes of widening the permissions to "user read-write, privileged read-write" is clearly a step backwards in terms of security. A review of memory areas revealed that the ROM makes use of this permission in the following areas: * FileCore's buffer/map DAs - revised to remove usermode access in "FileCore 3.77":https://gitlab.riscosopen.org/RiscOS/Sources/FileSys/FileCore/-/merge_requests/3 * The supervisor stack - speculatively revised to remove usermode access in "this kernel change":https://gitlab.riscosopen.org/RiscOS/Sources/Kernel/-/merge_requests/46, under the assumption that almost nothing would have been relying on this level of access (since the SVC stack is typically empty when the CPU is in user mode). "DragAnObj needed a fix":https://gitlab.riscosopen.org/RiscOS/Sources/Desktop/DragAnObj/-/merge_requests/1, but otherwise the change appears to have been a success. * Kernel workspace, kernel buffers, zero page, and HAL workspace - These are harder to fix because they contain many values which usermode software expects to be able to access. To deal with the kernel/HAL areas, and the wider problem of dynamic areas created by third-party software, the decision was taken to have the kernel map the areas as "user none, privileged read/write" and to employ an abort handler to emulate any usermode read operations. A natural extension of this was to provide "a full implementation of the OS_AbortTrap mechanism":https://gitlab.riscosopen.org/RiscOS/Sources/Kernel/-/merge_requests/55 that was first appeared in RISC OS Select: * An implementation of [[OS_AbortTrap]], including support for all ARM, FPA, and VFP/NEON load/store instructions * A Select-compatible implementation of [[Abortable Dynamic Areas]] * [[OS_Memory 24]] updated to flag abortable DAs as being abortable (as per RISC OS Select) * OS_AbortTrap API extensions to allow the OS to request that the handler should map in the memory with (at minimum) the requested set of permissions. This is necessary to ensure correct operation of some instructions (e.g. LDREX/STREX), and also means that code can be executed from AbortTrap areas (the prefetch abort handler will request that the page be mapped in with suitable execute permissions) * A fix to [[OS_ReadSysInfo 7]] so that it tracks both prefetch and data aborts (previously only data aborts were being tracked) * Fixes to lazy task swapping to ensure it works correctly with pages which cross page boundaries (Thumb-2, Jazelle) * [[OS_PlatformFeatures 34]] extended to report the presence & writability of a few useful abort processing CP15 registers (DFAR, DFSR, IFAR, IFSR, AIFSR, ADFSR) * Build system & kernel memory management changes to allow C code to easily be used in the kernel (including a new memory area, 17, from [[OS_Memory 16]]) Once AbortTrap was implemented it was fairly easy to have the kernel emulate usermode access to AP 1 areas. Apart from emulating usermode reads when the long descriptor page table format is in use, this also removes the usermode execute permission, so it's possible that this may break some third-party software. h3. ARMEABISupport fixes The ARMEABISupport module used by GCC currently uses [[OS_Memory 0]] to attempt to generate a unique identifier for the running task, based on the physical address of the first page of application space. This is unsuitable for a couple of reasons: * It will break if page replacement ([[Service_PagesUnsafe]] etc.) causes the physical pages associated with the task to be replaced with other pages * It will break if the first page is located in high RAM ([[OS_Memory 0]] will generate an error due to the large physical address) A modified version of the module was developed which instead uses the Wimp task handle to identify the task, with the help of Wimp pre-poll and post-poll filters to ensure the code always knows what the active task is (the module contains an abort handler that needs to be know the active task). However pre-poll and post-poll filters were found to be insufficient for this task. "Some other solutions for identifying the active task were suggested on the forums":/forum/forums/3/topics/16272, but as yet no decision has been made on which approach to take. h3. Kernel page allocation flaws The decision to restrict the page replacement strategy (used by [[Service_PagesUnsafe]] etc.) so that low RAM pages can only ever be replaced by other low RAM pages can result in OS functionality being seriously impaired if all the low RAM pages have already been allocated to DAs, by preventing other programs from which need to use those pages from being able to use them. The page allocation strategy can also have a negative impact on the performance of device drivers which are only capable of performing DMA to/from low RAM. E.g. on IGEPv5, "creating a 2GB RAM disc on startup within Predesk will severely impair SATADriver's performance":/forum/forums/3/topics/14885?page=2#posts-117529. Solutions to these problems need to be investigated (for SATADriver, simply increasing the bounce buffer size may be sufficient to adequately reduce the performance impact). h3. DMAManager extended to support 64 bit physical addresses No current system requires this, but it's a useful future-proofing step for future systems. The DMA controllers on the Pi 4 are able to use high RAM addresses, so would be an obvious choice to use when testing the changes. h3. Softload tool The softload tool needs significant changes in order to allow it to work on systems with long descriptor page tables, and to fix a number of historic flaws which somehow haven't caused any problems with existing platforms where the tool is supported: * On modern OS versions, PhysRamTable is only used for debug output, so there's no need to update the code which reads it to cope with the new format (addresses specified in 4KB units instead of byte units) * The kernel will need updating to expose the logical addresses of the long descriptor L1PT (and possibly L2PT & L3PT), e.g. via [[OS_ReadSysInfo 6]] Flaws in the current code (startnew.s): * Soft unloading assumes the ROM physical address is zero * Soft loading assumes the physical address of the first page of application space (rounded down to MB boundary) doesn't match the logical address of any of the following (rounded down to MB boundary): ** First MB of appspace ** SVC stack ** ARMA cleaner space ** MMU_Changing ARMop It should be possible to fix these problems by changing the code to take the following approach: # Make a note of which logical regions are needed during the "MMU off" sequence, and searching for a physical page whose address won't alias those (using OS_Memory 0/64 to walk through all the pages might be good enough) # Use a NeedsSpecificPages dynamic area to claim & map in the page # Copy a short section of code to the claimed page # (do any other prep work that's required) # Enter SVC mode with IRQs+FIQs disabled # Issue Service_PreReset # Flush and disable the caches, reset vectors to zero, etc (as per current code) # Overwrite the appropriate page table entry to create a flat mapping of the page that was claimed earlier (so that its logical address is the same as its physical address) # Drain write buffer & flush TLB # Jump to the code that was placed in the page # Disable the MMU # Jump to the new ROM By performing the steps in the above order, it should minimise the number of memory regions which could conflict with the flat-mapped page, making it easier to find a page that will work (and decreasing the chances of us missing a dependency and then having the code break due to an address clash) When disabling the caches it may also be necessary to disable cacheable pagetables, however the kernel routine that's responsible for this (MakePageTablesNonCacheable) doesn't seem to be reachable any more. It's also worth considering making the kernel responsible for performing the last few steps itself, as this will provide the best protection against future changes or device/implementation-specific quirks. h3. Memory amount clamps Around the time [[Physical Memory Pool|Physical Memory Pools]] were implemented, a number of memory-related APIs were updated to clamp the max amount of memory they'd report to 2GB-4KB, to try and avoid any 32bit signed number overflow bugs. Now that we have machines with 4GB or more of RAM (notably 4GB & 8GB Raspberry Pi 4B's, which will have 4GB of RAM available in short descriptor versions of RISC OS), it's been discovered that the 2GB-4KB limit is insufficient to avoid some software from failing. Some of this software has already been fixed, but if bugs in other software persist then it may be worth investigating reducing the limits further to try and avoid further problems (e.g. to match how much memory would typically be available on a 2GB system with no boot sequence) h2. Implementation progress table(bordered). |_<. Phase |_=. Status |_=. Completion |_<. Latest updates | |<. Conceptual design |=. In progress |=. 70% |<. 26-Dec-2021 Document created<br>16-Jan-2023 Document updated with mostly-complete historic information | |<. Mock ups/visualisation |=. - |=. - |<. - | |<. Prototype coding |=. - |=. - |<. - | |<. Final implementation |=. In progress |=. 70% |<. 16-Jan-2023 Most of the required changes have already been implemented & merged many months ago | |<. Testing/integration |=. In progress |=. 30% |<. 16-Jan-2023 API & short descriptor code changes have been in the wild for many months. Long descriptor & high RAM testing has seen minimal end-user testing due to not being enabled in nightly ROMs | h2. Document history v1.00 - 26-Dec-2021 * Outline added v1.01 - 16-Jan-2023 * Document fleshed out with lots of historic details relating to what's been implemented so far, and rough details of things that haven't been done yet v1.02 - 17-Jan-2023 * Some future/unfinished work (ARMEABISupport fixes, page allocation flaws, softload tool) elaborated on