Physical memory allocation

44 posts, 11 voices

Pages: 1 2

Jul 29, 2014 1:24pm

Jeffrey Lee (213) 6048 posts

I’m slowly coming up with a plan of attack on how to upgrade the kernel/OS to allow for memory to be claimed but not mapped in to the logical address space. I figured I might as well share my thoughts here, as it’s relevant to some recent discussions, and some other people might have some thoughts on how best to implement things.

Basically, the main goal would be to allow memory to be claimed (as per a dynamic area), but unlike a dynamic area the memory won’t be (immediately) mapped in to the logical address space. Instead the owner of the memory will be able to control when and where it gets mapped in (similar to a sparse dynamic area), or whether it ever gets mapped in at all. Adding support for this kind of memory allocation to the OS would allow for the following kinds of things to be implemented:

Larger RAM disc support, without wasting lots of logical address space. Instead you’d use a small dynamic area to act as a window into the much larger physical memory pool which stores all the data.
Getting rid of the free pool from logical address space. On a 512MB machine you lose 512MB of logical address space to the free pool. On a 1GB machine you lose 1GB. On a 2GB machine you’ll lose 2GB. 4 or more GB of RAM? Well, I’m sure you see where this is going.
OS-wide support for caching of data (filesystem, font cache, etc. – as mentioned here). You could have hundreds of megabytes of data cached, but only a subset of that data mapped into the logical address space at once.
Reserving big blocks of physical memory for non-CPU usage. E.g. display rotation on OMAP requires us to set aside a big chunk of RAM which both us and the display controller won’t access directly; instead all the accesses need to go through a tiling unit which will do the actual rotation. Or on the Pi, there’s a mode the GPU firmware can run in where it cooperatively shares memory with the CPU – whenever the GPU runs low on memory it will ask the CPU for some more, so we need to be able to dynamically reserve physical pages and hand them over to the GPU.

Taking the above into account, the system would need to be able to cope with the following:

Clients which require specific physical pages (for hardware IO buffers, etc.) – As per dynamic areas
Clients which don’t really care what physical pages they get given (RAM disc, cache memory, etc.) – As per dynamic areas
Clients which need to be able to lock the memory once it’s been assigned to them (for allocating memory to the Pi’s GPU) – There’s no equivalent of this with dynamic areas. However we could keep things simple and just require that specific physical pages are requested instead (as that will implicitly lock ownership of those pages)
If a (unlocked) physical page should get claimed by someone else, whether the contents of that page should be preserved (i.e. whether the OS should find a free page from elsewhere and copy the contents over) or whether it can be discarded. For cache memory (especially if the entire free pool is turned into a cache), discarding the contents would make the most sense. But for other uses (e.g. RAM disc) the contents obviously needs preserving. This probably needs evaluating on a per-page basis, so that if a write-behind filesystem cache is in use then any data which still needs writing to disc can be preserved while any read data can be safely discarded.
The ability to map the pages into and out of the logical address space on demand. Sparse dynamic areas already cover this to an extent, but we’d need extra control in order to allow the code to specify which pages of the physical pool get mapped in instead of leaving it up to the kernel to pick arbitrary pages from the free pool.

In terms of implementation, it’s probably easiest to consider a physical memory pool as being a special type of dynamic area. So when creating the dynamic area you’d specify both the maximum logical size (which could be zero, if the memory is for non-CPU use) and the maximum physical size. Then you’d use calls similar to the sparse dynamic area map in/out calls to manipulate the logical mapping, passing in a page list of which pages you want mapped in (instead of letting the kernel pick for you). For manipulating the underlying physical pool you’d probably need two extra calls – one for claiming pages and one for releasing pages. Perhaps more if we allow pages to be locked/unlocked on demand. And I’d expect that the physical pool would have to be sparse as well – pages within the physical pool will be identified by their index within that pool, rather than by the actual physical page number or address. That way it should simplify the code that’s necessary to deal with page reclaiming (for something like the filecore cache, if it uses pool-local page indices to specify which pages to map in/out then it means that it won’t care at all if the kernel has to reallocate some of the physical pages behind the scenes because another dynamic area has requested a specific page which you were using).

Another big implementation hurdle would be allowing the kernel to work out the owner of a physical page. This would be necessary to allow it to efficiently support the above situation of a dynamic area (or physical pool) requesting a physical page which is in use by another physical pool. Detecting whether the page is locked is trivial (that’s just a flag in the CAM), but once the kernel has worked out that it’s unlocked it needs to be able to locate the physical pool which owns the page so that it can update its page list. I’m thinking the cleanest way of supporting this would be to expand the CAM from two words per page to three or four words per page; you could just about get by with two words per page (overloading the logical address and using it to store the physical memory pool info when the page isn’t mapped in), but adding an extra word or two would make things much cleaner when dealing with reclaiming pages which are already mapped in.

Some of you may have already realised that we already have a system in place for claiming physical memory but not mapping it in – the AMBControl system that’s used for managing wimpslots for wimp tasks. Although this system works (it’s able to correctly cope with the kernel reclaiming pages for other purposes), the implementation isn’t entirely optimal – it listens out for Service_PagesSafe and then manually searches through the AMB control blocks to work out if any of the pages it manages have been remapped. So not the best code to base any new implementation ontop of, but something that could be improved if it was migrated to the new system.

Jul 29, 2014 4:32pm

David Feugey (2125) 2709 posts

Very good plan :)

If a (unlocked) physical page should get claimed by someone else, whether the contents of that page should be preserved

Should all the block of ram be marked as used, or only parts of the block allocated and used (so locked)? Not sure, I’m really clear here :)

If you give us the possibility to reserve blocks of memory without allocate them at first, should it be possible to use memory outside of the physical space? Then, exceptions could be raised when used. Virtual memory, on demand.

To make it simple, it’s probably better to have several memory types: non_protected/in_memory (direct access / not locked) – protected/in_memory (direct access / locked) – on_disk/uncompressed (access via VM exception) – on_disk/compressed (the same) and why not some in_memory/compressed for FS cache.

Perhaps the compress idea could be a filter applied before a VM access (compression, transformation of data [big<>little endian / color conversion], etc.).

Jul 29, 2014 8:14pm

Sprow (202) 1168 posts

I’m slowly coming up with a plan of attack on how to upgrade the kernel/OS to allow for memory to be claimed but not mapped in to the logical address space.

One extra thing to muse if you’re thinking about paged memory schemes, some of the more recent ARMs have the large physical address extensions which would allow a 40 bit address space (1TB of RAM on a 32 bit architecture) for those that support it.

Jul 29, 2014 9:08pm

Jeffrey Lee (213) 6048 posts

If a (unlocked) physical page should get claimed by someone else, whether the contents of that page should be preserved

Should all the block of ram be marked as used, or only parts of the block allocated and used (so locked)? Not sure, I’m really clear here :)

Everything will be handled on a per-page basis. So even if you’re only using one byte within a page the kernel will consider the full page to be in use. That might not be the most optimal way of doing things, but it will keep the implementation nice and simple (both for the kernel and for the programs).

If you give us the possibility to reserve blocks of memory without allocate them at first, should it be possible to use memory outside of the physical space? Then, exceptions could be raised when used. Virtual memory, on demand.

I think there are three things we’d need to get virtual memory working:

The ability to map pages in and out on demand. We almost have this with sparse dynamic areas, but the problem with those is that they’re still subject to the same restrictions as regular dynamic areas, so if you’re trying to allocate and map in pages from within an abort handler then you might run into difficulties.
Support for associating abort handlers with dynamic areas. This will be coming at some point as it’ll be useful for some of the GraphicsV stuff I’m working on.
A re-entrant filing system stack. Or perhaps just some modifications to make sure that any I/O buffers needed by the filing system are paged in and locked in memory for the duration of the operation. Other OS APIs might also need similar modifications – if you call a SWI which does something sensitive with interrupts disabled then you don’t want interrupts to suddenly come back on again when it hits an unmapped page and has to wait for slow disc I/O to load in the data.

I’m slowly coming up with a plan of attack on how to upgrade the kernel/OS to allow for memory to be claimed but not mapped in to the logical address space.

One extra thing to muse if you’re thinking about paged memory schemes, some of the more recent ARMs have the large physical address extensions which would allow a 40 bit address space (1TB of RAM on a 32 bit architecture) for those that support it.

Yes, I’ve already mused on that to a certain extent. There’s an IGEPv5 model with 4GB of RAM, so the hardware to test it on is within our reach. But for now I think I’ll be happy with just these changes – updating everything which deals with physical addresses to treat them as 40 bit (effectively 64 bit) would be a lot of work!

Jul 30, 2014 5:03am

William Harden (2174) 244 posts

Presumably this plays well into the multi-core discussions too? I presume from above that the GPU isn’t really a ‘special case’ – it’s ‘other PUs’ that can share physical address space, of which a GPU is one such example.

If the SCU has control over the physical pages (and presumably that could be part of the changes even at this stage), it would be possible to implement the ‘two RISC OSes, one on each core’ starting point previously discussed. The CPU0 RISC OS and CPU1 RISC OS use different physical pages, with the logical paging being allocated by the SCU. Clearly that doesn’t allow the two CPUs to cooperate in any way at this stage – but it would at least set the scene to then build cooperative memory structures, even if the two ARM cores use different logical memory maps for their work?

Jul 30, 2014 9:52am

Jeffrey Lee (213) 6048 posts

Presumably this plays well into the multi-core discussions too?

Not really… in a multi-core world I’d assume that we’d be going full SMP and so would be able to freely share memory between cores/processes using the standard RISC OS APIs. I.e. all cores see the same dynamic areas mapped, the only difference in the memory maps would be which wimpslots are mapped in. Maybe in future we’ll support something more advanced (e.g. the “dynamic area bound to task” flag), but for first-pass SMP keeping it simple would be the best approach.

If the SCU has control over the physical pages

The SCU doesn’t do anything with regards to mapping or allocating memory. Its main purpose is to maintain cache coherency between the different cores.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0407i/CHDFJICC.html

For giving each core its own unique memory map all you need to do is set different page table base addresses and you’re good to go.

it would be possible to implement the ‘two RISC OSes, one on each core’ starting point previously discussed

That can “easily” be implemented with the current kernel. All you’d need to do is tweak the HAL to report different RAM chunks as being available to different cores. E.g. the lower 50% to CPU0 and the upper 50% to CPU1. Maybe keep 1MB or so reserved which can then be mapped in to both for inter-core communication (mapping the memory in via OS_Memory 13, bypassing the standard RAM allocation mechanisms).

Jul 30, 2014 2:44pm

David Feugey (2125) 2709 posts

Everything will be handled on a per-page basis. So even if you’re only using one byte within a page the kernel will consider the full page to be in use. That might not be the most optimal way of doing things, but it will keep the implementation nice and simple (both for the kernel and for the programs).

Sounds good to me.

That can “easily” be implemented with the current kernel. All you’d need to do is tweak the HAL to report different RAM chunks as being available to different cores. E.g. the lower 50% to CPU0 and the upper 50% to CPU1. Maybe keep 1MB or so reserved which can then be mapped in to both for inter-core communication (mapping the memory in via OS_Memory 13, bypassing the standard RAM allocation mechanisms).

For me it would perfect. IMHO, AMP is much better than SMP for RISC OS. It doesn’t break anything, and it can be done now. And it’s so flexible that you can even choose… not to use it. A good point from an embedded point of view.

The big problem would be to get a subset of the normal system (without Wimp) for the second core. Perhaps a sort of ready to run ROM image to load, with a very simple HAL (nothing, except timers, some interupts and memory access).

I already say it too, but a will pay for such a bounty. Probably 300 pounds. And probably more if associated to a WIMP2 remake (on a different basis, to avoid GPL issues), i.e. some time limit on tasks.

Sep 10, 2014 12:37am

Jeffrey Lee (213) 6048 posts

I’m thinking the cleanest way of supporting this would be to expand the CAM from two words per page to three or four words per page; you could just about get by with two words per page (overloading the logical address and using it to store the physical memory pool info when the page isn’t mapped in), but adding an extra word or two would make things much cleaner when dealing with reclaiming pages which are already mapped in.

In terms of OS code, extending the CAM doesn’t look like it’ll be too painful. There’ll be some fallout in terms of OS_ReadMemMapEntries and friends (they directly get/set CAM entries), and the softload tool will need updating, but that’s about it for anything outside of the kernel.

Of course there’s also Aemulor and Geminus to worry about, so if they access the CAM directly then maybe the better approach would be to just add a second table somewhere for the page → physical pool lookup. The end result should be pretty much the same, so it’s not a big deal if extending the CAM isn’t feasible. In fact I might just go with that approach just to avoid the hassle + risk of having to update all the CAM/memory code instead of just some of it.

Sep 19, 2014 12:09pm

Steve Revill (20) 1393 posts

This is embarrassing; I remember sparse dynamic areas being specified and the API added to OS_DynamicArea but my recollection was that the actual ‘sparseness’ was never implemented. When was this done?

Sep 19, 2014 12:16pm

Jeffrey Lee (213) 6048 posts

It looks like the initial implementation of sparse DA support was complete (see this change to the Ursula kernel, from 1998)

Dec 12, 2014 8:01pm

Jeffrey Lee (213) 6048 posts

I now have a first-pass version of the ‘physical memory pool’ DAs working – enough to operate as a proof-of-concept, but without some of the features which will make the system worthwhile. As I’m sure many programmers will relate, getting the first version of some code up and running represents the passing of a significant hurdle (even if it’s only a psychological one), so hopefully progress on this will be a little bit quicker from now on.

My rough plan for finishing off the system is as follows:

Get RAMFS working with PMPs to act as a real-world test of the system (so far I’ve only tried one simple test app)
Restructure the CAM to allow it to store the PMP info for each RAM page (or find some other solution, e.g. a separate table)
Modify PMP code to not require PMP-owned pages to be marked as unavailable, and teach the kernel’s page reclaiming code about how to reclaim pages that are in a PMP (currently the pages within a PMP are all marked as ‘unavailable’ to prevent other DAs from claiming them – without step 2 the kernel has no sensible way of supporting reclaiming of PMP pages)
Convert the free pool to a PMP.

Plus some other assorted odds-and-ends like making the task manager display sensible information for PMP DAs, fixing up bugs/implementation issues, etc.

In terms of the API, in addition to the extra flag + parameters needed to the OS_DynamicArea ‘create’ call, I’ve currently got two additional reason codes defined:

A ‘PhysOp’ call to manipulate the physical mapping of the PMP (add/remove/change pages within the physical pool)
A ‘LogOp’ call to manipulate the logical mapping of the PMP (map/unmap PMP pages into the logical address space of the DA)

Both calls operate on unordered page lists and are capable of mapping in and out multiple arbitrary pages as part of their operation, unlike the sparse DA map/unmap calls which only allow you to map/unmap one contiguous logical region at a time. You can also specify the page flags (cacheability, etc.) on a per-page basis. However this is all subject to change if I find that the extra flexibility causes one too many headaches!

Aug 17, 2015 1:43pm

Jeffrey Lee (213) 6048 posts

As I’m sure many programmers will relate, getting the first version of some code up and running represents the passing of a significant hurdle (even if it’s only a psychological one), so hopefully progress on this will be a little bit quicker from now on.

8 months later, and I finally found some time to work on this again! Nothing to show yet, but I think the code is almost in the state where a PMP version of RAMFS would be possible (there were a few issues that needed resolving first – like how to deal with OS_ChangeDynamicArea being used to resize a PMP)

I’ve also spent lunch reading through the LPAE documentation (since that’s one of the things that PMP support will eventually lead on to). For the most part it looks fine, but there is one nasty incompatibility – unless I’m mistaken there’s no equivalent of the “user RO, privileged RW” access permission. With the long descriptor format (or with the short descriptor format with the access flag enabled) you have one bit to control whether memory is RO or RW and one bit to control whether access is allowed from all modes or just privileged modes. So I guess under an LPAE version of the OS we’d have to make OS_DynamicArea, etc. throw an error if “user RO, privileged RW” is requested, and will have to change any bits of the OS which do use it (kernel workspace?) to do something different instead – e.g. move all the user-readable bits to a user RW page, and live with the fact that user software may overwrite it at any given time. Or if it’s something critical to the kernel, have two copies of the data – an internal copy which the kernel reads/writes and an external copy which user code gets to see.

Aug 17, 2015 6:09pm

Rick Murray (539) 14047 posts

This has the potential of being a nightmare.

Many many APIs return pointers to things in privileged workspace. This has always been okay because the “system” could modify the data, but the user could only read it. Take that away (what are you thinking, ARM?) and we are left with a big problem.

I have said in the past that we ought to have a two-level RMA so that the code can be completely zoned off from user mode access, however we also need to have data accessible to it. Therefore, the “Module area” should operate as normal, but with the restriction of being SVC mode only (a USR R/W RMA is unthinkable). The secondary area, "Module data" should be user read/write and OS_Module claims can be directed there. However, OS_Module should have a flag to specify that the module wants the data to be allocated from the Module Area (inaccessible to user mode) so modules can choose what to share and what to keep private, bearing in mind that any user mode app could pee all over the RMA data area, one might prefer to keep filing system state someplace where only the filing system can touch it. Though in the longer term we ought to consider making APIs that will copy data into a user supplied block instead of just passing pointers. Yuck.

File this one under “oh look, ARM has made another incompatibility that breaks RISC OS”. ;-)

Aug 17, 2015 8:27pm

Jeffrey Lee (213) 6048 posts

Yeah, I’m not really sure why ARM have dropped support for it. Looking at the ARMv8 docs, it looks like the same permissions scheme is used there when in 64bit mode, so it doesn’t look like user RO, privileged RW is going to come back any time soon.

As far as RISC OS is concerned though, the following places seem to be using that access level (n.b. not an exhaustive search):

FileCore uses it for its disc map and directory buffer DAs. Why, I have no idea.
Kernel workspace. Should be fixable by shuffling things around to separate public data from private data.
ScratchSpace. Probably no reason for this to be user-readable.
HAL workspace. This one’s a bit tricky – although most of it should be HAL private data, the HAL will typically store HAL device descriptors there. Most of the time software interacting with HAL devices will be in privileged modes, but what if there was a SWI which returned a device name, and the device name was located in HAL workspace?
SVC stack. I can’t think of any good reasons for this to be user-readable.
Debugger page. Not critical that it’s not user-writable, and we’re planning on getting rid of it anyway.

Though in the longer term we ought to consider making APIs that will copy data into a user supplied block instead of just passing pointers.

I suspect that’s the approach that we’re supposed to be taking. After all, in a multi-core/multi-threaded system, it’s no good returning a pointer to something in a shared location (or at least, not without the object containing a mutex) because someone else could come along and do something to update it while you’re in the middle of using it. There may be some security considerations too (e.g. better to expose just the bits the program asks for, by copying it into the programs workspace, than return a pointer to a page containing the requested data and about a hundred other things)

Aug 17, 2015 10:15pm

Rick Murray (539) 14047 posts

As far as RISC OS is concerned though, the following places seem to be using that access level

The RMA is currently able to be written to from user mode. Are there plans to change this? Surely it should be done to aid system security? Problem is in working out what should be accessible from user mode, and what shouldn’t.

than return a pointer to a page containing the requested data and about a hundred other things)

That too. Making this information easily accessible encourages people to access it, even if Acorn fell over themselves¹ right from the days of the Beeb to emphasise using the proper API…

¹ It turned out to be quite important for software that could run on the co-processor across the Tube.

Aug 17, 2015 10:30pm

Jeffrey Lee (213) 6048 posts

The RMA is currently able to be written to from user mode. Are there plans to change this?

It’s not on my current todo list. Maybe one day, but at the moment I’ve got bigger fish to fry in terms of delivering bang-for-buck.

Aug 24, 2015 1:49pm

Jeffrey Lee (213) 6048 posts

I had a fairly productive weekend, so I now have a PMP version of RAMFS (1MB logical window into arbitrary-size physical space), and I’ve extended the CAM to be 16 bytes per entry instead of 8 bytes per entry (so that each RAM page can store its PMP association, so that reclaiming of PMP pages will work correctly). This means I’m now ready for step 4 of the plan, to convert the free pool into a PMP. However:

The PMP version of RAMFS works fine with disc sizes up to 512MB but fails on larger sizes. I’ve made the changes listed in PRM5A for updating the code to work with ‘big’ discs, but it’s still not working, so I suspect it’s something nasty with the way the disc record is being set up. There aren’t any obvious error messages in RAMFS’s debug output, so I might request the help of someone who knows FileCore a bit better than myself.
For each PMP created, the kernel allocates an array to store the list of logical pages assigned to it. So for the 512MB ram disc with 1MB local window, that would need (512MB/4K) pages * 4 bytes per entry = 512KB to store the page list. Or, for a machine with 4GB of RAM, the free pool would need 4MB to store its page list. At the moment I’m storing the page lists in the system heap (better than storing them in the RMA – which is what AMBControl does with its page lists), but the system heap currently has a 4MB size limit. So the example of a machine with 4GB of RAM wouldn’t work. And because each PMP needs its own page list, you’d also run into issues on a 2GB machine if you created a 2GB ram disc (2MB needed for the free pool page list + 2MB needed for the ram disc page list). As far as I can see there are four solutions to this problem:

Use a better data structure for storing the page lists, so that empty/unallocated pages don’t take up space in the list
Restrict the API so that you can’t have gaps in the list of physical pages
Make the system heap bigger (32MB? 64MB?) so that we can store both the PMP and AMBControl page lists there without fear of running out of space
Use the RMA

I’m worried that the extra complexity involved in options 1 and 2 (having the lists resize themselves for every page added/removed) will introduce some edge cases which the code won’t be able to cope with cleanly – e.g. what happens if the free pool is empty, you try and insert a page into it, but in order to store the reference to the page you need to expand the length of the page list, but there isn’t any free space in the system heap and you need to claim a new page to extend the heap? Also option 2 will increase the complexity of some use cases for PMPs, e.g. disc caching – you’d want to be able to discard a page from the middle of the cache.

Out of options 3 and 4, option 3 is the one I’m leaning towards. Partly due to the security aspect (we don’t want bad programs scribbling over the RMA and corrupting the PMP page lists), and partly because of the way dynamic areas are initialised during ROM init (when the DA init function is called, the system heap will have 32KB of memory mapped to it – which is enough to store a small page list to boot-strap the creation of the free pool DA). So keeping the code using the system heap will be easier than changing the ROM init to initialise the RMA earlier.

Does anyone have any thoughts on the above? Option 3 is the one I’m leaning towards, but since one of the aims of the changes is to free up a bunch of logical address space it seems a bit wrong to immediately take a big chunk of it back again.

I guess it’s worth pointing out that PMPs can resize their page list – so a well-behaved PMP (like the RAMFS one) will be able to make sure that its page list doesn’t contain any unallocated entries. But until the system starts getting used by third-party software it’s a bit hard to tell exactly how much memory an average system will be using to store page lists.

Aug 24, 2015 6:03pm

Rick Murray (539) 14047 posts

And because each PMP needs its own page list, you’d also run into issues on a 2GB machine if you created a 2GB ram disc (2MB needed for the free pool page list + 2MB needed for the ram disc page list).

and:

but there isn’t any free space in the system heap and you need to claim a new page to extend the heap?

Whatever system is chosen, it is going to have to eventually fail gracefully when memory runs out. Unless you have some sort of swapping mechanism¹ in the back of your mind, memory will eventually run out.
Even in nonsense cases like a 2GiB RAMdisc on a 2GiB machine (not viable – 5MiB for the OS ROM, 2MiB for memory management, 3MiB for the RMA, another MiB(ish) for heap/stack/cache/PCI buffer/etc, another MiB for the disc map, and finally video workspace (the Pi’s GPU claims 32MiB that doesn’t even figure in the memory RISC OS can see). That might be only 44MiB, but remember also you’d need some memory in application space. Not a lot of use filling up a RAMdisc if you can’t do anything with the contents. ;-)
But, you know, somebody is going to try…

So, the API must absolutely be designed to fail safely. If the extra memory required cannot be found, then stop and leave things as they were.

Partly due to the security aspect (we don’t want bad programs scribbling over the RMA and corrupting the PMP page lists),

Absolutely. It really gives me the (OS_)HeeBeeGeeBees to think of such sensitive information as that being in the RMA, a user-mode read/write area. It needs to be in a DA where it can be locked off from user mode access.

but since one of the aims of the changes is to free up a bunch of logical address space it seems a bit wrong to immediately take a big chunk of it back again.

The way I understand it, RISC OS can pretty much make use of the full 4GiB addressing potential of the ARM processor, however things start to get ugly if more than ~2GiB is actually fitted because the allocation method “pre-reserves” space in the memory map (as explained here).
So, the way I understand this is the start of making some (all?) of these areas able to be claimed dynamically. You don’t need to pre-allocate 256MiB of memory to the RMA when most systems will probably need 4-16MiB; you don’t need to pre-allocate xxxMiB to the RAMdisc when many systems won’t even use it, and we really don’t need to <euphemism for urinate> an entire 2GiB for the free pool.
Am I right?
If I am, then I don’t imagine that there will be many people complaining about your changes as there are many benefits to be had.
As you say:

Or, for a machine with 4GB of RAM, the free pool would need 4MB to store its page list.

You claim 4MiB and you give back 2GiB. Really, who is going to complain about that?

¹ Nobody ever did respond to my question regarding the “swapping” built into the Wimp…

Aug 24, 2015 7:34pm

Jeffrey Lee (213) 6048 posts

Partly due to the security aspect (we don’t want bad programs scribbling over the RMA and corrupting the PMP page lists),

Absolutely. It really gives me the (OS_)HeeBeeGeeBees to think of such sensitive information as that being in the RMA, a user-mode read/write area. It needs to be in a DA where it can be locked off from user mode access.

It looks like the jokes on us – the system heap is currently read/write in user mode. I guess that’s another thing to go on my todo list!

So, the way I understand this is the start of making some (all?) of these areas able to be claimed dynamically. You don’t need to pre-allocate 256MiB of memory to the RMA when most systems will probably need 4-16MiB; you don’t need to pre-allocate xxxMiB to the RAMdisc when many systems won’t even use it, and we really don’t need to <euphemism for urinate> an entire 2GiB for the free pool.
Am I right?

Not exactly. The RMA is a global resource, so it will still need to be a regular dynamic area, and will still need 256MiB-ish logical address space allocating to it even though it may only have 4-16MiB of pages mapped in. But the free pool will be completely gone from the logical address map, and the RAM disc will only take 1MiB of space. From the point of view of freeing up logical address space, it’s not clear what other DAs would benefit from turning into PMPs (filecore disc maps, perhaps?). But the free pool is the one that’s going to give the biggest bang for buck.

Aug 24, 2015 10:31pm

Rick Murray (539) 14047 posts

It looks like the jokes on us – the system heap is currently read/write in user mode.

I noticed that earlier this evening. I cobbled together a program to list dynamic areas and their access permissions (hoping greatly that the drive map WASN’T user accessible – phew!).

The RMA is a global resource, so it will still need to be a regular dynamic area, and will still need 256MiB-ish logical address space allocating to it even though it may only have 4-16MiB of pages mapped in.

Ah, I see. So the way the PMP works is that a small area is mapped in to be a window into a larger area (hence a half gigabyte RAMdisc claiming only 1MiB space (how does this affect RAMdisc’s read/write speed?)); but this wouldn’t work for the RMA.

Shame the RMA can’t auto-extend when necessary from a more modest allocation.

But the free pool is the one that’s going to give the biggest bang for buck.

Well, yeah. Wiping out half of the potential addressing capacity is a bit of a horrible thing now that 4GiB devices exist…

Aug 24, 2015 11:14pm

Jeffrey Lee (213) 6048 posts

So the way the PMP works is that a small area is mapped in to be a window into a larger area

Correct.

hence a half gigabyte RAMdisc claiming only 1MiB space (how does this affect RAMdisc’s read/write speed?)

I haven’t done any real-world performance tests yet (e.g. ROM compile), but I have done a bit of testing with RISCOSmark on a Pandaboard:

Test	Old ROM (Device memory)	New ROM (Normal, non-cacheable memory)	PMP (Normal, non-cacheable memory)
HD read – Block load 1MB file	4487%	5371%	4963%
HD write – Block save 1MB file	4267%	12369%	8568%
FS read – Byte stream file in	3124%	3152%	3156%
FS write – Byte stream file out	3467%	3604%	3546%

So it’s slower for large accesses, but not terribly so (Note also the recent performance gains from some changes I made recently to improve our support for the different VMSA memory types/attributes).

If necessary I can always make the logical window bigger, and/or switch to a better LRU cache implementation.

Sep 1, 2015 1:16pm

Jeffrey Lee (213) 6048 posts

Yesterday I checked in the first version of the code. The main headline changes are that RAMFS and the free pool are now physical memory pools. The free pool has no logical mapping, so that frees up a massive chunk of logical address space on modern systems. Meanwhile RAMFS has a 1MB logical mapping (may be tweaked in future), and the max ram disc size has been increased to 508MB (and should go higher in future once the filesystem issues are resolved).

But you can’t make an omelette without breaking eggs, so here’s a list of things that people (mainly developers) need to be aware of:

When you use OS_ReadDynamicArea, OS_DynamicArea 2, etc. to read the details of a physical memory pool DA, the size values returned correspond to the physical size. This means the size values behave the same as on previous OS versions. But because the physical size can be different to the logical size, you can’t do something like save a DA to disc by using OS_ReadDynamicArea to read its base + size. So software which allows you to save/restore the ram disc contents will be broken. And because it has no logical mapping, the free pool will return a base address of 0, and any software which tries using the free pool as temp storage will fail. This may also confuse software which tries to work out which DA something is in by looking at the address ranges (luckily recent OS versions have OS_DynamicArea 20 to perform that address → DA translation for you – although now I look at the code I see the current version won’t quite deal with sparse or PMP DAs correctly). If you need to know the actual logical size of a DA, there’s a new OS_DynamicArea 24 call which will work equally well with PMP and non-PMP DAs (Although, as with OS_DynamicArea 2, there is the caveat that sparse DAs – and now PMP DAs – might have gaps in their logical mappings. The size values returned correspond to the amount of memory mapped, not the extent of the mapped area).
Because the free pool has no logical mapping, Wimp_ClaimFreeMemory now does nothing (will always report 0 bytes available). This shouldn’t be a major issue since the call has been deprecated since RISC OS 3.5, and any well-written code should have a fallback path anyway (since it’s always been a single-user resource)
The CAM has been enlarged from 8 bytes per entry to 16 bytes per entry, and the fixed areas of the memory map which occupy the &FA000000-&FBFFFFFF range have been reorganised to make room for this. So if you’ve got some naughty code which accesses the CAM directly or has hardcoded addresses for things then it’s probably a good time to stop doing that (OS_Memory 16 can be used to read the addresses of the fixed memory map regions)
Some of the page flags have also been shuffled around, but that should only be relevant to people who were accessing the CAM directly or using OS_SetMemMapEntries to alter flags which haven’t been publicly documented.
You’re also likely to get bad results if you try using OS_SetMemMapEntries to alter a page which is a member of a PMP (especially if you’re trying to use it to move a page into or out of a PMP). That use of OS_SetMemMapEntries is probably going to be deprecated and unsupported, unless someone has a legitimate reason for doing things that way (PMP pages should be modified by the new OS_DynamicArea 22 call, and should probably only be modified if you’re the PMP owner!)
Although you specify a max physical size for a PMP when you create it, the PMP is able to change its max size after the fact. This allows the PMP to reduce the memory overheads on the system when the PMP isn’t full. E.g. without this feature an empty RAM disc would still need 1MB of RAM for every GB in the system. But it also means that the max size returned by OS_ReadDynamicArea, etc. might be a complete lie. To cope with this the task manager currently treats PMPs as having unlimited max size, in future I might add an extra flag or size value to allow the PMP to report any actual hard maximum it uses (Note that a PMP can be identified by the fact that bit 20 of its DA flags are set).
Because of the CAM changes, the current version of the softload tool won’t work on the new OS version – I should have that fixed in a day or two (when I had a look at the code last night, it looked like it can just be changed to use OS_Memory 12)

This is just a first iteration of PMP support – once the initial bugs have been fixed I’ll likely start work on the next iteration, which will focus on improving the functionality to allow it to be used by more OS systems (GraphicsV being the main one I’m interested in – there’s lots of complex use cases I have in mind). So don’t be surprised if some of the APIs end up changing (not that they’re publicly documented anywhere yet – when I get a chance I’ll probably post the current API to this thread)

Sep 2, 2015 1:19pm

Jeffrey Lee (213) 6048 posts

API details:

OS_DynamicArea 0

If bit 20 of the flags (R4) is set, this indicates the DA is a PMP.

R9 is used to specify the initial maximum physical size (as a page count). If necessary, this will be clamped to the maximum number of free pages in the system.
R5 specifies the maximum logical size (in bytes), which can be zero
The area must have the “requires specific pages” flag set, can’t be sparse, can’t be doubly-mapped and can’t be DMA auto-alloc.
The initial logical size (R2) must be zero
A handler routine must be provided

At the moment the handler routine is only called with one reason code, a new reason code of 6. This reason code is called whenever OS_ChangeDynamicArea is called on your PMP – it allows you to perform the grow/shrink operation yourself. Arguments are:

Reg
R0	6 (reason code)
R1	Requested change amount (signed page count)
R2	Dynamic area number
R12	Workspace

As with other handler routines, all registers should be preserved on exit. If returning an error, V should be set, and R0 should either be an error pointer or 0 for a generic error message to be returned.

The way the call operates with respect to OS_ChangeDynamicArea is that the OS will convert the change amount from byte to pages, call the handler routine, and then check to see how much the PMP’s physical size has changed, returning that value as the result of the OS_ChangeDynamicArea operation.

When a PMP is created it will start off with no pages assigned to it; pages must be assigned with OS_DynamicArea 21.

OS_DynamicArea 2, OS_ReadDynamicArea, etc.

As mentioned in the post above, these return the physical size of the PMP (in bytes) rather than the logical size. To avoid any signed number overflows, the size will be clamped to 2GB-4KB. A PMP can be detected by the fact that it will have bit 20 of the flags set.

OS_DynamicArea 21 (PMP PhysOp)

New reason code to claim/release physical memory pages in physical memory pool.

in:   r0 = reason code (21)
       r1 = area number
       r2 = pointer to array of (PMP page index, phys page index, PMP page flag) tuples
       r3 = number of entries

 out:  r0-r1 preserved (error if not all of region successfully updated)
       r2 advanced to first entry not processed (or end of list)
       r3 updated to number of entries not processed (or 0)

PMP pages are sequentially numbered from 0 to the current ‘max size’ of the PMP.

Physical page indices should be:

-1 to release a page
-2 to let the kernel pick a page (i.e. pick from the free pool). If a page is already mapped in to the indicated slot then it will be kept. Any flag updates will be applied.
Otherwise a specific page number to use (e.g. as returned by OS_Memory 12)

Currently the only supported page flag is bit 15, which is used to lock a page to prevent other DAs from claiming it (similar to a regular DA which requests specific pages). Other bits are ignored. Flags are also ignored when releasing pages.

The page list that’s provided is processed in order. If an error occurs then R2 and R3 will be updated to point to the first entry that has not been processed. However this may not be the entry that generated the error (it might be caused by an entry further on in the list).

Attempting to release or swap a page which is currently mapped in will fail.

OS_DynamicArea 22 (PMP LogOp)

New reason code to map/unmap pages from logical memory

 in:   r0 = reason code (22)
       r1 = area number
       r2 = pointer to array of (DA page number, PMP page index, page flags) tuples
       r3 = number of entries

 out:  r0-r1 preserved (error if not all of region successfully updated)
       r2 advanced to first entry not processed (or end of list)
       r3 updated to number of entries not processed (or 0)

DA pages are sequentially numbered from 0 to the max logical size of the PMP.

PMP page indices should be:

A valid PMP page index (corresponding to a PMP page which has a physical page allocated to it). This can be used to either map in a page or change the flags of an existing mapping.
Or -1 to unmap a page from that location

If a valid PMP page index is given, then the following page flags can be specified:

Bits 0-5, 12-14: as per memory page access flags
Bit 15: Lock page (as per PMP PhysOp)

Other bits are ignored. Currently, flags are ignored when mapping out pages – if you need to change the flags of a page when mapping it out either change the flags before mapping it out, or issue a PhysOp call afterwards.

Similar to PhysOp, the page list is processed in-order, and if an error occurs then R2 and R3 will have been updated but they may not point to the entry that’s caused the error.

Currently a PMP page can only exist in one location at once, so if you make a request to map page X to location Y then it will first be removed from its current location. As far as the sequential order of operations is performed, this all happens within the context of the entry that requests the page to be moved.

For performance reasons, when making large numbers of changes you should map out any cacheable pages first, then non-cacheable pages, then map in new pages (in any order). The kernel will scan ahead through the list to work out how many pages (and of what type) are being mapped out so that it can work out whether it should perform global or per-page cache/TLB invalidation.

OS_DynamicArea 23 (PMP resize)

New reason code to change the physical ‘max’ size of a PMP

 in:   r0 = reason code (23)
       r1 = area number
       r2 = resize amount (positive/negative page count)

 out:  r0-r1 preserved (error if not all of region successfully updated)
       r2 = amount area has changed by (unsigned page count)

This allows you to change the value of R9 that was passed to OS_DynamicArea 0 when the PMP was created.

The memory overhead for a PMP is 4 bytes per page (i.e. 4*R9). So for large PMPs which may spend most of their time empty (e.g. RAMFS) it’s recommended to dynamically adjust the physical size of the PMP to avoid wasting memory.

Note that when shrinking a PMP, the shrink will only succeed if the last N entries of the page list are unclaimed pages. If there are pages allocated then they must be released first.

OS_DynamicArea 24 (Get info on PMP/DA)

New routine that acts as an OS_DynamicArea 2 replacement

 in:   r0 = reason code (24)
       r1 = area number

 out:  r2 = current logical size of area (bytes)
       r3 = base logical address
       r4 = area flags
       r5 = maximum logical size of area (bytes)
       r6 = current physical size of area (pages)
       r7 = maximum physical size of area (pages)
       r8 -> title string

Although designed for use with PMPs, this call works with regular DAs too (just returns logical page counts for r6 & r7)

OS_DynamicArea 25 (Get PMP page mapping)

New routine to examine the state of a PMP’s pages

 in:   r0 = reason code (25)
       r1 = area number
       r2 = pointer to input/output array:
            +0: PMP page index (filled in on entry)
            +4: phys page index (filled in on exit, -1 if none)
            +8: PMP PhysOp page flags (filled in on exit, 0 if none)
            +12: DA page index (filled in on exit, -1 if not mapped)
            +16: page flags (filled in on exit, 0 if not mapped)
       r3 = number of entries

 out:  r0-r3 preserved
       Array updated with page details

This call will probably end up being revised, due to “PMP PhysOp page flags” and “page flags” being the same thing (PMP page flags are just a subset of the full page flags). Also there’s a bug where the current implementation returns 0 for the PhysOp page flags.

If you used -2 in PhysOp to let the kernel pick a page for you, then reading back the details using this call will return the page number that was allocated rather than -2.

OS_SetMemMapEntries and friends

Page flags read using these SWIs will have bit 20 set if the page is a member of a PMP, and bit 15 set if the page is locked. Altering bit 20 via these SWIs is prohibited (the kernel will force the current value of the bit to be retained), and altering bit 15 (especially clearing it) is likely to have bad consequences too. We should probably consider deprecating OS_SetMemMapEntries, or severely limiting its capabilities.

General

In the future the PMP PhysOp page flags may be extended to cover the full range of page flags. The aim of this will be to allow the CAM to store the details of mapped out pages, and as a consequence will allow us to reduce the amount of cache flushing which we perform when mapping/unmapping pages (on modern ARMs, caches are physically indexed/tagged – so you only need to flush the cache if the cacheability attributes of the underlying physical page change, rather than doing it on a logical page basis like we do now). So you might want to get into the habit of passing in the full set of flags you want to the physop calls.

This is the code review forum, so if anyone has any thoughts on the above API changes (good or bad) then feel free to share them!

Sep 4, 2015 6:48pm

Sprow (202) 1168 posts

OS_DynamicArea 2
To avoid any signed number overflows, the size will be clamped to 2GB-4KB.

Blurg. Surely any software that understands PMP’s doesn’t yet exist, so there’s no need to tippy toe around with signed numbers. Anything predating the PMP work could never have received a number bigger than 2 ³¹ since the free pool would have consumed the remaining logical address space.

I suggest unsigned, limited at 4GB-4kB.

Aug 24, 2016 1:08pm

Jeffrey Lee (213) 6048 posts

I’ve been doing a bit more PMP-related work over the past few weeks. Primarily, I’ve now rewritten the AMBControl code so that it uses PMPs under the hood. This has allowed for the removal of a few of the inefficiencies in the old implementation (use of service calls to fix things up after CAM/memory map changes, some O(N) list/array traversal, forcing areas to be non-lazy when growing/shrinking appspace, etc.), and it’s also allowed for the removal of a bunch of places where the CAM/page tables were being manually peeked/poked. The lazy task swapping code has also been tidied up a bit so that it can be switched out completely if necessary (e.g. the future ASID-based system for ARMv6+ won’t have any need for lazy task swapping)

However it’s not all good news, as profiling has revealed that the new implementation has introduced a few new inefficiencies all of its own. Part of this is down to limitations in the current PMP APIs (e.g. mapping out large numbers of pages is slow due to needing to construct a page list – so adding a “ranged map out” operation is likely to be the solution there). Part of it may also be down to the fact that the manual CAM/page table poking routines which AMBControl used were actually pretty good – none of the other page mapping code in the kernel makes use of loop unrolling, instead everything is sent through the somewhat bloated BangCamUpdate routine one page at a time. So I’ll be doing a bit more profiling around that area (e.g. performance of AMB vs. regular dynamic areas) and if the AMB routines prove to be a winner I’ll probably try adapting them so that the kernel can use them for most of its bulk page remapping operations.

Also, it’s a bit of a late reply, but:

OS_DynamicArea 2
To avoid any signed number overflows, the size will be clamped to 2GB-4KB.

Blurg. Surely any software that understands PMP’s doesn’t yet exist, so there’s no need to tippy toe around with signed numbers. Anything predating the PMP work could never have received a number bigger than 2³¹ since the free pool would have consumed the remaining logical address space.

I suggest unsigned, limited at 4GB-4kB.

If the clamp was at 4GB-4KB then any software which does use signed numbers will break as soon as it (e.g.) sees that a machine with 4GB of RAM has >2GB of that memory in the free pool. If clamping at 2GB-4KB allows that software to continue to run then we might as well use that as our clamp value – as you say, anything predating the PMP work could never have received a number bigger than that, so retroactively adding a clamp to the return values shouldn’t cause any problems for existing software on existing machines.

Pages: 1 2

Reply

To post replies, please first log in.

Forums → Code review →

Physical memory allocation

OS_DynamicArea 0

OS_DynamicArea 2, OS_ReadDynamicArea, etc.

OS_DynamicArea 21 (PMP PhysOp)

OS_DynamicArea 22 (PMP LogOp)

OS_DynamicArea 23 (PMP resize)

OS_DynamicArea 24 (Get info on PMP/DA)

OS_DynamicArea 25 (Get PMP page mapping)

OS_SetMemMapEntries and friends

General

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options