SMP-friendly DMA
Jeffrey Lee (213) 6048 posts |
For the past few days I’ve been about how best to handle the cache maintenance requirements of DMA to cacheable locations – because on modern CPUs, and particularly with SMP, making arbitrary pages non-cacheable is a big no-no. I think I’ve arrived at a solution that I’m happy with, so the question is, is there anything I’ve missed? OS_Memory 19In:
Out:
The idea is that this is going to act as a replacement to OS_Memory 0 (for the task making pages uncacheable for DMA). But rather than require the caller to use a specific data structure for input/output, function pointers are used to allow the kernel to read the input list and produce the output list. One of the reasons behind this choice is that the output list may end up being a different length to the input list – for a DMA write to RAM, any areas which aren’t cache line aligned will need breaking into two or more parts (bounce buffer at start, RAM in middle, bounce buffer at end). So to avoid the kernel allocating memory which the caller may or may not have use for, this API cuts out the middle man by allowing the caller to directly control the output format himself (e.g. the caller will typically want the output in a format which the DMA controller can use). Input functionIn:
Out:
Or, R0 = error to abort the operation The input function will be called multiple times, until it either returns an error or with R1 equal to zero. It’s expected that the memory regions returned will be in the same sequence as will be used for the DMA transfer, although this isn’t strictly necessary. Output functionIn:
Out:
The kernel will call the output function for each contiguous block of physical memory. The address ranges will be in the same sequence as returned by the input function, but they may have been split (e.g. to cope with discontiguous physical pages or bounce buffer usage) or joined (e.g. if two successive input blocks happen to be physically contiguous). If bit 0 of R3 is set, then it indicates that the address range described by the region is not suitable for DMA and a bounce buffer must be used. Typically this will only be the case for DMA writes to partial cache lines. But it in the future it could be extended to allow for other situations, e.g. memory which the kernel/HAL knows can’t be used for DMA. Cache maintenance implementationWhen the SWI is called, the kernel will perform the appropriate cache maintenance for the different memory regions. Depending on the flags in R0, there are essentially four variants of the routine:
Technically, the start of a DMA write only needs to invalidate the cache, since there’s no need to write back any dirty cache lines. But if the DMA is cancelled before it starts, or if the DMA terminates early, then this has the potential to destroy the old buffer contents. So for safety the cache is flushed instead. This behaviour also allows for read-write type DMA. DMA to overlapping regionsIf multiple DMA operations target the same area of memory, then the most sensible way of dealing with it would be to act as if each cache line has its own read-write lock that allows multiple clients to hold a read lock but only one client to hold a write lock (and no read locks while a write lock is held). However, it is effectively a programming error if this situation is encountered – the initiator of the DMA or the owner of the memory should be the one to make sure any concurrent DMA accesses are safe, the same way that it should make sure CPU access to DMA regions is safe. So the initial implementation isn’t expected to perform any memory locking, but the option of adding it is there if in the future we find that it’s necessary. Potential tweaks/improvements
I believe DMAManager, SATADriver and AHCIDriver are the only components which make use of OS_Memory 0 for controlling the cacheability of pages, so they’ll all want to be updated to use the new call. Also, as with OS_Memory 0, it’s going to be the caller’s responsibility to listen out for Service_PagesUnsafe. Does this sound OK to everyone? Anything that I might have missed? |
Clive Semmens (2335) 3276 posts |
I used to bend my brain around questions like these, but that was ten years ago. I don’t think anything I’d contribute now would be worth the rent on the pixels it was printed on. Sorry 8~( |
Jeffrey Lee (213) 6048 posts |
The design I went with is pretty similar to the above. API-wise, the differences are that the input and output functions are both allowed to use R9 as a way of keeping track of their state. The input function is also capable of returning the “use bounce buffer” flag – which is useful for ensuring that SATA transfers are halfword aligned. One thing I didn’t foresee though, is that using callback functions, deferred allocation of bounce buffers, etc. makes Service_PagesUnsafe a lot harder to deal with. Because the callback functions might trigger Service_PagesUnsafe, the kernel can’t cache any logical → physical translation across calls to the input/output functions (or it must be capable of invalidating the cached translation if Service_PagesUnsafe is triggered). Likewise, if the input/output functions perform memory allocation in a manner which might trigger Service_PagesUnsafe, they must also be capable of either updating or discarding the results that have been received so far. Updating the results (either during the DMA op, or during the OS_Memory call) is also going to be tricky, because the physical contiguity of regions may have changed – so any scatter list which is being constructed may need to grow, requiring more memory allocations, triggering more Service_PagesUnsafe, etc. So the easiest/best option is likely to be to throw everything away and call OS_Memory again. |