RISC OS Open: Forum: Network & USB inefficiencies

Feb 14, 2018 1:57pm

Jeffrey Lee (213) 6048 posts

After many years of hiding from it in fear, I’ve finally taken another look at ticket #324. I haven’t observed any crashes yet, but I am seeing some behaviour that indicates the system is close to crashing (e.g. excessive CPU usage), and some undesirable side-effects which are perhaps related to that (e.g. excessive packet loss).

Profiling on my BB-xM showed that when subjected to a packet flood, about 30% of the CPU time was spent in BufferManager, copying the received USB data into the DeviceFS buffer. Apart from the slow performance of the copy routine, the copy was also being performed with interrupts disabled, and seemed to be taking long enough to be breaking the age-old RISC OS 3 PRM rule of not spending more than 100us with interrupts disabled (although the overhead of the profiling may have contributed to that). Then after that the data would be copied out of the DeviceFS buffer and into mbufs (again with IRQs disabled, albeit much quicker this time due to being in cacheable memory) for processing by the Internet module (which, thankfully, is performed with IRQs enabled).

Presumably the high packet loss when the system is running at a lower clock speed is a symptom of the interrupt-driven USB → DeviceFS copy routines taking far too much time compared to the callback-driven Internet code which is processing the packets (and very little time, if any, being left for foreground programs to run and actually process those packets)

This suggests we have the following areas for improvement:

Make BufferManager’s copy routines faster (i.e. resurrect the better memcpy() project – I think that stalled because I was struggling to fit a strcpy / strcmp / str-something test into my framework, so if we ignore everything except memcpy/memmove then it should be possible to make some useful progress)
Rewrite BufferManager so that the buffer insert/remove routines perform the copy operation with IRQs enabled. Potentially rewrite the system to be threading-friendly as well, so that you can have multiple concurrent read and write requests to the same buffer. However the difficulty with this is dealing with requests completing out-of-order (which could happen both due to re-entrancy and due to threading). Instead of using a simple circular buffer system containing a “used” part and a “free” part, BufferManager will have to keep track of multiple “used” and “free” parts, along with “locked” parts which are in the process of being filled / emptied.
- If this is going to result in a significant change to the way BufferManager handles buffers, maybe now would be a good time to introduce a zero-copy option, especially if there’s a way to unify it with mbufs so data can be zero-copied between the two
EHCIDriver (and almost certainly the other USB drivers) will need modifying as well, to make sure that they only perform buffer insert/removal with IRQs enabled
- We could also consider allowing EHCIDriver (and other drivers) to tell the hardware to read/write from the DeviceFS buffers directly – there is/was a #define related to this in the source code, but I’m not sure of the history of why it’s not in use (plus there are hardware bugs to worry about – so maybe the inverse approach, of allowing DeviceFS/BufferManager to use USB-allocated buffers, would a safer place to start)

I’m yet to do any profiling on my Iyonix (hopefully tonight) – it’ll be interesting to see what the bottleneck is there, since there’s no USB/BufferManager involved. Perhaps it’ll be a similar problem, i.e. EtherK might be copying data into mbufs from within its IRQ handler.

Feb 14, 2018 2:26pm

Rick Murray (539) 13850 posts

Aside:

A few lines down in the source linked, it says “this is meant to be called from a kernel debugger”.
Does such a thing exist? A kernel level debugger might aid the development of modules…

Feb 14, 2018 2:30pm

Jeffrey Lee (213) 6048 posts

I suspect that comment hails from BSD-land, where good debugging tools grow on trees.

Feb 14, 2018 3:57pm

Steve Pampling (1551) 8172 posts

After many years of hiding from it in fear, I’ve finally taken another look at ticket #324. I haven’t observed any crashes yet, but I am seeing some behaviour that indicates the system is close to crashing (e.g. excessive CPU usage), and some undesirable side-effects which are perhaps related to that (e.g. excessive packet loss).

The note by Sprow suggests that the problem is buffer related – pinging with a packet size bigger than the MTU will involve packet split and re-assembly at/in the destination and the re-assembly is either failing or causing the buffer to fill while the packet re-assembly delay is active.

Interesting from my viewpoint as I regularly test for underlying problems by increasing the packet size for the ping test¹ where duplex mis-match issues and other bandwidth affecting problems tend to show up. In the declared scenario I wonder what happens if you set -f on the ping sent from an interface with a larger MTU. If it’s a buffer issue the problem shouldn’t arise as the interface won’t (well shouldn’t) accept the larger packet.

¹ on a PC ping 8.8.8.8 -l 2048 gives a packet size of 2048 bytes for those interested.

Feb 14, 2018 4:04pm

Jeffrey Lee (213) 6048 posts

I’m fairly certain my original testing was done with packets which were smaller than the MTU. In any case, my testing over the past few days hasn’t tried going above the MTU.

Feb 14, 2018 7:22pm

Steve Pampling (1551) 8172 posts

I’m fairly certain my original testing was done with packets which were smaller than the MTU.

Regular use case but hardly likely to trigger errors
When testing always take things beyond normal limits if at all possible. Things that behave nicely when stressed tend to work even better when they aren’t. Standard production test rigs push the various design parameter limits for a good reason.

Feb 14, 2018 9:40pm

Jeffrey Lee (213) 6048 posts

The Iyonix profile results are a lot harder to read – lots of jumping around between MManager, EtherK and Internet.

After hastily adding a histogram option to profanal, it looks like 43% of the time is spent in MManager, 29% in Internet, 14% in EtherK, and 5% in the SCL – but it’s hard to say exactly which functions are taking up all the time since I don’t have a convenient way of getting the addresses of C ‘static’ functions. And of course MManager is closed source (but if the BB results were anything to go by, I can guess that most of the time in there will be being spent copying from the NIC receive buffer)

Feb 14, 2018 9:55pm

Rick Murray (539) 13850 posts

Things that behave nicely when stressed tend to work even better when they aren’t.

Management?

Hmm, on second thoughts, naaaaaahhh….

Feb 14, 2018 10:01pm

Rick Murray (539) 13850 posts

Seriously, though, Steve is right. If you design something to be capable of “x”, you shouldn’t really be trying it with values less than “x”. Preferably more, but that depends what “x” is – there’s a big difference between packet sizes and volts…

Is there no way of getting hold of the MManager source code? It’s a little worrying that nearly half the time is spent within. Heck, for all we know it could be doing a dumb single register LDRB-STRB copy for everything…?

Feb 15, 2018 8:29am

Steve Pampling (1551) 8172 posts

Preferably more, but that depends what “x” is – there’s a big difference between packet sizes and volts…

Typical manufacturer testing of items like telephone switching systems¹ will usually involve running the tested item at PSU output levels a percentage above the rated value. The hope is to test the limits of any particular batch.
Military spec stuff has every item tested rather than batch samples, but again at temperature and power extremes.

¹ Personal experience, years ago.

Feb 15, 2018 9:04am

Colin (478) 2433 posts

As I see it here are 2 problems

1) High CPU usage due to buffer bouncing in EtherUSB
2) mbuf exhaustion when receiving packets – affects all Ethernet Interfaces

For me the best solution for 1 is to use the NetBSD interface directly. I have submitted to ROOL a demonstration on how to do it (I’ve removed the keyboard and mouse – which use the NetBSD interface – into separate modules without any changes to the NetBSD code) for them to evaluate.

2 is a bit trickier. I’ll ignore the fact that GB Ethernet can supply data quicker than USB or some Ethernet Phys can handle (if you are using a GB Adapter/phy).

The incoming packets are read from the device’s DMA buffer and converted into mbufs. This happens with interrupts disabled. Large transfers continually create mbufs until they are exhausted. Mbufs can never be consumed while more data is arriving because Mbufs are consumed/released in callbacks (I think). So after mbufs are exhausted packets are dropped.

If you delay the reading of the device by, for example, reading the device in a call back – effectively using the interrupt to trigger a new callback if there isn’t one already in progress – the delay from interrupt to callback can be .25 sec. This may result in the device missing packets as it isn’t being emptied quick enough. Once a device misses a packet a transfer can be subject to long delays waiting for missed replies – which I think is what happens in ShareFS. Unfortunately a lack of PMT means waiting doesn’t multitask.

Feb 15, 2018 9:19am

Colin (478) 2433 posts

Buffer insertion is done in USBDriver so there is no need to modify each backend.

Feb 15, 2018 10:01am

Jeffrey Lee (213) 6048 posts

Yes, using the BSD interface directly would certainly help. Hopefully your changes get accepted!

Feb 15, 2018 8:18pm

Steve Pampling (1551) 8172 posts

Unfortunately a lack of PMT means waiting doesn’t multitask.

And there people were wondering what on earth RO could usefully do with an unused core on a multicore system.

Note that on PeeCee systems there is typically a smallish processor in the NIC that you can offload stuff to in order to achieve better throughput. Not sure whether the on board NIC of the Iyonix has that but the equivalent chipset on a PCI card does.

Feb 16, 2018 12:26pm

David Feugey (2125) 2709 posts

Note that on PeeCee systems there is typically a smallish processor

Yep. And on Intel cards it was probably sometimes an Xscale :)

Feb 16, 2018 12:45pm

Jeffrey Lee (213) 6048 posts

The irony being that the Iyonix uses an XScale and currently can’t deal with 100Mb/s of traffic, let alone 1Gb/s? ;-)

Feb 16, 2018 2:30pm

Steffen Huber (91) 1953 posts

Can someone remind me why MBufManager is such a “magic piece” of code that no one ever dared to replace with something better (and where source would be available)?

Feb 16, 2018 3:09pm

Colin (478) 2433 posts

MBufManager isn’t a problem it’s just a means to buffer incoming packets which is optimised for the internet module – though I do think it may be better if rx and tx had their own mbuf pool. It’s a lot easier to move packets around and to add headers to packets with MBufManager than using a buffer like usb has for example. The problem is the buffer size isn’t infinite.

There are 2 situations

1) packets come in small bursts smaller than the total remaining mbufs or mbuf creation is quicker than arriving packets.

2) packets arrive in large chunks greater than remaining mbufs.

for 1 packets are converted into MBufs and linked together – so you get a list of mbufs – until the ethernet device has no more packets to convert at which point the interrupt returns and the MBufs get a chance to be consumed/released.

for 2 the interrupt doesn’t return until all mbufs are used. No replies can be made until a received mbuf is consumed and released but even then any reply may not get the chance as you may still be receiving interrupts which may consume the released MBuf.

You need to

1 switch the network interrupt off after the first packet arrives
2 process a few mbufs
3 multitask here to allow mbufs to be used
4 goto 2 until there are no more received packets
5 switch on the network interrupt
6 return from the interrupt

Feb 16, 2018 3:28pm

Jeffrey Lee (213) 6048 posts

MBufManager isn’t a problem it’s just a means to buffer incoming packets which is optimised for the internet module – though I do think it may be better if rx and tx had their own mbuf pool

Plus if there was a zero-copy option.

Feb 16, 2018 3:41pm

Colin (478) 2433 posts

Isn’t that what the ‘unsafe’ mbuf is for – not too sure the details are sketchy.

Feb 16, 2018 3:58pm

Colin (478) 2433 posts

Regarding zero copy options. Does non cached memory come from the same pool as cached memory? I’m just wondering if it should be used sparingly.

Take audio for example. If you read a file from a callback created from a 2cs callevery event you need to queue about .25sec audio – callbacks can take a while to happen – which would take a large chunk of uncached memory at high resolution. Is uncached memory scarce? should I minimise it’s use.

Feb 16, 2018 5:30pm

Jeffrey Lee (213) 6048 posts

Isn’t that what the ‘unsafe’ mbuf is for – not too sure the details are sketchy.

I don’t believe so – or at least, I’m fairly certain I’ve seen a comment somewhere in the OS sources saying that zero-copy mbufs are still on the wishlist.

Regarding zero copy options. Does non cached memory come from the same pool as cached memory?

Fundamentally, yes, cached & non-cached memory come from the same pool. Pages which are in the free pool DA can be allocated to either cached or non-cached DAs. So apart from some known issues with how physically contiguous memory is handled, there aren’t any limits on non-cached memory usage beyond those which would also apply for cached memory.

The main reason you’d want to avoid making heavy use of non-cached memory is because it’s very slow for CPU access, especially read operations (writes can usually be buffered to an acceptable degree). DMAManager can work around that by allowing you to use cacheable memory directly (with DMAManager doing the appropriate cache/TLB management to make this usage pattern safe). But not everything makes use of DMAManager, partly because not every DMA controller is suited to the way DMAManager does things. And more often than not, the stuff that doesn’t use DMAManager simply uses regular uncacheable memory allocated via PCI_RAMAlloc.

Network & USB inefficiencies

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Feb 14, 2018 1:57pm Jeffrey Lee (213) 6048 posts	After many years of hiding from it in fear, I’ve finally taken another look at ticket #324. I haven’t observed any crashes yet, but I am seeing some behaviour that indicates the system is close to crashing (e.g. excessive CPU usage), and some undesirable side-effects which are perhaps related to that (e.g. excessive packet loss). Profiling on my BB-xM showed that when subjected to a packet flood, about 30% of the CPU time was spent in BufferManager, copying the received USB data into the DeviceFS buffer. Apart from the slow performance of the copy routine, the copy was also being performed with interrupts disabled, and seemed to be taking long enough to be breaking the age-old RISC OS 3 PRM rule of not spending more than 100us with interrupts disabled (although the overhead of the profiling may have contributed to that). Then after that the data would be copied out of the DeviceFS buffer and into mbufs (again with IRQs disabled, albeit much quicker this time due to being in cacheable memory) for processing by the Internet module (which, thankfully, is performed with IRQs enabled). Presumably the high packet loss when the system is running at a lower clock speed is a symptom of the interrupt-driven USB → DeviceFS copy routines taking far too much time compared to the callback-driven Internet code which is processing the packets (and very little time, if any, being left for foreground programs to run and actually process those packets) This suggests we have the following areas for improvement: Make BufferManager’s copy routines faster (i.e. resurrect the better memcpy() project – I think that stalled because I was struggling to fit a strcpy / strcmp / str-something test into my framework, so if we ignore everything except memcpy/memmove then it should be possible to make some useful progress) Rewrite BufferManager so that the buffer insert/remove routines perform the copy operation with IRQs enabled. Potentially rewrite the system to be threading-friendly as well, so that you can have multiple concurrent read and write requests to the same buffer. However the difficulty with this is dealing with requests completing out-of-order (which could happen both due to re-entrancy and due to threading). Instead of using a simple circular buffer system containing a “used” part and a “free” part, BufferManager will have to keep track of multiple “used” and “free” parts, along with “locked” parts which are in the process of being filled / emptied. If this is going to result in a significant change to the way BufferManager handles buffers, maybe now would be a good time to introduce a zero-copy option, especially if there’s a way to unify it with mbufs so data can be zero-copied between the two EHCIDriver (and almost certainly the other USB drivers) will need modifying as well, to make sure that they only perform buffer insert/removal with IRQs enabled We could also consider allowing EHCIDriver (and other drivers) to tell the hardware to read/write from the DeviceFS buffers directly – there is/was a #define related to this in the source code, but I’m not sure of the history of why it’s not in use (plus there are hardware bugs to worry about – so maybe the inverse approach, of allowing DeviceFS/BufferManager to use USB-allocated buffers, would a safer place to start) I’m yet to do any profiling on my Iyonix (hopefully tonight) – it’ll be interesting to see what the bottleneck is there, since there’s no USB/BufferManager involved. Perhaps it’ll be a similar problem, i.e. EtherK might be copying data into mbufs from within its IRQ handler.

Feb 14, 2018 2:26pm Rick Murray (539) 13850 posts	Aside: A few lines down in the source linked, it says “this is meant to be called from a kernel debugger”. Does such a thing exist? A kernel level debugger might aid the development of modules…

Feb 14, 2018 2:30pm Jeffrey Lee (213) 6048 posts	I suspect that comment hails from BSD-land, where good debugging tools grow on trees.

Feb 14, 2018 3:57pm Steve Pampling (1551) 8172 posts	After many years of hiding from it in fear, I’ve finally taken another look at ticket #324. I haven’t observed any crashes yet, but I am seeing some behaviour that indicates the system is close to crashing (e.g. excessive CPU usage), and some undesirable side-effects which are perhaps related to that (e.g. excessive packet loss). The note by Sprow suggests that the problem is buffer related – pinging with a packet size bigger than the MTU will involve packet split and re-assembly at/in the destination and the re-assembly is either failing or causing the buffer to fill while the packet re-assembly delay is active. Interesting from my viewpoint as I regularly test for underlying problems by increasing the packet size for the ping test¹ where duplex mis-match issues and other bandwidth affecting problems tend to show up. In the declared scenario I wonder what happens if you set -f on the ping sent from an interface with a larger MTU. If it’s a buffer issue the problem shouldn’t arise as the interface won’t (well shouldn’t) accept the larger packet. ¹ on a PC ping 8.8.8.8 -l 2048 gives a packet size of 2048 bytes for those interested.

Feb 14, 2018 4:04pm Jeffrey Lee (213) 6048 posts	I’m fairly certain my original testing was done with packets which were smaller than the MTU. In any case, my testing over the past few days hasn’t tried going above the MTU.

Feb 14, 2018 7:22pm Steve Pampling (1551) 8172 posts	I’m fairly certain my original testing was done with packets which were smaller than the MTU. Regular use case but hardly likely to trigger errors When testing always take things beyond normal limits if at all possible. Things that behave nicely when stressed tend to work even better when they aren’t. Standard production test rigs push the various design parameter limits for a good reason.

Feb 14, 2018 9:40pm Jeffrey Lee (213) 6048 posts	The Iyonix profile results are a lot harder to read – lots of jumping around between MManager, EtherK and Internet. After hastily adding a histogram option to profanal, it looks like 43% of the time is spent in MManager, 29% in Internet, 14% in EtherK, and 5% in the SCL – but it’s hard to say exactly which functions are taking up all the time since I don’t have a convenient way of getting the addresses of C ‘static’ functions. And of course MManager is closed source (but if the BB results were anything to go by, I can guess that most of the time in there will be being spent copying from the NIC receive buffer)

Feb 14, 2018 9:55pm Rick Murray (539) 13850 posts	Things that behave nicely when stressed tend to work even better when they aren’t. Management? Hmm, on second thoughts, naaaaaahhh….

Feb 14, 2018 10:01pm Rick Murray (539) 13850 posts	Seriously, though, Steve is right. If you design something to be capable of “x”, you shouldn’t really be trying it with values less than “x”. Preferably more, but that depends what “x” is – there’s a big difference between packet sizes and volts… Is there no way of getting hold of the MManager source code? It’s a little worrying that nearly half the time is spent within. Heck, for all we know it could be doing a dumb single register LDRB-STRB copy for everything…?

Feb 15, 2018 8:29am Steve Pampling (1551) 8172 posts	Preferably more, but that depends what “x” is – there’s a big difference between packet sizes and volts… Typical manufacturer testing of items like telephone switching systems¹ will usually involve running the tested item at PSU output levels a percentage above the rated value. The hope is to test the limits of any particular batch. Military spec stuff has every item tested rather than batch samples, but again at temperature and power extremes. ¹ Personal experience, years ago.

Feb 15, 2018 9:04am Colin (478) 2433 posts	As I see it here are 2 problems 1) High CPU usage due to buffer bouncing in EtherUSB 2) mbuf exhaustion when receiving packets – affects all Ethernet Interfaces For me the best solution for 1 is to use the NetBSD interface directly. I have submitted to ROOL a demonstration on how to do it (I’ve removed the keyboard and mouse – which use the NetBSD interface – into separate modules without any changes to the NetBSD code) for them to evaluate. 2 is a bit trickier. I’ll ignore the fact that GB Ethernet can supply data quicker than USB or some Ethernet Phys can handle (if you are using a GB Adapter/phy). The incoming packets are read from the device’s DMA buffer and converted into mbufs. This happens with interrupts disabled. Large transfers continually create mbufs until they are exhausted. Mbufs can never be consumed while more data is arriving because Mbufs are consumed/released in callbacks (I think). So after mbufs are exhausted packets are dropped. If you delay the reading of the device by, for example, reading the device in a call back – effectively using the interrupt to trigger a new callback if there isn’t one already in progress – the delay from interrupt to callback can be .25 sec. This may result in the device missing packets as it isn’t being emptied quick enough. Once a device misses a packet a transfer can be subject to long delays waiting for missed replies – which I think is what happens in ShareFS. Unfortunately a lack of PMT means waiting doesn’t multitask.

Feb 15, 2018 9:19am Colin (478) 2433 posts	Buffer insertion is done in USBDriver so there is no need to modify each backend.

Feb 15, 2018 10:01am Jeffrey Lee (213) 6048 posts	Yes, using the BSD interface directly would certainly help. Hopefully your changes get accepted!

Feb 15, 2018 8:18pm Steve Pampling (1551) 8172 posts	Unfortunately a lack of PMT means waiting doesn’t multitask. And there people were wondering what on earth RO could usefully do with an unused core on a multicore system. Note that on PeeCee systems there is typically a smallish processor in the NIC that you can offload stuff to in order to achieve better throughput. Not sure whether the on board NIC of the Iyonix has that but the equivalent chipset on a PCI card does.

Feb 16, 2018 12:26pm David Feugey (2125) 2709 posts	Note that on PeeCee systems there is typically a smallish processor Yep. And on Intel cards it was probably sometimes an Xscale :)

Feb 16, 2018 12:45pm Jeffrey Lee (213) 6048 posts	The irony being that the Iyonix uses an XScale and currently can’t deal with 100Mb/s of traffic, let alone 1Gb/s? ;-)

Feb 16, 2018 2:30pm Steffen Huber (91) 1953 posts	Can someone remind me why MBufManager is such a “magic piece” of code that no one ever dared to replace with something better (and where source would be available)?

Feb 16, 2018 3:09pm Colin (478) 2433 posts	MBufManager isn’t a problem it’s just a means to buffer incoming packets which is optimised for the internet module – though I do think it may be better if rx and tx had their own mbuf pool. It’s a lot easier to move packets around and to add headers to packets with MBufManager than using a buffer like usb has for example. The problem is the buffer size isn’t infinite. There are 2 situations 1) packets come in small bursts smaller than the total remaining mbufs or mbuf creation is quicker than arriving packets. 2) packets arrive in large chunks greater than remaining mbufs. for 1 packets are converted into MBufs and linked together – so you get a list of mbufs – until the ethernet device has no more packets to convert at which point the interrupt returns and the MBufs get a chance to be consumed/released. for 2 the interrupt doesn’t return until all mbufs are used. No replies can be made until a received mbuf is consumed and released but even then any reply may not get the chance as you may still be receiving interrupts which may consume the released MBuf. You need to 1 switch the network interrupt off after the first packet arrives 2 process a few mbufs 3 multitask here to allow mbufs to be used 4 goto 2 until there are no more received packets 5 switch on the network interrupt 6 return from the interrupt

Feb 16, 2018 3:28pm Jeffrey Lee (213) 6048 posts	MBufManager isn’t a problem it’s just a means to buffer incoming packets which is optimised for the internet module – though I do think it may be better if rx and tx had their own mbuf pool Plus if there was a zero-copy option.

Feb 16, 2018 3:41pm Colin (478) 2433 posts	Isn’t that what the ‘unsafe’ mbuf is for – not too sure the details are sketchy.

Feb 16, 2018 3:58pm Colin (478) 2433 posts	Regarding zero copy options. Does non cached memory come from the same pool as cached memory? I’m just wondering if it should be used sparingly. Take audio for example. If you read a file from a callback created from a 2cs callevery event you need to queue about .25sec audio – callbacks can take a while to happen – which would take a large chunk of uncached memory at high resolution. Is uncached memory scarce? should I minimise it’s use.

Feb 16, 2018 5:30pm Jeffrey Lee (213) 6048 posts	Isn’t that what the ‘unsafe’ mbuf is for – not too sure the details are sketchy. I don’t believe so – or at least, I’m fairly certain I’ve seen a comment somewhere in the OS sources saying that zero-copy mbufs are still on the wishlist. Regarding zero copy options. Does non cached memory come from the same pool as cached memory? Fundamentally, yes, cached & non-cached memory come from the same pool. Pages which are in the free pool DA can be allocated to either cached or non-cached DAs. So apart from some known issues with how physically contiguous memory is handled, there aren’t any limits on non-cached memory usage beyond those which would also apply for cached memory. The main reason you’d want to avoid making heavy use of non-cached memory is because it’s very slow for CPU access, especially read operations (writes can usually be buffered to an acceptable degree). DMAManager can work around that by allowing you to use cacheable memory directly (with DMAManager doing the appropriate cache/TLB management to make this usage pattern safe). But not everything makes use of DMAManager, partly because not every DMA controller is suited to the way DMAManager does things. And more often than not, the stuff that doesn’t use DMAManager simply uses regular uncacheable memory allocated via PCI_RAMAlloc.