Slow Pandaboard Transfer Speeds
Pages: 1 2
Chris C. (2322) 197 posts |
Hi All, Trying to troubleshoot my PandaBoard ES transfer speeds. I max out at about 900Kb/s on my 50Mb connection. I went through troubleshooting the PandaBoard ES this morning, made a fresh firmware card with the 5.24 ROM. I’ve got a 1G switch connected to my Cable Modem, the router shows a 100M connection from the PandaBoard ES, and the PandaBoard ES confirms the same. Any ideas or tweaks I can do to speed up transfers or is that speed normal? I’ve tried transferring a 100MB file from https://speedtest.tele2.net/100MB.zip and it takes roughly 20 minutes. Cables are about 2 years old and so is the router. Some stats: *help etherusb *ejinfo ej0: SMSC95xx, USB bus 1, device 7, Devices:$.USB7, up Interface driver : ej Standard clients: Type 8035 (AddrLvl 01, ErrLvl 00) handler=(fc33487c/30003504) |
Timothy Baldwin (184) 242 posts |
The RISC OS implementation of TCP is slow. What client software are you using? Netsurf is/was slow. What is the round-trip time for this connection? You can measure that using ping. |
Chris C. (2322) 197 posts |
I used wget, FTPc after I read about NetSurf being so slow. |
Chris C. (2322) 197 posts |
*ping speedtest.tele2.net speedtest.tele2.net ping statistics 7 packets transmitted, 6 packets received, 14% packet loss |
Martin Avison (27) 1498 posts |
I would not have thought speed would have much effect on a ping. speedtest.tele2.net ping statistics 101 packets transmitted, 96 packets received, 4% packet loss |
Chris C. (2322) 197 posts |
OK, I found and tried a speed test closer to me (California) and got a much better result. 100mb download test at about ~900KB/s now that’s what I am talking about. The other tests results must be poor due to their distance from me. |
Rick Murray (539) 13862 posts |
Not necessarily distance. |
Rick Murray (539) 13862 posts |
Just remembered, I have Ookla Speedtest on my phone. Unfortunately it shows a list of “nearby” servers (under 500km away), it’d have been interesting to have tried other countries (like SKorea, US). Anyway… The results are backwards (most recent at top). So the first test that I ran was to Cowes, Isle of Wight. It managed a pretty decent speed. Technically the best of the lot.
|
Chris C. (2322) 197 posts |
I think I’m set. Just want to make sure everything is setup correctly on my end. It was puzzling why I was topping out at about. The fun part was that I had a 5.22 ROM on my PandaBoard ES SD card but somehow I was still booting and showing 5.24 until I changed the SD card for a fresh one. Got that all sorted now. Phew. |
Chris C. (2322) 197 posts |
Kind of curious to what you see on your end. wget http://speedtest-ca.turnkeyinternet.net/100mb.bin *ping speedtest-ca.turnkeyinternet.net speedtest-ca.turnkeyinternet.net ping statistics 5 packets transmitted, 5 packets received, 0% packet loss |
David J. Ruck (33) 1637 posts |
ChrisC: The RISC OS network stack is certainly slow, we can run Linux on Raspberry Pi’s and ARMx6 devices and compare directly with RISC OS on the same hardware. But it’s not that slow, on my Mini.M RISC OS can manage up to about 200 Mbit/s upload and download to other local gigabit devices, so clearly it can more than saturate a 100MBit Ethernet. However, you wont get anywhere close to measuring the speed your internet connection using a RISC OS browser, they are all just too slow. Most of the speed measurement pages rely on a lot of javascript, and javascript engines are slow on RISC OS. Even just doing a simple file download with Netsurf only does a few hundred KBytes per second on a connection which all the other machines can get 9.1MBytes/s from. Your best bet is using a file transfer protocol rather than a browser, such as FTP, but check from another machine as well as some ISPs throttle FTP heavily. If your router supports a VPN to a remote site where you can download files with Lanman, that will also show better results. |
Timothy Baldwin (184) 242 posts |
But that is a local network with presumably sub-millisecond round trip time, the route that Chris was testing had 160 milliseconds of round trip time. The bandwidth that RISC OS can achieve is inversely proportional to the round trip time. TCP flow control works by the receiver informing the sender how many more bytes it may send. RISC OS will give the sender permission to send just 17376 bytes, and the sender will not send more than that until it receives an acknowledgement. It might go as follows:
That results in 17376 * 5 = 86880 bytes per second. You can test it locally using an artificial delay such as NetEm in Linux, as I discuss here and here. Also see the Wikipedia article on Bandwidth-delay product. |
Chris C. (2322) 197 posts |
I saw your article Timothy. Cool stuff. I wanted to try some transfers with UDP just for fun, just to see the speeds. For now, I’m satisfied with the results I got. I’ll take a look at that article. |
David J. Ruck (33) 1637 posts |
The real question is why is RISC OS so much worse at on higher latency connections than other OSes? My point was locally it can just about exceed 100Mb Ethernet, but over a 72Mb/s broadband it can be 50x slower than the same device running Linux. |
Colin (478) 2433 posts |
My guess is lack of PMT. Without PMT when sending the ethernet driver has no way of giving control back to the desktop so when the device is overrun with packets from riscos it has to drop them and rely on retries which slows things down. There should have been a system to inform the socket module that a packet wasn’t consumed so that it could issue an EWOULDBLOCK and it looks like the driver has that feature – but it doesn’t work. When receiving you can only read from the device during the interrupt until mbufs are exhausted then the device starts dropping packets so the other end has to retry – slowing things down. The mbufs are processed in a callback (which makes using sockets from a module problematic). Then you have to wait for the app to be paged in to consume the mbufs freeing them for receiving more data. I’ve never programmed PMT devices but envisage a system where interrupts and processes have their own thread making the whole system much simpler – though my model of PMT may be utopian. DMA would probably help somewhere in the chain but it’s beyond my paygrade to figure out where. I sometimes see mbufs cited as being a problem I don’t think so they are just the socket module’s solution to buffering interrupt data back to user mode which all drivers face. Thats how I see things anyway. |
Rick Murray (539) 13862 posts |
My guess is a lack of competent buffering. Remember, it’s not just networking that is slow. The filesystem is too (one of the main reasons I’ve not built myself a newer ROM, it takes forever to delete/copy the thousands of files). What worked with 10baseT and PIO harddiscs doesn’t really scale to modern networking and storage technologies. |
Rick Murray (539) 13862 posts |
Perhaps simpler and easier for the programmer, but within the system the processor is only able to do one thing at a time (we’ll gloss over multiple cores and hyperthreading for now), storage devices only handle one request at a time, etc etc. But, all that said, Ethernet ought to be capable of much better buffering and the apps able to understand that data might come in in 256K chunks. That way, nothing needs to be dropped unless things get really badly held up. |
Colin (478) 2433 posts |
Yes but with threads you have flow control riscos has no flow control. When sending buffers have little impact. When an app is paged in and sent something it is essentially a single tasking machine and unlike receiving where you have to get data from an interrupt context back to the app, sending should have a direct link with the device and ‘should’ be as fast as you can get. However the driver is written with the riscos multitasking environment in mind and does not block if it can’t put data on the device and I think that is the main bottleneck. If the send was in a thread the thread could block until the device was free so retries were avoided. One problem I see with mbufs is that send and receive share an mbuf pool so you could get the situation where receiving exhausts mbufs and replies can’t be sent as there are no mbufs to carry the data to the device. |
Jeffrey Lee (213) 6048 posts |
I was hoping I’d have something more useful to contribute, but here it is anyway:
A FIQProf profile of a (single-tasking?) file upload/download over a high-latency link might allow you to quickly identify where all the busy-wait loops are and what the triggers are for leaving those loops. FIQProf made it trivial to identify that USB mass storage was slow because SCSISoftUSB was issuing one USB transfer per TickerV tick So basically my contribution is “stop speculating and profile the code, you big dummies!” ;-) |
Colin (478) 2433 posts |
I wish I didn’t have a goldfish memory then I could save repeating my drivel and the internet would have a little less to clog it up. |
Colin (478) 2433 posts |
It would be handy if you could limit the size of the buffer used by fileraction other than the next slot for transferring files. If I transfer a 1GB file with LanManFS from my armx6 to my pi4 using raspian with a usb3 HDD it takes 182secs with the 4MB next slot I normally use and only 58secs with a 640k next slot. You can see the transfer pausing with the 4M next slot. |
David J. Ruck (33) 1637 posts |
I’ve got a 16MB slot on my machines so I can do GCC stuff in TaskWindows without messing around, and most of the copying is done with !DirSync which uses a buffer the size of the next slot. I assume the reason why a smaller buffer is faster, is that is the only situation RISC OS can do some degree of overlapped I/O, i.e. read up to the socket buffer size of data while the data is being written to disc, or writing out a TCP buffer while the disc is being read. The use big buffers thing probably dates back to copying files between floppy discs with a single drive, but as RISC OS has never supported overlapped disc I/O (i.e. reading from one file while writing to another) which is affective with small buffers, we’ve never changed the disc tools. Now those disc tools are being used with remote filing systems, they are working non-optimally. |
Rick Murray (539) 13862 posts |
Every day, YouTube serves a billion hours of video to users (and about 70% of them being mobile devices). And this is only YouTube. There’s also Facebook, Instagram, blah blah, not to mention all sorts of foreign services like Dailymotion and Youku. That entire message of yours is smaller than Google’s tracking cookie. Just putting your comment into context. ;-) |
Colin (478) 2433 posts |
I’ve been trying to wrap my head around this for ages. The etherth device has a 16 packet ring buffer (about 24kB) so I don’t see dma being a factor – it’s not like USB where you can give it a large block of memory and say fill it and do something else in the meantime – which riscos doesn’t. Generally you get to the stage where a packet buffer isn’t available and you have to drop the packet – in the etherth driver there is a tiny delay before dropping in the hope that a packet buffer becomes available and you don’t have to drop. This would imply that the transfer is going at 1Gb/S except that it is not as you have to drop packets and that causes delays. 320kB next is slower than 640kB next slot – it’s about 70% of the speed – and 4MB is 33% of he speed, with the current system without any way to stop TCP retries (flow control) I can envisage a situation where a better stack takes up less CPU time, is faster, but transfer speeds are worse. I usually get to this point in my ponderings I decide to accept riscos for what it is and forget about its problems – and hope Jeffrey fixes them :-). Edit 480kb/s not 1GB/s armx6 uses the USB clock for Ethernet. |
Steffen Huber (91) 1958 posts |
480 MBit/s hopefully :-) |
Pages: 1 2