Large socket send buffer size causes terminal ENOBUFS error?
Jeffrey Lee (213) 6048 posts |
Something a bit worrying that I’ve spotted while trying to improve the throughput of vncserv. vncserv uses a non-blocking socket with the default buffer size of ~17k. It writes to it every centisecond, using a callback from TickerV. 17k*100Hz = 1.7MB/s, which is a bit poor when most machines will be using 100Mb (~10MB/s) or gigabit connections. vncserv has its own (ridiculously large) internal buffer which data is stored in prior to writing out over the socket, so there’s very little overhead in the socket send code, and there can easily be several megabytes of data waiting for transmit. Long-term the fix is to modify the code to listen out for the socket events so that it can write new data to the socket as soon as it empties, but short-term I figured I’d at least try increasing the socket send buffer size to a more sensible level. A 128k buffer size would give 12.8MB/s throughput, which would be enough to saturate a 100Mb connection, so I decided to give that a go. But I’m finding that sometimes the socket gets stuck with the send buffer full, and further send requests result in ENOBUFS (as opposed to the usual EWOULDBLOCK or EAGAIN). Furthermore this situation seems to deadlock all the other sockets in the system – e.g. ShareFS stops working, and inetstat gets stuck during name resolution. I’ve seen this happen on both Pi and iMX6, so it feels like a problem with the network stack as a whole rather than a specific driver (although I think the Pi was only failing when I was using a 256k buffer size, and 128k worked fine – can’t remember exactly) Killing the socket eventually allows the system to recover (socket gets stuck in FIN_WAIT for a while, but once it fully dies the system comes back). So it definitely feels like a temporary resource shortage in the stack rather than a permanent problem. A quick look at the network stack sources suggest that everything goes via MbufManager, which kind of makes me think that might be to blame. Running the code with the RMA, system heap and PCI heap resized to max fails in exactly the same way, so I don’t think it’s an issue with (e.g.) a failed DA grow from an IRQ handler. Before I go any deeper, I was wondering if anyone else had spotted any similar issues? The only slightly odd thing I can think of with how vncserv does things is that when data is being prepared for sending, vncserv will attempt to write it straight into the socket. So there could be lots of writes of 3k-4k of data before the socket buffer fills and the code falls back to doing repeated large writes (50k+) from the TickerV callback. So if there’s a hard limit to the number of available MBufs, maybe the small writes cause them all to be used up, and then there’s no MBufs spare to receive the TCP ACK packets from the remote machine. |
Chris Hall (132) 3560 posts |
Could you please explain what vncserv does? Otherwise an ordinary user won’t know what symptoms to look out for. Does it get used for random read/write (BGET/BPUT) and/or LOAD/SAVE over the network (lanMan98/LanManFS/sharefs) for example? |
Colin (478) 2433 posts |
Doing large sends over sockets generally isn’t a problem Lanmanfs and Sharefs can do transfers > 1.7MB/s. The problem may be that you are running from an interrupt context and callbacks aren’t being triggered. To eliminate this you could try using the tickerv to queue a callback and then keep doing a nonblocking send in the callback until EWOULDBLOCK – ensuring you don’t get reentered by another callback The size of the write shouldn’t matter if you are nonblocking |
Jeffrey Lee (213) 6048 posts |
VNC server. http://www.phlamethrower.co.uk/riscos/vnc_serv.php (Not quite sure why I’ve got it labelled as vnc_serv on my website, I’m fairly certain the module name is ‘vncserv’!) After neglecting its performance for many years (always figured the main bottleneck was scanning the screen for changes) I’ve been prompted to have another look and it turns out that there’s a fair bit of work which can be done to make it better.
It’s already doing all its processing in a callback from TickerV. The desktop is responsive while the sockets are blocked, and some debug code shows that during that time the server callback is still running and trying to send data.
It does matter – the network stack will only take in as much data as will fit into the buffer. Anything which won’t fit in the socket buffer, you have to keep yourself and try sending again later. If you’re calling Socket_Send from a callback running off of TickerV, the maximum throughput you’ll be able to achieve (in bytes/sec) is 100 times your buffer size. |
David Feugey (2125) 2709 posts |
Could it explain my ‘no socket left’ errors on webserver, after several thousand of requests? |
Colin (478) 2433 posts |
I thought we were talking about mbuf exhaustion in which case the size of the send doesn’t matter. You can do 16 16KB sends or 1 256KB send it shouldn’t make any difference the system will only send so much before blocking or ewouldblock. All nonblocking transfers do this – keep sending until EWOULDBLOCK and then multitask. So any use of nonblocking send would cause mbuf exhaustion if it was a bug with socklib. So you have to ask what is different about your situation. It seems to me that the backend is overrun. EtherUSB tx used to have a system of queuing mbufs and kicking the buffer in the background from a callback. It caused problems with pi and faster machines, grinding them down to a virtual halt. It doesn’t work this way anymore. One thing I did note when looking at EtherUSB was that ERR_TX_BLOCKED seems to be ignored by sockets so was of no use for flow control. |
Colin (478) 2433 posts |
That may be down to the server initiating a socket close. The side of the connection that initiates the socket close has to wait for it actually closing. I think early html server standards had the server closing the connection after serving a web page and later web server standards had the client close the connections. |
Steve Pampling (1551) 8173 posts |
You might recall1 comments I made when you released the updated vnc server relating to possible buffer issues that were eased when the (PC client) refresh/update rate was reduced. 1 Assuming a 100% memory retention for minute details. |
Jeffrey Lee (213) 6048 posts |
You may be right, some testing on my Iyonix didn’t reveal any problems there (although on the first attempt the VNC client did disconnect with an error message caused by some kind of data corruption)
Buffering of mouse events? That problem is still there (if the client spams mouse events at high speed the mouse movement on the server won’t keep up). At the moment I’m just working around it by telling the client to not be so silly, but I guess I should have a go at fixing it properly at some point. |
Jeffrey Lee (213) 6048 posts |
I think I might have discovered the cause of this problem now – after finding a copy of the DCI 4 spec that fell off the back of a server, it turns out that there’s a Mbuf_Memory SWI which controls the maximum amount of memory MbufManager can use. Default setting appears to be 256K, so a 256K socket send buffer (as I was trying) would clearly be capable of consuming the entire quota, preventing other buffers from functioning correctly. |
Andrew Conroy (370) 740 posts |
There is a much later version (v4.07, I think) available if you sign the relevant ROOL NDA and then let ROOL vet any code you might produce using it thereafter. |
Steffen Huber (91) 1954 posts |
Hiding specs behind NDAs has to be one of the more braindead things Acorn did. A bit sad to see that ROOL continues that “tradition”. Or do I miss something? |
Jeffrey Lee (213) 6048 posts |
I think it’s ANT who are to blame for that, not ROOL. https://www.riscosopen.org/forum/forums/4/topics/3647#posts-46805 |
Colin (478) 2433 posts |
As I see it he main problem with the internet stack is that packets are read under interrupts (creating mbufs for each packet) and these mbufs are consumed on callbacks. So large transfers create lots of mbufs in the interrupt until mbufs are exhausted. This is what happens with sharefs. With LanManFS the maximum transfer size is determined by the server. Linux servers (NAS) tend to use large buffer sizes, windows uses a small buffer. Socket sends are essentially done in the foreground and are dependent on mbufs being available – which there won’t be if incoming packets have exhausted mbufs. I think the solution is for the interrupt to use a callback and read the device from a callback – but it seems a lot of work to find out if that is a good idea. |