Recovering from a broken TCP connection
Dave Higton (1515) 3534 posts |
I’m not entirely happy with the thread’s title, but here goes anyway. My heating control apps run on a Raspberry Pi and communicate with a Maxcube (Ethernet to wireless gateway, proprietary to the radiator valves). They get their mains from different feeds, so one can go off but the other remains on. A couple of times in the last few days, the Pi has lost its mains while the cube remained powered. When the Pi powers up again, it can’t connect to the cube. This afternoon, after the second occurrence (caused by my clumsiness), I sat there at a VNC terminal re-trying every minute for over 10 minutes, without success. Eventually I gave up and came back a while later, whereupon it came up immediately. The cube must be keeping open its end of the old TCP connection, but the Pi end has been irretrievably lost. Does it do any good to keep re-trying, or does each retry restart a timer that has to time out before a new connection can be established? Whatever the right solution is, I have to automate it. |
Leo Smiers (245) 56 posts |
As you said the most likely is that the Maxcube thinks that the other end is still open and functioning. The quickest way to get of this is rebooting the Maxcube but that might not be practicle. I suspect that the protocol has some kind of alive mechanism and that either the Maxcube or Pi has to send a message/reponse sequence. You check if the frequncy of this mechanism can be configured in the Maxcube and then lower it to a more acceptable value (minutes iso tens of minutes). I do not think that re-trying will restart any timer in the Maxcube so you can try to open your connection as often as you want. |
Chris Hall (132) 3559 posts |
One thing I would do is to make sure that each unit, which will presumably obtain a DHCP address from your router, always gets the same IP address by using a MAC address DHCP reservation. One fewer thing to go wrong. |
Jon Abbott (1421) 2651 posts |
When a TCP connection is lost, invariably the session is closed. Some software will recover, but it will rely on the both ends retaining the sequence number. In your case, the Pi will be attempting to start a new session with fresh sequence so the Maxcube should drop the existing session and start a new one. Behaviour will however depend on the TCP source/destination port, be it fixed or dynamic. If it’s fixed ports, it may have to time out if the device doesn’t realise the sequence has reset. For dynamic ports, each new sequence should get its own session via a new port, while the existing one times out. Long story short, there’s possibly a timeout involved, if the problem isn’t elsewhere, like DHCP or DNS after power is recovered. |
Dave Higton (1515) 3534 posts |
No matter how hard I try, I always miss out some important information. The cube is a server. Since it remained powered up, it retained its IP address, but anyway it has a constant DHCP allocation. It is accessed via a fixed port, unsurprisingly. The Pi has a static IP address, but tries to make the connection from a dynamically assigned port. |
Leo Smiers (245) 56 posts |
Because the Pi lost its power the TCP/IP connection hasn’t closed in a neat way. Because of that the cube has not detected that the client has disappeared, the socket is not closed. The cube software can now only detect that the client is not alive any more if it has not received a message for a certain period. Mostly this is implemented bij keep alive messages/responses. There is nothing that you can do from the Pi other than wait for the cube to close its socket and is willing to accept a new connection. |