LanManFS losing directory contents after a while
Colin (478) 2433 posts |
If it wasn’t for the fact that reinitialising lanmanfs fixes the problem I’d say that’s what it looks like – the odd thing is that the server returns any files at all. There are instances of the continuation process working after the reset – to list the previous batch of filenames to the continuation that failed. I just have to figure out what is the difference between enumerating before and after the reset. |
Colin (478) 2433 posts |
Can anyone try LanManFS_02.zip It should fix the problem. It won’t fix the 40 second delay – that is caused by a reply timeout which is effectively used to detect a remote disconnection – I may look into that. I think the more frequent timeout ping fix may have stopped the remote server timing out LanmanFS so stopping reconnections and this fix should stop the problem if the remote end disappears temporarily. I think the ‘ping’ may not be necessary as reconnecting should now work. |
Will Ling (519) 98 posts |
Getting there… test-02 survives the reconnect, but unfortunately, following that, mounting another share from a different device, or in fact, dismounting and re-mounting the same share, the problem shows up again. I think the more regular ping is a good thing in that it makes it far less likely the remote end will drop the connection at all, which does avoid the time out on reconnect (in normal use, it was nice to not see the hour glass every time I returned to my iyonix while testing the last week), it may not need to be quite so regular, I’ve no idea what the optimum would be though. Unless of course you can find a way to make the disconnected detection faster. I’d assume on a samba restart the connection closing should be clean and detectable, hitting the reset button less so though. |
Colin (478) 2433 posts |
Ok try LanManFS_03.zip. Hopefully that will work ok. |
Will Ling (519) 98 posts |
Sorry, test-03 seems the same… Hopefully to be clear, |
Colin (478) 2433 posts |
How about LanManFS_04.zip. This version should also remove the 40 sec timeout if the remote server shut down gracefully. |
Will Ling (519) 98 posts |
Loving the instant re-connect :-) Sadly though, although test-04 makes it harder to break, it can still be got into the bad state. Instead of the last two steps above, if I mount a different share from my nas, that shows the problem, then it seems to carry back to the first share again when I re-open sub directories. |
Colin (478) 2433 posts |
The problem is that there are only 2 search contexts regardless of how many windows are open. One is used to verify the path and the other to download the filenames. The 2 searches are interleaved. Each search is given the a search ID by the server. When the server is restarted the search contexts are not re-initialised and both searches end up with the same search ID so after the search which verifies the path, because the two searches now have the same ID, the continuation on the second search is in the wrong directory so the continuation file isn’t found and the directory gets truncated. I can’t just initialise the contexts every time as then the search ID is never closed on the server. I’ve been trying to find somewhere to initialse the search contexts when the server has disconnected but it’s proving to be tricky. |
Colin (478) 2433 posts |
I can’t repeat the problem with 04 though I had seen the problem with previous test versions. This is exactly what I did. There is two shares on the same server – share1 and share2. In my case the server is samba on Armbian.
If at stages 3 and 6 I browse a few sub directories instead of just opening the root directory it makes no difference. These tests showed a problem in earlier test versions for me. I gather that you are testing on a raspian server and a NAS. Do you get the problem on both servers? If the problem occurs on the raspian server, do you know how to use wireshark? If so a wireshark output file after the directory shows the problem would be useful. If not I’ve uploaded a debug version of 04 LanManFS_04_debug.zip. The reporter output when it fails may be useful to see what is different about your server. |
Will Ling (519) 98 posts |
I get missing files with both. Perhaps it doesn’t help that I copied the directory tree I was testing from one share to the other, so they are the same path, though I assume the path checking includes the share name?
I don’t know if it is significant, but I’ve never seen an issue on the root directory. |
Colin (478) 2433 posts |
I’m also testing with a copy of the tree on the other share. I don’t have 2 servers so am testing with both shares on the same server. I’ve just tried your list substituting your second share on a different server for my second share on the same server and it works fine. I’ll have to see if I can get another version of linux working. I’ll look into your report tomorrow.
Neither have I. I just found that pre restarting the server I didn’t need to browse directories. |
Will Ling (519) 98 posts |
I’ve set up PI3 that seems to be able to run wireshark, instead of the PI1 I was using, I’ve saved the capture hopefully as you need. I did the following, to minimise the amount of data.
And, just to say, I’ve not managed to find issues using one server with two shares. |
Colin (478) 2433 posts |
Try LanManFS_05.zip then. I was able to replicate it with two servers so this fixes it for me. |
Will Ling (519) 98 posts |
That’s the badger! I’ve tried all manner of mounting dismounting, resetting, using multiple shares over three servers. I cannot fault it. |