LanManFS losing directory contents after a while
Jeffrey Lee (213) 6048 posts |
A few months ago when I was working on something I think I did have some success getting wiresalmon working to allow me to perform a capture on RISC OS – possibly all it needed was a recompile. |
Steve Pampling (1551) 8170 posts |
Frequently the NAS units still support SMBv1 and as Jeffrey noted MS have been stealthly switching off SMBv1 in Windows 10 so the Windows 10 will be different to the Win 7 & 8 (use SMB2 and fallback to SMB1) or the pre-Win7 builds that were SMB1 by default. I don’t decry the switch off as it pushes increased security on people, what I do think is bad is the stealth nature of the switch off which I believe should be done with a pop-up notification to tell you that you should do it and repeat the nag at regular intervals with no kill for the nag notification. |
Jeff Blyther (1856) 47 posts |
Ok, After reseting the system I hit the filer system hard to see if I could make it go wrong quicker (just opening/closing lots of directories for a few minutes) but no luck, it wouldn’t go wrong, but after returning to the system after 15min or so a click to open a directory caused a long hour glass pause for about a minute and when the filer window opened it was wrong, does LanMan do any thing in an idle moment? Later on I will reset the system and find out if a pause after a lot of usage makes it go wrong, and also as Rick has asked about other computers using the Nas I will test for this as well. Also Colin asked about if its random files/folders not being displayed (to me it looked random), but this time I kept a “cat” record of directories and this shows that the missing files are not random, when comparing the two cat lists they are identical until the point when the wrong list ends (what order does cat list files?). |
André Timmermans (100) 655 posts |
Cat depends on the order in which the underlying filesystem provides entries, usually they are provided in the order in which the directory entries are stored in the directory structure on the disk. On FileCore based disks (i.e. ADFS, SCSIFS, …) the directory entries are kept in sorted order, for FAT32 disks (DOSFS, Fat32FS) new entries are just added to the end of the list unless an entry which was marked as deleted can be reused so cat basically lists the entries in the order they are created on disk. For LanManFS it will depend on the server on the remote side but I guess they use the same principles. The filer basically requests the directory list like cat but sorts the received list which makes it more difficult to understand what happens. He we now have a first hint: when things go wrong LanManFS returns truncated directory content lists. Now the long hourglass pause you mention could indicate a timeout of some kind. A network timeout maybe or some kind of corruption which prevents it from requesting the next parts of the directory listing. |
Colin (478) 2433 posts |
That makes sense,
or
to list the directory or current directory instead of Do you use !Reporter? If so you could try this debug version of LanmanFS which will output debug data to a reporter window. If you are going to try the test module have you a folder that always shows the problem? If so could you: Before loading Omni 1) run !reporter You should now have 4 files, the reporter window and directory listing when there is no problem and the reporter window and directory listing when there is a problem. Could you send me the files at the email address on this web page and I’ll see if it sheds any light on the subject. |
Jeff Blyther (1856) 47 posts |
Thanks Colin, Last night I tried various ways to make it go wrong quickly, but with no consistant success, the only thing that does seem consistent is that always go wrong after its had pause (10min or longer) of me trying to make it go wrong, but that could be chance. |
Dave Higton (1515) 3526 posts |
There seems to be a timeout of some sort, after which something behaves differently in some way. Repeatedly poking it, even at 60 minute intervals, seems to me to be likely to prevent the problem from being observed. The timeouts look like they could be different on different systems, which suggests that the timeout may be in the server. Dunno. Just ideas. |
Colin (478) 2433 posts |
It does an ‘Anti IdleOut’ callback every so often to keep the link alive. |
Chris Evans (457) 1614 posts |
I wonder what happens if after the problem occurs you try saving a file to the directory that is not being listed correctly. Does it error? Is it there next time you are able to read the directory? |
Martin Avison (27) 1494 posts |
I have now run my LMcheck program on my Iyonix for 12 hours at a time, and on my RPi3 it is still running after 24 hours, all without a problem (yet). They are both using RO5.24 with LanManFS 2.61 and OS_GBPB to list 200+ directories and 33K unchanging files on my Synology NAS every hour. The Iyonix has been used, but the RPI is doing nothing significant. Have there been any confirmed problems using a Synology? Or an Iyonix? Or OS_GBPB? My program LMcheck is now available if anyone else wants to try it. I am also now running Colins debug version of LanManFS on my Iyonix … but until I have a problem the output is no real use :-(( |
Colin (478) 2433 posts |
Martin Your app doesn’t do the same as the filer when enumerating a directory. The filer just reads the directory it doesn’t read sub folders – don’t know if it makes any difference though. |
Martin Avison (27) 1494 posts |
LMcheck will indeed enumerate subfolders if there are any, because I wanted to be able to check as many folders as practical. In my case here it is reading many folders at the lowest level with lots of files and no sub-directories. I can easily try using it on a single directory with just 900+ files, if you think it a test worth doing. Until we discover a more reliable way of provoking the problem, debugging is difficult. It is playing hard to pin down … but will probably be bloody obvious when it is found! |
Will Ling (519) 98 posts |
That was interesting… I’ve been checking the folders I knew I had an issue with in the past trying to provoke the problem, with no luck. I then ran LMCheck (thanks Martin), and ran into the (still to be looked into) directory not found error, on enumerating files within a folder with too many spaces/special chars in the name. At that point, re opening the folders I’d just checked, there were files missing. 3 levels down from root, each folder only listed 14 items (all have more). Reinit lanmanfs and they are restored. Edit: just realised I’m still on 29th April 5.25. I’ll get that up to date before doing more tests. |
Martin Avison (27) 1494 posts |
Did LMcheck and the Filer agree?
What is too many? LMcheck on my Pi is still running ok after 30+ hours, so I am starting to think that it is not going to provoke the error with those test conditions. Which is interesting, but not directly useful! |
Will Ling (519) 98 posts |
I didn’t check the output from lmcheck, it left the log file open when it died each time and never completed a run. Had to wait a minute between runs to open a different file. I think more than 2 is too many. Can’t check now but “a / b” will probably do it. I managed about 4 fails in an hour so hopefully I can test more tonight. |
Martin Avison (27) 1494 posts |
@Will: I had not realised LMcheck had failed. If you send me the error message I will improve the error handling! My Pi has just passed 48 hours running LMcheck without a problem, so I have concluded that my conditions are not going to provoke the problem, and I have stopped it. If I can think what to change I will try again! |
Rick Murray (539) 13840 posts |
Basically, it knows you are looking. |
Rick Murray (539) 13840 posts |
More usefully, what does LMCheck actually do? It seems to me, from the above, that longish periods of idle may be a contributing factor? Maybe LMCheck checks too much, or too often? |
Martin Avison (27) 1494 posts |
LMcheck starts from a configured directory, and enumerates all entries within it using OS_GBPB,10. If it finds a directory, it enumerates that (recursively). All entries are listed to a timestamped file, and counted. After a configured time (I have been using 60 minutes) it does it again, raising an error if the counts have changed. This seemed to match the reports of missing files in directory lists after a period of inactivity. However, using LMFS v1.61 to my Synology from my RPI it lasted 48+ hours without faltering. My Iyonix lasted 12 hours on two days. So somehow it is avoiding the problem. But, as you say, it knows I am looking! |
Jeff Blyther (1856) 47 posts |
I have to agree with Rick, it knows were up to something! I’ve been running Colin’s test module with !Reporter for over a day now and it still running ok… |
Will Ling (519) 98 posts |
It wasn’t really LMchecks fault. It got “Directory ‘LanMan::Volume_1.$.lt.LtankWeb.Laser Tank Ver 1/60 For RISC OS_files.Newer’ not found at line 70” due the case where lanmanfs presents the folder and content with spaces and / in the name, but then bails out saying not found when asked for the content. |
John Williams (567) 768 posts |
LMcheck obviously frightems the merde out of LanManFS and can now be used as a preventatif. Bit like a patch, but perhaps more like a horse’s head on the pillow? Ok I’m on my third or … Lost count! |
Rick Murray (539) 13840 posts |
I think Schrödinger’s Cat, you think…The Godfather…? <puzzled> |
Rick Murray (539) 13840 posts |
Maybe this is the problem? What happens if you pick a directory and watch only that? |
Will Ling (519) 98 posts |
This is proving more elusive than I thought; I’m really struggling to get a failure now. Anyway, some more detail from when it is in the fail state… Failed listings
Should have been..
Clearly the not displayed items are still accessable as I was able to *cat $.lt.tank which wasn’t visable in the $.lt listing, although at the time the filer was showing different, as below for $.lt likewise, in the fail state the lt.tank folder had, in the filer, 14 items, 7 of which showed in *cat and 7 not. And the rest just missing. Another observation, when items are not displaying, new files and folders can be created, but don’t show up until a reinit of lanmanfs. |