LanManFS debugging
Pages: 1 2
André Timmermans (100) 655 posts |
I have been trying to debug “LanMan in use” issues with Colin’s LanMan_timout debug version of the module but instead run in aborts instea of the expected issue. Basically, I plays MP3s stored on a USB key connected to my router and at some random point I get an abort. I have collected the info in Report File ReportList02 contains the !Reporter file, ShowRegs a register dump of the first abort (somewhere in the the SharedCLibrary), ShowRegs2 a later abort with LanManFS itself. As Jeffrey mentionned the possibility to collect abort stack dumps I retried today, which gives the additional execdump file (though no reporter file since my machine stopped responding before I saved the report). |
André Timmermans (100) 655 posts |
There are also other ways to cause issues with with LanManFS like manipulating archives with SparkFS (I created a new archive, add a folder with one of my project to it, then attempted to delete files from the “o” subfolder and it reported errors as if the share didn’t exist) or performing transfers in parallel in both directions). |
Colin (478) 2433 posts |
The showregs2 file is the same bug as Will reported and occurs in debug code (DumpBuffer) which is only used in the debug version for tracing reason codes which are not used in the non-debug version of the module – ie the non-debug version ignores these reason codes. LanManFS_timeout2.zip is a debug version which bypasses the tracing of these reason codes so should stop this particular abort. I haven’t had chance to look at your other reports. |
Colin (478) 2433 posts |
I can’t repeat that at the moment – but then I haven’t had much luck reproducing any problems here. Is this a problem with the with the timeout version? or was it something you noticed before? In the original version of LanManFS you could have problems with files/dirs that are inside a directory which has characters that need translating. I thought I’d fixed that. |
André Timmermans (100) 655 posts |
I have now tested with the timeout2 version. All these issues are old ones that I reported here in the past already and I just thought to use the debug version to see if it could highlight the problems. Well, I was not expecting aborts instead of the “LanManFS in use” errors which i get with the normal 2.61 version. |
André Timmermans (100) 655 posts |
For the archive manipulation issue, I have no crash but a “File sharing violation” issue. See Report3.zip |
Colin (478) 2433 posts |
Can you repeat the bugs easily? If so does WireSalmon work for you on a Pi3. I’ve only just got it working on my ArmX6 as it needed alignment exceptions turned off before it would work – it may be the same on a Pi3. If you can get it working and can reproduce the problem easily could you run wiresalmon, start a capture, do what you need to do to reproduce the bug then stop the capture. The captures may be quite large if you leave it recording for a long time. Can you then send the captures to me – or make them available on your web site. The Access Violation error is an error returned by the server so hopefully the wiresalmon output will show if something looks wrong. Thanks. |
André Timmermans (100) 655 posts |
I will give WireSalmon a try tomorrow. |
André Timmermans (100) 655 posts |
Sorry, but WireSalmon doesn’t work on the PI3. |
Colin (478) 2433 posts |
It doesn’t matter I think I’ve found the problem – see other thread. Could you try the LanManFS_02 and see if it crashes – I couldn’t replicate any crashes. |
André Timmermans (100) 655 posts |
I tried the LanManFS_04 version, it made no difference, either to the archive manupulation issue or the crash while playing music from files stored on the NAS. I noticed one thing thought with the dump generated for the music playing issue. This a little part of the dump: fa207b78 : 0000000a : | R0 \ CMHG veneer kernel_swi_regs? According to the Socket_Recv doc: while the crash occurs in the Internet module for instruction STRGEB R3,[R0,#-1]! with R0 = 4469421f R0 clearly fits within the buffer limits so either the buffer size provided by LanManFS is incorrect or this buffer was maybe deallocated. |
Colin (478) 2433 posts |
Changes in 04 were unlikely to fix your crashes. How frequent are these crashes? I’ve done a bit of testing streaming 24bit flacs with my IsocPlayer program over LanManFS without problems so far. Are you sure the bug isn’t in your program? The receive buffer location and the len comes directly from the OS_GBPB call as far as I can tell and are just passed through to the socket recv function. |
André Timmermans (100) 655 posts |
Definitely a problem in LanManFS. I retried today and since it didn’t crash the system directly I had time to collect some info: The caller is DiskSample which for the call tries to fill in dynamic area “DiskSample (Input1)” located at 52fa6000 with data header size (&40) and circular buffer size (&40000), free offset is &10000 and start of fille part at offset (&20084). This can be see in the stack trace register list of the dump during the transition from FileSwitch to LanManFs. At some point within LanManFs the stack starts refering to 530e6040 (which is somewhere in another dina;yc area) and the Internet module crashes while trying to write to 530e7af8. As you can see the is a jump of &120000 bytes in the buffer adress. I should have added the dump to the post but I did not take a copy of the dump, had to reboot in order to be able to use Netsurf and of course the dump got overwritten due to another crash during the shutdown. I will have to reproduce the problem again. |
André Timmermans (100) 655 posts |
Test with your LanManFS_05. Trying to fill from 48bc5040 to 48bd5040 (+&1000). Error block: 80000002 Internal error: abort on data transfer at &FC177A78 R15 = fc177a78 = SharedCLibrary +c9bc = memmove +28c SVC stack: R14_usr = 0006429c = +5c29c in application memory = Task_PollIdle +68 USR stack: End of dump |
André Timmermans (100) 655 posts |
Note that since I notticde that the “Pre-load next track” option was active in DigitalCD, I disabled it to ensure that I only access I file at a time but still manage to produce the dump, though heve address 44683040 seems to become 44693063, i.e a change by +&10023 this time. |
André Timmermans (100) 655 posts |
I have been looking at the sources of SMB_Read and noticed: SMB_TxWords0 = fid; SMB_TxWords1 = min(len_left, MAX_RX_BLOCK_SIZE); SMB_TxWords2 = offset & 0xFFFF; SMB_TxWords3 = (offset >> 16 ); SMB_TxWords4 = (len_left);Is the value of SMB_TxWords4 normal, should it not be “SMB_TxWords4 = SMB_TxWords1;” ? |
Colin (478) 2433 posts |
That’s not where the bug is, it’s in the loop containing SMB_ReadRaw. You have these functions
And this debug output
So you are calling OS_GBPB with a buffer at 48bc5040 and len 00010000. SMB_ReadRaw from the loop in SMB_Read is being called from address 48ce5040. As SMB_readraw is done in 0×8000 byte chunks it should not be called with an address >= 48bd5040. |
André Timmermans (100) 655 posts |
Indeed my remark hasn’t anything to do with the DMB_ReadRaw calls from the stack, just something curious I noticed. Looking at the differences between content of fa207c80-8c and of of fa207ccc-d8, it looks that the SMB_ReadRaw loop in the SMB_Read code has increase both “where” and “offest” by &12000 while “len_left” was decreased by &8000. I can think of a way for the loop to continue past its limit if SMB_ReadRaw signals having returned more bytes than was requested: len_left becomes negative but since it has type “uint” it is seen as a large positive number and the loops continue. It we assume &8000 is the value of “len” in SMB_ReadRaw after its limitation by “if (len>RDRAW_BLOCK_SIZE) len=RDRAW_BLOCK_SIZE;” it would make sense. |
Colin (478) 2433 posts |
Yes it looks like SMB_ReadRaw is returning a value in n_read > len_left so that Would you say the crash happens at the end of the music? |
Colin (478) 2433 posts |
Would you like to try LanManFS_Test_06a.zip I’ve just added some debug output to see if n_read is ever > len_left and fail gracefully if it does. Hopefully it will show n_read > len_left. |
André Timmermans (100) 655 posts |
Not quite according to the dumps offset (which I assume corresponds to the position in the file) was 320000 and 4F0000 so even with 10000 of length to fill it doesn’t reach the sizes of the 3 first files in my playlist which have a size of 50xxxx. I have been trying a new crash just to give it another but I am now at the 16th file in the playlist and it still didn’t crash while it usually crash within the 2 first files. |
Colin (478) 2433 posts |
Could you try this LanManFS_06b.zip It turns out that SMB_ReadRaw and SMB_WriteRaw didn’t have a re-entrancy guard on them whereas all other commands go through Do_SMB which does have a re-entrancy guard. If you connected to a windows server SMB_ReadRaw and SMB_WriteRaw aren’t used so re-entrancy is checked for when read/writeraw isn’t used. So I’ve added the re-entrancy guard to these functions. It may explain an intermittant problem. Note: It is the re-entrancy guard which triggers ‘LanMan in use’ errors you said you had earlier. |
André Timmermans (100) 655 posts |
I tested version 06b, it had no effect, still the same crash. |
Colin (478) 2433 posts |
I’ve found the problem. The crash is made worse because the anti-idle is more often – it’s a re-entrancy issue. stopping the anti-idle happening stops the crash. The rom version also crashes while playing but it takes between 10 and 18 mins into the recording for it to happen. Fixing that crash is easy enough but it has highlighted a bigger problem and that is while playing via DigitalCD – the problem isn’t specific to DigitalCD it just makes the problem more apparent – doing anything else with lanmanfs will cause problems. I presume that you are filling the DigitalCD buffer from a callback? |
Chris Johnson (125) 825 posts |
This is interesting. I was using DigitalCD yesterday, playing some mp3s from the NAS via LanManFS. I had several crashes (real crashes needing a big finger on the reset button). In I think two of the crashes I was looking at filer windows on the NAS at the time. Reverting to playing files stored on a local drive gave no problems. This was on a Titanium, but I have always found LanManFS rather flaky for music streaming, probably more so than LanMan98. I’ll certainly give any test versions you produce a good go. |
Pages: 1 2