Ticket #459 (Fixed)Wed Nov 14 12:40:56 UTC 2018
LanManFs corrupting files
Reported by: | Rik Griffin (98) | Severity: | Critical |
Part: | RISC OS: Module | Release: | |
Milestone: | Status | Fixed |
Details by Rik Griffin (98):
This is an odd one.
RISC OS 5.24 on Iyonix. LanManFS 2.61.
Using OmniClient and LanManFS to access a share on a SUSE 42.2 PC.
Saving files to the SMB share seems to work ok.
Loading files from SMB results in corruption, if the file is between 4304 and 4356 bytes in size. The last x bytes in the file are replaced with zeros, where x = 4356 – <file_size>.
Looking in the Linux PC’s logs (journalctl -xb) shows a lot of messages like this:
Nov 14 12:28:09 ruthenium smbd24084: [2018/11/14 12:28:09.955620, 0] ../source3/smbd/reply.c:3863(reply_read)
Nov 14 12:28:09 ruthenium smbd24084: reply_read: requested read size (4320) is greater than maximum allowed (4304/4356)
Note the presence of those two numbers (4304, 4356) in the SMB logs.
Happy to test fixes etc if required.
Changelog:
Modified by Sprow (202) Thu, November 15 2018 - 08:18:48 GMT
LanManFS declares up front a maximum receive size of 3*1452 (MAX_RX_BLOCK_SIZE) when it’s negotiating with the server. So, if the server has allowed it to connect then it doesn’t have much justification to later reject requests of that size. This size was chosen as it matches the default Windows uses, so is hopefully maximally compatible.
A WireShark pcap file showing the initial sign on would be interesting to confirm this.
Modified by Rik Griffin (98) Thu, November 15 2018 - 15:56:48 GMT
- Attachment added: lanmanfs.pcap
Wireshark capture (actually tcpdump) as requested.
Server is 192.168.2.136
Client (RISCOS) is 192.168.2.42
Share name: test
Username: denbridge
Password: fish
Command: tcpdump -w lanmanfs.pcap -na host 192.168.2.42
I started tcpdump, then on the RISC OS side, I mounted the share, opened the root directory, and double clicked the file called “file”, causing it to be loaded into StrongEd. The file is 4320 bytes in size, and it was corrupted with the last 16 bytes being replaced by zeros.
Modified by Sprow (202) Thu, November 15 2018 - 21:26:18 GMT
In the session setup (frame 9) the max buffer of 4356 is declared, and the response (frame 10) says “OK” and no error.
Later when the read is requested of 4320 (frame 41) the server responds with only 4304 bytes (frames 42 & 43).
LanManFS’ data copying loop has an exit condition if the number of bytes delivered < the number requested, assuming that to be EOF, because in all other cases it sends 4356 as declared in the session setup. That logic’s been in SMB.c since the first revision 1.1 in 1998, so it doesn’t appear that a change in the RISC OS side has broken things.
Something’s changed in Samba that’s ignoring the declared buffer (or undersizing it perhaps to allow for overheads?).
Does your server behave differently if “raw” reads and writes are enabled?
Modified by Rik Griffin (98) Fri, November 16 2018 - 09:52:08 GMT
Adding these lines to /etc/samba/smb.conf:
read raw = yes
write raw = yes
and restarting the server had no effect on the problem. Setting those options to “no” also had no effect.
Interestingly, the linux system logs show this when I load the test file from the SMB share:
Nov 16 09:49:40 ruthenium smbd10089: [2018/11/16 09:49:40.880659, 0] ../source3/smbd/reply.c:3863(reply_read)
Nov 16 09:49:40 ruthenium smbd10089: reply_read: requested read size (4320) is greater than maximum allowed (4304/4356). Returning short read of maximum allowed for compatibility with Windows 2000.
So this looks like a deliberate choice on the part of the server. Server version is 4.4.2-9.1, I’m going to try a newer one.
Modified by Rik Griffin (98) Fri, November 16 2018 - 10:37:09 GMT
Here’s the code from smbd, reply.c line 3861…
<pre><code>
/*
- The requested read size cannot be greater than max_send. JRA.
/
maxtoread = xconn->smb1.sessions.max_send – (smb_size + 52 + 3);
Returning short read of maximum allowed for compatibility with Windows 2000.\n",
(unsigned int)numtoread, (unsigned int)maxtoread,
(unsigned int)xconn->smb1.sessions.max_send));
numtoread = maxtoread;
}
</code></pre>
So I’m not familiar with the code, but <pre>xconn</pre> must be some sort of connection context, and we’re loooking at the <pre>max_send</pre> value that LanManFS has sent to the server. Deduct the size of a minimal SMB packet (<pre>smb_size</pre>) and an extra 13 bytes (?) gives us the discrepancy between the requested max size of 4356 and the actual size delivered (4304).
So, the server thinks that the packet header size is deducted from <pre>max_read</pre>, and LanManFS doesn’t?
Modified by Rik Griffin (98) Fri, November 16 2018 - 10:37:46 GMT
Gah, sorry, I’ve messed up the markup in that post. Can I edit somehow?
Modified by Rik Griffin (98) Fri, November 16 2018 - 12:08:01 GMT
- Attachment added: lanman-diff.txt
Wonder if I can get the formatting right. Here’s a patch for LanManFS.c.SMB – also as an attachment.
pre.
- ADFS::Rik2.$.riscos.src-iyo-dev/5/27/tar.TungstenDev.apache.RiscOS.Sources.Networking.Omni.Protocols.LanManFS.c.SMB 2018-11-16 12:03:49.0 0000
++ ADFS::Rik2.$.riscos.apache.RiscOS.Sources.Networking.Omni.Protocols.LanManFS.c.SMB 2018-11-16 12:03:49.0 +0000
@ -111,6 +111,12 @
#define WRRAW_BLOCK_SIZE 16384
#define PRN_BLOCK_SIZE 1024
+/*
+ * SMB packet size as defined in smbd/reply.c
+*/
+#define SMB_SIZE 39
+#define MAX_RX_DATA_SIZE (MAX_RX_BLOCK_SIZE – (SMB_SIZE + 5*2 + 3))
+
/* This structure relies on Norcroft C packing the bytes & words
properly!
*/
@ -2619,7 +2625,7 @
err_t SMB_Read ( int FH, uint offset, ui
while ( len_left > 0 )
{
SMB_TxWords0 = fid;
- SMB_TxWords1 = min(len_left, MAX_RX_BLOCK_SIZE);
+ SMB_TxWords1 = min(len_left, MAX_RX_DATA_SIZE); // was MAX_RX_BLOCK_SIZE
SMB_TxWords2 = offset & 0xFFFF;
SMB_TxWords3 = (offset >> 16 );
SMB_TxWords4 = (len_left);
@ -2648,7 +2654,7 @
err_t SMB_Read ( int FH, uint offset, ui
- if ( n_read < MAX_RX_BLOCK_SIZE ) /* Reached end of file /
+ if ( n_read < MAX_RX_DATA_SIZE ) / Reached end of file */ // was MAX_RX_BLOCK_SIZE
break;
}
Modified by Rik Griffin (98) Fri, November 16 2018 - 12:09:05 GMT
For flips sake, how do you format text? None of what’s described in the alleged help actually works.
Modified by Rik Griffin (98) Fri, November 16 2018 - 12:11:47 GMT
I do have write access to the repository but I haven’t done so in ages, and I don’t want to cock it up.
Therefore, if these changes meet approval, can someone else check it in and at that point close this ticket?
Modified by Sprow (202) Sun, November 18 2018 - 17:50:39 GMT
- Attachment added: load4320_win7.pcap
> For flips sake, how do you format text? None of what’s described in the alleged help actually works.
I don’t know what markup the tickets use either, except that it doesn’t appear to be Textile (nor can you edit a comment, nor preview to see how bad it looks). Stick to attachments!
However, I don’t think your change is making the code any better – you’ve ended up having to bake in lots of magic numbers which are tied to the way a specific server (smbd) is designed. It looks like other clients tripped up https://lists.samba.org/archive/samba-technical… with this workaround, so my suggestion would be that you raise a ticket against Samba to make the logic only apply when the client in Win2k (the client states its name in the session setup). Or apply the patch from JRA in https://lists.samba.org/archive/samba-technical…. Or come up with a loop construct that handles both cases without compromising the (current-ish) Windows case.
For comparison attached is a load of a 4320 byte file in one chunk with Win7: no problem. I could probably dig up a WinXP machine for comparison.
Modified by Rik Griffin (98) Mon, November 19 2018 - 09:48:43 GMT
I agree, it’s a bit of a nasty specific fix. I’ll investigate further and see what I can come up with.
Modified by Rik Griffin (98) Mon, November 19 2018 - 12:52:44 GMT
- Attachment added: lanman-diff.txt
Yeah OK I just realised that:
1 – the server returns the number of bytes read
2 – LanManFS quits out of the read loop if this number is less than the number requested.
3 – The read loop would otherwise terminate when we have read the total required bytes.
So, the fix is even simpler than I thought. Simply delete lines 2651 and 2652, which are the early exit from the read loop if n_read is less than MAX_RX_BLOCK_SIZE.
My concern with this simple change is that if the server returns 0 bytes, we’d get stuck in an infinite loop. So, add a break out of the loop if n_read == 0, and return error code EDATALEN.
Diff attached.
Modified by Rik Griffin (98) Mon, November 19 2018 - 12:57:11 GMT
I really wish I could edit this to cover my embarrassing mistakes!
That doesn’t work either. Back to the drawing board.
Modified by Sprow (202) Tue, November 20 2018 - 09:01:53 GMT
It’s worth looking at what happens if Filer_Action triggers the copy. Two calls result, one for 4096 and another for 512 (which I assume corresponds to FileSwitch direct copying as much as possible, then 1x buffer refill to do the partial).
While I haven’t yet tried your 2nd patch, what would happen is the 512 byte request would be partly fulfilled, which would reduce len_left to 0 and quit the while loop – sounds OK.
What I’m not convinced of is what happens if there’s a discrepancy between the true file size and the one RISC OS believes (ie. something on the server grows or shrinks the file between the time that RISC OS asked for its size and comes to load it). I believe that’s what the apparently superfluous “if ( n_read < MAX_RX_BLOCK_SIZE )” is for.
Modified by Sprow (202) Tue, November 20 2018 - 09:02:24 GMT
- Attachment added: fileraction_win7.pcap
Attached Filer_Action example for reference.
Modified by Bert (7232) Sat, January 12 2019 - 19:11:11 GMT
Rik,
Did you make any more progress?
I was expecting your posting/patch of Mon, November 19 2018 – 12:52:44 GMT to work. What did it end up doing?
Also can you clarify the initial bug report: the “where x = 4356 – <file_size>” bit doesn’t fit my understanding of the bug and doesn’t match the example of the 4320 byte file with 16 bytes corrupted at the end. Should it have been x=<file_size>-4304?
Cheers
Bert
Modified by Christopher Collins (2981) Fri, November 22 2019 - 00:41:55 GMT
I’ve confirmed Rik’s bug exists and have put up a PR with a (highly similar) fix.
The issue lies with the definitions within CIFS itself and a misinterpretation of what the CIFS MaxBufferSize is.
As we’re passing MAX_RX_BLOCK_SIZE as the MaxBufferSize parameter in the SMB_COM_SESSION_SETUP_ANDX request, the spec defines this as: “The maximum size, in bytes, of the largest SMB message that the
client can receive. This is the size of the largest SMB message that the server can send to
the client. SMB message size includes the size of the SMB header, parameter, and data
blocks”.
SMB_COM_READ further makes it clear that the read response (which includes the headers) must fit into the buffer size nominated.
LanManFS treats MaxBufferSize as if it’s the largest permitted data body for the server to send back to us.
As a result, the largest requests from LanManFS exceed the negotiated MaxBufferSize and the server should reject it. Because this bug is pretty common it turns out, Samba truncates the read to fit in the nominated buffer.
Modified by Christopher Collins (2981) Fri, November 22 2019 - 02:34:14 GMT
Even better, it’s highly likely there’s also a bug in samba where it miscalculates the maximum permitted size as smb_size includes the transport (NBT) header, but the transport overheads are excluded from the size calculation.
I’ve made a further patch to use the correct value, and to survive the short-read behaviour in the buggy Samba without violating the spec.
Modified by Sprow (202) Tue, September 01 2020 - 13:02:16 GMT
The RISC OS half of the fix is in LanManFS 2.66, however you’ll need to wait for https://bugzilla.samba.org/show_bug.cgi?id=14443 if using Samba for the other half.
Windows clients are unaffected.
Modified by Sprow (202) Tue, September 01 2020 - 13:03:09 GMT
…sorry that should say clients of Windows. Windows being the server.
Modified by Sprow (202) Thu, June 09 2022 - 07:14:24 GMT
- Status changed from Open to Fixed
And the Samba half of the fix has now been accepted too
https://gitlab.com/samba-team/samba/-/commit/17…
This will appear in “4.16.next, 4.15.next”.
At time of writing that would be 4.16.2 and 4.15.8 since 4.16.1 and 4.15.7 are the current releases.