SMB2/3 protocol
Grahame Parish (436) 481 posts |
I don’t know if it makes sense as I don’t understand the complexities of coding at that level, but would it be possible/practical to make the filename conversion routines available to the rest of the system, as this could help with NFS, FAT32 and other non-native filesystems? |
Dave Higton (1515) 3525 posts |
Yes. Space is an illegal character within RISC OS filenames. 0xa0 is the accepted substitute. |
David J. Ruck (33) 1635 posts |
What is needed is a set of filename conversions which are:-
|
Richard Walker (2090) 431 posts |
Not all. I find it them welcome. This isn’t exactly a heavy-traffic forum! |
Rob McKay (8401) 4 posts |
I’ve dug a bit deeper into Apple’s open source libraries and have found their latest SMB client for MacOS 13 here: https://github.com/apple-oss-distributions/SMBClient/tree/SMBClient-382.60.5 I’ve had a quick look at it and I’m considering swapping my module’s backend from libsmb2 to Apple’s SMBClient purely because of the licensing. I’m not sure how Apple will respond to pull requests but I’m sure that their code is likely to be maintained and tested. |
Clive Semmens (2335) 3276 posts |
Umm. Pretty certainly true, but with a caveat. Here’s an interesting example of why I say that: the latest version of MacOS, Ventura, when it first came out, had a bug in its disc sharing such that while another Mac running Ventura could reliably see shared discs on it, older Macs that could only run older versions of MacOS (never mind any non-Macs) either saw them unreliably or not at all. It took Apple months to issue an upgrade to Ventura that fixed this issue – several upgrades went by without fixing it first. |
Rick Murray (539) 13840 posts |
Well, not actively supporting “old stuff” isn’t a surprise. |
Clive Semmens (2335) 3276 posts |
Oh, I don’t think so – they might not care about cutting off ancient Macs like mine, but any Mac that couldn’t run Ventura was liable to be cut off, some of them not very old at all. And everything that wasn’t Apple. |
Chris Mahoney (1684) 2165 posts |
It surprises me. I’ve filed dozens of bugs with Apple over the past three years and my current score is:
So yes, I’m actually surprised that Apple fixed something! |
Clive Semmens (2335) 3276 posts |
Unless I missed it, that’s the case here too. I didn’t file a bug report; I guess someone or several someones must have. But there’s plenty of mention of the problem in discussion groups. I’ve not checked to see whether there’s any mention of the fix in discussion groups – I just discovered for myself that it’s been fixed. It’s a fix in Ventura – no fix required in Big Sur or Monterey for their end of it. |
Jake Hamby (8915) 21 posts |
I have some good progress to report on the LanManFS integration with SmbX. I’ve worked through all of the issues that require a custom RISC OS design, and I can work on coding those up starting tomorrow. For testing purposes, I’ve decided to strip out all of the old SMB1 client code and create a toy filesystem with hardcoded sample directories and files that will also allow me to test creating and writing/reading files and directories to the RAM cache to make sure that read-ahead and write-behind works as expected (more on that in a second). I finished my tables to map between what I called “ROFTF-8” (RISC OS Filename Transformation Format 8-bit) to/from 16-bit UTF-16 (little-endian) as used by SMB. Thanks for the info on mapping 0×20 to 0xa0. There’s a similar mapping trick that I learned that Samba, SmbX, and others know about called “SFM”, for NT Services for Macintosh. You have to use the Internet Archive to read the original Microsoft notice: http://web.archive.org/web/20150315001238/http://support.microsoft.com:80/en-us/kb/117258 SFM was an AFP (Apple Filing Protocol) server for NT 4.0 and Microsoft needed a way to handle files with embedded 0×01-0×1b control characters, as well as “?”, “/”, “\”, “*”, """, “?”, “<”, “>”, and “|”, and spaces and periods at the end of filenames, all of which were illegal in NTFS. So the SFM convention is to map those characters to Unicode in the 0xf000-0xf027 range. For RISC OS purposes, being able to remap “?”, “<”, and “>” in filenames in the same way everyone else does means there’s no need to escape any characters going in the RISC OS to SMB filename direction. I did figure out quite quickly that the Acorn 8-bit code page isn’t Windows-1252. Here’s what the ROM font with all of the 0×80-0×9f characters has: RISC OS CP-1252 UTF-16 Comment ------- ------- ------ ------- 0x80 0x80 U+20AC Euro symbol 0x81 -- U+0174 Capital Latin W with circumflex 0x82 -- U+0175 Small Latin W with circumflex 0x83 -- U+25F0 White square with upper left quadrant (my guess) 0x84 -- U+2718 Heavy ballot "X" mark (via RISC OS manual) 0x85 -- U+0176 Capital Latin Y with circumflex 0x86 -- U+0177 Small Latin Y with circumflex 0x87 -- -- Unmapped ("8^7" filler glyph in one font; blank in others) 0x88 -- U+21D0 Left double arrow (via RISC OS manual) 0x89 -- U+21D2 Right double arrow 0x8A -- U+21D3 Down double arrow 0x8B -- U+21D1 Up double arrow 0x8C 0x85 U+2026 Ellipsis 0x8D 0x99 U+2122 Trademark symbol 0x8E 0x89 U+2030 Per mille symbol (O/oo) 0x8F 0x95 U+2022 Bullet symbol 0x90 0x91 U+2018 Left single quote 0x91 0x92 U+2019 Right single quote 0x92 0x8B U+2039 Left single angle quote 0x93 0x9B U+203A Right single angle quote 0x94 0x93 U+201C Left double quote 0x95 0x94 U+201D Right double quote 0x96 0x84 U+201E Lower double quote 0x97 0x96 U+2013 En dash 0x98 0x97 U+2014 Em dash 0x99 ---- U+2212 Minus sign? (my guess) 0x9A 0x8C U+0152 Capital Latin OE ligature 0x9B 0x9C U+0153 Small Latin OE ligature 0x9C 0x86 U+2020 Single dagger symbol 0x9D 0x87 U+2021 Double dagger symbol 0x9E ---- U+FB01 Latin small ligature "Fi" 0x9F ---- U+FB02 Latin small liagture "Fl" The conservative behavior would be to only assume the presence of symbols that are in both the RISC OS code page and Windows-1252 (and therefore also in Unicode), and not the “W and Y with circumflex” (those are for the Welsh language, I figured out), double arrows, etc.. Everything is the same from 0xA0 to 0xFF, but 0×80 to 0×9F needs to be assumed to be the RISC OS code page, and with the Base64 escaping I described previously for anything outside of whatever the user defines as the “safe” range in the 0×81-0×9F area (I think 0×80 for the Euro symbol is the only one that’s the same). BTW, the Base64 encoding would need to be further modified to use, say, “=” instead of “/” for the encoding of %111111, since “/” would be the translation of “.” on the SMB side and I want to avoid ambiguity with filename extensions. Since “=” isn’t needed to pad the end of the Base64, it can replace “/” in the lookup table. My thinking is that users may have many files on the SMB side with mostly Acorn code page friendly characters and a few emoji or symbols that don’t map, or even the entire filename is in a non-Latin language. The user needs to be able to copy-and-paste a representation of that file where the runs of Base64-encoding UTF-16 look like they’re runs of encoding text, and the start and end of the Latin part of the filename is clear. UTF-8 isn’t suitable as an encoding where backwards compatibility is important because you have too much else going on, like 0×20 turning into 0xa0, and the range from 0×81 to 0×9F being unsafe in terms of the user’s font having glyphs for all of the characters. So I think I have all the details of name translation sorted out. Filename and file content caching is the next big challenge. I figured out I don’t need ModMalloc because the C library malloc()/free() will use the RMA automatically when called from SVC mode, which LanManFS always is in because it’s being called from SWIs. And my only question about the Mbuf Manager is whether it will improve performance to raise the minimum and maximum mbuf allocation size from the current 112 and 1280 bytes, which are tuned for SMB1, to the values that Apple’s kernel mbuf uses, which appear to be 256 and 2048 bytes. Those values are passed from the client when opening the Mbuf Manager, so I can play around with them. For the filename translation and filetype lookup (more on that in a second), and for readahead and write-behind, I want to create a separate dynamic area split into a small Heap Manager managed part to hold cached filenames and extents used by the larger portion of the dynamic area, which will be a direct map of runs (extents) of 4 KiB data blocks mapping directly to 4K aligned data from the SMB server. I’m going to set the default filename and data cache size to the system free physical RAM (not counting RAM used by apps or the RAM disk / 128, which works out to about 7.5 MB for a 1GB RAM PC, or 31.5 MB for a 4GB PC. By only putting cached data in the dynamic area, the user can be allowed to resize it and we can throw away any/all of the contents safely, and also the malloc and free of the filenames and data structures related to the block cache won’t fragment the shared RMA heap and usage / fragmentation can be tracked separately. I found an LWN post about the importance of readahead especially for CIFS (SMB) that gave me a good idea for how to do readahead and writebehind: https://lwn.net/Articles/897786/ I’ll save most of the details for later. I’m thinking of using 1/16 of the dynamic area for the heap to hold translations and extents mapping 4K blocks in the other 15/16 of the area to SMB files. What I’m really happy about is what I’ve worked out for filetype mapping, in addition to the usual “,fff” appendages to filenames to encode the 12-bit RISC OS type manually. Because the client code is from Apple, they have an “AAPL” extension negotiated in SMB2/SMB3 mode (this code supports up to SMB 3.0.2, which is more than new enough), which Samba supports, especially if you enable the “fruit” extension in your smb.conf: https://wiki.samba.org/index.php/Configure_Samba_to_Work_Better_with_Mac_OS_X With SMB extended attributes (supported by NTFS, Samba, Apple’s SMB server, etc.), with or without their optimized “Mac-to-Mac” support, we can get a metadata attribute containing the By starting with the https://developer.apple.com/library/archive/documentation/Miscellaneous/Reference/UTIRef/Articles/System-DeclaredUniformTypeIdentifiers.html I’ll still create and decode the “,fff” filename extensions by default, but my reasoning for wanting to integrate Apple’s already-existing “UTI” file type scheme is it would let Mac users work with .png, .gif, .jpg, and all the other standard file types and change the filename extension to whatever they want, and share the files to a RISC OS machine and it would be able to use the Mac’s file type in preference to the extension, and wouldn’t need to add or remove a “,fff” in that case. One last idea I had was dealing with notifications if other users have modified files in a directory or specific files. It’s possible to do this in both SMB1 and SMB2/3, and Apple’s client can do that. The problem on the RISC OS side is there’s no way for clients to register interest in specific directories. Filer is able to catch all file writes and modifications from the OS_UpCall 3 event, which would work for local changes to the file share but wouldn’t notice if you were adding/deleting/appending to files in directories opened by the Filer on another client (or on the SMB server itself). This isn’t something that’s in the bounty work items, but I’ll see if it’s possible, once everything else is working, to add an SMB notification for directories that are open in the Filer (perhaps only for SMB3, or wherever it’s most efficient), so that the user won’t have to select the Refresh menu item to see changes from other machines. |
Andrew McCarthy (3688) 605 posts |
It’s great to see this level of understanding, contribution, cooperation, and interest from the various participants. I’m not sure how useful the following link will be, but rather than trying to synthesize the main points I’ll just put the reference here. I think it might be useful, as it concerns a discussion around file name translation which may or may not feed into your thinking- it’s an old thread mainly about Git. |
Jake Hamby (8915) 21 posts |
Wonderful, thanks for the link! The reason I was excited about the idea to piggyback on Apple’s proprietary UTI filetype scheme is that they’ve been using it for quite a while and it has a straightforward way to add fallbacks in, say, “org.riscosopen.*”. In the ideal world, a Samba or macOS machine out of the box (and hopefully Windows too, but with a few more roundtrip SMB queries?) would see that the SMB server supports Apple’s method of extended attributes for metadata, and then all the types that Apple has registered as a “UTI” string would have a one-to-one correspondence to a RISC OS filetype, and every other type would be saved with a UTI content type of If you can preserve RISC OS filetypes round-trip to/from an SMB server using Apple’s UTI content type attribute, then that makes the experience friendlier for everyone, since no desktop OS recognizes extensions ending in “,” plus 3 hex digits. The only real reason to modify the filename in that way is so the type gets preserved when you .zip or .tar.bz2 a directory. Since some people will want to do that, I suppose it needs to be a setting users can toggle on/off depending on whether they want to always modify the filenames to add the RISC OS type or only when the server doesn’t support attributes. At any rate, there’s no need for me to care about the MIME type in the SMB client, since the UTI is a slightly simpler and less ambiguous string that the SMB server may already have for any/all of its files, so my modified Theoretically, someone could then write a Mac GUI editor for BBC BASIC programs that registered itself as being able to handle files of any extension and also to be the default for |
Jake Hamby (8915) 21 posts |
This project is quite humbling, I’ll say that much. I knew in advance it’d be a lot of work to tackle, and I was really deluding myself to imagine that I could write protocol implementations of SMB2/3 entirely on my own, without Apple’s battle-tested codebase already doing the heavy lifting and now only having to handle the RISC OS-specific features. Forget what I wrote about having special mappings for custom attributes. The “,fff” and “,llllllll,eeeeeeee” extensions are easy enough to add/remove and already a de facto standard. If there’s a “.” extension that maps onto something in a MimeMap lookup (and it’s easy to keep a small cache of those), then no need to add/remove the extension. That’s basically how LanManFS appears to behave now. I spent most of the past day or two finishing my plans for what would be a simple but effective block cache, considering the latency of SMB and the TCP/IP stack on RISC OS from end to end. For testing all of this in isolation, my current plan is to write a “Toy_SMB” simulated set of hardcoded SMB servers with hardcoded files and that will accept writes and throw away the data once it’s out of the block cache, and then replace it with “lorem ipsum” text if the user reads it back. All of the ops will have callbacks to simulate both sync and async delays. Taking 1/128th of the user’s physical RAM and rounding to the next MB seems reasonable (8MB for a 1GB system, or 32MB for a 4GB machine), and I don’t need to partition any of it off separately because I’m not planning to use very much RAM in the RMA space for the filename translations. By upper-casing everything on the RISC OS side and hashing to the uppercased name, then it’s an O(1) lookup into the open file table to find pointers to the filename itself, the entry for the next file with the same hash, in the event of a collision, and a pointer to the smb_node. So hashing and comparing on the full RISC OS canonical path seems easiest. Apple’s code keeps everything in UTF-8 wherever possible, so debug statements just use “%s”, and since I have what I think is a good plan for bidirectionally converting UTF-8 and something RISC OS-safe to print, then I can treat that as if it’s UTF-8, and not waste my time trying to actually deal with any strings as UTF-16 (except where Apple’s code already handles it). By limiting the max open files by default to something like 250, and with a user-customizable max under 65535 (if you plan to have >65535 open file handles on one RISC OS machine, I’d like to know what you’re up to!) then I can use 16-bit handles to find the smb_node (via the pointer from the open file table) including to track the owner of each page in the cache, and for file path hash collisions, as I just mentioned. For each 64KB page in the allocated page cache (a separate dynamic area), I can have a free bitmap and a dirty bitmap (write-behind caching is as important here as read-ahead caching) and a 16-bit clock that’s updated each time someone reads or writes, so that it’s simple to scan forward through the bitmap to find an allocation space for up to 256KB or perhaps 1MB of intended data, and if pages need to be evicted, it’s also simple to scan through the 16-bit clocks and add up to find the range with the oldest time collectively, and then evict those pages. Any dirty write pages will be excluded from this, unless the buffer is completely full of dirty pages and some have to be written out, but that’s unlikely to happen with an enforced ratio of max 50% (by default, or a user-defined percent ratio) of write space vs. read space. That way the read cache won’t be destroyed by a big sequence of writes. Much work lies ahead for me, but so far it’s very much been a labour of love, in addition to a project with potential for both direct and indirect financial incentives if everything lines up the way I hope it will. :-) |
Jake Hamby (8915) 21 posts |
I just thought of a question about user expectations: what sort of SMB printers do people want to print to, what do they look like, and what drivers do you use on the RISC OS side? PostScript? PCL? Epson dot-matrix printers? Do people have Samba set up to accept PostScript and translate it into the native printer language? I can imagine all of these possibilities. Do people ever set up their SMB server with a fake printer share that generate PDF files? I’m extremely curious to find out what types of printers people actually use, or would hope to be able to use. Thanks! |
Rick Murray (539) 13840 posts |
I never got SMB printing to work. Apparently the Livebox supports one plugging in a USB printer to share it, but it never worked from Windows, so I didn’t even bother with RISC OS. For my printing, I had previously used RemotePrinterFS to toss LJ6 bitmaps to my laser on port 9100. This isn’t mentioned at all in anything to do with the printer, but it works. ;) Now, I print to the laser using either RemotePrinterFS or AirPrint (part of Dave’s IPP work). Much less hassle than printing to PDF and transferring it to my phone to print. Now I can print directly from RISC OS, even in colour. 👍 I used to use SMB to transfer files to/from Windows and RISC OS, however with Android in the mix these days it’s often quicker to run a file manager that has the option of a built in FTP server that can be turned on. It’s a bit long-winded, but it’s much simpler than the deplorable Google Files app which is shamefully bad (and fails to sync at all far too often). For big things or lots of files, simplest way, honestly, is dump the files on a FAT formatted USB key and just, you know, use that. ;) |
Dave Higton (1515) 3525 posts |
SMB printing can only apply AFAIK to old-style printers. The number that still remain must be decreasing as they wear out and/or develop faults. Those that are present have been connected up to something for a long time, and are unlikely to need moving. For roughly 10 years or so now, printers have been accessed using IPP or Airprint (much the same thing), which requires bidirectional data transfer using HTTP or HTTPS over the LAN, often wireless. So I wouldn’t worry about it. |
Jake Hamby (8915) 21 posts |
Quick update: I’m making good progress on integrating SmbX source files into the OmniLanManFS repo for the core data structs, UTF-16 conversion, and error handling/conversion. I spent some time looking at FileSwitch and FileCore and I think my strategy of pre-allocating a relatively large dynamic area divided into 64 KB blocks to use for read-ahead and write-behind shared among all open files is sensible, since SMB 2.1 uses a “credits” system where the server meters out credits to clients which they use to pay for payloads (the variable-size data portion of the SMB packet, excluding headers). The formula is: CreditCharge = (max(SendPayloadSize, Expected ResponsePayloadSize) – 1) / 65536 + 1 This means there’s an incentive to read/write in multiples of 64 KiB where possible because your “cost” is rounded up to the nearest credit. Conversely, it makes sense to limit the max read-ahead to let’s say 256 KiB, to conserve credits, and also so the SMB client won’t have to copy and retain too many pages of data that the caller may never actually request. This strategy is reasonable for SMB1 as well. What I learned from the FileCore / FileSwitch code is that FileSwitch expects filing systems to return a natural block size (on a per-file basis, in the output of the open call) in the range of 64 to 4096 bytes, a limit that was raised from 1K to 2K (for CDFS) and then 4K (for advanced-format hard drives). FileSwitch handles reads/writes that are misaligned by reading/writing individual blocks (512b/1K/2K/4K) at the start and/or end (whichever is misaligned), with one big GetBytes/PutBytes call for the block-aligned middle of the transfer. It uses file buffers to cache only these sub-block transfers, while the large, block-size-aligned transfers aren’t cached. There’s some read-ahead and write-behind file caching logic in FileCore (see After reading the FileCore code, I realized I need to call the per-file lists that will hold locations and sizes of the corresponding read/write segments in the shared file cache “scatter lists”, not “extents”, which is already a synonym for the file size in RISC OS. I’ll post more later. I figured some of you might be interested in what I discovered about how FileSwitch works. |
Jake Hamby (8915) 21 posts |
I forgot to mention something else I learned: the current LanManFS expects to see I looked at the “native-zip” patches from GCCSDK for RISC OS cross-development, and it recognizes Does anyone know of apps that actually generate and/or consume the load/execute-address form of the |
David J. Ruck (33) 1635 posts |
Yes, there are lots of stuff with load/exec files, particularly games, old BBC stuff etc, and it all gets thoroughly shafted when backing up over the network as nothing seems to support ,llllllll-xxxxxxxx correctly. I did look at trying to get sunfish to do it years ago, but never got it working in the time available. |
Richard Walker (2090) 431 posts |
As druck says, it’s not just LanManFS. There will be similar in LanMan98, NFS (?), and other modern approaches like RPCEmu/Arculator HostFS and the Econet-compatible file servers (see StarDot for info on those). I don’t know if anyone has written a formal specification, nor if each implementation is consistent! |
Dave Higton (1515) 3525 posts |
I have no direct experience of using load/exec addresses. But: General advice is to be strict in what you generate and lenient in what you accept. So you should generate full 8-character strings in each field, separated by a hyphen. You should accept fields that contain up to 8 hex chars in each field, separated by a hyphen. There’s a question as to whether anything with fields of less than 8 characters, but with a leading zero, is really RISC OS load/exec addresses. It wouldn’t be a natural thing to do. Bearing in mind that a filename could have that format but be nothing to do with RISC OS, I’d be inclined to leave it untranslated. The above is what I’d do if I were coding it, anyway. |
Jake Hamby (8915) 21 posts |
Thanks again for the info! This project has been quite a challenge, but I think the last puzzle pieces have fallen into place in my plan and I actually will be able to do this. A few days ago I committed a checkpoint of my initial SMBX client import work for those who want to follow along. I realized last week that I needed to start a doc describing the SMBX client port so that’s what I did. I used Markdown format so it looks pretty on GitHub (and I accidentally discovered that VS Code has a Markdown preview if you press Shift-Ctrl-V) while still being readable as a text file. I was disappointed to discover that Apple didn’t open-source all of their crypto libraries, but only a portion, called “Common Crypto”, which doesn’t contain all the functionality needed by their SMB client. Apple’s “corecrypto” libraries were released for interested parties to review the code, but only licensed for “internal evaluation for 90 days”. So that was a bit disappointing. Fortunately, all the crypto/signing stuff is contained in I’m currently working on finishing up my adaptation of Mbuf Manager to the The last “puzzle piece” that I alluded to was the question of how to handle threading and sleeping on mutexes. I figured out that the appropriate library to use is the relatively new in RISC OS terms (from 2004) RTSupport library, which I can use to register realtime kernel threads to handle upcalls and asynchronous timeouts without having to modify the original code’s use of The way the Apple/NetBSD “SMBX” client works is similar to the BSD TCP/IP stack itself, using chains of mbufs to hold packet headers and data. The current LanManFS client does the same thing, but with its own “BufLib” API that I’ve removed and replaced with SMBX’s custom The internal kernel socket calls that SMBX uses can send and receive directly to a chain of mbufs, so I’m mapping that to the Speaking of which, I discovered the On Mac OS, small mbufs are 256 bytes and large mbufs are 2048 bytes, and I’m curious if it’s possible to change the system-wide pool based on the advisory minimum and maximum buf size passed when the client opens the Mbuf manager session. At the very least, the pool probably needs to be increased to 1 MiB or more, which I’m guessing is a boot-time option? Sadly, the Mbuf manager is one of the few remaining binary blobs in the RISC OS codebase, so I can’t just see what it expects. I’m still a few weeks away from completion, optimistically speaking, but at least now I know it’s possible to finish the plan I set up for myself. I also had the secondary goal to research the RISC OS TCP/IP stack and filesystem in order to figure out how much work the other open bounties are. The TCP/IP bounties in particular are going to be more difficult than what I’m currently working on, not less. As others have noted, the current stack is mostly mid-1990’s 4.4 BSD ported to mid-1990s RISC OS. So the stack has its own internal fake spl implementation and doesn’t use RTSupport, which wouldn’t exist for another 10 years. I’m hoping to be able to show how to integrate newer BSD kernel code in a way that will support SMP, 64-bit file offsets, and other enhancements to RISC OS. |
Dave Higton (1515) 3525 posts |
This is a naive suggestion: AcornSSL uses mbedTLS, which has some encryption and decryption functions, so you may find it’s worth a look. |
André Timmermans (100) 655 posts |
Beware that someone worked on the Mbuf Manager for the ROD network stack. This is work that as not yet been merged in the ROOL repository, you should check with ROD to avoid incompatibilities. |