SMB2/3 protocol
Paolo Fabio Zaino (28) 1882 posts |
Fact. At this point it has been calcoulated that 64% of the entire internet traffic it’s “bot” (aka crawlers, scanners, also malware and spam), not human generated. (read full stop, aka not an opinion). Crawling and web scrapping are techniques you can at this point use even to build your own web analyzers, few libraries to reinforce the concept with data: In Golang: In Rust: https://github.com/mattsse/voyager There are obviously libraries in every language, included Python, JS, C++, you name it. Crawlers can obviously be specialised in seeking for code, news, specific topics like political activism etc. Bots can be designed to connect to socials and chat systems, this includes also the old school stuff like IRC, but also Discord etc. Obviously there are also more commerical tools like Maltego, that can gater information across the internet searching for whatever is the topic you are interested into. For instance the RISC OS Search engine (which keeps apearing and disapearing) is made through data collection techniques and crawling. Datasets for AI can be built from every resource that can be defined as “reliable”, where in this case reliable means the code presented works, hence it’s a good source of examples. ARM Assembly is still a valuable source, for instance, not just for ChatGPT amenities, but also to find malware in executables etc. So not all AI training is done to trigger Rick’s posts on here ;)
One more misinformed horn in the orchestra… (facepalm). Here is the response of the guy who started this and his apologies for that mistake: https://www.jeffgeerling.com/blog/2023/i-was-wrong RedHat is not breaching the GPL, end of the story. They are now part of IBM and IBM profit’s levels are most likely higher than a company that originally was tryign to be disruptive on the market, not to mention IBM has to cover the ROI from aquiring RH (which was a very expensive acquisition). So the reason behind their changes are purely economical, they can’t break the GPL, becasue if they do, they are the ones who will lose the sources. Hope this makes sense, if it doesn’t read again the youtuber’s apologies.
This is true, can’t argue with that. But the original post was for “safe” public repositories of code, the answer to this is, if you are on a cruciade against AI (or more generally against people taking your code from whatever public repository, included your own website), then DON"T publish it at all :) As others have mentioned, you can still use git (which has nothing to do with all of the above) and run gitlab on your own premise, without exposing it to crawlers. |
Stuart Swales (8827) 1357 posts |
At least the blank sections don’t contain utterly duff code! |
Paolo Fabio Zaino (28) 1882 posts |
Agreed, this is an example of how to generate them automatically from git on github (but the action should work on gitlab as well):
There are also pre-built actions, so even easier to use: https://github.com/marketplace/actions/release-notes-generator In general everything that is possible done with legacy tools, can also be done on tools like github and gitlab. The automation side can also be implemented on legacy systems, via some scripting and similar. |
Paolo Fabio Zaino (28) 1882 posts |
That code looks suspiciously like an old U-BOOT patch I have seen somewhere else, just with MEMC define value different… it’s GNU ASM, so possibly a mix of ARM code / patches with values that may come from old Archimedes documentation. If so, technically isn’t copying, or not a worst copyign that a human doing the same… But again, if AI datasets crawlers are a problem for you, there is only one thing to do, keep your code fully private. Now, I think we have gone really off topic very very much and I hope this discussion can get back on the SMB2/3 work :( P.S. my apologies, I have contributed to move it even more off topic. |
Rick Murray (539) 13839 posts |
Given that I pretty much said that, exactly how am I a misinformed horn? |
Paolo Fabio Zaino (28) 1882 posts |
Sorry, then I misunderstood what you meant. |
Steve Fryatt (216) 2105 posts |
There’s Harriet Bazley’s SideDiff, although I appear to have a more recent version than the one available for download!
The problem is that it isn’t as easy to do with a copy/paste of directories as it is with a proper VCS. That’s the reason why people started to write the things in the first place… :-) |
Jake Hamby (8915) 21 posts |
Dave, thanks for the suggestion to look at Mbed TLS. I didn’t know about that possibility, and it does look like it could be smaller and simpler. The most unusual requirement of SMB 3 is to use a specific “NIST 800-108 section 5.1” key derivation algorithm, which I’ve just now learned from looking at the Samba source file that implements The benefit of OpenSSL/LibreSSL is it has a plug-in architecture that lets you type commands like: openssl kdf -keylen 10 -kdfopt digest:SHA2-256 -kdfopt key:secret -kdfopt salt:salt -kdfopt info:label HKDF And it eventually, I see now, would look up in its plug-in architecture the same sequence of SHA-256 hashes as Samba just implements inline, using GnuTLS in Samba’s case. That’s very interesting: I would’ve guessed that Samba would have more than one option for encryption libraries. For the past few hours, I’ve been pondering the difference between two Wireshark captures of downloading the same PDF scan of an Acorn User magazine using the built-in Python 3 web server ( Replacing or upgrading the TCP/IP stack won’t solve the underlying issue with extremely slow download speeds, which is that the TCP window fills up while the sender (Web server) is waiting for ACK responses, which arrive invariably about 10 ms later. Now what runs at 100 Hz in RISC OS? That’s right: TickerV. Basically, the download is so fast that the TCP/IP client app, at least the GUI ones, can’t catch up without calling If my diagnosis is correct, and I think the evidence from the packet capture (TCP window full then waiting 0.01 sec for an ACK before sending more) is fairly conclusive, then I shouldn’t have much to worry about with my socket abstraction layer for the SMB client because the combination of being driven by Internet_Event messages, non-blocking socket calls, and running I/O background threads at real-time priority, should enable the file data cache that I’ve designed already to window the file data into the 500 KB or more (for a 50 MB/s download speed) that the app has to process every OS tick in order achieve that read speed. Since in my ideal future, the SMB2/3 client will be even faster than the onboard SD card reader of a typical SBC, and everything will just work and be fast, I’m encouraged to keep working on it because I don’t think the GUI Internet client bottleneck will apply to the code I’ve written. I’m unsure what the best strategy is for speeding up apps that aren’t written with RISC OS in mind, that don’t have custom code that creates RTSupport threads and handles Internet_Event callbacks and uses SyncLib mutexes. I haven’t looked at the UnixLib code lately, and I remember it’s quite tricky, but I suspect there must be ways to improve select(), poll(), and the emulated UNIX processes and threads to try to wake up the process that received data (or whose TCP window is now empty to send more data, since the same issue in reverse will happen with uploads) without delaying to the next OS tick. The problem isn’t having a 100 Hz system tick, but that the OS is preferring to idle for some reason until the next tick rather than wake up the app that needs to be awoken to handle the data. |
Chris Mahoney (1684) 2165 posts |
Bingo.
On that note, I should probably put a robots.txt on my site that disallows downloads. Of course, the next question is figuring out how to get them to delete anything they’ve already harvested, which I suspect won’t happen automatically! |
Steve Pampling (1551) 8170 posts |
Modern RO hardware is multicore, with all except one core doing sod all. |
Paolo Fabio Zaino (28) 1882 posts |
Guys, Jake’s new update almost got lost in the usual off-topics discussions… :( |
Steve Pampling (1551) 8170 posts |
I thought that pointing out that Pi hardware had cores lying idle and that every other OS seems to offload network i/o to another processor might be on topic. |
Colin Ferris (399) 1813 posts |
Using a spare core to do the Printing :-) |
Rick Murray (539) 13839 posts |
Steve – yeah, but since network stuff tends to get offloaded to the network interface rather than just a different core in the main processor unit (which other OSes would be using all of already)… we’d still have n>1 cores doing sod all. |
Rick Murray (539) 13839 posts |
Some reason is this – I have a poll speed monitor. It simply sits on null poll and counts how many times it gets called in a second. Well, the output is messed up as a modern machine not doing anything gets far more than 1,000 polls per second (it was written in the days of my A5000 so it only copes with three digits). So if the desktop can do upwards of a thousand polls a second, and your switcher is working at 100ths of a second increments, there’s a lot of potential for the system to get bored and fall asleep. ;) Did the idea of the FastTickerV (millisecond) and related FastCallAfter ever gain any traction? |
Dave Higton (1515) 3525 posts |
Why should the app wait for an OS tick to do something? My inclination is to process everything that’s there as soon as it’s there, and only yield when either there isn’t anything there, or a small number of OS ticks have occurred, whichever is the sooner. Rather than waiting for an OS tick and then do something, instead wait for an OS tick and do nothing :-) |
Steve Pampling (1551) 8170 posts |
It does? I must talk to one of those IT guys ;) |
mikko (3145) 123 posts |
Hi @Jake, did you make any further progress with this? It was looking quite promising! |
Paul Sprangers (346) 524 posts |
I’d be interested too… |
Richard Walker (2090) 431 posts |
It was a while back, so maybe the earlier links Jake posted have been forgotten, so: it looks like Jake has helpfully put his work-in-progress onto GitHub: https://github.com/jhamby/RiscOS-OmniLanManFS/tree/jhamby_smb2 Browsing through the commits, I can understand the initial smaller changes, but when the Apple SMB2 library drops in… :-o. For someone who is familiar with such things (RISC OS, C, SMB protocol) it looks like a great starting point. |