URL_Fetcher and AcornHTTP
Dave Higton (1515) 3534 posts |
It has become apparent that there needs to be a way to send much larger data payloads, perhaps exceeding the available RAM space. The present API doesn’t permit this. Here’s a possible way that I’m putting up for review. At present, URL_GetURL and presumably HTTP_GetData (although that doesn’t appear to be fully documented) require R4 to point to any extra header lines and the body data, which must be contiguous; and R5 to be the length of the block in its entirety. My proposal is that, if flags (R0) bit 2 is set, R4 points to a null-terminated string containing any extra header lines, and R5 points to a data file’s name, null terminated. The header lines must be textual, so the module doesn’t need to be given the length of the block. The data body’s length can be found from the file’s length. If there are no extra header lines, R4 can either point to a null, or be null. OK, what does the team think? |
Matthew Phillips (473) 721 posts |
Sorry I’ve been a bit quiet: trying to get some programming done, and now hampered by a computer failure! Would it make sense for the Content-Type header to be generated with reference to the filetype of the data file? Probably not, as we wouldn’t want people have to register precious filetypes just because they need to submit stuff to APIs. It might also be useful for there to be a way to call URL_GetURL repeatedly to pass chunks of request body at a time, to allow it to be done in RAM if an application would work better that way. One tricky aspect is how the URL Fetcher module would know that the underlying protocol module can support the new flags. We might need changes to the protocol register SWI so that the protocol module (AcornHTTP) can indicate its API capabilities. The URL Fetcher docs envisage the URL Fetcher being enhanced in the future to add extra protocol SWIs, and to fail gracefully if the SWI is not known by the underlying module, but the docs do not envisage the API of an existing SWI being enhanced. |
Dave Higton (1515) 3534 posts |
Definitely not, for the reason you stated.
Yes, that would be good. In the case of the printer dumper I’m working on, that might save having to create a temporary file, with all the savings of time and drive life that would accrue. So would this just be another two flag bits, one to say get from RAM, and another to say this is a continuation chunk? |
Matthew Phillips (473) 721 posts |
Sounds good. There are other changes that need to be made to HTTP_GetData which is essentially the AcornHTTP implementation of Protocol_GetData in order to provdie better cookie support. If I hadn’t got distracted by another project for the last two weeks I had intended to draft a proposal. Essentially, for the latest cookie specification to work, the fetcher needs to know the URL of the parent page and also, I think, the URL of the referrer. This is in order to be able to implement SameSite restrictions. For example, if the HTML of a page from domain-one.com includes an image link with the source being from domain-two.com, the cookie rules mean that you might have to avoid sending domain-two.com tookies when fetching the image, whereas if the image from domain-two.com was embedded in a page from the same domain, you would send those cookies. This needs a little bit more thought, and will definitely require some extra registers and alterations both in the URL_Fetcher module and AcornHTTP. No reason to delay firming up your proposals, though. If we can put both API changes through together that would be nice. |
Dave Higton (1515) 3534 posts |
I’m all for working together, but I’m not clear what I can do to help beyond discussing the API and testing the results. But if I can do something, I will! |
Dave Higton (1515) 3534 posts |
Does anyone know off the top of their head when the fetcher modules started to support https? |
Rick Murray (539) 13850 posts |
Depends upon what you mean. I think it always has, as the fetchers were released by Acorn to support the URL module, all of which was a part of Browse. If you mean the contemporary version, August 2018 was when AcornSSL was modernised: https://www.riscosopen.org/forum/forums/8/topics/11950 |
Chris Mahoney (1684) 2165 posts |
Edit: Rick sneaked in there with a more detailed answer :) |
Dave Higton (1515) 3534 posts |
… and I think Rick knows why I asked, and also that the answer may not have any bearing on the problem that caused me to ask :-) But thanks, anyway! |
Rick Murray (539) 13850 posts |
Okay, Dave has asked me to chip in, so I shall. Hold on, here goes! First up, I had three printers.
I say “had” because the Epson was a deplorable piece of [poop emoji]. After setting it up the other day, I hooked it up yesterday and some of the nozzles were clogged (already!?). A clean fixed that, but it took something like six minutes (walking across the field to feed kitty took less time). After doing some testing, it died. Like motherboard-is-a-short-circuit style failure. So out of the box the scanner was faulty, and it managed to work “for around four or five hours”. Trust me, with build quality like that, I will never touch Epson again. A quick note – AirPrint uses IPP as a transport mechanism; in as much as I suspect that IPP developed from what Apple was doing with AirPrint. The primary difference is that IPP Everywhere uses an open and documented raster format (PWG) whereas AirPrint uses a mostly undocumented format (URF) which has been reverse decoded, sort of. There are three primary PWG forms that I can tell. 1 bpp (probably aimed at lasers), 8bpp mono (often supported by both lasers and inkjets), and 24bpp RGB (inkjets). I suppose colour lasers too, but that sort of thing isn’t in my budget. ;-) There are two methods that I use to talk to my printers. The first is set of programs that I wrote myself using direct socket access. This worked on all of my printers (they don’t use encryption). One program (FindIPP) will enquire and parse the IPP data block for understanding the printer’s capabilities. The second method is a pair of programs written by Dave. Much like mine, there’s a program called “Proto” that enquires the IPP data block, but doesn’t yet parse it. It also seems to lose a lot of data (it says 631 bytes received from the Laser, which will have sent ~10K), but it might be because it looks like it’s only dumping the data received when the connection finishes, so if it loops for a few blocks, it’ll only be reporting the last of them. Not a big deal, it’s easily fixed and just a test to check that something happens. As far as I can determine from looking at the source code, what Dave is putting together to send to the printer is the same as I have. We’re pretty much sending the bare minimum necessary to port 631 (IPP). Now to AcornHTTP. The Epson and the laser both responded to Proto with an HTTP 200 code and a block of data; but the HP inkjet refused the connection with a 505 error. I have finally cracked this. I tweaked Proto to ask for the headers to be included in the returned data, and added logging of everything received. It begins thus: BDF8 ¦48 54 54 50·2F 31 2E 30|20 35 30 35·20 48 54 54 : HTTP/1.0 505 HTT : +0 BE08 ¦50 20 56 65·72 73 69 6F|6E 20 4E 6F·74 20 53 75 : P Version Not Su : +16 BE18 ¦70 70 6F 72·74 65 64 0D|0A 53 65 72·76 65 72 3A : pported..Server: : +32 BE28 ¦20 48 50 20·48 54 54 50|20 53 65 72·76 65 72 3B : HP HTTP Server; : +48 So we now have a working example of something that will refuse to connect to an HTTP 1.0 client. I tried changing the word at +6DA4 in AcornHTTP (in memory!) to a simple So count this as a +1 for some form of HTTP/1.1 support. The next, and potentially larger, problem is that POST requests are seriously flawed. You see, in the days when AcornHTTP was written, it was likely that POST would be used for things like form requests and the like. It’s always been possible to POST pictures and such, but remember this was the dial-up days. ;-) However, the POST method absolutely does not scale up to dumping a megabyte to a device. I am going from memory here, I didn’t save the data and the Epson is dodo so I can’t repeat; however the URL_Status SWI returned alternatively 0,1,0,1,0,1 (etc) until it did something like &1F and reported a 200 response. This was with a file that was about a megabyte (a page at 300dpi). Higher resolution? Well, that would be a much larger file. URL / AcornHTTP has, as far as I can tell, absolutely no way of knowing that it’s actually sending the POST content to the server, and how far along it is. I would suggest a new SWI, URL_Status2 which can handle reporting the activity in both directions, rather than attempting to find ways to bodge the values into the existing URL_Status SWI. Of course, if sending large payloads is done as a set of chunks (as discussed above) then there may be some overlap here with being able to tell the status of the transmission. There. My €0,02 worth. Inflation is a bitch, huh? :-) However, to end on a happier note… while Dave’s code is still very experimental, I did successfully print from OvationPro to a file, and send that file via IPP/WiFi to a printer (the Epson) to get a correct printed page out of the printer. As much as it is test code at this stage, it can work (except for fussy printers – looking at you HP!) which means that IPP support is that much closer for RISC OS. |
Rick Murray (539) 13850 posts |
One thing I will add, which isn’t anything to do with URL/AcornHTTP but may well be important for proper IPP support… RISC OS really needs to have a “Bonjour” service. As far as I can tell with about thirty seconds of Googling is that it appears to be some sort of modified DNS broadcast, to which interested devices can respond. This is important for two reasons. Firstly it’s a lot better for locating IPP printers than cycling through every IP address (x.×.×.1 → x.×.×.254); but more importantly while it is normal for the IPP path to be My HP is a 505 (doesn’t like HTTP/1.0; but my code uses |
Richard Walker (2090) 431 posts |
Silly question… If AcornHTTP is bad at uploads, has anyone tried using Browse to upload a large file via HTTP? I assume that uses the same code paths in URLFetcher and AcornHTTP. |
Chris Mahoney (1684) 2165 posts |
HTTP 505 is actually the wrong error code1 for that, but your analysis is probably still correct. I suspect that the printer does indeed require 1.1 but isn’t indicating this correctly. 1 The spec (RFC 7231, section 6.6.62) says that 505 means “the server does not support the major version of HTTP that was used in the request”. It’s only the minor version that’s changed in this case (RFC 7230, section 2.6). 2 I knew printers were evil! |
Steve Pampling (1551) 8172 posts |
Nasty. Some network services are “chatty”, Bonjour tends toward the verbal diarrhoea end |
Dave Higton (1515) 3534 posts |
We tried an old Brother HL2250DN last night at RONWUG. It wants /ipp. But it doesn’t understand PWG-Raster or URF or, well, anything much other than PCL or text. |
Dave Higton (1515) 3534 posts |
Conversation last night suggested that chunked transfer is what we need to support big uploads. A single flag bit to say “more to come”, which keeps it compatible with previous software. |
Alan Adams (2486) 1149 posts |
Coincidence of numbers? |
Rick Murray (539) 13850 posts |
There’s no code for “I’m an arse and I’m going to whinge about something inconsequential”.
I’ll happily throw my support behind anything better that works in real life.
I’m not saying to replace the Status SWI. That’ll keep working as before. But a Status2 SWI can provide more information to those programs that might need better feedback than is currently provided. |
Rick Murray (539) 13850 posts |
Strange it supports IPP. Brother’s site says it doesn’t do AirPrint (URF), and if it doesn’t do that it’s safe to assume it won’t cope with IPP Everywhere (PWG). Hmm… |
Rick Murray (539) 13850 posts |
Is anybody able to send me a copy of the AcornHTTP module hacked to always identify itself as HTTP/1.1? |
Steffen Huber (91) 1953 posts |
In the light of this
and that
I have to ask…I have no practical experience with using AcornHTTP and friends, but all I have gathered so far is that it has already consumed quite a lot of debugging time, and is severely underpowered and out-of-date in various respects. I would suggest to just use libcurl and forget about all the homebrew-RISC OS-stuff (providing a shim module around libcurl to provide API compatibility can surely be done if someone insists to have client code running as OS code). The AcornSSL/mbedTLS stuff also seems to be a dead end, mbedTLS is now widely considered (e.g. by the (lib)curl community) as no longer recommended for use, not least because it still has not gained official TLS 1.3 support. Or is there some hidden beauty of URLFetcher/AcornHTTP/AcornSSL that I am missing? |
Rick Murray (539) 13850 posts |
Perhaps not entirely official, but there is support for it, so it’s not a complete unknown.
The curl community says: All versions of SSL and the TLS versions before 1.2 are considered insecure and should be avoided. Use TLS 1.2 or later.
Not terribly surprising given that the fetcher core dates from the Acorn era. It is worth noting that the SSL side of things is separate to the HTTP document fetching. HTTP has been updated, the fetcher is still to do. It hasn’t really had any love in a quarter century. Remember Browse? That’s what we’re talking about.
I doubt it, but it exists and stuff uses it. Patching up what we have to be less grotty may well be an order of magnitude simpler than trying to bash libcurl into some sort of form that can be sanely used from RISC OS. Oh, and as a native module with SWIs and not an elf library or anything like that. If there’s money and developers, then curl would be a good idea since it appears to support numerous formats out of the box, like IMAP, for example. But, as it often is, that’s not the case here. Unless you’re volunteering? ;) |
Rick Murray (539) 13850 posts |
According to SSLlabs, both ssl.com and curl.se run on servers that only support TLS 1.2. This site only goes up to 1.2, but gets slapped down for supporting 1.0 and 1.1 as well. Google does 1.3 (as one would expect), but it also supports 1.0 and 1.1 so… |
Dave Higton (1515) 3534 posts |
Look what I just found in Networking.Fetchers.HTTP.c.header: /* Client does NOT need to be aware that we are using HTTP/1.1. We * can lie to it safely. We MUST do this in case the server used a * chunked transfer encoding (which we are removing) since if we were * to leave an HTTP/1.1 response without an encoding and without a * content-length header, our client MUST reject the message as invalid. */ header = "HTTP/1.0"; Also I’ve just started reading about Chunked Transfer, and found that it has a very specific meaning, and it’s normally for responses AFAICS. What I was thinking of was much simpler, and something that would not be visible at all as part of the HTTP transaction; merely the client app telling AcornHTTP that there will be more data to come, so don’t close the connection until the client app says that there will not be any more to come. So it would just be part of the API between AcornHTTP and the client application. |
Dave Higton (1515) 3534 posts |
I think some of the criticisms of the fetchers are unjustified. Yes, they need to support HTTP 1.1 (and preferably before HTTP 2 becomes the norm). Yes, they need to offer a way to send arbitrarily large POST payloads. But a lot of debugging time? I only know of one instance – Rick’s 505 above. All the other debugging I’m aware of is the fault of the client apps, not the fetchers. Oh, and we can fairly blame the docs too. And underpowered? I don’t even know what you mean, Steffen. Perhaps you’d like to explain. When I’ve used the URL module, the required code has been very simple, and handles plain and encrypted with zero effort. Zero. So I do not see any reason to throw compatibility out of the window. I see reasons to update the fetchers. |