URL_Fetcher and AcornHTTP

59 posts, 10 voices

Pages: 1 2 3

Mar 6, 2022 5:42pm Dave Higton (1515) 3525 posts	It has become apparent that there needs to be a way to send much larger data payloads, perhaps exceeding the available RAM space. The present API doesn’t permit this. Here’s a possible way that I’m putting up for review. At present, URL_GetURL and presumably HTTP_GetData (although that doesn’t appear to be fully documented) require R4 to point to any extra header lines and the body data, which must be contiguous; and R5 to be the length of the block in its entirety. My proposal is that, if flags (R0) bit 2 is set, R4 points to a null-terminated string containing any extra header lines, and R5 points to a data file’s name, null terminated. The header lines must be textual, so the module doesn’t need to be given the length of the block. The data body’s length can be found from the file’s length. If there are no extra header lines, R4 can either point to a null, or be null. OK, what does the team think?

Mar 8, 2022 9:37pm Matthew Phillips (473) 721 posts	Sorry I’ve been a bit quiet: trying to get some programming done, and now hampered by a computer failure! Would it make sense for the Content-Type header to be generated with reference to the filetype of the data file? Probably not, as we wouldn’t want people have to register precious filetypes just because they need to submit stuff to APIs. It might also be useful for there to be a way to call URL_GetURL repeatedly to pass chunks of request body at a time, to allow it to be done in RAM if an application would work better that way. One tricky aspect is how the URL Fetcher module would know that the underlying protocol module can support the new flags. We might need changes to the protocol register SWI so that the protocol module (AcornHTTP) can indicate its API capabilities. The URL Fetcher docs envisage the URL Fetcher being enhanced in the future to add extra protocol SWIs, and to fail gracefully if the SWI is not known by the underlying module, but the docs do not envisage the API of an existing SWI being enhanced.

Mar 8, 2022 11:01pm Dave Higton (1515) 3525 posts	Would it make sense for the Content-Type header to be generated with reference to the filetype of the data file? Probably not, as we wouldn’t want people have to register precious filetypes just because they need to submit stuff to APIs. Definitely not, for the reason you stated. It might also be useful for there to be a way to call URL_GetURL repeatedly to pass chunks of request body at a time, to allow it to be done in RAM if an application would work better that way. Yes, that would be good. In the case of the printer dumper I’m working on, that might save having to create a temporary file, with all the savings of time and drive life that would accrue. So would this just be another two flag bits, one to say get from RAM, and another to say this is a continuation chunk?

Mar 9, 2022 8:39pm Matthew Phillips (473) 721 posts	Sounds good. There are other changes that need to be made to HTTP_GetData which is essentially the AcornHTTP implementation of Protocol_GetData in order to provdie better cookie support. If I hadn’t got distracted by another project for the last two weeks I had intended to draft a proposal. Essentially, for the latest cookie specification to work, the fetcher needs to know the URL of the parent page and also, I think, the URL of the referrer. This is in order to be able to implement SameSite restrictions. For example, if the HTML of a page from domain-one.com includes an image link with the source being from domain-two.com, the cookie rules mean that you might have to avoid sending domain-two.com tookies when fetching the image, whereas if the image from domain-two.com was embedded in a page from the same domain, you would send those cookies. This needs a little bit more thought, and will definitely require some extra registers and alterations both in the URL_Fetcher module and AcornHTTP. No reason to delay firming up your proposals, though. If we can put both API changes through together that would be nice.

Mar 10, 2022 8:42pm Dave Higton (1515) 3525 posts	If we can put both API changes through together that would be nice. I’m all for working together, but I’m not clear what I can do to help beyond discussing the API and testing the results. But if I can do something, I will!

Mar 13, 2022 8:39pm Dave Higton (1515) 3525 posts	Does anyone know off the top of their head when the fetcher modules started to support https?

Mar 13, 2022 9:26pm Rick Murray (539) 13840 posts	Depends upon what you mean. I think it always has, as the fetchers were released by Acorn to support the URL module, all of which was a part of Browse. If you mean the contemporary version, August 2018 was when AcornSSL was modernised: https://www.riscosopen.org/forum/forums/8/topics/11950

Mar 13, 2022 9:27pm Chris Mahoney (1684) 2165 posts	Looks like 1998. Edit: Rick sneaked in there with a more detailed answer :)

Mar 13, 2022 10:03pm Dave Higton (1515) 3525 posts	… and I think Rick knows why I asked, and also that the answer may not have any bearing on the problem that caused me to ask :-) But thanks, anyway!

Mar 16, 2022 6:56pm Rick Murray (539) 13840 posts	Okay, Dave has asked me to chip in, so I shall. Hold on, here goes! First up, I had three printers. An HP 3630 inkjet (AirPrint + IPP) A Samsung M-2022W laser (AirPrint) An Epson XP-345 (AirPrint + IPP) I say “had” because the Epson was a deplorable piece of [poop emoji]. After setting it up the other day, I hooked it up yesterday and some of the nozzles were clogged (already!?). A clean fixed that, but it took something like six minutes (walking across the field to feed kitty took less time). After doing some testing, it died. Like motherboard-is-a-short-circuit style failure. So out of the box the scanner was faulty, and it managed to work “for around four or five hours”. Trust me, with build quality like that, I will never touch Epson again. [oh, and a quick Google suggests that Epson printers have issues with going off and never coming back on again] A quick note – AirPrint uses IPP as a transport mechanism; in as much as I suspect that IPP developed from what Apple was doing with AirPrint. The primary difference is that IPP Everywhere uses an open and documented raster format (PWG) whereas AirPrint uses a mostly undocumented format (URF) which has been reverse decoded, sort of. Dave’s IPP driver currently outputs PWG. If it did URF as well, in theory this would allow support for older AirPrint-capable devices. The underlying transport appears to be the same for both. There are three primary PWG forms that I can tell. 1 bpp (probably aimed at lasers), 8bpp mono (often supported by both lasers and inkjets), and 24bpp RGB (inkjets). I suppose colour lasers too, but that sort of thing isn’t in my budget. ;-) [I’m sure there’s also CYMK and some other fruity stuff, but they aren’t important for us] There are two methods that I use to talk to my printers. The first is set of programs that I wrote myself using direct socket access. This worked on all of my printers (they don’t use encryption). One program (FindIPP) will enquire and parse the IPP data block for understanding the printer’s capabilities. The second program of mine is one that I threw together to send a PWG raster image to the printer. It “sort of” works, in as much as you might expect when you dump about seven hundred kilobytes to a buffer that seems to be about 20K in size. It starts to print correctly, and then goes wrong. Obviously. ;-) But that’s okay, the important part is that it starts correctly. The second method is a pair of programs written by Dave. Much like mine, there’s a program called “Proto” that enquires the IPP data block, but doesn’t yet parse it. It also seems to lose a lot of data (it says 631 bytes received from the Laser, which will have sent ~10K), but it might be because it looks like it’s only dumping the data received when the connection finishes, so if it loops for a few blocks, it’ll only be reporting the last of them. Not a big deal, it’s easily fixed and just a test to check that something happens. The second program is “PrintIPP” which sends the PWG raster to the printer. The primary difference between these is that Dave uses the URL module, which means that it can easily cope with SSL, which is something my code doesn’t do. Plus, you know, it’s supposed to just work. ;-) As far as I can determine from looking at the source code, what Dave is putting together to send to the printer is the same as I have. We’re pretty much sending the bare minimum necessary to port 631 (IPP). Now to AcornHTTP. The Epson and the laser both responded to Proto with an HTTP 200 code and a block of data; but the HP inkjet refused the connection with a 505 error. I have finally cracked this. I tweaked Proto to ask for the headers to be included in the returned data, and added logging of everything received. It begins thus: BDF8 ¦48 54 54 50·2F 31 2E 30\|20 35 30 35·20 48 54 54 : HTTP/1.0 505 HTT : +0 BE08 ¦50 20 56 65·72 73 69 6F\|6E 20 4E 6F·74 20 53 75 : P Version Not Su : +16 BE18 ¦70 70 6F 72·74 65 64 0D\|0A 53 65 72·76 65 72 3A : pported..Server: : +32 BE28 ¦20 48 50 20·48 54 54 50\|20 53 65 72·76 65 72 3B : HP HTTP Server; : +48 So we now have a working example of something that will refuse to connect to an HTTP 1.0 client. I tried changing the word at +6DA4 in AcornHTTP (in memory!) to a simple `MOV R9, #1` rather than a conditional, but this didn’t work. It looks like it will bump up to HTTP/1.1 if one is doing a GET but if it’s a POST then it’s a straight HTTP/1.0. Time, I think, that this gets modernised. ;-) So count this as a +1 for some form of HTTP/1.1 support. The next, and potentially larger, problem is that POST requests are seriously flawed. You see, in the days when AcornHTTP was written, it was likely that POST would be used for things like form requests and the like. It’s always been possible to POST pictures and such, but remember this was the dial-up days. ;-) However, the POST method absolutely does not scale up to dumping a megabyte to a device. I am going from memory here, I didn’t save the data and the Epson is dodo so I can’t repeat; however the URL_Status SWI returned alternatively 0,1,0,1,0,1 (etc) until it did something like &1F and reported a 200 response. The problem is, that the initial prints kept failing because Dave’s code was timing out. It worked when I bumped up the timeout to a silly large value. This was with a file that was about a megabyte (a page at 300dpi). Higher resolution? Well, that would be a much larger file. URL / AcornHTTP has, as far as I can tell, absolutely no way of knowing that it’s actually sending the POST content to the server, and how far along it is. Looking at the status words, it seems as if it’s all concerned with receiving data, with no real thought regarding sending larger amounts of data. I would suggest a new SWI, URL_Status2 which can handle reporting the activity in both directions, rather than attempting to find ways to bodge the values into the existing URL_Status SWI. Of course, if sending large payloads is done as a set of chunks (as discussed above) then there may be some overlap here with being able to tell the status of the transmission. There. My €0,02 worth. Inflation is a bitch, huh? :-) However, to end on a happier note… while Dave’s code is still very experimental, I did successfully print from OvationPro to a file, and send that file via IPP/WiFi to a printer (the Epson) to get a correct printed page out of the printer. As much as it is test code at this stage, it can work (except for fussy printers – looking at you HP!) which means that IPP support is that much closer for RISC OS.

Mar 16, 2022 7:12pm Rick Murray (539) 13840 posts	One thing I will add, which isn’t anything to do with URL/AcornHTTP but may well be important for proper IPP support… RISC OS really needs to have a “Bonjour” service. As far as I can tell with about thirty seconds of Googling is that it appears to be some sort of modified DNS broadcast, to which interested devices can respond. This is important for two reasons. Firstly it’s a lot better for locating IPP printers than cycling through every IP address (x.×.×.1 → x.×.×.254); but more importantly while it is normal for the IPP path to be `/ipp/print`, this isn’t actually mandatory. My older HP inkjet (yes, I have a fourth, but this is really old and only just about manages AirPrint) actually uses `/ipp/printer`. There was no status output from the printer (either printed or via the built-in server) that mentioned this. I had to install an app on my phone (Service Browser by Andriy Druk) to pick up the information and report on the expected paths. It may be that if `/ipp/print` is a 404, Bonjour may be the only way to tell what is expected. My HP is a 505 (doesn’t like HTTP/1.0; but my code uses `/ipp/print`). The Epson required `/ipp/print`, the ancient HP required `/ipp/printer` and the Samsung doesn’t care (it says it wants `/ipp/print` but it will happily respond to `/` via the IPP port).

Mar 16, 2022 9:57pm Richard Walker (2090) 431 posts	Silly question… If AcornHTTP is bad at uploads, has anyone tried using Browse to upload a large file via HTTP? I assume that uses the same code paths in URLFetcher and AcornHTTP.

Mar 16, 2022 10:27pm Chris Mahoney (1684) 2165 posts	So count this as a +1 for some form of HTTP/1.1 support. HTTP 505 is actually the wrong error code¹ for that, but your analysis is probably still correct. I suspect that the printer does indeed require 1.1 but isn’t indicating this correctly. ¹ The spec (RFC 7231, section 6.6.6²) says that 505 means “the server does not support the major version of HTTP that was used in the request”. It’s only the minor version that’s changed in this case (RFC 7230, section 2.6). ² I knew printers were evil!

Mar 17, 2022 8:44am Steve Pampling (1551) 8170 posts	RISC OS really needs to have a “Bonjour” service. Nasty. Some network services are “chatty”, Bonjour tends toward the verbal diarrhoea end

Mar 17, 2022 9:23am Dave Higton (1515) 3525 posts	We tried an old Brother HL2250DN last night at RONWUG. It wants /ipp. But it doesn’t understand PWG-Raster or URF or, well, anything much other than PCL or text.

Mar 17, 2022 9:26am Dave Higton (1515) 3525 posts	Conversation last night suggested that chunked transfer is what we need to support big uploads. A single flag bit to say “more to come”, which keeps it compatible with previous software.

Mar 17, 2022 11:41am Alan Adams (2486) 1149 posts	It also seems to lose a lot of data (it says 631 bytes received from the Laser, We’re pretty much sending the bare minimum necessary to port 631 (IPP) Coincidence of numbers?

Mar 17, 2022 1:09pm Rick Murray (539) 13840 posts	HTTP 505 is actually the wrong error code1 for that There’s no code for “I’m an arse and I’m going to whinge about something inconsequential”. Nasty. I’ll happily throw my support behind anything better that works in real life. Until then, it’s Bonjour in all its bogosity. which keeps it compatible with previous software. I’m not saying to replace the Status SWI. That’ll keep working as before. But a Status2 SWI can provide more information to those programs that might need better feedback than is currently provided.

Mar 17, 2022 4:40pm Rick Murray (539) 13840 posts	But it doesn’t understand PWG-Raster or URF Strange it supports IPP. Brother’s site says it doesn’t do AirPrint (URF), and if it doesn’t do that it’s safe to assume it won’t cope with IPP Everywhere (PWG). Hmm…

Mar 17, 2022 4:41pm Rick Murray (539) 13840 posts	Is anybody able to send me a copy of the AcornHTTP module hacked to always identify itself as HTTP/1.1?

Mar 17, 2022 5:11pm Steffen Huber (91) 1953 posts	In the light of this Conversation last night suggested that chunked transfer is what we need to support big uploads. and that Is anybody able to send me a copy of the AcornHTTP module hacked to always identify itself as HTTP/1.1? I have to ask…I have no practical experience with using AcornHTTP and friends, but all I have gathered so far is that it has already consumed quite a lot of debugging time, and is severely underpowered and out-of-date in various respects. I would suggest to just use libcurl and forget about all the homebrew-RISC OS-stuff (providing a shim module around libcurl to provide API compatibility can surely be done if someone insists to have client code running as OS code). The AcornSSL/mbedTLS stuff also seems to be a dead end, mbedTLS is now widely considered (e.g. by the (lib)curl community) as no longer recommended for use, not least because it still has not gained official TLS 1.3 support. Or is there some hidden beauty of URLFetcher/AcornHTTP/AcornSSL that I am missing?

Mar 17, 2022 6:40pm Rick Murray (539) 13840 posts	not least because it still has not gained official TLS 1.3 support. Perhaps not entirely official, but there is support for it, so it’s not a complete unknown. https://github.com/ARMmbed/mbedtls/releases (e.g. by the (lib)curl community The curl community says: All versions of SSL and the TLS versions before 1.2 are considered insecure and should be avoided. Use TLS 1.2 or later. https://github.com/curl/curl/blob/master/docs/SSL-PROBLEMS.md so far is that it has already consumed quite a lot of debugging time, and is severely underpowered and out-of-date in various respects Not terribly surprising given that the fetcher core dates from the Acorn era. It is worth noting that the SSL side of things is separate to the HTTP document fetching. HTTP has been updated, the fetcher is still to do. It hasn’t really had any love in a quarter century. Remember Browse? That’s what we’re talking about. Or is there some hidden beauty I doubt it, but it exists and stuff uses it. Patching up what we have to be less grotty may well be an order of magnitude simpler than trying to bash libcurl into some sort of form that can be sanely used from RISC OS. Oh, and as a native module with SWIs and not an elf library or anything like that. If there’s money and developers, then curl would be a good idea since it appears to support numerous formats out of the box, like IMAP, for example. But, as it often is, that’s not the case here. Unless you’re volunteering? ;)

Mar 17, 2022 7:08pm Rick Murray (539) 13840 posts	According to SSLlabs, both ssl.com and curl.se run on servers that only support TLS 1.2. This site only goes up to 1.2, but gets slapped down for supporting 1.0 and 1.1 as well. Google does 1.3 (as one would expect), but it also supports 1.0 and 1.1 so…

Mar 17, 2022 7:21pm Dave Higton (1515) 3525 posts	Look what I just found in Networking.Fetchers.HTTP.c.header: /* Client does NOT need to be aware that we are using HTTP/1.1. We * can lie to it safely. We MUST do this in case the server used a * chunked transfer encoding (which we are removing) since if we were * to leave an HTTP/1.1 response without an encoding and without a * content-length header, our client MUST reject the message as invalid. */ header = "HTTP/1.0"; Also I’ve just started reading about Chunked Transfer, and found that it has a very specific meaning, and it’s normally for responses AFAICS. What I was thinking of was much simpler, and something that would not be visible at all as part of the HTTP transaction; merely the client app telling AcornHTTP that there will be more data to come, so don’t close the connection until the client app says that there will not be any more to come. So it would just be part of the API between AcornHTTP and the client application.

Mar 17, 2022 7:38pm Dave Higton (1515) 3525 posts	I think some of the criticisms of the fetchers are unjustified. Yes, they need to support HTTP 1.1 (and preferably before HTTP 2 becomes the norm). Yes, they need to offer a way to send arbitrarily large POST payloads. But a lot of debugging time? I only know of one instance – Rick’s 505 above. All the other debugging I’m aware of is the fault of the client apps, not the fetchers. Oh, and we can fairly blame the docs too. And underpowered? I don’t even know what you mean, Steffen. Perhaps you’d like to explain. When I’ve used the URL module, the required code has been very simple, and handles plain and encrypted with zero effort. Zero. So I do not see any reason to throw compatibility out of the window. I see reasons to update the fetchers.

Pages: 1 2 3

Reply

To post replies, please first log in.

Forums → Code review →

URL_Fetcher and AcornHTTP

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options