AcornHTTP upgrade
Pages: 1 2
Dave Higton (1515) 3534 posts |
The work I’ve done on printing via IPP has shown up a couple of deficiencies that need solving. One of them is AcornHTTP. IPP requires that the print job be complete before the file can be sent via HTTP. It’s easy to see that job files can easily become so large that they will not fit in RAM. AcornHTTP has no way to transmit data directly from a file, nor can it transmit data in sections. (I’m avoiding use of the work “chunk” since “chunked transfer” has a specific meaning in HTTP, and AIUI it’s not the same as what I’m discussing here.) There is another detail: print files really have two parts; the first (small) part is specific to the printer and its URL, the second large) part is printer-independent. The two parts will have to be concatenated to be sent. There are also some printers that refuse to accept HTTP 1.0, which is what AcornHTTP claims to support. So I’d like to see the above deficiencies in AcornHTTP remedied. As to how: Is anyone interested in collaborating on that upgrade, in two sections: 1) Agree a new specification; 2) Implement to it. We ought to be able to discuss a new specification here, and use a wiki page to hold the WIP specification for collaborative work. I’d welcome input from anyone who has worked on AcornHTTP and knows the code to any extent, anyone who is familiar with the workings of HTTP, and anyone who has a wider overview so that the specification makes complete sense. |
Matthew Phillips (473) 721 posts |
As discussed with Dave at Bradford last week, I’d be happy to be involved but I have limited time available till September. I will contribute what I can, but it would be really good to have input from others too. |
Dave Higton (1515) 3534 posts |
I’m suggesting that, to permit sending huge data payloads, we add two flags: one to say we’re doing it like that, the other to say more data to follow. Then: Instead of a length in R5, it’s a pointer to a 2-word block: overall length (which must not be changed during the transfer), and length of this lump. It’s the user’s responsibility to read lumps from file into RAM for transfer. The session info contains the lenght transferred so far. If this is 0, the header is sent. Saves having to implement a flag to signify first lump. |
Dave Higton (1515) 3534 posts |
Does anyone have any knowledge or experience of “chunked transfer”, which has a specific meaning in HTTP? AIUI, anything that claims to implement HTTP 1.1 must support chunked transfer. The comments in AcornHTTP indicate that it doesn’t, which is why it still can only claim to support HTTP 1.0. When I read about chunked transfer, it doesn’t look difficult, so I’ve probably not understood it correctly. If we implement it, we would also have to know how to test it, so, again, all ideas are welcome. |
Rick Murray (539) 13850 posts |
https://en.wikipedia.org/wiki/Chunked_transfer_encoding Doesn’t seem difficult, but does anybody know of a server that can be a reliable test? |
Jeffrey Lee (213) 6048 posts |
Receiving chunked transfers in responses is already supported, so the only thing that needs testing would be sending chunked transfers in POST/PUT bodies. Since it’s a core part of HTTP 1.1, you should be able to test it with any (well-written) server which accepts POST/PUT transfers. |
Dave Higton (1515) 3534 posts |
Interesting. I’ve only ever seen references to chunked transfer from server to client. And the client is not, AFAICS, obliged to declare Transfer-Encoding as chunked. |
Rick Murray (539) 13850 posts |
My understanding of the text is that, for a POST, the server is obliged to accept it from an HTTP 1.1 connection, but the client is not obliged to use it. For a GET, it’s the inverse. The client must be able to accept it, but the server doesn’t have to use it. Chunked transfers seem to be more useful for cases where the server is offering data “on the fly” and so doesn’t necessarily know how much data there will be until the end (so cannot complete Content-Length). This cannot be used to send strips piecemeal to the printer, because IIRC, the IIP header needs to contain the data length, so the payload needs to be known at the start. |
Dave Higton (1515) 3534 posts |
Jeffrey, are you saying that the comments in AcornHTTP saying chunked transfer is not supported, and the refusal to declare HTTP 1.1, are out of date and/or inaccurate? |
Jeffrey Lee (213) 6048 posts |
I haven’t looked at the code/documents in any detail, but my understanding is that:
So if the documentation is contradicting the code then yes, the docs need updating. |
Rick Murray (539) 13850 posts |
That part should have been clear back when diddling around earlier and noting that POST used 1.0, while GET used 1.1. |
Matthew Phillips (473) 721 posts |
I can vouch for the fact that AcornHTTP supports receiving data from the server as chunked transfer. I had to submit a fix a few months ago to improve the support, as AcornHTTP erroneously assumed that any chunk length beginning with a zero digit was of length zero. The WikiData API padded its lengths with leading zeros which meant AcornHTTP could not reliably receive content from it. |
Dave Higton (1515) 3534 posts |
Ah, yes, I remember testing that, but I had completely forgotten what it was about! |
Dave Higton (1515) 3534 posts |
Yes. My statements about HTTP 1.1 were badly phrased; they make it look like a blanket statement from me that 1.1 isn’t supported. Mea culpa. Bearing in mind all the postings above, I wonder if the declaration of 1.0 for POST is because there isn’t an API to send multiple chunks, so the only case that can currently be supported is the trivial one of a single chunk, which is pointless? I’m wondering if chunked transfer is the way to go for IPP, or indeed anything else that has to transmit data bigger than available RAM. The other nice thing is that chunked transfer provides positive confirmation of the end of data (a zero-length chunk), so it isn’t even necessary to declare the Content-Length in the header, so there is no need to pass in parameters of both the total length and the length of a chunk. |
Dave Higton (1515) 3534 posts |
The wiki won’t let me use NetSurf to edit a page. I don’t know why this is, as I’ve done it before. The complaint is that I must enable Javascript and enable cookies, but JS is enabled and I can’t see any way to disable or enable cookies. It may be a few days before I have a Linux box again. So, until the time when I can put up a proposal page in the wiki, here’s a slightly fuller explanation of what I currently have in mind. Please pick all the holes in it that you can! HTTP_GetData adds another flag, which must be set to use chunked transfer encoding on sending a POST. If it is set, R5 is the length of the current chunk. Data are sent by multiple calls to HTTP_GetData (usually through the URL_GetURL SWI). The first chunk must include any additional header lines and a CRLF. It may also contain any amount of body data. Subsequent chunks must contain any amount of body data. The transfer is ended with a chunk of zero length. I’m assuming that there is no point in calling HTTP_ReadData until the last, zero-length, chunk has been sent. Internally, the module keeps a flag to say the header has been sent, initially cleared to zero when the session is opened, and set on the first call to HTTP_GetData. Subsequent calls to HTTP_GetData, on seeing the flag set, do not attempt to re-send the header. The flag is cleared by the call to the zero-length chunk. (This allows re-use of the session – is there any point?) The module adds the chunk information. The module does not send a Content-Length header line. The module adds a Transfer-Encoding header line if it is not already present, and adds “chunked” as the last argument on the line. If the user adds a Transfer-Encoding line, she should not include the argument “chunked”. As far as I can see, this would satisfy the needs of printing via IPP, as the IPP header does not have to be prepended to the IPP raster data, and the IPP raster data can be of any length – the only requirement is for each chunk to fit in RAM. |
Matthew Phillips (473) 721 posts |
Looking at it from the point of view of URL_GetURL, would you envisage it having another bit (bit 2, say) in R0 to indicate that the accompanying data falls in several chunks. This would only be valid if bit 1 was also set, I imagine. And again, R5 would be the length of the current chunk, with the process being concluded with a call having R5=0 but bit 2 still set? R3 and R6 need not be valid after the initial call, I would suggest. I wonder whether we should consider what modifications might be necessary to support maintaining a session for multiple requests? On a separate matter, but one which it might make sense to consider when amending the API between the URL Fetcher module and its (sole supported) fetcher module AcornHTTP, I would like to add support for segregating cookies between applications. I think perhaps the best approach would be a further flag in R0 to indicate that R7 carries a pointer to a protocol-specific data block. This could then contain a version number or flags word, and a pointer to a “cookie jar”, which could be obtained via a new SWI in AcornHTTP. I think that would be the best approach. It’s not possible for AcornHTTP to distinguish tasks automatically because of the lack of a process model in RISC OS. Besides, applications may wish to have more than one jar, as in a browser with “private browsing” support. URL_ProtocolRegister will need modifying so that the protocol handler can indicate to the URL module that it supports chunked transfer, and the protocol-specific data block if we go down that route for cookies. |
Dave Higton (1515) 3534 posts |
If you look at the meaning of bit 1 clear, I don’t think it makes sense in the context of chunked sending. So either bit 1 or the new bit set would require R5 to hold a valid length.
Yes.
Agreed. |
Dave Higton (1515) 3534 posts |
I’m not sure I understand the code well enough yet to make a firm assertion; but I think the code may be substantially simpler if we restrict the API so that the first packet contains only any additional header lines and the first empty line. It avoids the need to insert a chunk header between the empty line and the body data. Does the module ever change or remove any of the user-supplied header lines? I haven’t worked that out yet. |
Rick Murray (539) 13850 posts |
Yes, it sanitises them. Stuff that doesn’t belong (like Content-Length in GET) is stripped. Such things in POST are checked and fixed if believed to be wrong. This is very E&OE as the code for dealing with header lines is fairly complex. I’m not sure how it handles cookies previously sent by the server. I think something in here is why Manga no longer works (not blaming the module, I haven’t looked deeply but I think there’s some sort of complicated interaction going on 1). 1 Already discovered that there are two image links, the first of which is bogus. |
Dave Higton (1515) 3534 posts |
If I understand the code correctly, the state machine code is called from the ReadData SWI, which means the whole thing is predicated on a single send. Reading the data before the POST is completed is pretty much guaranteed to get an error response of some kind. I was hoping to send a chunk and keep reading status until all the data are sent. This is all about CMT. The data are copied into a buffer in the module, and there they reside until sent – and of course you mustn’t write new data into the buffer or you’ll probably overwrite wanted data. But the status SWI doesn’t appear to move the state machine, so nothing happens. And if my analysis is right, that’s going to be a nightmare to disentangle. |
Rick Murray (539) 13850 posts |
That’s my impression too. It seems like the entire API was predicated on the idea of only sending tiny amounts of data, so there’s no useful status for sending, only for receiving. It’s also why my first attempts at using it to send data to a printer (not an HP!) kept falling with timeouts. There’s literally no way to tell that anything is actually happening during the sending stage. That’s why I suggested a richer Status SWI a few weeks back. Leave the current status behaving as it does, and have a new SWI duplicate that, but with a richer set of status reports, like how much of the payload has been sent.
It’s actually a rather peculiar implementation, that there is a lot of activity available (status, what’s happening, how much, http code…) for receiving, but nothing at all for sending. You’ll be twiddling your thumbs and hoping it works until bit 2 is finally set. Really not sure how the idea of sending data in chunks would fit into this… |
Rick Murray (539) 13850 posts |
Funny how “I wonder if I can make a WiFi printer work” turns into something…bigger. |
Dave Higton (1515) 3534 posts |
Yes. I’m seriously thinking that it might be easier to roll my own HTTP transport from scratch. I really don’t want to reinvent the wheel. There are times, though, when it genuinely looks easier than trying to modify something that’s already there, no matter how long- and well-established. |
Dave Higton (1515) 3534 posts |
I’ve just rolled an experimental HTTP POST transport from scratch, in BASIC. It sends in slices of up to 16384 bytes. Because it’s written to handle an arbitrarily large file a slice at a time, it doesn’t need to go to chunked transfer. At the moment it’s HTTP only (no HTTPS), but that ought to be relatively easy to add (whose famous last words are they?…) In view of the simplicity of this approach, any ideas of upgrading AcornHTTP are now dead. |
Rick Murray (539) 13850 posts |
Sad to hear, but perfectly understandable. You need an HTTP(S) transport mechanism, not the headache of fixing fundamental design issues 1. 1 I won’t say “deficiencies” because I rather suspect that suggesting support for tossing a few megabytes of data at a printer…might have got you laughed out of the room back in ~1996. |
Pages: 1 2