Incomplete fetch via AcornHTTP

17 posts, 6 voices

Feb 5, 2022 9:29pm Matthew Phillips (473) 721 posts	I am working on an application which uses the URL_Fetcher module, AcornHTTP and AcornSSL to fetch about 2MB of XML from an API. I am processing the data as it arrives, and because of inefficiencies in my part of the process, it can take quite a long time. I find that after about 8 minutes of slowly fetching the data and processing it, URL_ReadData suddenly reports that the fetch is complete (R0 = 32) and that there are no bytes read and no more expected. The data received is incomplete, stopping part way through an XML tag. The previous call to URL_ReadData had the number of bytes still to read (R5) showing about 900K more. Using a debugging version of AcornHTTP I can see that the module is fetching an uncompressed, unchunked response from the SWI AcornSSL_Recv and that call is returning zero in R0 and does not generate an error. I suspect that the server at the other end has got bored with the slow progress and closed the socket, but maybe AcornSSL has got bored. I’m not sure how to tell. If the server closed the socket, is there any way that that could be distinguished from a completed fetch? As far as I understand it, after the server has transmitted all the bytes, it would be quite at liberty to close the socket, so the only way we could tell the transmission was incomplete would be with reference to the Content-Length header, and we cannot necessarily rely on that. (I expect I have to solve this problem by speeding up my processing, and by trapping the XML parsing error that results from an incomplete transmission. If there could be a way for the modules to distinguish between a complete transmission and an aborted one, I suppose I could look into writing an enhancement, but after a couple of weeks messing around with AcornHTTP I would be quite glad to be told that it’s impossible.)

Feb 5, 2022 9:48pm Chris Mahoney (1684) 2165 posts	I suspect that the server at the other end has got bored with the slow progress and closed the socket, but maybe AcornSSL has got bored. I’m not sure how to tell. Is it possible to fetch the same data using something other than AcornSSL/HTTP, such as using NetSurf? Edit: Sorry, misunderstood. That, of course, won’t be slow enough.

Feb 5, 2022 9:55pm Matthew Phillips (473) 721 posts	Correct: NetSurf will fetch the complete 2MB response quite happily. So can I if I just save it to a file and don’t process as I go. The problem may seem academic, as I can probably avoid the problem in various ways, but having encountered it I would like to make the code more robust so that it detects the incomplete transfer at the earliest stage possible.

Feb 5, 2022 10:19pm Dave Higton (1515) 3526 posts	Most weeks, I receive an email with 10MB or so of attachments. This is fetched securely using AornSSL. It never fails, but it completes much quicker than your situation. I wonder if something times out deliberately because such a slow transfer implies a possible security problem? Maybe that’s far fetched. But can you use the same code except not slow down to process it, i.e. just dump the data, and see if the behaviour is any different? Edit: Oops, I see you already said you tried it and it works.

Feb 5, 2022 10:20pm Dave Higton (1515) 3526 posts	Get all the data to file, then parse from file?

Feb 5, 2022 11:07pm Paolo Fabio Zaino (28) 1882 posts	@ Matthew, bq . I find that after about 8 minutes of slowly fetching the data and processing it, URL_ReadData suddenly reports that the fetch is complete (R0 = 32) and that there are no bytes read and no more expected. AFAIR, there can be multiple timeous that can be set for an HTTP Sever (depending on the server), generally (the most commons) are two types of timeouts: - Request Timeout (generally used during the open phase) - Read Timeout (which might be what is happening to you), sometimes also known as “Connection Timeout” (be aware that the word “Connection Timeout” is ambiguous and can be used for both on certain Server Softwares) What is possibly happening to you, is that you are holding a connection for an amount of time that exceeds the maximum allowed time on the specific server. It is possible that, such server, is sending you a status, but maybe AcornHTTP is not using it, or it’s also possible that the server may be sending either an RST or just a FIN which may be interpreted as a connection completed. Do you have more details on the packets you are receiving from that Server? You may be able to monitor the traffic that is happening between the two using either Wireshark (if you have a PC that could sample that connection) or also WireSalmon on RISC OS itself. By capturing the traffic you’ll certainly see which of the two ends is sending the FIN or the RST packet. Hope this helps

Feb 6, 2022 8:35am Matthew Phillips (473) 721 posts	@Pablo Thanks, WireSalmon is a good idea. As you’ll appreciate, I have got as far as finding what is happening inside AcornHTTP, a module I am now reasonably familiar with. What I don’t know is whether AcornSSL could be any better informed as to the reason for the traffic ceasing, and whether it could give better information to AcornHTTP. As I am not keen to delve into AcornSSL just yet, I’ll try the WireSalmon angle. @Dave I agree, fetching all the data before processing it would help, and I may well need to do that anyway. It was just, having encountered the problem and the fact that my application could not tell if the transmission was compelete, I thought it was worth exploring this a bit to see if the modules could be improved. The reason I am taking so long over the processing is that the API returns hundreds of objects which I want to plot on a map in RiscOSM. If I process them as they come in, to keep my memory requirements down, that involves my application sending a series of GeoData Wimp messages to RiscOSM. RiscOSM acts on these messages immediately, but as the number of objects reach the hundreds, its redraw is taking a long time, so my application does not get polled very much, and hits the timeout issue. It would clearly be a good idea for me to accumulate the messages and only send them to RiscOSM after I have finished the complete fetch. Hilary is also working on improving the redraw in RiscOSM so that it is more efficient at handling the incoming messages.

Feb 6, 2022 6:30pm Matthew Phillips (473) 721 posts	Where can you find WireSalmon these days? It was written by Alex Waugh, but I cannot find his site anymore.

Feb 6, 2022 6:52pm Andrew Conroy (370) 740 posts	Where can you find WireSalmon these days? It was written by Alex Waugh, but I cannot find his site anymore. Have you tried here ?

Feb 6, 2022 11:20pm Matthew Phillips (473) 721 posts	Thank you. I could not find via Google.

Feb 16, 2022 10:44am Matthew Phillips (473) 721 posts	I’ve now had time to play with WireSalmon (on RISC OS) and look at the captured packets with WireShark (on Linux). I’ve never looked at the packet level before, so I don’t understand much about it. Quick reminder of background. I was finding that a remote sever was cutting the connection before the complete HTTPS response had come through because my application was being very slow about processing the incoming data. I have solved those issues, but I wanted to see if it might be possible to enhance the URL_Fetcher / AcornHTTP / AcornSSL modules so that the premature termination of the connection could be detected and a signal passed to the client application so that it knew that a failure had occurred, because at present the URL_ReadData calls look no different from a successful fetch. My first test was to see what happens if I make an HTTPS request via the URL_Fetcher module and it succeeds. I fetched a fairly small amount of data to keep the number of packets down. I can see from my application logs that the HTTP response, including header, was 3355 bytes in total, but that will be after AcornHTTP has doctored it. The screenshot below covers almost all the conversation. There is some SYN ACK stuff just off the top of the screen. I imagine that frame 18 is where my HTTP request gets transmitted to the server. The next four frames carry enough bytes to be the full response, but I do not know why 22 is described as protocol TLSv1.2 and the others only as TCP. Frame 22 has a “Secure Sockets Layer” section in WireShark which says “Length: 3398” which is consistent with what I get back at the application. I’m puzzled as to why that is the last packet, and as there isn’t compression involved, the actual response must be divided across all four packets. I don’t know the significance of the “Encrypted Alert” in frame 23. Then it looks like the socket gets closed with the “FIN, ACK” from each end. Link to full-size image Next I fetched a much larger quantity of data and made sure that my application deliberately waited a long time between calls to URL_ReadData in order to encourage the remote server to terminate the connection before the transmission was complete. The following screenshot shows the tail end of the conversation. You will see a lot of black frames which I think must be where the process is waiting till the RISC OS TCP/IP stack has more room available to accept more data. Frames 921-923 look like normal transmissions, similar to frames 20 and 21 above. But then we do not get anything like frames 22 and 23, and instead get a frame (924) with flags “FIN, PSH, ACK”. After this the conversation seems to wrap up in a similar way to the successful one above. Link to full-size image My conclusion is that it may be possible for the operating system to tell that the response from the web server is incomplete, but I have no idea how any of this might get surfaced at the AcornSSL stage, or how the information could be passed up to Acorn HTTP, the URL_Fetcher and the application.

Feb 16, 2022 11:00am Jeffrey Lee (213) 6048 posts	The next four frames carry enough bytes to be the full response, but I do not know why 22 is described as protocol TLSv1.2 and the others only as TCP. It’s probably just a quirk of wireshark’s protocol analysers. Unless you explicitly tell it “this is TLS”, it’s probably erring on the side of caution and defaulting to “TCP” instead of assuming that everything after the TLS negotiation phase is valid TLS traffic. If you’re new to wireshark, one very useful thing you can do is right-click one of the entries and select “follow TCP stream”. That’ll set a filter so that you only see the packets from that stream.

Feb 16, 2022 3:45pm Paolo Fabio Zaino (28) 1882 posts	@ Matthew The next four frames carry enough bytes to be the full response, but I do not know why 22 is described as protocol TLSv1.2 and the others only as TCP. The first packets (where an encrypted protocol is agreed between the two parts) are not encrypted, and so WireShark can read the full payload and tell you everything about it. The packets that come after the encryption takes place cannot be decrypted by WireShark. (unless you know the secret key of the specific asymmetrical encryption protocol and cipher suite used), so they appear to WireShark as generic TCP packets. Remember, during an encryption handshake only public keys can be exchanged and those are used to encrypt traffic, but cannot be used to decrypt it. So the reason why you see TCP in the subsequent packets is because that is how far WireShark has understood those encrypted ones. To isolate the specific packet stream use a filter, for instance: ip.addr == 3.9.2.66 && ip.addr == 192.168.1.17 && tcp My conclusion is that it may be possible for the operating system to tell that the response from the web server is incomplete, but I have no idea how any of this might get surfaced at the AcornSSL stage, or how the information could be passed up to Acorn HTTP, the URL_Fetcher and the application. Your conclusion is correct, in the sense that the FIN ACK is part of the ISO/OSI L4 and therefore it’s handled by the OS TCP Stack. In a module or code that is aware of the nature of the response, you can add checks by parsing the response content, for instance if it’s HTTP does it has a tag at the end? Anyway, the good news is you now have a way to know who is sending the first FIN. If it’s the Web Server you know you need to speed up your stream processing or cache it and then post process it. For the performance, this is what I am trying to do for my FetchAURL module: A) An app requests it to pull down a page and it does everything in privileged mode and then return data to the app, this is faster when you have small responses than try to pull them down using the NULL event. It seems to improve also RISC OS bandwidth fo rother things than just HTTP BTW. B) An app can make a request and then FetchAURL will set a POLLWORD (like SocketWatch does), so the app gets the data without using the NULL event, this helps a bit with performance when we have large responses Maybe this could help you to improve performance of your own work, good luck! :) @ Jeffrey If you’re new to wireshark, one very useful thing you can do is right-click one of the entries and select “follow TCP stream”. That’ll set a filter so that you only see the packets from that stream. Didn’t that display the content of the entire TCP stream and, in this case, it would display gibberish given the encryption?

Feb 16, 2022 3:51pm Jeffrey Lee (213) 6048 posts	Didn’t that display the content of the entire TCP stream and, in this case, it would display gibberish given the encryption? Yes, it will pop open a window showing a dump of the stream, which will be useless for TLS. But it also sets up a filter so that the main window only shows the packets from that stream.

Feb 16, 2022 4:09pm Matthew Phillips (473) 721 posts	Your conclusion is correct, in the sense that the FIN ACK is part of the ISO/OSI L4 and therefore it’s handled by the OS TCP Stack. In a module or code that is aware of the nature of the response, you can add checks by parsing the response content, for instance if it’s HTTP does it has a tag at the end? I was hoping for a method that would not require the AcornHTTP module to understand the response. The response might be binary data, and might not have had the Content-Length declared in advance. Anyway, the good news is you now have a way to know who is sending the first FIN. If it’s the Web Server you know you need to speed up your stream processing or cache it and then post process it. The web server sent the first FIN in both cases. The differences I observed are that in the failure case, the FIN, ACK from the web server also has PSH, and I don’t know whether that is significant. The other difference is that the FIN is not preceded by the two frames understood by Wireshark, numbers 22 and 23 in the above example. If the TLS message ought to be concluded by something recognisable like this, then AcornSSL should be able to spot the difference between a premature finish and a complete transmission, and pass the information up to AcornHTTP, URL Fetcher, and ultimately the client. I don’t know enough about SSL to know what that would look like in the AcornSSL module code, however. For all I know, the secure socket SWIs used by AcornHTTP may already provide a route to this information. I’m not looking for a way to speed up my stream processing: I’ve largely got that sorted out now. My object in investigating further was to try to find out whether the operating system could be enhanced so that URL_Fetcher clients get told if a response is incomplete. It struck me that it was unsatisfactory that my application did not get any warning through status codes etc. that the transmission had not finished properly. (As it happens, as I am receiving XML, it’s pretty obvious when it fails to parse, but there are formats which do not give that level of certainty.)

Feb 16, 2022 4:58pm Jeffrey Lee (213) 6048 posts	AcornSSL has no knowledge of the HTTP protocol – if a server randomly decides to close the connection halfway through a response then it won’t be able to help you, other than forwarding on the reason why the connection was closed (e.g. whether it was a clean closure due to FIN or a dirty closure due to a network timeout or other error). There’s a reason why HTTP servers should include Content-Length headers for responses (or use chunked responses) – it’s because both of those methods allow the receiver to determine if the response has been fully received. If you’re dealing with a server which just does a basic transfer with no Content-Length, and the connection closes cleanly, then there’s no way for a generic HTTP client like AcornHTTP to know whether all the data was received or not. See RFC 7230, section 3.4: https://datatracker.ietf.org/doc/html/rfc7230#section-3.4 A message body that uses the chunked transfer coding is incomplete if the zero-sized chunk that terminates the encoding has not been received. A message that uses a valid Content-Length is incomplete if the size of the message body received (in octets) is less than the value given by Content-Length. A response that has neither chunked transfer coding nor Content-Length is terminated by closure of the connection and, thus, is considered complete regardless of the number of message body octets received, provided that the header section was received intact.

Feb 16, 2022 9:55pm Matthew Phillips (473) 721 posts	That’s pretty conclusive: thank you for the reference. It does mean that AcornHTTP could use the Content-Length or chunking to spot incomplete fetches. I think in my own example there was a Content-Length, and the module could perhaps do more to check the length of what comes in.

Reply

To post replies, please first log in.

Forums → Community Support →

Incomplete fetch via AcornHTTP

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options