URL_GetURL not receiving the body
Jon Abbott (1421) 2651 posts |
Why does the repro below not download the html body from the NetSurf download site? Do I need to pass a specific User Agent string for that server to respond? The same code works if I switch the URL to this site, so it seems to be specific to the NetSurf site. Enabling body+header does respond with the header, but again no body. data_limit%=65536 DIM url% 512, data% data_limit% ?url%=0 get_header%=FALSE SYS "URL_Register",0 TO ,S% ON ERROR SYS "XURL_Deregister",0,S%:PRINT REPORT$;" at line ";ERL:END IF get_header% THEN Z%=2 ELSE Z%=0 SYS "URL_GetURL", 1<<30, S%, 1, "http://www.netsurf-browser.org/downloads/riscos/", url%, Z%, 0 offset%=0 REPEAT SYS "URL_ReadData", 0, S%, data%+offset%, data_limit%-offset% TO A%,,,,C%,L% IF C%>0 THEN offset%+=C% UNTIL (A% AND %1100000)>0 OR offset%=data_limit% SYS "URL_Deregister",0,S% IF (A% AND %1000000) THEN PRINT "Transfer aborted" PRINT offset%;" bytes received" IF offset%>0 THEN OSCLI "MEMORY B "+STR$~data%+"+"+STR$~offset% EDIT: My test setup: RCPEmu 0.9.4 running RISC OS 5.31 (25-March-23) |
Kevin (224) 322 posts |
wget seems to work without an useragent set
Replace xx with tt |
Stuart Swales (8827) 1357 posts |
wget DOES supply a default User-Agent. Try it with -d. And then —user-agent "" |
Jon Abbott (1421) 2651 posts |
I’ve since tried passing a known User Agent and still didn’t get the body, so can probably rule that out. Neither HTTP or HTTPS receive the body from that URL The fact the site works via wget and RISC OS browsers tells me there might be something wrong with URL_GetURL or the protocol handlers. |
Sprow (202) 1158 posts |
A possible difference is the NetSurf site appears to respond with HTTP 200 OK and then uses gzip chunked transfers. There was a bug with chunked transfers fixed in AcornHTTP 1.06, but you don’t mention which version you’ve tried, so my observation could be irrelevant. |
Jon Abbott (1421) 2651 posts |
I’ve amended the OP with my test setup details. I’ve not checked in the past few weeks, but I think I’m running the latest available Module versions. You could be on to something with gzip’d transfers though. I guess the test would be to see if another site that uses gzip exhibits the same issue – although I’m not sure how to find such a site. |
Stuart Swales (8827) 1357 posts |
https://ougs.org is one that returns gzip if asked for. |
Jon Abbott (1421) 2651 posts |
Looking at the Header response from the NetSurf URL in the Edge Developer Tools, the body responds with status code 304 (Not modified), not the usual 200 (OK) – could that be relevant? (EDIT: its not) I also do not see NetSurf returning gzip encoding – so that might possibly be browser dependent? Stuart – the URL you suggested does show a gzip encoding in Edge and the code in the OP does retrieve the body okay. |
Jon Abbott (1421) 2651 posts |
Having read RFC9110 which defines 304 Not modified as:
That would imply (although I’m not sure how to check this) GetURL has indicated to the server that it can provide a cached copy of the page…so the server responded with 304 and without the body. EDIT: Ignore all that, its responding with 200 when I check via URL_Status |