RISC OS Open: Forum: Possible URL fetcher/documentation bug

Nov 2, 2019 11:24am

Rick Murray (539) 13850 posts

With reference to the URL fetcher specification:
https://gitlab.riscosopen.org/RiscOS/Sources/Networking/Fetchers/URL/raw/master/Docs/APISpec

If I call URL_GetURL according to the specification, it fails and returns the headers:

R0 = 0 (no length in R5, no user-agent line)
R1 = session ID
R2 = 1 (HTTP GET)
R3 = URI
R4 = ""
R5 = 2
R6 = ""

The problem comes down to the fact that the documentation states, for R5:

If R0:1 is set, length of data in R4 data block.
If R0:1 is clear, must be 2 (backwards compatible ‘Method dependent flags’ which are otherwise in R2).

This is, put simply, wrong.
Doing that will return the headers as ‘2’ in R5 (taken as backwards compatible method flags) is indeed the magic value to tell the fetcher to return the headers (otherwise 2<<8 in R2).

The correct approach is to either specify a length in R5 and set it to zero, or just ignore the API and set R5 to 0 instead of 2 if not using a length.
In both cases, the body content will be correctly returned, and not the headers.

Nov 2, 2019 11:49am

Rick Murray (539) 13850 posts

While I’m here writing this, other stuff in URL that is annoying:

Correctly returns a status of 301 (etc) if there is a redirect; but does not return any body content indicating where the redirection is to.
It is necessary to switch to reading the headers (a magic value not mentioned in the API document) in order to read the headers to parse it for yourself.
URL ought to either follow redirections or return the location in the payload. Ideally, it should be capable of both with a flag to specify which behaviour is preferred.
All method-dependent flags ought to be documented – in the one place.
Why does ReadData not return the HTTP status code? It seems dumb to have to call the Status SWI to get most of the same information as was already provided by ReadData, only with this one extra thing.
The API ought to specify which HTTP headers the URL module automatically applies, so you know what to supply/not supply.
Does AcornHTTP (via URL) handle cookies? Can the client access/block them?

Nov 2, 2019 2:00pm

nemo (145) 2554 posts

This is, put simply, wrong.
Doing that will return the headers as ‘2’ in R5 is indeed the magic value to tell the fetcher to return the headers

2=Head+Body surely? So it’s not “wrong”, it’s just failing to mention that other television listings magazines are available.

Nov 2, 2019 2:25pm

Rick Murray (539) 13850 posts

2=Head+Body surely?

No – it’s only returning the headers.

Probably wants 3 for the head and shoulders… Of course, if this was documented…

So it’s not “wrong”

Documentation states ‘X’.
‘X’ cocks it up.
Therefore ‘X’ is wrong.

Nov 2, 2019 2:52pm

nemo (145) 2554 posts

So you’re saying that the HTTP documentation is wrong, not the URL documentation.

Nov 2, 2019 4:41pm

Rick Murray (539) 13850 posts

Oh, I see, it’s documented elsewhere under something else.

Maybe. I don’t know. I am using URL, not directly talking to AcornHTTP. I’m guessing that the value is passed through, but…

Note: in the link, change “blob” for “raw” to have a hope of seeing it on NetSurf.

Nov 2, 2019 7:38pm

Chris Mahoney (1684) 2165 posts

URL is a “generic” interface to other protocol modules, although these days AcornHTTP is probably the only one in common use.

Until last year, AcornHTTP would always return both the headers and the body regardless of the value of R2 (note that I’m talking about AcornHTTP’s interface, not URL’s). This was despite the docs saying that 0 returns the body, 1 returns the headers, and 2 returns both.

AcornHTTP 0.98 fixed that, and R2 now does what the docs say. But I can’t say whether URL is doing the right thing! I had a suspicion that the “must be 2” comment in the URL docs was to work around the bug in AcornHTTP. Now that that’s fixed, the URL docs might need an update…

It doesn’t help that there are two different versions floating around (GitLab, Wiki).

Nov 2, 2019 8:29pm

Rick Murray (539) 13850 posts

although these days AcornHTTP is probably the only one in common use.

<cough> One of the primary benefits of the URL module is that changing “http://” to “https://” just works.

It’s not hard to write a simple HTTP fetcher.
But with URL, handling SSL is just as easy.

I had a suspicion that the “must be 2” comment in the URL docs was to work around the bug in AcornHTTP.

Hmmm… So stick a ‘fix’ into the docs instead of fixing the bug in AcornHTTP? Way to go, Acorn…

Nov 3, 2019 12:31am

Chris Mahoney (1684) 2165 posts

Well technically it’s AcornHTTP (not URL) that’s calling AcornSSL, so I stand by my claim that AcornHTTP is the only commonly-used URL fetcher :)

Nov 3, 2019 12:34am

nemo (145) 2554 posts

stick a ‘fix’ into the docs instead of fixing the bug

Of course the documentation must describe what actually happens on the end-user’s machine, and not what we would like to happen. Hence my comments about PreFilters the other day, and Service_International,7 some time ago.

Nov 3, 2019 4:55am

Rick Murray (539) 13850 posts

Well technically it’s AcornHTTP (not URL) that’s calling AcornSSL,

Are you sure about that? I thought that fetchers registered themselves with the URL module according to what protocol they supported. Re. *URLProtoShow

Nov 3, 2019 4:56am

Rick Murray (539) 13850 posts

Of course the documentation must describe what actually happens on the end-user’s machine, and not what we would like to happen.

Seems to be something of a running theme, doesn’t it?

Nov 3, 2019 5:35am

Chris Mahoney (1684) 2165 posts

Are you sure about that?

No :)

Possible URL fetcher/documentation bug

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Nov 2, 2019 11:24am Rick Murray (539) 13850 posts	With reference to the URL fetcher specification: https://gitlab.riscosopen.org/RiscOS/Sources/Networking/Fetchers/URL/raw/master/Docs/APISpec If I call URL_GetURL according to the specification, it fails and returns the headers: R0 = 0 (no length in R5, no user-agent line) R1 = session ID R2 = 1 (HTTP GET) R3 = URI R4 = "" R5 = 2 R6 = "" The problem comes down to the fact that the documentation states, for R5: If R0:1 is set, length of data in R4 data block. If R0:1 is clear, must be 2 (backwards compatible ‘Method dependent flags’ which are otherwise in R2). This is, put simply, wrong. Doing that will return the headers as ‘2’ in R5 (taken as backwards compatible method flags) is indeed the magic value to tell the fetcher to return the headers (otherwise 2<<8 in R2). The correct approach is to either specify a length in R5 and set it to zero, or just ignore the API and set R5 to 0 instead of 2 if not using a length. In both cases, the body content will be correctly returned, and not the headers.

Nov 2, 2019 11:49am Rick Murray (539) 13850 posts	While I’m here writing this, other stuff in URL that is annoying: Correctly returns a status of 301 (etc) if there is a redirect; but does not return any body content indicating where the redirection is to. It is necessary to switch to reading the headers (a magic value not mentioned in the API document) in order to read the headers to parse it for yourself. URL ought to either follow redirections or return the location in the payload. Ideally, it should be capable of both with a flag to specify which behaviour is preferred. All method-dependent flags ought to be documented – in the one place. Why does ReadData not return the HTTP status code? It seems dumb to have to call the Status SWI to get most of the same information as was already provided by ReadData, only with this one extra thing. The API ought to specify which HTTP headers the URL module automatically applies, so you know what to supply/not supply. Does AcornHTTP (via URL) handle cookies? Can the client access/block them?

Nov 2, 2019 2:00pm nemo (145) 2554 posts	This is, put simply, wrong. Doing that will return the headers as ‘2’ in R5 is indeed the magic value to tell the fetcher to return the headers 2=Head+Body surely? So it’s not “wrong”, it’s just failing to mention that other television listings magazines are available.

Nov 2, 2019 2:25pm Rick Murray (539) 13850 posts	2=Head+Body surely? No – it’s only returning the headers. Probably wants 3 for the head and shoulders… Of course, if this was documented… So it’s not “wrong” Documentation states ‘X’. ‘X’ cocks it up. Therefore ‘X’ is wrong.

Nov 2, 2019 2:52pm nemo (145) 2554 posts	So you’re saying that the HTTP documentation is wrong, not the URL documentation.

Nov 2, 2019 4:41pm Rick Murray (539) 13850 posts	Oh, I see, it’s documented elsewhere under something else. Maybe. I don’t know. I am using URL, not directly talking to AcornHTTP. I’m guessing that the value is passed through, but… Note: in the link, change “blob” for “raw” to have a hope of seeing it on NetSurf.

Nov 2, 2019 7:38pm Chris Mahoney (1684) 2165 posts	URL is a “generic” interface to other protocol modules, although these days AcornHTTP is probably the only one in common use. Until last year, AcornHTTP would always return both the headers and the body regardless of the value of R2 (note that I’m talking about AcornHTTP’s interface, not URL’s). This was despite the docs saying that 0 returns the body, 1 returns the headers, and 2 returns both. AcornHTTP 0.98 fixed that, and R2 now does what the docs say. But I can’t say whether URL is doing the right thing! I had a suspicion that the “must be 2” comment in the URL docs was to work around the bug in AcornHTTP. Now that that’s fixed, the URL docs might need an update… It doesn’t help that there are two different versions floating around (GitLab, Wiki).

Nov 2, 2019 8:29pm Rick Murray (539) 13850 posts	although these days AcornHTTP is probably the only one in common use. <cough> One of the primary benefits of the URL module is that changing “http://” to “https://” just works. It’s not hard to write a simple HTTP fetcher. But with URL, handling SSL is just as easy. I had a suspicion that the “must be 2” comment in the URL docs was to work around the bug in AcornHTTP. Hmmm… So stick a ‘fix’ into the docs instead of fixing the bug in AcornHTTP? Way to go, Acorn…

Nov 3, 2019 12:31am Chris Mahoney (1684) 2165 posts	Well technically it’s AcornHTTP (not URL) that’s calling AcornSSL, so I stand by my claim that AcornHTTP is the only commonly-used URL fetcher :)

Nov 3, 2019 12:34am nemo (145) 2554 posts	stick a ‘fix’ into the docs instead of fixing the bug Of course the documentation must describe what actually happens on the end-user’s machine, and not what we would like to happen. Hence my comments about PreFilters the other day, and Service_International,7 some time ago.

Nov 3, 2019 4:55am Rick Murray (539) 13850 posts	Well technically it’s AcornHTTP (not URL) that’s calling AcornSSL, Are you sure about that? I thought that fetchers registered themselves with the URL module according to what protocol they supported. Re. `*URLProtoShow`

Nov 3, 2019 4:56am Rick Murray (539) 13850 posts	Of course the documentation must describe what actually happens on the end-user’s machine, and not what we would like to happen. Seems to be something of a running theme, doesn’t it?

Nov 3, 2019 5:35am Chris Mahoney (1684) 2165 posts	Are you sure about that? No :)