RISC OS Open: Forum: Problems fetching from WikiData API

Nov 27, 2021 10:32pm

We’ve been noticing for a while that when displaying links for objects on a RiscOSM map the lookup of potential links using WikiData has not been working properly. More often than not the lookup has been timing out.

I found some time to investigate the problem, and I have produced a simple BASIC programme that exhibits the issues

You can find it via the link. The programme uses the URL_Fetcher, AcornHTTP, and AcornSSL modules to perform a query via WikiData’s API. It’s the exact same query that RiscOSM uses to try to find useful links for the place under the mouse pointer.

There is a !!ReadMe file in the zip file which explains the problem in more detail. I would be grateful if a few other people could try the programme, let me know whether the fetches succeed or fail, and report what versions of AcornHTTP, AcornSSL and URL_Fetcher are on your machine.

(You don’t need RiscOSM to be able to run the test.)

Nov 28, 2021 1:05am

Chris Mahoney (1684) 2165 posts

0/6 succeeded here. Output is here.

URL             0.58 (12 May 2018)
Acorn HTTP      1.04 (22 Apr 2020)
AcornSSL        1.06 (03 Jun 2020) mbedTLS 2.16.8

Nov 28, 2021 2:16am

Bryan (8467) 468 posts

0/6 succeded for me as well

But if I change FNfetch to use Wget instead of URL_GetUrl and URL_ReadData , then it works every time.

Nov 28, 2021 2:36am

Bryan (8467) 468 posts

Alternatively, if you change line 890 to read

WHILE ?q%:VDU ?q%:q%+=1:ENDWHILE: q%-=1: ?q%=13

then it works 6/6 every time.

i.e. change the URL terminator to 13 instead of zero

Nov 28, 2021 8:37am

Matthew Phillips (473) 721 posts

It is very surprising that changing the URL terminator to 13 has any effect. Your code as state above doesn’t just change the terminator. It also strips the closing } from the end of the query. When the loop exits, q% is pointing to the byte with the zero in, so your code will replace the final D of the urlencoded %7D with a carriage return. Although the fetches then all succeed, what you are getting back is an error to say that the query cannot be parsed, along with a backtrace from the jetty engine.

I should have given the module version I had tested with:
Raspberry Pi 3:



URL             0.58 (12 May 2018)

Acorn HTTP      1.04 (22 Apr 2020)

AcornSSL        1.06 (03 Jun 2020) mbedTLS 2.16.8

ARMX6:



URL             0.58 (12 May 2018)

Acorn HTTP      1.04 (22 Apr 2020)

AcornSSL        1.04 (26 Jan 2018) mbedTLS 2.16.0

Iyonix



URL             0.58 (12 May 2018)

Acorn HTTP      1.04 (22 Apr 2020)

AcornSSL        1.05 (09 Sep 2019) mbedTLS 2.16.6

I didn’t realise I had so much variation in OS versions between the machines! Normally none of them fetch the content properly, but this morning the ARMX6 fetched 3/6.

Nov 28, 2021 8:39am

Matthew Phillips (473) 721 posts

If you change the actual terminator from 0 to 13, you get a Bad Request response back from the module.

I suppose it is interesting that sending a badly-formed request to the WikiData server (as in Bryan’s modification above) results in successful fetching of the error. I’m not sure what it tells us. The server can no doubt fail to parse the request quite quickly, and reply with the error, whereas a real request takes a little longer to process. I suppose it tells us that the HTTPS aspect is working well, and that the problem with fetching the body of the response probably lies with the AcornHTTP or URL_Fetcher modules.

Nov 28, 2021 10:20am

Bryan (8467) 468 posts

Sorry. It was late last night and I was still looking at the file returned by my test with Wget (which does work)

Nov 28, 2021 11:51am

Martin Avison (27) 1494 posts

Here none succeeded out of 6.

RISC OS 5.29 (11 Jul 2021)  Titanium
URL  0.58 (12 May 2018)
Acorn HTTP 1.05 (03 Jul 2021)
AcornSSL 1.06 (03 Jun 2020) mbedTLS 2.16.10

Nov 28, 2021 12:42pm

Doug Webb (190) 1180 posts

I didn’t realise I had so much variation in OS versions between the machines!

that the problem with fetching the body of the response probably lies with the AcornHTTP or URL_Fetcher modules.

Well if you look at this thread you will see that putting your faith in narrowing it down to a specific version of AcornHTTP/URL_Fetcher may not be easy as a changes to the build system don’t always result in a positive outcome or indeed an easy way to identify who is running botched modules etc..

I’m not going to go over old ground to state the obvious as much has been said before.

AcornSSL-specific errors

By the way ran the test andf success 1 out out of 6.

And for what it is worth.

ARMX6 – RISCOS5.29 (03-Nov-20)
URL – 0.58 (12 May 2018)
Acorn HTTP – 1.05 (03 Jul 2021)
AcornSSL- 1.06 (03 Jun 2020) mbedTLS 2.16.11

Nov 28, 2021 2:17pm

Matthew Phillips (473) 721 posts

There is an outstanding merge request for AcornSSL but I don’t think it will help with my problem, as it’s clear the connection is being made successfully.

Nov 28, 2021 2:39pm

Doug Webb (190) 1180 posts

Though the thread title is AcornSSL it highlights issues Rick Murray had with broken AcornHTTP versions with Manga to higlight how a module with the same version/date may not be the same and that the actual module size can show they are different.

Given the issues with Manga that use AcornHTTP and I think URL_Fetcher then I would look at those to see if they are broken and see what module sizes people have..

My AcornHTTP 1.05 is 42912 bytes and the URL_Fetcher 0.58 is 13720 bytes.

Nov 28, 2021 4:49pm

Dave Higton (1515) 3525 posts

There is an outstanding merge request for AcornSSL but I don’t think it will help with my problem

You are correct.

Nov 28, 2021 6:14pm

Matthew Phillips (473) 721 posts

I have built myself a debug build of the AcornHTTP module, version 1.05 using the current Disc build sources (ROOL downloads, not GitLab).

Using this build I get a big log file. From that I can see the response headers before they get altered by the module. They include:

content-type: application/sparql-results+xml;charset=utf-8
Content-Encoding: gzip
Accept-Ranges: bytes
Transfer-Encoding: chunked

It’s then clear that after the 1191 bytes of header it then receives around 1466 more bytes which look like gibberish, so are consistent with being gzipped.

The module then is clearly waiting for more, as this state is repeated multiple times until the timeout is reached and the BASIC programme closes the fetcher. I have updated the zip file to add AcornHTTPD (debug version of the module) and an excerpt of the HTTP_Trace file (with some of the repeated material cut out).

Please feel free to download and examine it if you understand this stuff!

I think my next move will be to see if there is a way of asking the server not to use gzip — I am wondering if this is the problem. Or I might try to reconstitute the binary from the hex dump and see if what I have been sent by the server appears to be complete.

I am not au fait with chunked transfer-encoding so I will need to do some reading-up of specifications to get anywhere with this.

(I’m not confident that my debug module is properly built, as I do not generally have a build environment set up for building RISC OS itself. I had to hack around a bit to get it to build at all.)

Nov 28, 2021 10:26pm

Dave Higton (1515) 3525 posts

You’d think it made more sense to use a POST request and shove it all in the body, wouldn’t you?

But I haven’t managed to find how to format a suitable POST request.

Vexingly, all the stuff I can find on line is how to generate a query, not how to transmit it – except as a potentially huge query on the tail of the URL.

Nov 28, 2021 10:48pm

Matthew Phillips (473) 721 posts

WikiData seems to be accepting the query all right — I don’t think there is any problem there.

Got bogged down with other things this evening. Hope to continue investigations tomorrow night.

Nov 28, 2021 11:07pm

Steve Fryatt (216) 2105 posts

But I haven’t managed to find how to format a suitable POST request.

Is this documentation what you’re after?

Nov 29, 2021 1:36am

Chris Mahoney (1684) 2165 posts

For what it’s worth, I tried putting one of the URLs into my HTTPLib (which calls URL/AcornHTTP itself) and it failed there too. This would indicate – as suspected – that it’s an issue with the modules and not with Matthew’s code.

Nov 29, 2021 8:42am

Matthew Phillips (473) 721 posts

I’ve located the problem. The debugging available in AcornHTTP is very helpful.

In chunked transfer encoding, the response is divided into chunks. Each has a header consisting of the length of the chunk in bytes expressed in hexadecimal, followed by CR LF, the binary chunk data (with the stated number of bytes) and then CR LF.

The first chunk returned by the server in my example started “00a” CR LF, i.e. ten bytes. It was followed, correctly, by ten bytes and CR LF. The second chunk was longer and the chunk header started “005aa” CR LF. (I forget the exact figure, 5aa is just an example.)

Putting the chunks back together manually from the debugging output obtained a gzipped stream which uncompressed to the expected SPARQL response. So far, so good.

The debug output from AcornHTTP tells a different story:

Found one! `00a ’
Chunk size declaration: `00a ’
ZERO chunk size – moving to reading footer state

Searching the source for “ZERO chunk” took me to line 539 of Sources.Networking.Fetchers.HTTP.c.header which says

if (*buffer == '0' || ses->chunk_bytes == 0)

For some reason the module seems to assume that the hexadecimal chunk sizes will not have leading zeros for a non-zero value. The code immediately preceding is:

if (isxdigit(*buffer)) {
    ses->chunk_bytes = (int) strtol(buffer, NULL, 16);
} else {
    ses->chunking = FALSE;
    return consumed;
}

So by the time we get to line 539 we know that *buffer is a hex digit, and the value of the number will be in ses->chunk_bytes. I cannot therefore see why a separate check for *buffer equalling ‘0’ is necessary.

I have removed it from my copy, recompiled, and the problem is then fixed.

I had probably better read the HTTP specification to find out whether there is any justification for this caution on the part of the AcornHTTP module, before submitting a patch.

Nov 29, 2021 8:54am

Matthew Phillips (473) 721 posts

Reviewing the HTTP 1.1 specification I cannot see any justification for bailing out when *buffer is ‘0’. There isn’t any comment in the code to explain why this check was included. As well-behaved servers are allowed to have leading zeros in a chunk size, the WikiData server is behaving entirely properly. It is odd that the WikiData server seems to include two leading zeros whether the length is expressed in 1 or 3 hex digits, but it’s allowed to do that.

I’ll see if I can remember how to submit a patch to GitLab.

Nov 29, 2021 2:31pm

Dave Higton (1515) 3525 posts

Well researched, Matthew. I’ve removed that first, faulty, condition and rebuilt AcornHTTP. The result promptly fetched 6/6.

Nov 29, 2021 3:26pm

Doug Webb (190) 1180 posts

Well researched, Matthew.

+1

If one of you are willing to share the rebuilt module with me then I will test it against Manga and see if sorts out an issue that I’m seeing with the latest version.

Nov 29, 2021 7:34pm

Chris Mahoney (1684) 2165 posts

Well done. There’s a particular ‘bad URL’ that I’ve had issues with in the past and it’ll be interesting to see whether the same fix applies there (although this is dependent on being able to find my notes!)

Nov 29, 2021 7:55pm

Steve Pampling (1551) 8170 posts

Question: is the chunk size value actually a three character tag? i.e 00a, or 5aa or even fff?

I’ll lave people to do the hex to decimal conversion to see where my thought is headed.

Nov 29, 2021 8:22pm

Matthew Phillips (473) 721 posts

No, the specification which I linked to above states that the chunk size is represented by one or more hex digits, or in the case of the last chunk, one or more zeros. There is no preferred length for hex numbers in the specification.

Nov 29, 2021 9:08pm

Matthew Phillips (473) 721 posts

I’ve submitted a merge request

Problems fetching from WikiData API

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Nov 27, 2021 10:32pm Matthew Phillips (473) 721 posts	We’ve been noticing for a while that when displaying links for objects on a RiscOSM map the lookup of potential links using WikiData has not been working properly. More often than not the lookup has been timing out. I found some time to investigate the problem, and I have produced a simple BASIC programme that exhibits the issues You can find it via the link. The programme uses the URL_Fetcher, AcornHTTP, and AcornSSL modules to perform a query via WikiData’s API. It’s the exact same query that RiscOSM uses to try to find useful links for the place under the mouse pointer. There is a !!ReadMe file in the zip file which explains the problem in more detail. I would be grateful if a few other people could try the programme, let me know whether the fetches succeed or fail, and report what versions of AcornHTTP, AcornSSL and URL_Fetcher are on your machine. (You don’t need RiscOSM to be able to run the test.)

Nov 28, 2021 1:05am Chris Mahoney (1684) 2165 posts	0/6 succeeded here. Output is here. URL 0.58 (12 May 2018) Acorn HTTP 1.04 (22 Apr 2020) AcornSSL 1.06 (03 Jun 2020) mbedTLS 2.16.8

Nov 28, 2021 2:16am Bryan (8467) 468 posts	0/6 succeded for me as well But if I change FNfetch to use Wget instead of URL_GetUrl and URL_ReadData , then it works every time.

Nov 28, 2021 2:36am Bryan (8467) 468 posts	Alternatively, if you change line 890 to read WHILE ?q%:VDU ?q%:q%+=1:ENDWHILE: q%-=1: ?q%=13 then it works 6/6 every time. i.e. change the URL terminator to 13 instead of zero

Nov 28, 2021 8:37am Matthew Phillips (473) 721 posts	It is very surprising that changing the URL terminator to 13 has any effect. Your code as state above doesn’t just change the terminator. It also strips the closing } from the end of the query. When the loop exits, q% is pointing to the byte with the zero in, so your code will replace the final D of the urlencoded %7D with a carriage return. Although the fetches then all succeed, what you are getting back is an error to say that the query cannot be parsed, along with a backtrace from the jetty engine. I should have given the module version I had tested with: Raspberry Pi 3: URL 0.58 (12 May 2018) Acorn HTTP 1.04 (22 Apr 2020) AcornSSL 1.06 (03 Jun 2020) mbedTLS 2.16.8 ARMX6: URL 0.58 (12 May 2018) Acorn HTTP 1.04 (22 Apr 2020) AcornSSL 1.04 (26 Jan 2018) mbedTLS 2.16.0 Iyonix URL 0.58 (12 May 2018) Acorn HTTP 1.04 (22 Apr 2020) AcornSSL 1.05 (09 Sep 2019) mbedTLS 2.16.6 I didn’t realise I had so much variation in OS versions between the machines! Normally none of them fetch the content properly, but this morning the ARMX6 fetched 3/6.

Nov 28, 2021 8:39am Matthew Phillips (473) 721 posts	If you change the actual terminator from 0 to 13, you get a Bad Request response back from the module. I suppose it is interesting that sending a badly-formed request to the WikiData server (as in Bryan’s modification above) results in successful fetching of the error. I’m not sure what it tells us. The server can no doubt fail to parse the request quite quickly, and reply with the error, whereas a real request takes a little longer to process. I suppose it tells us that the HTTPS aspect is working well, and that the problem with fetching the body of the response probably lies with the AcornHTTP or URL_Fetcher modules.

Nov 28, 2021 10:20am Bryan (8467) 468 posts	Sorry. It was late last night and I was still looking at the file returned by my test with Wget (which does work)

Nov 28, 2021 11:51am Martin Avison (27) 1494 posts	Here none succeeded out of 6. `RISC OS 5.29 (11 Jul 2021) Titanium URL 0.58 (12 May 2018) Acorn HTTP 1.05 (03 Jul 2021) AcornSSL 1.06 (03 Jun 2020) mbedTLS 2.16.10`

Nov 28, 2021 12:42pm Doug Webb (190) 1180 posts	I didn’t realise I had so much variation in OS versions between the machines! that the problem with fetching the body of the response probably lies with the AcornHTTP or URL_Fetcher modules. Well if you look at this thread you will see that putting your faith in narrowing it down to a specific version of AcornHTTP/URL_Fetcher may not be easy as a changes to the build system don’t always result in a positive outcome or indeed an easy way to identify who is running botched modules etc.. I’m not going to go over old ground to state the obvious as much has been said before. AcornSSL-specific errors By the way ran the test andf success 1 out out of 6. And for what it is worth. ARMX6 – RISCOS5.29 (03-Nov-20) URL – 0.58 (12 May 2018) Acorn HTTP – 1.05 (03 Jul 2021) AcornSSL- 1.06 (03 Jun 2020) mbedTLS 2.16.11

Nov 28, 2021 2:17pm Matthew Phillips (473) 721 posts	There is an outstanding merge request for AcornSSL but I don’t think it will help with my problem, as it’s clear the connection is being made successfully.

Nov 28, 2021 2:39pm Doug Webb (190) 1180 posts	Though the thread title is AcornSSL it highlights issues Rick Murray had with broken AcornHTTP versions with Manga to higlight how a module with the same version/date may not be the same and that the actual module size can show they are different. Given the issues with Manga that use AcornHTTP and I think URL_Fetcher then I would look at those to see if they are broken and see what module sizes people have.. My AcornHTTP 1.05 is 42912 bytes and the URL_Fetcher 0.58 is 13720 bytes.

Nov 28, 2021 4:49pm Dave Higton (1515) 3525 posts	There is an outstanding merge request for AcornSSL but I don’t think it will help with my problem You are correct.

Nov 28, 2021 6:14pm Matthew Phillips (473) 721 posts	I have built myself a debug build of the AcornHTTP module, version 1.05 using the current Disc build sources (ROOL downloads, not GitLab). Using this build I get a big log file. From that I can see the response headers before they get altered by the module. They include: content-type: application/sparql-results+xml;charset=utf-8 Content-Encoding: gzip Accept-Ranges: bytes Transfer-Encoding: chunked It’s then clear that after the 1191 bytes of header it then receives around 1466 more bytes which look like gibberish, so are consistent with being gzipped. The module then is clearly waiting for more, as this state is repeated multiple times until the timeout is reached and the BASIC programme closes the fetcher. I have updated the zip file to add AcornHTTPD (debug version of the module) and an excerpt of the HTTP_Trace file (with some of the repeated material cut out). Please feel free to download and examine it if you understand this stuff! I think my next move will be to see if there is a way of asking the server not to use gzip — I am wondering if this is the problem. Or I might try to reconstitute the binary from the hex dump and see if what I have been sent by the server appears to be complete. I am not au fait with chunked transfer-encoding so I will need to do some reading-up of specifications to get anywhere with this. (I’m not confident that my debug module is properly built, as I do not generally have a build environment set up for building RISC OS itself. I had to hack around a bit to get it to build at all.)

Nov 28, 2021 10:26pm Dave Higton (1515) 3525 posts	You’d think it made more sense to use a POST request and shove it all in the body, wouldn’t you? But I haven’t managed to find how to format a suitable POST request. Vexingly, all the stuff I can find on line is how to generate a query, not how to transmit it – except as a potentially huge query on the tail of the URL.

Nov 28, 2021 10:48pm Matthew Phillips (473) 721 posts	WikiData seems to be accepting the query all right — I don’t think there is any problem there. Got bogged down with other things this evening. Hope to continue investigations tomorrow night.

Nov 28, 2021 11:07pm Steve Fryatt (216) 2105 posts	But I haven’t managed to find how to format a suitable POST request. Is this documentation what you’re after?

Nov 29, 2021 1:36am Chris Mahoney (1684) 2165 posts	For what it’s worth, I tried putting one of the URLs into my HTTPLib (which calls URL/AcornHTTP itself) and it failed there too. This would indicate – as suspected – that it’s an issue with the modules and not with Matthew’s code.

Nov 29, 2021 8:42am Matthew Phillips (473) 721 posts	I’ve located the problem. The debugging available in AcornHTTP is very helpful. In chunked transfer encoding, the response is divided into chunks. Each has a header consisting of the length of the chunk in bytes expressed in hexadecimal, followed by CR LF, the binary chunk data (with the stated number of bytes) and then CR LF. The first chunk returned by the server in my example started “00a” CR LF, i.e. ten bytes. It was followed, correctly, by ten bytes and CR LF. The second chunk was longer and the chunk header started “005aa” CR LF. (I forget the exact figure, 5aa is just an example.) Putting the chunks back together manually from the debugging output obtained a gzipped stream which uncompressed to the expected SPARQL response. So far, so good. The debug output from AcornHTTP tells a different story: Found one! `00a ’ Chunk size declaration: `00a ’ ZERO chunk size – moving to reading footer state Searching the source for “ZERO chunk” took me to line 539 of Sources.Networking.Fetchers.HTTP.c.header which says if (buffer == '0' \|\| ses->chunk_bytes == 0) For some reason the module seems to assume that the hexadecimal chunk sizes will not have leading zeros for a non-zero value. The code immediately preceding is: if (isxdigit(buffer)) { ses->chunk_bytes = (int) strtol(buffer, NULL, 16); } else { ses->chunking = FALSE; return consumed; } So by the time we get to line 539 we know that buffer is a hex digit, and the value of the number will be in `ses->chunk_bytes`. I cannot therefore see why a separate check for buffer equalling ‘0’ is necessary. I have removed it from my copy, recompiled, and the problem is then fixed. I had probably better read the HTTP specification to find out whether there is any justification for this caution on the part of the AcornHTTP module, before submitting a patch.

Nov 29, 2021 8:54am Matthew Phillips (473) 721 posts	Reviewing the HTTP 1.1 specification I cannot see any justification for bailing out when *buffer is ‘0’. There isn’t any comment in the code to explain why this check was included. As well-behaved servers are allowed to have leading zeros in a chunk size, the WikiData server is behaving entirely properly. It is odd that the WikiData server seems to include two leading zeros whether the length is expressed in 1 or 3 hex digits, but it’s allowed to do that. I’ll see if I can remember how to submit a patch to GitLab.

Nov 29, 2021 2:31pm Dave Higton (1515) 3525 posts	Well researched, Matthew. I’ve removed that first, faulty, condition and rebuilt AcornHTTP. The result promptly fetched 6/6.

Nov 29, 2021 3:26pm Doug Webb (190) 1180 posts	Well researched, Matthew. +1 If one of you are willing to share the rebuilt module with me then I will test it against Manga and see if sorts out an issue that I’m seeing with the latest version.

Nov 29, 2021 7:34pm Chris Mahoney (1684) 2165 posts	Well done. There’s a particular ‘bad URL’ that I’ve had issues with in the past and it’ll be interesting to see whether the same fix applies there (although this is dependent on being able to find my notes!)

Nov 29, 2021 7:55pm Steve Pampling (1551) 8170 posts	Question: is the chunk size value actually a three character tag? i.e 00a, or 5aa or even fff? I’ll lave people to do the hex to decimal conversion to see where my thought is headed.

Nov 29, 2021 8:22pm Matthew Phillips (473) 721 posts	No, the specification which I linked to above states that the chunk size is represented by one or more hex digits, or in the case of the last chunk, one or more zeros. There is no preferred length for hex numbers in the specification.

Nov 29, 2021 9:08pm Matthew Phillips (473) 721 posts	I’ve submitted a merge request