Problems fetching from WikiData API
Pages: 1 2
Matthew Phillips (473) 721 posts |
We’ve been noticing for a while that when displaying links for objects on a RiscOSM map the lookup of potential links using WikiData has not been working properly. More often than not the lookup has been timing out. I found some time to investigate the problem, and I have produced a simple BASIC programme that exhibits the issues You can find it via the link. The programme uses the URL_Fetcher, AcornHTTP, and AcornSSL modules to perform a query via WikiData’s API. It’s the exact same query that RiscOSM uses to try to find useful links for the place under the mouse pointer. There is a !!ReadMe file in the zip file which explains the problem in more detail. I would be grateful if a few other people could try the programme, let me know whether the fetches succeed or fail, and report what versions of AcornHTTP, AcornSSL and URL_Fetcher are on your machine. (You don’t need RiscOSM to be able to run the test.) |
Chris Mahoney (1684) 2165 posts |
0/6 succeeded here. Output is here. URL 0.58 (12 May 2018) Acorn HTTP 1.04 (22 Apr 2020) AcornSSL 1.06 (03 Jun 2020) mbedTLS 2.16.8 |
Bryan (8467) 468 posts |
0/6 succeded for me as well But if I change FNfetch to use Wget instead of URL_GetUrl and URL_ReadData , then it works every time. |
Bryan (8467) 468 posts |
Alternatively, if you change line 890 to read WHILE ?q%:VDU ?q%:q%+=1:ENDWHILE: q%-=1: ?q%=13 then it works 6/6 every time. i.e. change the URL terminator to 13 instead of zero |
Matthew Phillips (473) 721 posts |
It is very surprising that changing the URL terminator to 13 has any effect. Your code as state above doesn’t just change the terminator. It also strips the closing } from the end of the query. When the loop exits, q% is pointing to the byte with the zero in, so your code will replace the final D of the urlencoded %7D with a carriage return. Although the fetches then all succeed, what you are getting back is an error to say that the query cannot be parsed, along with a backtrace from the jetty engine. I should have given the module version I had tested with:
ARMX6:
Iyonix
I didn’t realise I had so much variation in OS versions between the machines! Normally none of them fetch the content properly, but this morning the ARMX6 fetched 3/6. |
Matthew Phillips (473) 721 posts |
If you change the actual terminator from 0 to 13, you get a Bad Request response back from the module. I suppose it is interesting that sending a badly-formed request to the WikiData server (as in Bryan’s modification above) results in successful fetching of the error. I’m not sure what it tells us. The server can no doubt fail to parse the request quite quickly, and reply with the error, whereas a real request takes a little longer to process. I suppose it tells us that the HTTPS aspect is working well, and that the problem with fetching the body of the response probably lies with the AcornHTTP or URL_Fetcher modules. |
Bryan (8467) 468 posts |
Sorry. It was late last night and I was still looking at the file returned by my test with Wget (which does work) |
Martin Avison (27) 1495 posts |
Here none succeeded out of 6.
|
Doug Webb (190) 1180 posts |
Well if you look at this thread you will see that putting your faith in narrowing it down to a specific version of AcornHTTP/URL_Fetcher may not be easy as a changes to the build system don’t always result in a positive outcome or indeed an easy way to identify who is running botched modules etc.. I’m not going to go over old ground to state the obvious as much has been said before. By the way ran the test andf success 1 out out of 6. And for what it is worth.
|
Matthew Phillips (473) 721 posts |
There is an outstanding merge request for AcornSSL but I don’t think it will help with my problem, as it’s clear the connection is being made successfully. |
Doug Webb (190) 1180 posts |
Though the thread title is AcornSSL it highlights issues Rick Murray had with broken AcornHTTP versions with Manga to higlight how a module with the same version/date may not be the same and that the actual module size can show they are different. Given the issues with Manga that use AcornHTTP and I think URL_Fetcher then I would look at those to see if they are broken and see what module sizes people have.. My AcornHTTP 1.05 is 42912 bytes and the URL_Fetcher 0.58 is 13720 bytes. |
Dave Higton (1515) 3535 posts |
You are correct. |
Matthew Phillips (473) 721 posts |
I have built myself a debug build of the AcornHTTP module, version 1.05 using the current Disc build sources (ROOL downloads, not GitLab). Using this build I get a big log file. From that I can see the response headers before they get altered by the module. They include: content-type: application/sparql-results+xml;charset=utf-8 It’s then clear that after the 1191 bytes of header it then receives around 1466 more bytes which look like gibberish, so are consistent with being gzipped. The module then is clearly waiting for more, as this state is repeated multiple times until the timeout is reached and the BASIC programme closes the fetcher. I have updated the zip file to add AcornHTTPD (debug version of the module) and an excerpt of the HTTP_Trace file (with some of the repeated material cut out). Please feel free to download and examine it if you understand this stuff! I think my next move will be to see if there is a way of asking the server not to use gzip — I am wondering if this is the problem. Or I might try to reconstitute the binary from the hex dump and see if what I have been sent by the server appears to be complete. I am not au fait with chunked transfer-encoding so I will need to do some reading-up of specifications to get anywhere with this. (I’m not confident that my debug module is properly built, as I do not generally have a build environment set up for building RISC OS itself. I had to hack around a bit to get it to build at all.) |
Dave Higton (1515) 3535 posts |
You’d think it made more sense to use a POST request and shove it all in the body, wouldn’t you? But I haven’t managed to find how to format a suitable POST request. Vexingly, all the stuff I can find on line is how to generate a query, not how to transmit it – except as a potentially huge query on the tail of the URL. |
Matthew Phillips (473) 721 posts |
WikiData seems to be accepting the query all right — I don’t think there is any problem there. Got bogged down with other things this evening. Hope to continue investigations tomorrow night. |
Steve Fryatt (216) 2106 posts |
Is this documentation what you’re after? |
Chris Mahoney (1684) 2165 posts |
For what it’s worth, I tried putting one of the URLs into my HTTPLib (which calls URL/AcornHTTP itself) and it failed there too. This would indicate – as suspected – that it’s an issue with the modules and not with Matthew’s code. |
Matthew Phillips (473) 721 posts |
I’ve located the problem. The debugging available in AcornHTTP is very helpful. In chunked transfer encoding, the response is divided into chunks. Each has a header consisting of the length of the chunk in bytes expressed in hexadecimal, followed by CR LF, the binary chunk data (with the stated number of bytes) and then CR LF. The first chunk returned by the server in my example started “00a” CR LF, i.e. ten bytes. It was followed, correctly, by ten bytes and CR LF. The second chunk was longer and the chunk header started “005aa” CR LF. (I forget the exact figure, 5aa is just an example.) Putting the chunks back together manually from the debugging output obtained a gzipped stream which uncompressed to the expected SPARQL response. So far, so good. The debug output from AcornHTTP tells a different story: Found one! `00a ’ Searching the source for “ZERO chunk” took me to line 539 of Sources.Networking.Fetchers.HTTP.c.header which says if (*buffer == '0' || ses->chunk_bytes == 0) For some reason the module seems to assume that the hexadecimal chunk sizes will not have leading zeros for a non-zero value. The code immediately preceding is: if (isxdigit(*buffer)) { ses->chunk_bytes = (int) strtol(buffer, NULL, 16); } else { ses->chunking = FALSE; return consumed; } So by the time we get to line 539 we know that *buffer is a hex digit, and the value of the number will be in I have removed it from my copy, recompiled, and the problem is then fixed. I had probably better read the HTTP specification to find out whether there is any justification for this caution on the part of the AcornHTTP module, before submitting a patch. |
Matthew Phillips (473) 721 posts |
Reviewing the HTTP 1.1 specification I cannot see any justification for bailing out when *buffer is ‘0’. There isn’t any comment in the code to explain why this check was included. As well-behaved servers are allowed to have leading zeros in a chunk size, the WikiData server is behaving entirely properly. It is odd that the WikiData server seems to include two leading zeros whether the length is expressed in 1 or 3 hex digits, but it’s allowed to do that. I’ll see if I can remember how to submit a patch to GitLab. |
Dave Higton (1515) 3535 posts |
Well researched, Matthew. I’ve removed that first, faulty, condition and rebuilt AcornHTTP. The result promptly fetched 6/6. |
Doug Webb (190) 1180 posts |
+1 If one of you are willing to share the rebuilt module with me then I will test it against Manga and see if sorts out an issue that I’m seeing with the latest version. |
Chris Mahoney (1684) 2165 posts |
Well done. There’s a particular ‘bad URL’ that I’ve had issues with in the past and it’ll be interesting to see whether the same fix applies there (although this is dependent on being able to find my notes!) |
Steve Pampling (1551) 8173 posts |
Question: is the chunk size value actually a three character tag? i.e 00a, or 5aa or even fff? I’ll lave people to do the hex to decimal conversion to see where my thought is headed. |
Matthew Phillips (473) 721 posts |
No, the specification which I linked to above states that the chunk size is represented by one or more hex digits, or in the case of the last chunk, one or more zeros. There is no preferred length for hex numbers in the specification. |
Matthew Phillips (473) 721 posts |
I’ve submitted a merge request |
Pages: 1 2