I hate Resolver
Pages: 1 2
Rick Murray (539) 13850 posts |
So it appears there is an issue on my system where something crashes and it takes out Hearsay. Attempting to use Nettle instead causes an instant die, with a report of a failed LDRB (the base address is something ridiculous like &6xxxxxxx – no memory there). Fatal signal received: Segmentation fault Register dump at 001effb4: a1: 6f6e532f a2: fa207eec a3: ff6e7fef a4: 736262 v1: 2029f1d4 v2: 2029f17c v3: 0 v4: 0 v5: 20000113 v6: 1eedb0 sl: fa20021c fp: fa207f88 ip: 35bad0d5 sp: 1eed74 lr: 1eedac pc: fc18675c cpsr: 20000113 fc186748 : ...à : e0031000 : AND R1,R3,R0 fc18674c : ..Bà : e0420001 : SUB R0,R2,R1 fc186750 : .ŵ½è : e8bd8200 : LDMIA R13!,{R9,PC} fc186754 : . Ðä : e4d02001 : LDRB R2,[R0],#1 fc186758 : .0Ñä : e4d13001 : LDRB R3,[R1],#1 fc18675c : .0Rà : e0523003 : SUBS R3,R2,R3 (I’ve omitted the stack backtrace as it is clearly wrong – it literally unwinds the last three functions which, stupidly, are the three functions leading up to the backtrace rather than the functions involved in the error!) Using Debugger’s annotated dump, this turns out to be a failed strcmp call within Resolver’s lookup. Of course, I cannot reinitialise Resolver because it’s such a badly written pile of rubbish that this happens: *RMKill Resolver *** unrecoverable error in run time system: free failed Exit called * And, of course, like the ghost of a bad zombie, Resolver fails to die. Which, obviously, has knock-on effects: *ping heyrick.co.uk Internal error: abort on data transfer at &FC18675C Postmortem requested Arg1: 0x0001ef59 126809 ae7c in function gethostbyname Arg2: 0x0001ef40 126784 -> [0x0001ef54 0x0001ef59 00000000 0x3e694c3c] Arg1: 0x00000002 2 8394 in function main Arg2: 0x00008120 33056 -> [0xe1a0c00d 0xe92ddbf3 0xe24cb004 0xe24dcf57] Arg1: 0x0001ea60 125536 -> [0x676e6970 0x79656820 0x6b636972 0x2e6f632e] fc1776e0 in shared library function a538 in anonymous function * That, address, by the way, equates to: SVC stack: fa207ec4 : 0001ef59 : - R9 fa207ec8 : fc3938f4 : | R14: fc3938f4 (ASM call to fc1866d8) : : | fc3938f4 = Resolver +1d14 : : | fc1866d8 = SharedCLibrary +13030 : : | = strcmp +0 fa207ecc : ffffffff : [...snip...] : : | fc018ebc = +18ebc in the Kernel : : | = CallVector +0 : : | fc0106f0 = +106f0 in the Kernel : : | = VectorUserSWI +8 fa207fe8 : 40000110 : - PSR? fa207fec : 00066000 : | SWI XResolver_GetHostByName fa207ff0 : fc18748c : | R14: fc18748c : : | = SharedCLibrary +13de4 : : | = _swix +58 Now, what has changed on my system recently that has anything to do with Resolver is my hosts file. The important line being: 127.0.0.1 localhost raspberrypi heyrick.ddns.net srv loopback facebook.com fbcdn.com In order to allow “heyrick.ddns.net” to point to this machine (and not the Livebox via looking up the public IP address), “srv” as a shortcut to typing in something longer, and also to sink any attempts for facebook rubbish to call home. I’m guessing this might be too long and Resolver is too rubbish to deal with it properly? I’m going to reboot (no choice now it’s a zombie) with the hosts file now being: 127.0.0.1 localhost heyrick.ddns.net srv 127.0.0.2 facebook.com fbcdn.com See if that works better. I don’t need “raspberrypi” any more, and no need for “loopback” as “localhost” does the same thing… <sarcasm>And, of course, since the Resolver sources are available, I could drop in some calls to DADebug to work out what’s going wrong here – maybe a subtle buffer overrun or blatant crap like strcpy’ing a too-long string into a fixed size buffer because nobody ever thought to strncpy…</sarcasm> |
Steve Pampling (1551) 8172 posts |
I think I said something similar over a decade ago with respect to reverse lookup and early attempts at DNS blacklist use in !AntiSpam. I’ve repeated variants ever since.
I love that.
Don’t you just wish? (Well, obviously) |
Richard Walker (2090) 431 posts |
Rick, disappointing… I expected your next step to be trawling through a disassembly of Resolver! :) |
Steve Pampling (1551) 8172 posts |
To look at exactly why it doesn’t work properly? Start with a collection of shortcuts, this will sort of do for nows and a bug or two and stew for years. |
Jon Abbott (1421) 2651 posts |
“/Sno” in ASCII. Buffer overrun causing stack corruption?
This tells me its a stacked SWI frame from an application (Nettle?) that triggered the problem, passing a value Resolver_GetHostByName can’t handle. The value in a1 implies it’s passed a full URL instead of just the host name.
Is that correct? Should it not be one entry per line:
An open source OS with closed source code? Oh, the humanity! |
Rick Murray (539) 13850 posts |
Just to follow up… 127.0.0.1 localhost heyrick.ddns.net srv 127.0.0.2 facebook.com fbcdn.com This works. The machine doesn’t have random internet stuff crash and fail. All back to the way it was. So, there’s an apparently undocumented line length or entity count length limit to that which can be placed in the hosts file. That said, it looks like it is supposed to be one entry per line as Jon suggested, however the default supplied with RISC OS seems to imply it’s all on one line with whitespace between each entry. So no idea if this would even work… BTW, comment in the hosts file warns me not to get rid of the “loopback” entry. I did and nothing seems any different. Old info or does something actually use it (in preference to “localhost”)? |
Steve Pampling (1551) 8172 posts |
127.0.0.1 entries like that produce the same effect as a DNS poisoning (deliberate entry of a similar style in your DNS) but with a DNS build you can do things in a granular or sweeping way. |
Jeffrey Lee (213) 6048 posts |
127.0.0.1 localhost raspberrypi heyrick.ddns.net srv loopback facebook.com fbcdn.com I’d hope that multiple hostnames per line will work – that’s the convention that’s supported on other OS’s Of course it is possible that there’s a line length limit hidden in Resolver (or elsewhere?) which is causing problems. |
Rick Murray (539) 13850 posts |
Does ROOL have access to the sources? There’s a limit with either line length or read failure (like using strcpy and trashing subsequent data) or the number of elements that can be parsed. It might be worth somebody taking a quick peek to see if anything stands out, given that this does eventually crash Resolver and mucks up everything that uses it and the length appears to be quite short… |
Steve Pampling (1551) 8172 posts |
All things considered do we really care about patching a multiply broken item? |
Chris Evans (457) 1614 posts |
https://www.riscosopen.org/bounty/polls/29 has no mention of Resolver! |
Steve Pampling (1551) 8172 posts |
I’m working off the news page which links to the stage 2 of 4 “TCP/IP stack overhaul” that you linked to. If anyone wanted to spend time fiddling with Resolver (and probably MBufManager too) they would need to sign an NDA. ROOL might well point any competent person at working on the update rather than patching. |
Stefano Bertinetti (2986) 11 posts |
Premise: I don’t have the technical background necessary to code something of the complexity of an OS part. |
Rick Murray (539) 13850 posts |
Except the part that actually crashes is a call to strcmp in CLib. ;-)
I know. I’m not offering, not because I don’t want to have a rummage, but because I’ve gotten nowhere in years with something that didn’t require the legalese paperwork… It would be nice, if not a fix, to at least know what the limitations are. Because we you’ll observe from my top post, it wasn’t like I was giving it four hundred characters to deal with. |
Chris Hall (132) 3558 posts |
Except the part that actually crashes is a call to strcmp in CLib. ;-) Isn’t that the function in CLib that was ‘debugged’ between RISC OS 5.16 and 5.20 causing the bundled !Writer to crash as everything had worked around the bug previously but fell over once the bug was corrected. Try an earlier CLib. The problem was something to do with string or buffer length. |
Steffen Huber (91) 1953 posts |
One of the nice things about the Apache license is that it does not care about “other” components, it is highly compatible with anything else that is compatible itself (i.e.: yes, there were problems withj GPL compatibility!). I guess it was one of the main reasons why it was chosen as the new RISC OS license in the first place – RISC OS consists of parts under CDDL, BSD and whatever-closed-source-license, so license compatibility is an important point. |
Alan Adams (2486) 1149 posts |
Here I copy the hosts file between systems – including Windows. I suspect copying one without loopback might upset Windows. |
Steve Fryatt (216) 2105 posts |
|
Chris Hall (132) 3558 posts |
You’ve found the thread I was looking for! Yes, you are right. |
Steve Pampling (1551) 8172 posts |
Ah, sounds like it’s a mongrel then. No wonder no-one with access to the source has done anything. |
Colin Ferris (399) 1818 posts |
Assemble code but :- Bit of ‘Baldrick’ if is is the Resolver code I am looking at. It seems to make direct branches into the ‘CLib’ |
Rick Murray (539) 13850 posts |
Not a surprise. All the C things in the ROM are fixed, CLib in ROM is fixed, you can’t build a jump table in read only memory, and many of the ROM C things are started before the OS has a chance to try softloading a newer CLib (you know how well that normally goes). Solution? You know where everything is and you know it isn’t going to move. So when the ROM is built, simply wang in all the correct addresses. Job done.
There’s a part of me that suspects all this NDA nonsense is so they don’t get the urine extracted by a reasonably bright eight year old… ;-) |
Chris Johns (3727) 40 posts |
I was looking at the internet update bounty thing earlier. I would imagine that resolver would be replaced by something less awful as part of that. |
Steve Pampling (1551) 8172 posts |
Some years ago I investigated DNS blacklist use for AntiSpam and to make my hacky test program work I did a quick and dirty port of a GPL code tool that worked better. A charitable view is that Resolver, sort of, mostly works. 1 I think your network switch is faulty because when I plug in this unauthorised kit it won’t connect. (Nope, it’s working perfectly and my RADIUS logs tell me exactly when you plugged it in and what switch port. Rinse, repeat with next supplier, await variant) |
Rick Murray (539) 13850 posts |
As in: congratulations, you just tried to plug an unknown device into a network that you did not have permission to connect to. In other words, you failed the first test. Don’t slam the door on your way out. |
Pages: 1 2