Weak TCP/IP stack
Pages: 1 2
David Feugey (2125) 2709 posts |
I more and more think the two problems are the same. I did have a problem with disk accesses. But I have also a problem of non responding sockets. Network blinks (= request received), disk does not (= nothing in the WebJames’s log). The problem seems to be here for months/years, as the RC12a ROM has this bug too. To be honest, I don’t know what to do. I think, I’ll simply shut down all the riscos.fr website, the time to find an alternative, as I sink in Google rank. The curious point is that nobody did notice the problem until today. I have probably no visitors :) Nota: perhaps that this problem is the same as the “no sockets left”, that I have after around one month of WebJames use. (Would be cool to see this solved for ROS 5.22 :) ) |
David Feugey (2125) 2709 posts |
I came back to my fastest setup: PandaBoard ES. It’s the fastest to wake up. But also to see its socket going to death. The strange point is that once active, sockets don’t die. I should make an apps to ping on my server. Does anyone has simple Basic code to make an http call ? :) |
David Feugey (2125) 2709 posts |
My batch file on a PC :START Guess what. no lag any more on server. All socket stay active. If some sockets are dead, you must wait several minutes to revive them, but all will be back to normal later… or almost (sometimes a request fail). Problem: my SD card will be dead in two days with this ‘fix’. |
David Feugey (2125) 2709 posts |
Now I need to code this in Basic, else I will need to use two servers. One for RISC OS & one to revive the it. |
Rick Murray (539) 13840 posts |
Two things. One – Google’s rank is unimportant. You will sink anyway because you are not an https site and Google prefers that these days. I’m the same, but I’ve never cared about my Google status. I got disillusioned with the whole thing when I took a peek in the “SEO” world and how obsessed people were with “monetizing” their website, talking about pulling content that was not turning a profit. These people aren’t interested in sharing information, they just want to cash in. And thanks to that, a lot of Google’s search results are crap. Two – perhaps you could write a short program to (periodically, like maybe on the hour?) shut down the webserver task, close open files, then call OS_Reset to kick the entire machine. I think a shutdown-reset cycle should take around 30s (a total guess!), that might not cure your problems, but may at least stop them from being show stoppers. I am looking at your site right now and sometimes it doesn’t respond, other times everything comes up quickly. If you want to make a basic HTTP call, you could perhaps call wget via OSCLI? Tell it to fetch something. ;-) |
David Feugey (2125) 2709 posts |
Not only. The fact that the website crashes each evening when Google robots are coming is not good. And it means too that there is no cache and no indexation of the content.
No. The problem appears only a few seconds/minutes after the launch of WebJames.
As expected, it’s not really useful. Only one socket is alive. So another connection could make all failed. The script is only useful to revive sockets faster (or to get timeouts instead of you, if you prefer). The strange point is that ShareFS, that use sockets too, works perfectly. Ping too. So it’s really linked to sockets called via the C interface. |
Colin (478) 2433 posts |
Have you tried any of the roms in my omap4 network test thread. They have changes to etherusb which may be relevant. |
David Feugey (2125) 2709 posts |
After 30 minutes, failed to connect to web server. Connection does not want to come again. Ping still works, as ShareFS. |
David Feugey (2125) 2709 posts |
I switched to company server, as it’s not possible any more to use it like this :) |
Rick Murray (539) 13840 posts |
To follow this up… I have been running a small site on my Pi for two days now without problems, using the current WebJames, the latest firmware, and the most recent (at the time) version of RISC OS (5th April). Yuck, all my times are an hour out according to CLib. :-/ I have just made a clone of riscos.fr and testing over the LAN using my phone shows no unexpected problems, even though the pages are somewhat more complex. I did see David’s oflafofla (or, these days, dofla, that’s surely “ofla” in a post Homer Simpson world):
Looking in the other log, I see this:
Does WebJames support KeepAlive? I didn’t think it did, but maybe…? Just been at it with the iPad and doubled the size of the log file. ;-) I can’t say the server has been taking it easy due to few people knowing it is available, as my logfile shows:
and so on. Lots of rubbish like this. Is the phpMyAdmin link broken? Surely the “//” can’t be right? Right – I’m off to watch a cute zombie called Liv Moore… ho ho. |
Rick Murray (539) 13840 posts |
Tomorrow I’ll revert back to my modified older RISC OS. An unexpected consequence of using a different timezone to the UK expected seems to be that NetTime is getting things horribly wrong.1 My computer thinks it is 0h53 (it is 0h28) and the status says NetTime last synced a day ago. What? 2 :-/ Update: well . . . I entered NetTime_Kick in a task window and the machine has frozen. Since I have to reset, I’ve reverted back to my build of RISC OS. Well, I guess that’s one way to clear all the crap in the log files, huh? 1 Surely I can’t be the only person with Timezone +1 & DST ? 2 Half an hour in a day is kind of poor. Could NetTime be a little more intelligent here and disable slewing if it can’t check the time for whatever reason? Surely it would be best to try to keep the time as it is rather than continue to slew and have it get more and more incorrect? |
Malcolm Hussain-Gambles (1596) 811 posts |
Talking of NetTime, this guy is to blame – Genius at work. No that’s not sarcasm. |
Rick Murray (539) 13840 posts |
What’s today? The 25th? The server has been running through the week and my server has needed a couple of restarts (plus the code being updated requiring the update to be loaded). |
David Feugey (2125) 2709 posts |
No, as I used also HTTPServ without vhost, with the same problems. |
Rick Murray (539) 13840 posts |
I don’t use ShareFS. Did you ever try without it active? |
Steve Pampling (1551) 8170 posts |
I did suggest that as I’ve had problems with WebJames stability on Iyonix in the past. Adding the opportunity for alignment issues to the mix isn’t going to make it more stable. |
Rick Murray (539) 13840 posts |
Can you please be more specific about what you mean by the stabillity issues, namely did you notice if the problem was something repeatable? (I wonder if it is related to the dofla – that looks like a null pointer being used). A question for both of you – are you using the simple version or the PHP build? I didn’t need PHP so I’m using the simpler one. I did notice that the resolve IPs option is extremely crashy – logging in from 127.0.0.1 shouldn’t cause the server to instantly die. ;-) |
Colin (478) 2433 posts |
The dofla string will be caused by the printing of a NULL pointer – try printf(“%s\n”, (void*)0); As it happens when the command to fetch a page is printed to the logfile it would appear that the command is not always set when printing to the log file. It doesn’t mean there’s a problem – other than the display in the logfile. The command string may be checked for null after the logfile output. |
Steve Pampling (1551) 8170 posts |
It was a while back, but as I recall it was leaking sockets1 when clients had intermittent connections (ropy old laptop) and randomly crashed or froze after a few hours or days use. 1 After a period of time it woud crash and any attempt to restart or run anything else using IP sockets would report a socket in use |
Rick Murray (539) 13840 posts |
It appears to be a little more stable in this respect. *inetstat -a Active Internet connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp 0 0 raspberrypi.home.http akirei.home.38648 ESTABLISHED tcp 0 0 raspberrypi.home.telne akirei.home.52736 ESTABLISHED tcp 0 0 raspberrypi.home.http akirei.home.46667 CLOSE_WAIT tcp 37 0 raspberrypi.home.49196 91.203.57.172.443 CLOSE_WAIT tcp 37 0 raspberrypi.home.49195 91.203.57.172.443 CLOSE_WAIT tcp 37 0 raspberrypi.home.49194 91.203.57.172.443 CLOSE_WAIT tcp 37 0 raspberrypi.home.49193 91.203.57.172.443 CLOSE_WAIT tcp 37 0 raspberrypi.home.49192 91.203.57.172.443 CLOSE_WAIT tcp 0 0 raspberrypi.home.49189 68.232.35.121.http LAST_ACK tcp 0 0 *.http *.* LISTEN tcp 0 0 *.telnet *.* LISTEN udp 0 0 *.49152 *.* udp 0 0 *.netbios-ns *.* udp 0 0 *.bootpc *.* Akirei is my phone, I just did a port scan to make sure RISC OS isn’t responding to anything else (though locally; the Livebox is only allowing telnet and http through from outside). And since the last reboot, for tcp: 70 connection requests 293 connection accepts 0 bad connection attempts 0 listen queue overflows 307 connections established (including accepts) 367 connections closed (including 117 drops) And for all: 5812 packets for this host 5952 packets for unknown/unsupported protocol 0 packets forwarded 0 packets not forwardable 31171 packets received for unknown multicast group 0 redirects sent 4636 packets sent from this host
I got that when the IP lookup failed (as it frequently did). It seems that WebJames neither attempts to trap the exception and try to deal with it sensibly, nor does it then try to close open ports. |
Pages: 1 2