Weak TCP/IP stack
Pages: 1 2
David Feugey (2125) 2709 posts |
I just realise that the TCP/IP stack is very weak with recent roms. After a few loads of pages, I get this request in WebJames’s log: Then: new connections are impossible (WebJames just don’t receive any request). Conclusion: riscos.fr is unavailable, and I have no idea of when it’ll be OK again. |
David Feugey (2125) 2709 posts |
A few precisions: |
David Feugey (2125) 2709 posts |
Nota: if you have some older ROM for Panda (end of 2014?), please send them to me. Not to be able to reboot is a problem, but a broken website is just impossible :) |
Steve Pampling (1551) 8170 posts |
Pick a date and I will do the nearest if there isn’t an exact match. |
David Feugey (2125) 2709 posts |
Thanks: let’s try first with a end december release… |
Steve Pampling (1551) 8170 posts |
December 22nd 2014 Should be in your mailbox by the time you read this |
David Feugey (2125) 2709 posts |
Yep, thanks Steve. I made a test. It seems to work much better. The ‘sleeping socket problem’ is gone. I explain: with current ROM, all works OK, even if I load a lot of pages (of course, there is a limited of sockets, so massive slowdowns can appear). But if I stop loading pages and wait for the server to go to 300 MHz, then, it’s finished: no way to load pages any more. Just need to wait 5-10 minutes, for a new laps of time of normal use. With the 20141222 ROM, this problem is clearly not present. Yeeeeesss. |
Steve Pampling (1551) 8170 posts |
The next task then is to look through the CVS updates for things that have changed that affect the network provision. With the OMAP boards that would be USB as well as the logical network stack elements. If you narrow down the specific items people stand more chance of identifying the cause. |
David Feugey (2125) 2709 posts |
I spoke too fast. There are still problems to wake up the server when speed is low (300 MHz), but it seems to wake up in less time (around 30 sec. VS around 10 minutes). I suspect something not good with frequency management under the classic PandaBoard. Pandaboard ES probably works better (it did, but I changed it for a classic PandaBoard). Perhaps if I force the motherboard to use higher slow speed… |
David Feugey (2125) 2709 posts |
I tried 800 MHz as slow speed and 1 GHz for high speed to reduce timing issues linked to the change of frequency. Slowdowns are less massive, but seem much more frequent. I’ll check again tomorrow, as Internet is a bit slow tonight (and so is my server). |
Rick Murray (539) 13840 posts |
I’ve noticed my server sometimes stops responding to connections and just acts dead. I have built the module with a load of tracing information to see if the problem is my code (probably!) or RISC OS (hope not!) but as is often the case, the problem doesn’t show itself when I’m looking for it! Making changes to the module means reloading the module which means closing and re-opening the socket, which means everything will work again. Hmm! On the other hand, it does mean that if I identify this as a real problem, then a workaround while investigation is “in progress” could be to close and re-open the socket after a period of time has elapsed? [Pi B, self-built ROM of 2nd November 2014 vintage] |
Rick Murray (539) 13840 posts |
For what it is worth, I am testing my server and I have the Livebox forwarding the port to the Pi so I can test it outside of the LAN by using my phone on GSM/3G. Connections on port 23 from:
Connections on port 80 from… nobody that isn’t me. Interesting. Logs show no login attempts. The person/script probably aborts as soon as it fails to look like a standard Unix login. The lesson? If you write world-facing code, it should be fairly bulletproof as there are those scanning IP ranges and I would imagine not for friendly purposes (my server has not been announced anywhere and it is dynamic IP anyway…). Still, at least I haven’t heard from Szechuan yet today…must be a quiet day in China. Or maybe our threats now come from the Middle East? [update – the DADebug output now records the time of connection attempt] |
David Feugey (2125) 2709 posts |
I found where is the problem. in fact, there are two of them: My solution: And voilà! all seems OK now. Please make some tests. Nota: integrated cache does not seems to work very well inside WebJames. Perhaps I’ll be more lucky with HTTPServ… |
David Feugey (2125) 2709 posts |
Grumpf. No file writes problems any more, but after a few minute of non activity the panda still forgets to answer requests. |
Rick Murray (539) 13840 posts |
Ah, we appear to have different problems then. Mine is specifically the socket appears to just cease responding – but as said I’ve not identified where or why this is happening. It could well be my code messing up. I really hate these “random” problems, it is a pain to try debugging that which isn’t constant. That’s why I have left the port open to the world, I wonder if some of the connection attempts are sending malformed packets or somesuch… it is grasping at straws, yes, but I can’t do diddly when it is working like it should. ;-) |
David Feugey (2125) 2709 posts |
The same here. It seems to wake up faster on the ES. That’s really strange. After around 10 minutes of inactivity, first request gets a timeout. Second is ok, but very slow. And then it’s going faster and faster, and after 5-10 loads, all is OK again… until next inactivity time. I simply suspect sync problem when going from 1.2 GHz to 350 MHz. So SD problem + socket problem. |
David Feugey (2125) 2709 posts |
Update: problem seems to vanish (almost completely) when switching to a PandaBoard ES. The only good news its that I now have a super optimized setup. |
Rick Murray (539) 13840 posts |
Finally heard from China…
It is 5am over there. 19C, and raining. |
David Feugey (2125) 2709 posts |
Problem comes back this afternoon, but PandaBoard ES stays silent for a shorter time than PandaBoard. I make some requests, Ethernet LED blinks, but WebJames doesn’t react (and doesn’t received the request). Ethernet blinks again and again (my web browser :) ), then, suddenly, all wakes up, request is received, disc access is made, answer is sent. Strange. Network packets or sockets seems to disappear. |
David Feugey (2125) 2709 posts |
I tried a new experiment: to set slow and fast speed to the same value (700 MHz). I’ll see if socket issue is linked to sync problems linked to changes of frequency. |
David Feugey (2125) 2709 posts |
(I can confirm that at 700 MHz, the problem occurs almost immediately. 920 MHz works better, but with some almost permanent lags. 1.2 GHz is really better, but not perfect.) CORRECTION Lags are the same at all speeds, but are coming faster with low speeds. |
David Feugey (2125) 2709 posts |
The same with HTTPServ. It’s even worse. It works OK, then, after some idle time, it does not work any more. Completely. Data is coming to network card, then lost. Conclusion: I have no solution. I just can’t use RISC OS for web server any more. Old ROMs did not permit me to reboot (hey, I cannot live near my Panda). New ROMs have a very big issue with network. |
David Feugey (2125) 2709 posts |
Same setup under a Pi Model B+, generic boot + latest RC14 ROM. |
David Feugey (2125) 2709 posts |
Test finished. I confirm dead sockets problem on all tested set up : Pi Model B (wakes up after 5-10 sec), PandaBoard non ES (does not wake up easily), PandaBoard non ES (wake up a bit faster than non ES, but slower than Pi). RC12A Rom works about the same, but slowly, because of other network issues. |
Rick Murray (539) 13840 posts |
I had the problem just now. What my code does is it sets a CallAfter to fire in 50cs. This then schedules a CallBack which returns whenever (close enough to not worry). My code will then do:
So, essentially, the module checks the socket twice per second, so it doesn’t impact the system but responds fast enough to be covered by general network latency. ;-) After each eventuality has been handled, the CallAfter is scheduled anew and life carries on. I have a suspicion that something is causing the CallAfter to not be scheduled. I don’t see any obvious exit-without-doing-it in the code, so I have added a debug command to tell me if the module thinks a CallAfter is pending (it should always be at any point when I can issue the command). That will tell me if it was failing to be set or if something else is going on. It may also work to take all of this out and replace it with a simple CallEvery instead. The socket handling code doesn’t take anything remotely near half a second to do its work; and anyway there is an interlock to prevent a CallBack being set if one is pending, so it should be safe to handle the CallEvery on time – if a CallBack is pending, another won’t be set. I should point out (again) that my Pi ROM is 2nd November 2014. I have not updated the sources more recently; and since I need localtime() to work for an additional CE(S)T timezone in the UK territory, it rather implies some specific code patches. ;-) In short – David’s problem concerns me – but I don’t think the causes to our problems are the same despite superficially looking similar. |
Pages: 1 2