Ticket #324 (Open)Wed Oct 24 13:00:51 UTC 2012
Machine will crash when subjected to UDP packet flood
Reported by: | Jeffrey Lee (213) | Severity: | Major |
Part: | RISC OS: General | Release: | |
Milestone: | Status | Open |
Details by Jeffrey Lee (213):
While trying to track down a potentially related networking bug, I’ve discovered that spamming a RISC OS machine with UDP packets will often cause it to repeatedly crash with data aborts. I’ve seen both an Iyonix and a BB (or Pi? I can’t remember which) crash due to this, so it’s likely to be a general network stack issue rather than a bug in a specific driver or the USB stack.
One odd fact is that if the machine receiving the packets is expecting to receive them (e.g. there’s a program running which will gobble up the packets as they’re received by a socket) then the crash doesn’t seem to occur (or at least it didn’t crash with my test program which runs single tasking and sits in a tight loop waiting for data). So it might be some buffer overflow edge case which is the cause of the problem.
I’m not sure offhand what data/packet rate is required to trigger the crash, I’ll try and find the time to do some further testing over the next few days.
Changelog:
Modified by Sprow (202) Sun, February 03 2013 - 10:10:11 GMT
Possibly a variant on this, while investigating the change in EtherH-4_59 I happened to spot that RISC OS stiffs on receipt of UDP ping > MTU.
- Internet 5.56.
- SharedCLibrary 5.73
with
ping -s 1600 my_windows_xp_machine
it’s all fine (ie. sending to something other than RISC OS), but
ping -s 1600 another_risc_pc
one of two failure modes occurred.
In the desktop the mouse would keep moving, but F12 did nothing, suggesting interrupts enabled but stuck somewhere in SVC mode?
Outside the desktop, sometimes ctrl-break would kill the ShellCLI and get you back to a prompt. Typing *EBINFO (or whichever driver you have) showed that packets were still being received but 100% being discarded because there were no filters set up, implying that the Internet module had quit (and hence deregistered).
Modified by Timothy Baldwin (184) Fri, September 13 2013 - 10:44:25 GMT
I have observed a similar problem, on a Linux x86-64 computer I repeatedly run Nmap 6.00:
<pre>nmap -sU -p1-65,70-32769,32772-49151,49153-49170,49172-65534 riscpc</pre>
The RiscPC 600 with a StrongARM processor and I-Cubed EtherLan 600A running RISC OS 5.20 softloaded over RISC OS 3.7, crashes within a few minutes with data aborts at either FC2BA57C or FC2C31DC.
The port numbers where chosen to avoid open ports, but even with Access and AUN disabled and no third party software running (just ROM image, disc image and NIC drivers) it still crashes.
I can not reproduce this in RPCEmu 0.8.10
Modified by Jeffrey Lee (213) Sun, February 11 2018 - 22:07:37 GMT
My original notes say that spamming an Iyonix or a Pi with unwanted packets will cause a crash, but now that I’ve finally returned to this I can’t seem to provoke a crash in either case. Wah!
Communication between EtherUSB / SMSC95xx backends (Pi & BB-xM) seems fine, with no crashes, unusual behaviour, or obvious data corruption when the link is being flooded with UDP packets.
Communication with an Iyonix is a bit different. Transmitting data from the Iyonix is fine, but spamming it with packets results in the system slowing to a crawl (stuck in SVC/IRQ?), whether something is receiving the packets or not. Only about 7% of the packets appear to be received, with a high “mbuf allocation failed” count shown by *EKInfo. So it’s close to crashing, but not quite there. If I lower the transmit rate then things behave more sensibly. Hopefully it won’t be too hard to spot what the problem is.
I’ve still got the wandboard and IGEPv5 to try, maybe I’ll be able to get some interesting behaviour out of those as well. RiscPC I could try too, but that would be purely academic since we don’t have any 32bit compatible / open source unipod network drivers yet.
Modified by Jeffrey Lee (213) Tue, February 13 2018 - 22:09:29 GMT
Some more observations:
My testing on the 11th was with a Pi 3, but my original testing would have been on a Pi 1. Re-testing on a Pi 1 now, I get similar problems to those seen with the Iyonix. Which suggests that a lot of the problems are related to CPU speed rather than individual network drivers.
Retesting on a BB-xM, but with the clock speed reduced to 300MHz, shows that when subjected to a packet flood around 80% of the CPU time is spent dealing with the incoming packets. But the system does remain responsive, unlike with the Pi 1 or Iyonix. At that clock speed there’s also significant packet loss (about 50%) – not as bad as the Pi 1 or Iyonix, but still indicative of a problem.
Next stop: running profilers to try and discover where all the time is going.