Ticket #47 (Fixed)Tue Aug 15 17:38:03 UTC 2006
Collaboa FCGI can hang
Reported by: | Andrew | Severity: | Blocker |
Part: | Web site: Collaboa (bugs and subversion browsing) | Release: | 2nd public site release |
Milestone: | 2nd public site release completed | Status | Fixed |
Details by Andrew:
There is some specific operation, either by client request or as a background operation, that seems to be able to cause Collaboa to hang. Only Collaboa has been seen to do this so far. The Ruby process ends up taking 100% of CPU time until the FCGI process is forced to exit by Lighttpd, leaving that particular section of the site reporting a 500 error.
The cause is unknown, but the frequency is quite high – if the server is not restarted then within a couple of days the Collaboa installation has usually hung. The irony of the site’s most severe bug to date being in the bug tracking system itself has not been lost upon me…
Changelog:
Modified by Andrew Hodgkinson (6) Mon, October 23 2006 - 16:47:42 GMT
- Milestone changed from Prototype site ready for release to 2nd public site release completed
- Release changed from Prototype site to 2nd public site release
I have been unable to track this down. Provided period checks are made and the server restarted should there be a problem, this isn’t enough to hold up first release anymore.
I’ve deferred this to the 2nd site release.
Modified by Andrew Hodgkinson (6) Sat, December 16 2006 - 19:47:26 GMT
Now that the site is live and under greater load, the fault is manifesting itself again – not too surprising. Logs indicate a series of internal failures to find files, probably due to spiders crawling Collaboa. The reduction in frequency of the failure on the prototype site coincided with the introduction of a robots.txt file requesting that spiders did not crawl that site. The activity logs on the live site, which presently has no such robot instructions, would seem to confirm that there is a particular sequence of actions which lead to the failure.
Eventually Rails logs reach a couple of unexpected Rails template errors followed by silence as the web server reaches the point where it refuses to run the application at all. At this point, the web server shows a huge number of logs along the lines of:
2006-12-16 19:32:05: (mod_fastcgi.c.2599) fcgi-server re-enabled: 0 /home/rool/live/rails/fcgi_sockets/tracker_fcgi
@2006-12-16 19:32:08: (mod_fastcgi.c.2821) backend is overloaded, we disable it for a 2 seconds and send the request to another backend instead: reconnects: 0 load: 131
2
Modified by Andrew Hodgkinson (6) Sun, March 25 2007 - 17:41:34 GMT
- Status changed from Open to Fixed
Closed; robots exclusion has prevented the fault from repeating. It still exists but is clearly such an edge case as to not be worth looking for further. Collaboa is due for an upgrade to 0.6 anyway, which may fix this or indeed introduce new problems; I’ll deal with those as they manifest themselves.