RISC OS Open: Forum: Wiki offline

Mar 31, 2010 11:07am

I’ve been forced to take the Wiki offline. The PostgreSQL database supporting it has become corrupted at the filesystem level. The cause is unknown. Our hosting service is looking into possible issues with the new server we’re using.

I’m attempting to recover data, but recent changes may be lost. I do not know when the Wiki will become available again. Apologies for the inconvenience.

Mar 31, 2010 11:56am

Andrew Hodgkinson (6) 465 posts

OK, hopefully everything is running properly now and data has been either restored or reconstructed where necessary. We still don’t know what prompted the problem – the hardware has been running without a problem for quite a while before the ROOL account moved to it, for example, though it’s always possible that a new fault has developed. We’ll be keeping an eye on it anyway.

If you added pages to the Wiki or edited any in the last couple of days, please make sure they’re in the page list and have your changes:

https://www.riscosopen.org/wiki/documentation/pages

If you spot any problems, please e-mail us at:

webmaster@riscosopen.org

Thanks.

Mar 31, 2010 12:32pm

Jan Rinze (235) 368 posts

Yet another case for having good backups..

Mar 31, 2010 3:15pm

Andrew Hodgkinson (6) 465 posts

Backing up every single change to the database is impractical. Instead, backups are daily. With the precise point of filesystem corruption unknown and a window of roughly two days, purely restoring from backup would lose all Wiki changes made since that point. Instead, I elected to reconstruct the corrupted data by combining inspection of the missing elements with out of date equivalents in the backup, then producing an up to date replacement object.

It was then further necessary to dump and re-import all databases in case the corruption had spread elsewhere (propagated indexing faults, for example, are a possibility). Creating new databases and importing the exported SQL data should clean up any such problems.

This is the first fault of that nature seen since the site started in 2006 and there was nothing of use in any of the logs, so it’s the usual frustrating “once in a blue moon” bug which is hard to diagnose. We were actually lucky that the bit of database it took out was quite easy to replace. In a more severe case, we might be forced to roll back some or all of the databases to an earlier backup, losing some or all changes to all applications since that point.

Mar 31, 2010 3:16pm

Andrew Hodgkinson (6) 465 posts

It’s just kind of ironic that we move to a new server with a “this’ll help!” announcement, then promptly lose a bit of DB to a mysterious failure mode :-)

Been one of those days really. We’re trying to organise new merchandise for Wakefield but the fire in the BT exchange this morning took out the payment processor for the company we were using, so when we came to pay, the process failed. We’re trying BACS now. The delivery window was already quite tight so fingers crossed that everything comes through in time.

Mar 31, 2010 7:49pm

W P Blatchley (147) 247 posts

My changes to *MemoryI and *MemoryA were fairly recent, and they’re still there. Hopefully you recovered everything. Sorry you’re having a bad day – it all come good in the end! The Wiki in particular is becoming extremely useful. In my opinion, this is set to become by far the most useful reference for RISC OS we have at the moment. Keep up the good work!

Wiki offline

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options

Mar 31, 2010 11:07am Andrew Hodgkinson (6) 465 posts	I’ve been forced to take the Wiki offline. The PostgreSQL database supporting it has become corrupted at the filesystem level. The cause is unknown. Our hosting service is looking into possible issues with the new server we’re using. I’m attempting to recover data, but recent changes may be lost. I do not know when the Wiki will become available again. Apologies for the inconvenience.

Mar 31, 2010 11:56am Andrew Hodgkinson (6) 465 posts	OK, hopefully everything is running properly now and data has been either restored or reconstructed where necessary. We still don’t know what prompted the problem – the hardware has been running without a problem for quite a while before the ROOL account moved to it, for example, though it’s always possible that a new fault has developed. We’ll be keeping an eye on it anyway. If you added pages to the Wiki or edited any in the last couple of days, please make sure they’re in the page list and have your changes: https://www.riscosopen.org/wiki/documentation/pages If you spot any problems, please e-mail us at: webmaster@riscosopen.org Thanks.

Mar 31, 2010 12:32pm Jan Rinze (235) 368 posts	Yet another case for having good backups..

Mar 31, 2010 3:15pm Andrew Hodgkinson (6) 465 posts	Backing up every single change to the database is impractical. Instead, backups are daily. With the precise point of filesystem corruption unknown and a window of roughly two days, purely restoring from backup would lose all Wiki changes made since that point. Instead, I elected to reconstruct the corrupted data by combining inspection of the missing elements with out of date equivalents in the backup, then producing an up to date replacement object. It was then further necessary to dump and re-import all databases in case the corruption had spread elsewhere (propagated indexing faults, for example, are a possibility). Creating new databases and importing the exported SQL data should clean up any such problems. This is the first fault of that nature seen since the site started in 2006 and there was nothing of use in any of the logs, so it’s the usual frustrating “once in a blue moon” bug which is hard to diagnose. We were actually lucky that the bit of database it took out was quite easy to replace. In a more severe case, we might be forced to roll back some or all of the databases to an earlier backup, losing some or all changes to all applications since that point.

Mar 31, 2010 3:16pm Andrew Hodgkinson (6) 465 posts	It’s just kind of ironic that we move to a new server with a “this’ll help!” announcement, then promptly lose a bit of DB to a mysterious failure mode :-) Been one of those days really. We’re trying to organise new merchandise for Wakefield but the fire in the BT exchange this morning took out the payment processor for the company we were using, so when we came to pay, the process failed. We’re trying BACS now. The delivery window was already quite tight so fingers crossed that everything comes through in time.

Mar 31, 2010 7:49pm W P Blatchley (147) 247 posts	My changes to MemoryI and MemoryA were fairly recent, and they’re still there. Hopefully you recovered everything. Sorry you’re having a bad day – it all come good in the end! The Wiki in particular is becoming extremely useful. In my opinion, this is set to become by far the most useful reference for RISC OS we have at the moment. Keep up the good work!