Backup, restore and verify
Dave Higton (1515) 3526 posts |
I’ve just been stress testing my DBack (backup) and DRest (restore) apps. As we’re aware, one key thing is to test whether the restored files are indeed the same as the originals. I backed up my Apps folder – 17542 files, which are substantially over 500MB when backed up, then restored the backup to another folder. The backup and restore each took roughly 9 minutes. I wrote a file compare app some time ago, which is extremely fast, but only compares one pair of files, and is entirely manually controlled. So I added remote control via Wimp messages. Over the last few days I’ve written a tree compare app, which reads the backup’s IndexLog file to get all the files in one tree (remember that DBack can optionally include or exclude a list of files, so any other list would have to take the same into consideration when created). It does a simple conversion to build the second path for each file, then commands FileComp to compare each pair of files. 17542 pairs of files compared in 28 minutes; 1 file different, which was a stats file for CPUClock, and was indeed different. Not bad. That’s a RasPi 3B+ with a USB-connected SSD. I wrote the tree compare app in two parts, the second of which is sort of a plugin (LIBRARY command, the BASIC file is specified in the !Run file) so that alternative file lists could be used (not specifically parsing DBack’s IndexLog file), and alternative algorithms to create the second filename could be used. |
Chris Hall (132) 3554 posts |
I wrote a file compare app some time ago, You could always try !Cat – it does a CRC of the first 4 Mbytes of each file and also compares file datestamps. You can do a ‘before’ and ‘after’ and it will show any differences. You just drag the directory you want to catalogue onto its icon bar icon. |
David J. Ruck (33) 1635 posts |
I’ve always used !DirSync as it checks what needs to be copied, copies, and checks it has copied. You can either do a quick check using the file meta data, or a full CRC of the contents. Full check did spot a couple of corruptions when copying from the the Iyonix to a SMB fileserver, but I’ve never had any problems with the later machines. |
Colin Ferris (399) 1814 posts |
How do you check ‘Pictures’ – have them running in a album and see if you can spot deflects? |
Stuart Swales (8827) 1357 posts |
@Dave: Could DBack produce an optional IndexCRC akin to IndexLog so that backups can be tested independently some years down the line? I used to do something like this at Acorn: CRCTree could be run as part of a backup, which produced a list of files, metadata and CRCs that were stored alongside the backed-up data. The CRCTree output could be used to verify the integrity of individual files during a restore. We had some flaky media which would verify immediately after writing but bit rot would set in shortly thereafter! |
David J. Ruck (33) 1635 posts |
For jpegs the jpeginfo tool on Linux (there might be a port somewhere) can check they are valid, this is how I first noticed pictures on the NAS uploaded by the Iyonix had been corrupted, and after that switched !DirSync from fast meta checks, to slow content checks. Luckily all the corrupted (usually truncated) pictures were still on the Iyonix’s disk, so I didn’t lose anything. |
Dave Higton (1515) 3526 posts |
I think a point may have been missed. I’m stress testing a pair of backup and restore apps that I’ve written and released. My recent efforts are to test that the total process, from backup to restore, has resulted in a full set of restored files that are identical in every bit to the originals. Fortunately they are, whether the backup is plain or encrypted (Blowfish), in the tests I’ve done. FileComp is also already released, but the recent upgrade I’ve done makes it worth a new release – and TreeComp too, in case anyone else has the need to compare two trees for absolute equality. Yeah, I know, it’s not a common need… The idea of CRC calculation is an interesting one. The metadata are currently stored in the IndexLog file (in the next release, this will be split into Index and Log), so adding a CRC could be done optionally. I’d imagine that a proper CRC calculation might slow the process down, OTOH a checksum (don’t know if that’s what you meant, Stuart, but some people seem to say CRC when they mean checksum), although much quicker, doesn’t offer much protection. I’d have to have a look. And there would have to be another app to check the CRC from the data files against the CRC from the index, if you really do want to check the integrity of the backed up data without restoring them. So maybe not with the next release. |
Stuart Swales (8827) 1357 posts |
I do mean SWI OS_CRC (or even sha256sum these days!). If you care about the quality of your backups, a little more time producing them doesn’t matter. You needn’t do it for daily backups, but if you’re archiving, that’s a different matter. Something to ponder for the future, perhaps. Why not just CRC/sha256sum the entire archive? Well, it depends how easy it is to extract individual objects. Perhaps there’s just one byte of damage in a 100Mb archive – should that stop me restoring all the files which are not damaged? Remember that when restoring, you may not have access to the originals to compare against! Also, these days, people will generally have enough storage to just restore the data and check CRC (if present) as a side-effect rather than needing a separate tool. [Edit: Added the SWI bit! Might save others some time.] |
Dave Higton (1515) 3526 posts |
Having thought about your ideas a bit more, it occurs to me to use the SHA1 algorithm, as it’s only an integrity check, and DBack/DRest already use SHA1 via a module. (The Blowfish encryption doesn’t use the key text directly, rather it uses the SHA1 digest of the key text.) Also a verify-only operation can of course be done by the DRest app itself, simply not creating the files. I’ll see what can be done! |
Dave Higton (1515) 3526 posts |
I’ve just discovered SWI OS_CRC, which I did not previously know existed. That can be chained over as many data chunks as I need, unlike SHA1, and can of course be called from BASIC. |
Dave Higton (1515) 3526 posts |
Like for like comparison.
|
Simon Willcocks (1499) 513 posts |
Newer processors, ARMv8.1+, have CRC32 instructions, in Byte, Half-word, and Word flavours. |