Self-extracting archives
Pages: 1 2
Steve Revill (20) 1361 posts |
There are now self-extracting versions of the UnTarBZ2 tool and HardDisc4 images on the download pages. These simply have to be downloaded onto a RISC OS system, given the ‘Utility’ filetype (&FFC) and then run. They will decompress their contents into the same location as the archive. The important benefit of doing things this way is that these archives are stand-alone. You do not require any additional software do decompress them – unlike zip, tar and bz2 archives. There is also a self-extracting version of the tool which we used to create these archives. This is supplied by 7th software and is free for anyone to download and use to create their own archives. ROOL will probably move over to using self-extracting archives for all RISC OS binary releases in the near future. Eventually, we hope to supplement our downloads pages with a proper software packaging system. |
James Lampard (51) 120 posts |
Er, why? Surely the universally supported PKZIP format is a far smarter choice than a file that you have to set to type &FFC and then run. Any of the standard archive types can be read on any platform, your oddball format would seemingly be limited to RISC OS. |
Theo Markettos (89) 919 posts |
I haven’t tried CreateSEC, but if it outputs in PKZIP format with a self-extracting header it should be OK. Infozip is capable of skipping any extraneous stuff and finding an embedded PKZIP. I don’t know if other tools like WinZip do, though. I do poke around RISC OS Zips on other platforms from time to time, so something that was only decompressable on RISC OS would be a pain. |
Jeffrey Lee (213) 6048 posts |
I’m in agreement with James and Theo here – if you’re going to replace everything with self-extracting archives then PKZIP would be a better internal format (even if it would most likely add a dependency on the Shared C Library being present). Apart from allowing people on other platforms to extract the archives, and providing a better compression ratio than Squash, it would also provide a handy solution to the problem of being unable to extract archives that are larger than RAM (since you should just be able to load them into any half-decent RISC OS zip program instead). Judging by Wikipedia, it looks like PKZIP’s “Shrink” format uses LZW - so in theory you could create a PKZIP-compatible self extractor that can still be decompressed using the Squash module (thus keeping the current system’s advantage of a small header size and no SCL dependency) |
Andrew Hodgkinson (6) 465 posts |
The reason for doing this is sanity. Recently, both Steve and I tried to set up RPCEmu from clean using public components only, to test the IOMD ROM. Before trying an unstable ROM, though, it made sense to try RISC OS 3.71 first. So I got hold of a 3.71 ROM, downloaded RPCEmu and:
So…
It’s not quite so bad just for RISC OS 3.71 because you could ignore UnTarBZ2, mostly; in fact you’d probably ignore the ROOL site altogether since you already have your own ROM and thus presumably your own RISC OS machine anyway. If you want RISC OS 5, though, you have this ridiculous boostrapping process and all sorts of trouble with 26-bit and 32-bit variants, with much of the trouble boiling down to filetypes and HostFS.
Something like UnTarBZ2 which instead unzipped things has been discussed in another thread. The CLI zip tool would need to understand RISC OS filetype information. Having no dependency on an external CLib is critical because of the problems and incompatibilities that can arise due to a half-brought-up system, just when you don’t need that kind of pain. It might perhaps be possible to have some kind of Zip-compatible thing which self-extracts on RISC OS. If someone wants to write and test a tool which creates such things and can demonstrate them working on HostFS then great! We’ll use it. You’ll need to pay close attention to filetypes, particularly text and data, and ideally dealing with hard spaces when executing under <= RISC OS 3.71 if possible – our current code doesn’t manage that and HostFS barfs. Otherwise, this new tool certainly saves me huge amounts of pain trying to actually use components from the ROOL site, which is half the point of them being there. There is nothing to say we might not put Zip and self-extracting archives side by side so other operating systems could be used to examine contents in the rare case where this is necessary. Since the most likely use case for self-extracting components at the current time is via HostFS, having ”,ffc” on the end of filenames might be wise also. Harmless on ADFS-like filesystems, very handy on HostFS-like filing systems. Means the end user can just download and run. |
Martin Bazley (331) 379 posts |
Ahem. Infozip? |
Andrew Hodgkinson (6) 465 posts |
The self-extractor requires a 32-bit C library which you can’t obtain without being able to decompress a Zip file :D |
Steve Revill (20) 1361 posts |
Given the mixed response, I have no problem with supplying the binary downloads (note: because they are BINARY downloads, they only work in RISC OS anyway!) in both zip and self-extracting formats. It’d just be one extra step in the process (which is already not fully automated and a PITA right now) for creating the binary downloads. |
Steve Revill (20) 1361 posts |
Oh, and the reason I didn’t use zip as a self extraction file format is because that would have been no fun – I wanted to do something fun for a change. So ner! :p |
James Woodcock (307) 32 posts |
I have written a very quick and dirty tool to extract these self extracting archives on a Linux box. The code can be improved greatly. File types are appended to file names in the usual way, so there is a small advantage to using this tool rather than unzip when extracting under Linux. I haven’t packaged it up yet, but code is at github: git://github.com/mjwoodcock/unsec.git |
Andrew Hodgkinson (6) 465 posts |
Oooh, git. Very modern Nice tool, many thanks – we may not rely on the self-extractor for everything but it’ll certainly be used for a few things and having a cross-platform tool to access the archives. You shouldn’t need to reverse engineer the file format, though (which is what you’ve done, if the Github readme is to be believed). It’s documented in the Help file. Steve – hope you don’t mind – I’ve extended the Wiki documentation for the software to include a large chunk copy & pasted from the software. https://www.riscosopen.org/wiki/documentation/pages/Software+information%3A+CreateSEC |
James Woodcock (307) 32 posts |
Oh, yes. Read the documentation. I must admit, that hadn’t thought of that. Thanks for the link. I’ll have a look sometime shortly. |
Andrew Hodgkinson (6) 465 posts |
I note reading it again that it gives details of the archive format, but does not go into details about squash format. That’ll be in the PRMs. The Squash API is described in volume 4 from page 103 and the file format in volume 4 from page 499. You can find these in PDF format here: http://foundation.riscos.com/Private/manuals/PRMs/ ...though given the URL I’m not sure they’re meant to be open for public access. Still, they are, so you may as well grab a copy. The PRM information, looking at it right now, is actually unusually poor. Even more annoyingly, we have not been able to secure rights to release the source code to this component. Looking at your code, most of the work you’ve done seems to have been on the actual decompression side, so in fact you may well have had no choice but to reverse engineer the lion’s share of it anyway. |
James Woodcock (307) 32 posts |
Thanks for the link. I did manage to find a copy when I was developing the tool. squash_compress gives the most relevant information: the algorithm is 12-bit LZW as used by Unix compress command. I had some LZW code from nspark (arcfs and spark dearchiver for various OSs) that was nearly there – I just had to cater for the unix compress header in the stream. |
Steve Revill (20) 1361 posts |
I have tweaked the format of the binary so that you have an offset to the start of the compressed data structure near the start of the file. Thus, reading the word at offset 24 (bytes) gives you an offset (bytes) from the start of the file to the start of the compressed data structure (i.e. the “rsqs” word). Note: the word immediately preceding the the “rsqs” ID word is the size of the structure (bytes), if you care – which you probably don’t because you’ve already loaded the file so know how big it is. Still, it’s a useful extra sanity check. I’ve also added a -n switch to CreateSEC so that you can build an archive using our format that doesn’t include the self-extraction code. Finally, I rebuilt the self-extracting code downloads that were on our site (and noticed that the HardDisc4 one was broken, oops!). |
James Woodcock (307) 32 posts |
That sounds useful – thanks a lot. I’ll update my code to deal with that at some stage soon. |
James Lampard (51) 120 posts |
Then why don’t you produce your own version, using your self extractor without the dependency. You could throw in any additional required modules. I’ve also seen the InfoZip back end binaries compiled with GCC.
I’ve written a program called LM98Util (available from http://www4.webng.com/resurgam/) which on RISC OS will strip these and set the filetypes. |
Steve Revill (20) 1361 posts |
Erm, because we don’t have to? |
Theo Markettos (89) 919 posts |
Don’t get me wrong, self-extracting archives are a good idea. I’ve suffered the RPCEmu !Boot shuffle enough times to be fed up with it. So your solution is welcomed from that perspective. My only worry was about suggesting that distribution would switch to SEAs. Given the aim for greater cross-compile supoort, which means things like manipulating archives on other platforms, I was concerned that this would be made more difficult. Filetypes are an annoyance but, given the right unzip tool with support for the ”-,” option (append ,xxx types) I find it easier to unpack on the host system (or NFS server) than in the emulator. So having both possibilities is good. (which reminds me, anyone know the status of -, going into mainline infozip?) |
W P Blatchley (147) 247 posts |
Steve, I think this going to be a huge help for people setting up RPCEmu. Nice work. I did the ‘setup shuffle’ a few days back, and as Andrew describes above in great detail, it’s not a fun dance to do! Is there any mileage in the suggestions to switch to ZIP format? It seems like the compression algorithms used could be compatible (though I’m not sure if Squash’s particular brand of LZW is supported by PKZIP), and the following suggests that a RISC OS executable header could be appended without straying outside the ZIP spec.: http://en.wikipedia.org/wiki/ZIP_%28file_format%29#Combining_ZIP_with_other_file_formats Seems like, if ROOL intend to start distributing SEAs, that would allow you to just put one file for each component up on the website – which could be self-extracted on RISC OS, or just accessed as a regular ZIP archive on other OSes – possibly saving some hassle in the long run? |
Jeffrey Lee (213) 6048 posts |
I’m looking at the code now, and it looks like there are some important differences between the two LZW formats – specifically, how they handle code 256. Unix ‘compress’ 2.0 and below treats it as a standard data token, while >2.0 treats it as a “clear code tree” command. PKZIP, on the other hand, expects any 256 code to either be followed by a 1 (for “increase code size”) or a 2 (for “partial clear code tree”). So unfortunately it doesn’t look like there’s any sensible way of getting files which can be decompressed by both Squash and PKZIP. Of course, I’ve only just discovered that to create a self-extracting zip archive all you need to do is prepend a RISC OS build of ‘unzipsfx’ to the zip file and then run ‘zip -A’ to correct the zip header. So, how about this for a compromise:
Unfortunately a quick check suggests that SparkFS handles self-extracting zip archives, but SparkPlug doesn’t – which could make life a bit annoying for people without SparkFS and want to browse the zipfiles as image filing systems. |
Peter Howkins (211) 236 posts |
Just a quick note to let you know that the self extracting archives have been compiled with strh (ARMv4) instructions in, that don’t work on ARM6/7/7500 and don’t work reliably on the SA (when used in a RPC). |
Jeffrey Lee (213) 6048 posts |
Are you sure? I can’t see any sign of STRH in the CreateSEC archive, nor in the BASIC program that generates the archive headers. The only place where I do see STRH is the occasional one inside the compressed data stream – which should obviously never get executed. |
Matthew Howkins (373) 3 posts |
I have tried the self-extracting archives in RPCEmu. They always fail when attempting the opcode 0xe18c10b3. This disassembles as an STRH instruction. I can’t find any references to this opcode in ‘CreateSEC.util’, so it is probably not at fault. However I can find one example in the IOMD ROM - is it possible this instruction is present in the ROM, and just happens to get called when running one of the self-extracting utilities? |
Jeffrey Lee (213) 6048 posts |
Yeah, it looks like it’s the squash module that’s at fault. Due to source licensing issues the version in CVS is just a binary blob, and it looks like the IOMD ROM builds are using the newer >=ARMv5 version of the module instead of the older <=ARMv5 version. |
Pages: 1 2