Compressed ROM images
Jeffrey Lee (213) 6048 posts |
It’ll be a while until I look at this in earnest, but I was wondering if anyone had any ideas on how best to add support for this. They key requirements are:
Initially I was thinking that we should go for an approach that maximises the amount of compressed data. This would have involved splitting the HAL and OS into two seperate parts – the pre-RISCOS_Start part and the post-RISCOS_Start part. The decompression code would then be inserted between the two parts. On most systems I’d expect the size of the uncompressed code to be under 8K, maybe even under 1K (not including the size of the decompression code). But since this would involve making significant changes to the existing HALs I’m thinking that a simpler approach would be better, in terms of saving time and avoiding breaking things. So instead I’m thinking of going with the simplest approach I can think of – Sprow’s idea of just compressing the module chain. This has the following advantages:
The only downside is that it would mean that about 270K of the image would be uncompressed. But if we can get a 50% (or better) compression ratio then I suspect it will be a while until we start worrying about that last few hundred KB. |
Sprow (202) 1155 posts |
I’ve never understood what that’s all about. Presumably if there are several subtle differences in the NVidia cards out there then the BIOS ROM must be different on each (so the PC pokes the subtle bits in some subtle manner). Therefore, just do a CRC16 of the BIOS ROM (which gets mapped in somewhere in PCI memory space) and use that to select from a table of corresponding poke differences in the NVidia module. Simples. |
Jess Hampshire (158) 865 posts |
85% easy vs. 99% difficult, I don’t think there’s any question as to the obvious route, especially when there’s not a major shortfall yet. Would the decompression system be available for the whole system after loading? |
Jeffrey Lee (213) 6048 posts |
Not for this initial version, but it’s a potential future enhancement. |
Matthew Phillips (473) 719 posts |
Do I take it you’re hoping the ROM decompression code could double-up as an image filing system for zip files? I’m afraid, if so, you have no inkling of the amount of additional work, over and above just implementing a decompression algorithm for a dedicated purpose, that even a read-only image filing system represents! There are already perfectly good tools for this, and I think David Pilling was offering a read-only SparkFS for the standard disc image anyway. If I’ve misunderstood what you had in mind, then please explain! |
Jess Hampshire (158) 865 posts |
yes
I was thinking in terms of the compression system being one that was already implemented and being trimmed down for the purpose.
I was thinking read-only.
Were it available on the ROM it would have benefits. e.g. a (read only) !boot on filesystems that don’t support the correct names/attributes. |
Jeffrey Lee (213) 6048 posts |
There’s a copy of the zlib library (which uses the same ‘deflate’ algorithm as zip files) in CVS. The licence is permissive enough to allow us to include the code in the ROM image, so I’ll probably be using it as a basis for the compressed ROM system. A quick test with the zdeflate/zinflate command line tools shows that an Iyonix can compress a 4MB ROM image in 36 seconds, and decompress it in 3 seconds. The decompression time is OK, but the compression time is a bit slow. Hopefully there’ll be an easy way to boost the compression performance – It’s hard to be excited about working on a feature that will add an extra minute/half-minute to my ROM build times! Also I’m a bit disappointed that there isn’t builtin support for in-place decompression, but it’s not a deal breaker. |
Jeff Doggett (257) 234 posts |
As I understand it, only the Iyonix flash image needs to be compressed. But the flash loader needs an uncompressed image to patch the Nvidia module. So it would appear that the compression would need to be done by the flasher utility rather than the rom builder. |
Jeffrey Lee (213) 6048 posts |
Very true. And most of the time I’d be wanting uncompressed ROM images so I can easily look at the disassembly in StrongED. Rather than build the compression code into romlinker I’ll probably be writing a seperate tool for compression/decompression. That way it can easily be left out of the main ROM build process and only used when necessary. I think I’ll also have a go at updating our zlib sources to the latest version. Although I don’t think there are any new features that we want/need, it would be good to make sure we have all the latest bugfixes. Plus there have been a couple of optimisations made which might help a bit. |
Andrew Rawnsley (492) 1443 posts |
If you’re updating zblib, would there be any milage in creating a ROOL version of the zlib RISC OS module present in Adjust etc? There’d certainly be more chance of coders using that feature if it were available in both OS branches? I must admit I’ve never bothered with it for this reason. |
Sprow (202) 1155 posts |
See post 2 in this thread. Patching the ROM on a per machine basis and fixing up the CRC/checksum entirely invalidates any point in having the CRC/checksums in the first place. Fixing the NVidia module to detect the cards properly is the right thing to do, independent of ROM compression. |
Martin Bazley (331) 379 posts |
Ooh, yes please! I’d definitely use one of those. |
Theo Markettos (89) 919 posts |
Stupid question time… what about the ‘squeeze’ algorithm (and tool used by Norcroft C etc)? ISTR that’s purpose designed for ARM code, and so gets a better compression ratio. And since the kernel is mostly code… In fact, is it actually documented anywhere? Ah, the code is in: 65: * squeeze takes an ARM executable image file and compresses it, usually 66: * to about half the size, adding decompression code so that the image 67: * will automatically expand itself when it is run. 68: * 69: * For details of the compression scheme, see doc.squeeze. Briefly, 70: * the image is treated as a sequence of 32-bit words, and each word 71: * is encoded in one of four ways, specified by a 4-bit nibble: 72: * zero -> nibble 0 73: * the 7*256 most common word values are encoded with one byte extra as 74: * an index into a table 75: * the 7*256 most common upper-3-byte values are encoded with one byte 76: * extra as index into another table, with the low byte separate 77: * anything else is given in full as 4 bytes. 78: * 79: * The tables of common values are sorted into ascending order 80: * and encoded in a devious way. |
Jeffrey Lee (213) 6048 posts |
It doesn’t look like it should be too hard to produce a clone of ROL’s zlib module. Most of the SWIs look like they map directly to zlib functions.
I’ve just given it a quick test, and it shrank the ROM image to just under 3MB. So while it’s not as good as zlib/deflate it’s at least on par with ordinary LZW algorithms like Squash. The tiny size of the decompression code, support for in-place decompression, high decompression speed (somewhere under a second), and nil memory requirements would make it a good fit for any slow or memory-constrained systems.
About 750k of a ROM image is the contents of ResourceFS. I would have thought that would impact the compression ratio, but after cutting it out of the image the compression ratio was about the same. |
Steve Revill (20) 1361 posts |
Any further thoughts on this? On a related note, I got the “Live RISC OS” ROM build working on a BeagleBoard – it boots entirely from a disc image in ResourceFS, with temp files held in RAMFS. This is great, but the ROM is 28MB in size and takes a bleedin’ long time to load! I can’t quite believe how long it takes (a couple of minutes) given it only takes 5 seconds to write the ROM onto the SD card on my laptop – how can reading a single 28MB image be so slow? It is that the caches are off in the bootloader or something? Until we figure out this particular issue, I don’t think the “Live RISC OS” concept is a goer. |
Jeffrey Lee (213) 6048 posts |
No further thoughts at the moment. I’m just working on clearing a few other things off of my todo list so I can start work on this without worrying about neglecting my other duties.
I believe u-boot runs with the caches off, but I’d have to check the sources to be sure. It’s also possible that it isn’t driving the SD interface in the most efficient manner. |
Steve Revill (20) 1361 posts |
I suspect improving the u-boot performance would be a much more significant win than implementing compressed ROM images. Given that I know you can write the whole ROM in five seconds in (very) high-performance code, I’d’ve thought reading it even in not super-optimal conditions, should be doable in under ten seconds. Certainly, taking half of “ages” isn’t going to be a great deal better than taking “ages”. :) |
Sprow (202) 1155 posts |
I did have 2 random thoughts on the slow booting live image. First, is it in any way related to the dawdle when you type *desktop and there’s a big page shuffle shimmy. Has the extra ROM and RAM disc made things even more complex? Also, I looked in ResourceFS and for every item registered it goes off and recaches and resorts all its nodes immediately. Could this be deferred? So then you can add loads of stuff to ResourceFS but the penalty of sorting thousands of nodes only happens once. Health warning: these could be red herrings. I hate fish. |
Jeffrey Lee (213) 6048 posts |
From my reading of the sources, it looks like it only rebuilds the quick index list the next time FindFileOrDirectory is called. I can see a few bits which could be improved (e.g. it could insert the new entries directly into the quick index instead of rebuilding the whole thing from scratch), but it’s probably better to do some profiling before we get distracted by too much red fish :-) Steve, are all your latest fixes in CVS? I might have some time over the weekend to have a quick look and see where all the time is being spent. (Although I can see it’s still dependent on the closed source LiveDisc component; I’m guessing you won’t have had the time to fix that yet) |
Sprow (202) 1155 posts |
I’m thinking that AddApp might well do directory enumeration, and since there are a lot of things in the disc image this could be quite expensive (even if the action of adding one application is not). Profiling definitely a better use of time than just randomly recoding! |
Steve Revill (20) 1361 posts |
The longest delay (which is at least a minute, if not more) is before anything is printed on the screen (e.g. while the ROM is being loaded into RAM). There’s a bit of a delay around the time it tries to mount the file system, but Ben had an idea about what that could be. The rest seems OK speed-wise. I think the OMAP3Live components source code is all in CVS, except LiveDisc, as you say. The LiveDisc component is pretty easy to re-create – it’s little more than a component with a copy of the Disc build in its Resources folder which can then be copied into the correct location during the export resources phase. |
Theo Markettos (89) 919 posts |
One of the stated design philosophies of u-boot is to turn on caches ASAP, so I don’t think it’s that. Looking at the u-boot SDHCI source (which is a slimmed-down copy of the Linux code), there’s a #ifdef for CONFIG_MMC_SDMA. If it isn’t set, the SD driver just does PIO (this code can never do ADMA). The only board to set this flag is arch-pantheon: the rest (including arch-omap3/4/5) don’t, so use PIO. It also doesn’t do any of the higher speed access modes for SDHCI (which are more flaky and require a lot more messing around getting the timing right). It might be worth an experiment recompiling u-boot with the SDMA flag turned on (to see what happens, assuming a means of unbricking your board), but lack of the high speed modes is really the killer. The full Linux source does support them, and potentially could be dropped in, but the risks of flakiness are high. |
Jeffrey Lee (213) 6048 posts |
As long as the flakiness factor is only related to the controller hardware and not to the SD card being used, I guess it would be OK to enable SDMA and the high-speed code for the OMAP versions of U-Boot (or at least the BB/BB-xM version). But I’m not volunteering to try it just yet ;-) Back on topic, I did start work on the compressed ROM support over the weekend. Another day or two and I should have a simple algorithm (e.g. Squeeze) hooked up so I can test the code. Then I can work on something a bit trickier like zlib. |
Steve Revill (20) 1361 posts |
Cheers Theo, Jeffrey. I must admit we’re buried under other (related) ROOL work ATM which is why we haven’t looked harder at these performance issues. The compressed ROM support should make a significant difference. Once it’s there, I’ll time compressed vs uncompressed with our big fatty of a ROM to see how it performs. If anyone wants a copy of our LiveDisc component (it’s closed source so you’ll have to populate the disc image from the HardDisc4 distribution yourself) just let me know… |
Chris Hall (132) 3554 posts |
If anyone wants a copy of our LiveDisc component (it’s closed source so you’ll have to populate the disc image from the HardDisc4 distribution yourself) just let me know… Yes please. |