Support for files of 2GB or larger in C applications

16 posts, 6 voices

Oct 28, 2011 2:41pm Ben Avison (25) 445 posts	I’ve just added 64-bit file pointer support to the shared C library. This support follows the so-called LFS extensions which are now common in Unix-style operating systems, to the extent to which they apply to a strictly conforming ISO C library such as ours. At the present time, since the 64-bit file pointer FileSwitch API hasn’t yet been defined, you still have a 4GB file limit. Since the C library’s file pointer manipulation API uses signed numbers, the use of the 64-bit API means we are effectively raising the limit from 2GB. But a big advantage is that once any C applications have been recompiled with this new C library API, no further changes will be required to them to make them work with the new FileSwitch API, once it has been implemented – at least, provided the application uses stdio.h exclusively for file operations. So, if you are responsible for a C application, whether open or closed source, and you might reasonably want it to handle files larger than 2GB, you should seriously consider migrating to the new API. A summary of the 64-bit C library API and implementation details for RISC OS can be found on this wiki page.

Oct 29, 2011 8:40am Sprow (202) 1158 posts	This is significant step forward, removing the 2G file limit for C programs. I was amused to see the LFS spec is dated 1996! A few comments and questions (added here to avoid proliferating threads) Will an updated !System, with the new C library, be made available for non RISC OS 5 users? Related question, can someone migrate the documentation pages from Iyonix to this site, then have the old URL redirect here. References to a “new” !System from 2002 are a bit confusing. Unrelated question, in the C library sources why is the StrongARM switch determined on whether the kernel supports AMBControl?

Oct 30, 2011 9:34am Jess Hampshire (158) 865 posts	Would it be possible to add support for virtual large files? Edit: (The aim being that programs could store large files on old filesystems, similar to the way they could store files with long names on early adfs systems using !longfiles) Any file larger that 2GB would be stored as a folder containing parts, (as has been used to overcome the 4GB limit on FAT by some PC programs). Any program not using the system (including the current filer) would see a folder containing sub 2GB files, programs using it would just see a large file.

Oct 30, 2011 9:43am Sprow (202) 1158 posts	Please fix your C programs properly, per Ben’s notes.

Nov 4, 2011 2:10pm Jess Hampshire (158) 865 posts	since the 64-bit file pointer FileSwitch API hasn’t yet been defined When it is defined, could it also be non-blocking?

Nov 5, 2011 12:47am Ben Avison (25) 445 posts	Will an updated !System, with the new C library, be made available for non RISC OS 5 users? Yes, as and when !System gets rebuilt, the copies of the C library binaries will be updated. In particular, note that owners of the C Tools receive a broader licence to redistribute the !System that’s supplied alongside them than you would by obtaining them under the shared source licence (i.e. it can also be included with commercial products). Related question, can someone migrate the documentation pages from Iyonix to this site Those pages belong to Castle/Iyonix Ltd. We at ROOL don’t have control over them or (AFAIK) rights to edit and republish their contents. We could try to raise it with them, but from past experience, I wouldn’t hold your breath… Unrelated question, in the C library sources why is the StrongARM switch determined on whether the kernel supports AMBControl? Historical, I suspect. AMBControl was introduced in RISC OS 3.7, coincidentally the same version that introduced StrongARM support. It’s possible that at some stage, builds of the C library targetted at earlier OSes were build using a copy of Hdr:RISCOS which had AMBKernel set false. That’s not true of any version of the header in CVS, so all builds of the C library have the StrongARM switch set true. Would it be possible to add support for virtual large files? You wouldn’t do that at the level of the C library if you did. If you did implement such a thing, it would be a transitional arrangement – much like !LongFiles was made obsolete by native support for long filenames in FileCore from RISC OS 4 onwards. Personally, I wouldn’t prioritise it, because (a) engineering time is too precious to spend on transitional schemes at the moment, and (b) I’m not keen on that sort of hidden complexity at the best of times. At least when !LongFiles screwed up, you just ended up with some files which needed renaming; if a virtual large file went wrong, you’re potentially looking at gigabytes of corrupted data. I know which I’d prefer to encounter… When it is defined, could it also be non-blocking? I could be facetious and say yes, if you’re volunteering to make all the changes yourself :) Seriously though, I see introduction of non-blocking file operations as a completely separate task from 64-bitting the file pointers. The former involves issues of reentrancy and memory allocation (you can’t just stick things on the stack if you’re required to return to the caller for operations that haven’t yet completed) and involves changes more deeply through the filing system stack, down to the block driver level. The latter is “just” a matter of making some data structures larger. There’s a whole lot of parts of the FileSwitch API that don’t involve file pointers at all – such as the “read or write a block of data from the current position in the file” calls – just the sort of calls you’re actually most likely to want to block on! I also have my doubts about how much software would ever be adapted to use a non-blocking file access API too – witness the fact that the ISO C library doesn’t make any such calls available. If on the other hand, when you’re talking about non-blocking, you’re including calls which yield the current task if the filing system is busy, then no changes to the API are needed. We’ve had that for decades – see PipeFS for the oldest example.

Nov 5, 2011 9:13am Sprow (202) 1158 posts	Yes, as and when !System gets rebuilt, the copies of the C library binaries will be updated. Good stuff. I see this as of similar significance to the addition of the long long maths functions, so wanted to be sure that “I can’t run it on my A7000” wasn’t going to be a barrier to wide adoption. We […] don’t have control over […old Iyonix website…] or rights to edit and republish their contents. Fair point, I think it’s worth tieing a note to a carrier pigeon’s leg to ask though as it’s hardly ‘new’ material as advertised any more – from Castle’s point of view a potential commercial licencee is hardly going to be encouraged at reading so many abandoned websites/domains. [On the use of AMBControl to assemble StrongARM…] AMBControl was introduced in RISC OS 3.7, coincidentally the same version that introduced StrongARM support. Ah OK. Well, as that’s the only thing in the source tree using that switch I think it’s time it got binned, along with several other bits of hdr:RISCOS that I just deleted from my local copy!

Nov 5, 2011 10:44am W P Blatchley (147) 247 posts	We […] don’t have control over […old Iyonix website…] or rights to edit and republish their contents. Some of the docs there are basically lifted straight out of the repository, aren’t they? For example, the Font Manager specs. In that case, presumably there’d be no issue about integrating them into the Wiki here? Ah OK. Well, as that’s the only thing in the source tree using that switch I think it’s time it got binned, along with several other bits of hdr:RISCOS that I just deleted from my local copy! Tidying up these legacy quirks in the repository would be an excellent step forward, since unless you’re very familiar with the history of the development of the OS, they can really throw you off track.

Nov 5, 2011 8:24pm Jess Hampshire (158) 865 posts	I see introduction of non-blocking file operations as a completely separate task from 64-bitting the file pointers. The former involves issues of reentrancy and memory allocation (you can’t just stick things on the stack if you’re required to return to the caller for operations that haven’t yet completed) and involves changes more deeply through the filing system stack, down to the block driver level. The latter is “just” a matter of making some data structures larger. The reason I asked, was that it might be possible to get programs rewritten once, but I can’t see it happening twice. Would it work to define the new API as non blocking, so programs are written not to expect blocking, but actually implement the non-blocking part at a later date? There’s a whole lot of parts of the FileSwitch API that don’t involve file pointers at all – such as the “read or write a block of data from the current position in the file” calls – just the sort of calls you’re actually most likely to want to block on! So these would need new definitions too. (Which would initially just map to the old ones) It seems like the non-blocking issue is a bigger problem than not being able to use files bigger than 4GB, because the programs that would need to access big files, are likely to run poorly due the the blocking issue anyway, (i.e. Video replay.)

Nov 7, 2011 7:30am Matthew Phillips (473) 721 posts	The reason I asked, was that it might be possible to get programs rewritten once, but I can’t see it happening twice. The rewriting for using large files is pretty easy for applications written using stdio in C, and probably rather more complicated for anything written in assembly language or BASIC (or using SWIs from C) because of the need to handle larger file pointers yourself. So if folk are prepared to update code for large files, they’ll be willing to update for a future non-blobking interface if it comes along. As a programmer, you probably wouldn’t want to update and test your code for both changes at once — it’s a good idea not to introduce more than one change at a time.

Nov 7, 2011 8:31am Sprow (202) 1158 posts	We […] don’t have control over […old Iyonix website…] or rights to edit and republish their contents. Some of the docs there are basically lifted straight out of the repository, aren’t they? For example, the Font Manager specs. In that case, presumably there’d be no issue about integrating them into the Wiki here? That’s true, that could be wikified. The other content is worth having though, subject to approval, then delete the abandoned copy. as that’s the only thing in the source tree using that switch I think it’s time it got binned Tidying up these legacy quirks in the repository would be an excellent step forward, since unless you’re very familiar with the history of the development of the OS, they can really throw you off track. I’m pretty brutal with my code strimmer. With CVS you can always get the crusty pastry back if you want.

Nov 8, 2011 10:00pm Ben Avison (25) 445 posts	The reason I asked, was that it might be possible to get programs rewritten once, but I can’t see it happening twice. I don’t see that as a strong argument. A lot of programs only use file operations to load and save whole files to/from RAM, which means they’re fundamentally limited to less than even 2GB files. These would not need rewriting for either 64-bit file pointers, nor would many of them particularly benefit from non-blocking IO, so these programs would probably happily continue to use the existing calls. This leaves programs that handle things like disc images, archives or video, and programs that handle arbitrary data like Filer_Action or diff tools. These are relatively few in number, which immediately reduces any rewriting burden. Things like video decoders are already at the mercy of rapidly changing fashions for codecs as well as for rapidly obsoleted hardware acceleration support, so if you can’t get them adapted for enhancements to the filing system API, then you have bigger things to worry about! The same is true, perhaps to a lesser extent, for disc images and archive formats. I don’t believe that you have yet declared what sort of non-blocking calls you had in mind. I can think of three main ways you might achieve it: Have the lower levels of the filing system stack force a task switch if they find they are waiting on hardware to complete an operation. This does not require any change to the APIs at all, demonstrably already works for PipeFS, and could added on a per-filing system basis at a later date (I’m aware I’m slightly glossing over the technicalities of the task switching here, but we are talking about a future enhancement after all). New API calls that are structured broadly similarly to the existing ones, but return some sort of “would block” error if the operation has not yet completed. These would commonly be called in a tight loop waiting until the error is no longer returned. New API calls that set up an operation, then additional calls to poll its status and/or callbacks on operation completion. I imagine that you’re thinking of one of the latter two options if you’re imagining an API change is needed. But why force authors to add the additional complexity in how they make filing system calls at this point, just in order to give them access to 64-bit file pointers, on the off-chance that at some point in the future, non-blocking file operations are implemented, and not using the first API option above. And even if they were implemented, a common usage case is always going to remain that an application can’t continue with its functioning until a file operation has completed, so for the sake of reducing code duplication, you’d probably want the OS to provide a blocking version of the calls in addition anyway. As it happens, I think I’d favour the first option in any case. Even if it wasn’t simpler in terms of ease of implementation and reduced need for redesign and testing, the other two options wouldn’t give any benefit to applications written in standard C or BASIC, which has got to be a large proportion of RISC OS software. And since multi-core ARMs seem to be The Future, we ought to be trying to think more in terms of running lots of threads of execution and allowing them to block from time to time, as this is a better design when there are multiple CPUs in a system. I don’t think you should worry too much about the case of a media player, either. At the sort of bitrates that compressed media use, the time spent streaming the data from disc should take a relatively insignificant proportion of the CPU even with fully blocking IO. Even if it did, existing media players like Replay (or the RISC OS STB software stack) can and do use the CBAI or RTSupport modules to do processing when the CPU would otherwise be idle during disc operations, and ideally most of the processing is actually done on a separate DSP core in any case, so the CPU usage is not the limiting factor. Sorry, this has turned into a bit of a rant. But I really don’t think we should confuse the two issues. The work that needs doing on the filing systems is daunting enough as it is, we really need to divide it into manageable chunks, then focus on them and not get sidetracked.

Nov 9, 2011 8:27am André Timmermans (100) 655 posts	I don’t think you should worry too much about the case of a media player, either. At the sort of bitrates that compressed media use, the time spent streaming the data from disc should take a relatively insignificant proportion of the CPU even with fully blocking IO. Even if it did, existing media players like Replay (or the RISC OS STB software stack) can and do use the CBAI or RTSupport modules to do processing when the CPU would otherwise be idle during disc operations, and ideally most of the processing is actually done on a separate DSP core in any case, so the CPU usage is not the limiting factor. I can agree with you here, file IO on RISC OS is real slow. For example, people complained on the forum about executable compressing incompatibilities with RISC OS and I told them that compressing them to save a few hundred of KB of disc space is unnecessary these days and that they should just leave them uncompressed, I was told: you cannot imagine how many seconds of loading time whe gain with large applications like Firefox on the Beagleboard. Also in all file read/write comparative tests I have seen RISCOS was left far behind the other systems. Another problem on RISC OS, is my DiskSample module which performs sound decoding and IO from a callback so that the sound is not interrupted by other tasks performing long operations. It come with all sorts of tricks for network filing systems to try to ensure that I do not attempt to read data while the FS cannot do it (non-reentrant ShareFS, etc). With some old version of RISC OS I remember being unable to play files from data CDs, probably it itself relied on callbacks to complete the IO. This all means that either we wait 50 years to get a multithreaded, timesliced multitasking RISC OS which yields the CPU to another task when the current one is bloqued by an IO, or we implement a socket like file IO API, or you release and document those CBAI and RTSupport modules which are totally unknown to me. While we are speaking of file IO speed and API changes, a little extension to the OS_Args (OS_FILE?) SWI modifying the extend of the file so that you can tell it not zero the extra bytes allocated to the file would be welcome. It will certainly allow for faster copies of files from Samba, LanManFS, etc.

Nov 9, 2011 3:14pm Ben Avison (25) 445 posts	release and document those CBAI and RTSupport modules which are totally unknown to me CBAI (Call Back After Interrupt) was written by WSS, but Replay’s ARPlayer used it, if it was installed, to reduce the latency experienced by the decoders. RTSupport is an implementation of the same concept, but done with the cooperation of the kernel, in order to avoid certain drawbacks of the CBAI module. This has been available on the ROOL website for a couple of years now, I think. Source is here and documentation is here. a little extension to the OS_Args (OS_FILE?) SWI modifying the extend of the file so that you can tell it not zero the extra bytes allocated to the file would be welcome. By all means add a new reason code that does this, but I believe that zeroing the files was a deliberate security feature of the existing implementation. Imagine a network of machines, where a remote machine can create and extend a file in your publicly-writeable share. This would mean that the remote machine can effectively read back the contents of the free space on your hard disc, which may include deleted copies of files that you didn’t wish to share.

Nov 9, 2011 4:42pm Sprow (202) 1158 posts	faster copies of files from Samba, LanManFS I think you mean to shares via LanManFS. LanManFS just does a CREATE (OSFile 7) then starts sending, but you have to wait for Windows to zero out the space. It’s not that RISC OS wants it filling with zero, that’s just what Windows does (for the security reason Ben mentions). Locally typing CREATE on a FileCore disc doesn’t do that – you just get random old deleted data.

Nov 11, 2011 8:29am André Timmermans (100) 655 posts	I think you mean to shares via LanManFS. LanManFS just does a CREATE (OSFile 7) then starts sending, but you have to wait for Windows to zero out the space. It’s not that RISC OS wants it filling with zero, that’s just what Windows does (for the security reason Ben mentions). Locally typing CREATE on a FileCore disc doesn’t do that – you just get random old deleted data. I don’t now what the *CREATE command does, but using OS_Args 3 to set the extent of a file certainly does. The Samba server sets the extent of the file to ensure (I think) that their will be enough space on disc for the whole file. For large files it took so much time due to the zeroing of data that the PC side though that te connection was lost and I had to pach my copy of te RISC OS server to skip this call.