File handles are rubbish

48 posts, 13 voices

Pages: 1 2

Aug 5, 2019 6:18pm nemo (145) 2571 posts	I tried to open a zip file yesterday, but SparkFS said it was corrupt. I then did a search through the file system, and a few zips reported corruption. Very weird. Then I checked the dates – all changed the same day in April. Loaded one into Zap, and the end of the file is full of ascii – debugging info from something. So, four months ago I was running something that was sending debugging info to a file. Somehow, its file handle got closed. Thereafter, whenever SparkFS opened a zip, the program (whatever it was) spaffed debugging info onto and into various zips. Thanks, mystery program. Fortunately I use VRPC on top of Google Drive, and Drive does automatic versioning, so I was able to revert the ten zips that got killed (or I’d still be Quite Cross Indeed). File handles should not be reused. I’m thinking of writing something that will add a cycle number to FileCore’s apparently 8bit file handles to stop this kind of stupidity happening again. Some things treat file handles as bytes… in which case I can allow them through (optionally), fire warnings, whatever. But all those programs that correctly treat them as words would then be protected from overwriting the wrong file. (Just for the record, file handles are rubbish on all systems, and used to cause me lots of trouble inside printers too)

Aug 5, 2019 8:16pm Matthew Phillips (473) 721 posts	I’m thinking of writing something that will add a cycle number to FileCore’s apparently 8bit file handles to stop this kind of stupidity happening again. That sounds a very worthwhile enhancement.

Aug 6, 2019 6:46am Rick Murray (539) 13862 posts	Some things treat file handles as bytes… […] But all those programs that correctly treat them as words […] Why so complicated? The stupidity here is mostly in the logic where the “let’s find a free file handle” starts at 255 and counts downwards. It’s a lame-ass saving in that if the counter reaches zero then we’ve run out of handles, instead of bothering to actually count them. All it needs to fix a lot of these issues is to simply patch FileSwitch (not FileCore!) to decrement the stream count for each request, wrapping when it reaches zero. This will mean separately keeping track of how many handles were allocated as well as the number to start from the next time around. That way, much much less possibility of one program trampling on the handles of another. The code is in …Sources.FileSys.FileSwitch.s.StreamBits in the function AllocateStream: MOV r1, #MaxHandle ; Allocate handles from MaxHandle..1 ADR r2, UnallocatedStream ADR r4, streamtable 05 LDR r14, [r4, r1, LSL #2] ; Empty slot found ? TEQ r14, r2 BEQ %FT20 SUBS r1, r1, #1 BNE %BT05 ; If reached handle 0, naff up 10 addr r0, ErrorBlock_TooManyOpenFiles BL CopyError EXIT spaffed debugging info onto and into various zips Had that back in the Archimedes days. Exact same problem, exact same reason. That’s why I prefer a non-file-based debugging method. Also means the system wasn’t trying to write stuff to file if it crashes. DADebug is my friend. There’s also a tweaked version to periodically dump the info to the debug serial port, where it can be looked at on another machine if necessary.

Aug 7, 2019 1:16pm nemo (145) 2571 posts	Why so complicated? Your suggestion, if I understand correctly, simply cycles through 8bit file handles. So corruption can still happen after only 240ish OS_Finds (I think you’d be surprised how often file handles get allocated – try a grep). My suggestion means a single file handle would have to be reused 8 million times before corruption could occur. I prefer mine.

Aug 7, 2019 3:35pm David Feugey (2125) 2709 posts	Thanks, mystery program. Hum, it’s probably SparkFS itself. It corrupts most of my zip file when I try to modify them. All works OK… but after a reboot, it’s not the same.

Aug 7, 2019 5:04pm Steffen Huber (91) 1958 posts	Hum, it’s probably SparkFS itself. It corrupts most of my zip file when I try to modify them. All works OK… but after a reboot, it’s not the same. You seem to have a deeply-embedded-somewhere filing system problem on your RISC OS systems. SparkFS never ever corrupted an archive for me, not a Sparkive, not a ZIP, not a TAR. And I have used it very very extensively. I remember you also had problems with Sunfish. I never had a data corruption problem there either, and I’m shifting multi GiB of data around.

Aug 7, 2019 5:31pm Rick Murray (539) 13862 posts	Hum, it’s probably SparkFS itself. It’s probably something messing with the file handles. Do you have anything that may accidentally close all of the handles? So corruption can still happen after only 240ish OS_Finds (I think you’d be surprised how often file handles get allocated – try a grep). True, and a valid point. However, we are looking at the problem of: file is closed unknown to an application, different file is opened and allocated the same handle, first application continues to access the file unaware that it is something else. There are two factors at play – the first is that an application really shouldn’t keep handles open across a poll and expect it to be the same thing on return. And the second is the FileSwitch behaviour of allocating handles from the highest number it copes with, which makes it extremely likely that an open request will get the same handle as one of the last close requests. That’s not to say your eight million is a bad idea, it isn’t. I just think my blu-tack fix is a much quicker way to reduce (not eliminate) the problem. I prefer mine. I wonder how many things would stop working correctly due to the assumption of 255→1? Certainly every “show open files” program I have seen (including the two I wrote myself) work on the basis of starting the enumeration at 255, because that’s how it is defined and there is no API to say otherwise. A bit like the “max objects in a directory” problem. The only certain way to know you’re not going to git the 77 file limit is to create a temporary directory and try putting 78 files into it. You can’t assume otherwise (I hacked my RAMFS to use an old structure so I could test this on RO5) and as far as I can tell there is no obvious API to tell you the underlying device structure short of attempting to find and read the boot sector, which is taking the p… All works OK… but after a reboot, it’s not the same. That implies something else is corrupting the files. It seems okay because SparkFS’ cached copies of the data is intact. After a reboot, the file is damaged and SparkFS has nothing cached. What is the errant data in the files? That might give a clue as to what is going on. When I said above my files were corrupted, it was often archives and they contained bits of ARCbbs log files. But, importantly, this corruption also nuked some other files – there was a database, I forget which. What they all had in common (after some digging) was that they kept files open over Wimp polls. It was “fixed” by trapping CLOSE#0. I don’t know what was doing that – I suspect that something might have closed a handle, set the value to zero, then closed the handle (without checking if it was already zero). The strangest this, and probably half the reason for this problem… is that all three of them would react to file closure by silently reopening the file. Try it. Open an archive. Enumerate open files. Do a star-close. Enumerate open files to see everything has closed (except whatever FontManager wants open). Notice the archive contents window stays open and on the screen. Double click a file in the archive. Enumerate open files to see that the archive file has been reopened with a different file handle. Now, I said that they would reopen files if closed. Well the plan comes unstuck if the file handle in question is perceived as not having been closed because something else opened it first and got the same file handle as the one in use by other app. There is no way for an application to tell that all of the file handles have been closed. Ideally the Wimp should broadcast a notification message (because hooking into events and such is not something sensibly done from application code). But, the obvious response is ideally apps shouldn’t keep files open across times when they aren’t in control of the machine. And thus, we end up at the current situation where a bunch of dumb things equals a problem.

Aug 7, 2019 5:41pm Rick Murray (539) 13862 posts	You seem to have a deeply-embedded-somewhere filing system problem on your RISC OS systems. Just out of interest, David, have you tried DiscKnight? Is your logical structure intact? If your maps are awry, all sorts of chaos will ensue, progressively getting worse. Note that modern disc formats don’t need Compact, that happens automatically in the background when needed. And if the map is damaged…….. Trust me, DiscKnight is a tenner well spent – your disc structure has* to be good (there’s no maybe, either it is good or it is broken and will corrupt data) and RISC OS is too clueless to be able to fix itself. Hell, I don’t think *CheckMap has worked for anything other than a simple yes/no since the days when we all had floppy discs. https://armclub.org.uk/products/discknight/

Aug 7, 2019 5:45pm Frank de Bruijn (160) 228 posts	SparkFS never ever corrupted an archive for me, not a Sparkive, not a ZIP, not a TAR. Not my experience. https://www.freelists.org/post/davidpilling/SparkFS-and-application-directories-in-tars

Aug 8, 2019 3:37pm nemo (145) 2571 posts	David wondered Hum, it’s probably SparkFS itself No. The unwanted content was a load of hex octets that looked looked like exactly the sort of thing I might have been spraying into a debugging file. The interesting thing to ponder is whether the guilty code was using XOS_BPut (or equiv), or whether it just happened to not output anything while the handle was closed. Rick declared an application really shouldn’t keep handles open across a poll and expect it to be the same thing on return The only restriction about keeping files open during Wimp_Poll relates to Wimp_OpenTemplates. Quite apart from the novel restriction on Wimp programs, how’s “Rick’s Rule” going to work with command line utilities in TaskWindows? There is an argument for protecting a file handle opened in Task A from being touched by Task B, but that’s a more complicated thing to enforce than the cycle number wheeze. Certainly every “show open files” program I have seen (including the two I wrote myself) work on the basis of starting the [file handle] enumeration at 255, because that’s how it is defined and there is no API to say otherwise. …Apart from OS_FSControl,58 you mean. the “max objects in a directory” problem Yes, one would have hoped that OS_FSControl,59 would have obliged, but it isn’t even filing system specific. And yes, I have written a program that had to write two files to a directory in order to detect that it would instead have to delete one and create a continuation directory instead. CLOSE#0 I’m sure a `*SHUT` will have been the start of my troubles. There’s an argument for that becoming an interactive utility… there’s a thought.

Aug 8, 2019 4:20pm Rick Murray (539) 13862 posts	how’s “Rick’s Rule” going to work with command line utilities in TaskWindows? Haven’t TaskWindows always been something of a special case? relates to Wimp_OpenTemplates Because the Wimp, itself, only uses one file handle…and doesn’t give a handle to the caller (left over Arthur legacy or something?). Quite apart from the novel restriction on Wimp programs, I find it useful to make no assumptions about what happened in the duration that the app yielded to Wimp Poll – to the point where something that might have checked/used the time should read it again. While it is “unlikely”, there is technically nothing at all that states that the poll didn’t return from the command line three days later with the alphabet set to Eastern European, the keyboard set to Dvorak, and an entirely different SD card mounted. The only thing your app will see if this is whatever mode change and redraw messages the Wimp spaffs on return from the CLI. Apart from OS_FSControl,58 you mean. Brilliant stuff! Thanks, I missed that one, was still using OS_Args…

Aug 9, 2019 9:20am David Feugey (2125) 2709 posts	It’s probably something messing with the file handles. Do you have anything that may accidentally close all of the handles? With SparkFS, the corruption happens mostly when I reboot the OS or when I leave the SparkFS filesystem. So if something closes the file, it’s probably RISC OS itself :) Nota: it happens with big files. By big file, I mean 300 MB or more. I move/copy/modify about 60 GB of data every month on my main setup. And several thousand of little files everyday. I suspect that something might have closed a handle When you reboot? True: RISC OS closes the handles :) It’s as if SparkFS didn’t (have the time to) apply the changes to the file. Or forgets to do it. Nota: I use a big SparkFS cache: 32 MB. And if you want to laugh, add a file to a zip stored inside a zip, then leave the SparkFS filesystem, and check your zip. David, have you tried DiscKnight? Is your logical structure intact? Yes: no problem. The problem occurs with my very big zip files (made initially with 7Zip on a PC). For Sunfish it was on a clean install of RISC OS 5.22 on a Pandaboard. One of the problem was when I did edit network files with StrongED: sometimes, additional data was added at the end of the file. I did have the same problem with RISC OS 5.20 to 5.26 on various configurations (Pi 1, 2, 3, clean installs or complex ones). I can create or read archives without problems, but it’s always tricky to modify something in an existing zip. Most of the time it’s just data added at the end of the zip (7Zip will complain about it when checking the file). And sometimes it’s just data corruption on the new files added to the archive. Sometimes, it’s the archive itself that is corrupted.

Aug 9, 2019 9:36am nemo (145) 2571 posts	[TaskWindows] a special case Will you tell the file corruption or shall I? The point is that as far as the desktop is concerned, files shall be left open during polling. If anything it’s `SHUT` (OS_Find,0,0) that’s the special case – and should arguably be suppressed. Note that FontManager keeps its file handles open (lots), but silently retries if it finds they’ve closed. I haven’t investigated what happens if one of FontManager’s handles gets closed and then a FindFont gets the same handle again. Unpleasantness, I expect. [OpenTemplates] The (internal) template file handle could have been in per-task storage instead of global… but it’s difficult to argue a case for wanting to keep your Templates open. But as with most things in RO, it isn’t even policed properly – Wimp_Poll ought to return an error if the template file is still open. As is rather common, it works by coincidence. was still using OS_Args. My `Handles` uses OS_FSControl,58 if present, falling back to cycling through OS_Args,7 if enumeration is not implemented.

Aug 9, 2019 10:39am Martin Avison (27) 1498 posts	There are probably other solutions around, but if anyone is interested I have a small module StopClose0, which is designed to stop all files and devices being closed at once by program or by accidental command, which can cause some programs problems as disxcussed above. It will trap instances of the command *Close, and BASIC CLOSE#0 – indeed anything that results in a SWI OS_Find with r0 and r1=0, which indicates all files on the current filing system are to be closed. When OS_Find,0,0 is detected, a prompt will be issued to Allow or Fail: if the Desktop is running, this will be in an error box; if single-tasking it is a simple line prompt. If Allowed, all open files will be closed as ‘normal’. If Failed, an error will be raised and passed back to whatever invoked the close. which may help to identify it. On System CloseDown, the module is deactivated in case the closedown process does a Close All as a last resort, although I am not sure this is required. If/When there is agreement that failing these requests does not cause any problems, the prompts could be removed and they would always fail. Note that StopClose0 does not trap Shut and Shutdown commands – although they also close all files they are closed individually, not by OS_Find,0,0. It can probably be improved … but only if I receive feedback. The module is available from my website

Aug 9, 2019 4:33pm Martin Avison (27) 1498 posts	nemo wrote: If anything it’s SHUT (OS_Find,0,0) that’s the special case – and should arguably be suppressed. From my notes Shut does not use OS_Find,0,0 but loops round closing individual file handles. But I agree there is a case for supressing it! It seems a very dangerous command today. Is there any valid case for using it?

Aug 9, 2019 7:06pm Dave Higton (1515) 3543 posts	an application really shouldn’t keep handles open across a poll and expect it to be the same thing on return. Sorry, Rick, that’s rubbish. I wrote a pair of backup and restore apps that concatenate lost of source files into a small number of large files. The result is very fast, and multi-tasks with silky smoothness. A backup takes a significant time – a couple of hours is an entirely reasonable time if there are thousands of source files totalling some gigabytes. The only reasonable way is to keep a file handle open until its maximum size (according to the app’s rules) is reached. To close it, reopen it, and reposition to the full extent, would be crazy – it would take many times as long, and would have a good go at wearing out any flash destination medium. If some app closes one or more file handles that it isn’t entitled to, that’s clearly a bug and should be fixed. Programming apps defensively around such bugs would be ludicrous. I would support attempts to prevent *Shut or similar from working.

Aug 10, 2019 8:43am Grahame Parish (436) 481 posts	Is there a way to implement a scheme where each task has its own ‘local’ file handles that are not seen or shared outside of the task? So if a task does shut all open files it will only be for the handles it owns.

Aug 10, 2019 12:34pm nemo (145) 2571 posts	Martin delivered The module is available Thanks! From my notes SHUT… Yeah `Shut` does FSControl,22, which is also implemented inside FileSwitch, so has direct access to the open files. `Shutdown` does the same thing. Perhaps your module should trap that just like OS_Find,0? :-D Is there any valid case for using [Shut]? Well, in my case it’s usually “Something has just exploded leaving multiple files open…” I used to have a library obey file that closed all files, reinitialised all filing system modules in the correct order and basically rebooted the whole filing system for when I’d broken something at a deep level. It was only one letter different from `Shut`. Grahame asked Is there a way… I’d touched upon that earlier. It’s difficult to police because it’s very difficult to clearly define ownership: Task calls OS_Find* This is straightforward – the task owns the handle. I’m not aware of any protocols that require Task A to pass a file handle to Task B, but if there were one, a Filter could detect the explicit sharing of ownership. Task calls module which calls OS_Find (eg FontManager) This is much more problematic – whether the handle belongs only to this one task is implementation defined. In the case of the FontManager it’s definitely shared. In the case of SysLog it isn’t. So, as I said, as the problem in my case was caused by illiberal use of `*Shut`, that’s the easiest thing to control. if a task does shut all open files This is likely to be via OS_Find,0 and may well be unintentional (trying to close a file that wasn’t actually opened). Martin’s module protects against that.

Aug 10, 2019 2:46pm Stuart Swales (1481) 351 posts	Well, in my case it’s usually “Something has just exploded leaving multiple files open…” … It was only one letter different from Shut. bq. Phut ? ;-)

Aug 10, 2019 2:46pm nemo (145) 2571 posts	Martin, I’ve just had a look at your module. Task names returned from TaskManager are only guaranteed to be ctrl-terminated, not null-terminated. TaskManager currently truncates task names to 32 bytes including terminator… but that’s an implementation detail not a contract. I’d expect that to increase significantly once Unicode is supported properly (because worst-case that’s only a few characters!). Note that the German Word of the Year, 1999, is Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz, which is 65 bytes long in UTF-8. And whereas the capital of England in English is six bytes long, Ulaanbataar in Mongolian is 39 bytes: ᠤᠯᠠᠭᠠᠨᠪᠠᠭᠠᠲᠤᠷ. Where was I? Oh yes, task names are restricted by the 400C2 TaskInitialise message that the Wimp broadcasts. So they must be less than 228 bytes long. Naturally there’s no checking. Of course there isn’t. This is what RO4 does if you try to use a 230 byte long task name for example: So don’t do the equivalent of a strcpy unless your buffer is at least 228 bytes, or you’ll get similarly disappointing results. You’re also using a lot of non-X SWIs in a vector handler, which is… brave, especially when using a 12 byte buffer to read the current filing system name – there’s no such restriction I know of (I just created an FS with a 20 chr name, for example). Also OS_Confirm would like to remind readers that reports of its death are exaggerated. But bravo. This could easily handle FSControl,22 too.

Aug 10, 2019 3:03pm David Feugey (2125) 2709 posts	Nota: I never have these problems under the 26bit versions of RISC OS. On the other hand, I did not manipulate files of these sizes, and the uptime of my RISC OS 4 system was in hours and not days.

Aug 10, 2019 3:16pm nemo (145) 2571 posts	the uptime of my RISC OS 4 system Using VRPC, my uptime can be weeks at a time. I often forget it’s running.

Aug 10, 2019 4:05pm David Feugey (2125) 2709 posts	I mean, the uptime while using it with many different and sometimes buggy apps. Here it’s 1 month, since I reboot my main system after each monthly backup. ZPP and other new features certainly help here. RPCEmu is quite stable with RO4, if you don’t launch unknown apps, but I can say the same for the RPC :)

Aug 10, 2019 10:02pm Martin Avison (27) 1498 posts	Using VRPC, my uptime can be weeks at a time. I am happy for you! Do you ever hibernate your machine? Here VRPC rarely survives a hibernation (currently on Win10 Pro64bit, but it has been the case for years). Even more annoying than the lack of proper pointer handling in window mode. re StopClose0 – Thanks to nemo for constructive feedback, which I will look into.

Aug 10, 2019 10:27pm nemo (145) 2571 posts	Do you ever hibernate your machine? No, it’s a desktop. But when I used to use a laptop I think it was fine… but don’t quote me. proper pointer handling <grits teeth> it is… fortunate… that I tend to use it full screen anyway. BUT, its windowed pointer handling is beyond irritating. BY FAR the worst sin is that if the VRPC window gets the focus (because some other Windows window has closed) the RO pointer starts moving (with the mouse, but separate from the Windows ptr) and then, if you click in some other Windows window it remembers that you pressed the mouse button even though you didn’t click in VRPC and it has lost the focus. THEN when (perhaps days later) you deliberately change back to VRPC it then processes the focus gain as if you’ve clicked where the RO pointer happens to be… closing a window, or clicking on “Yes, do delete the contents of my disc” or… you get the picture. THAT is shoddy. RPCEmu has far better mouse handling.

Pages: 1 2

Reply

To post replies, please first log in.

Forums → General →

File handles are rubbish

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options