Odd bug when renaming/removing files
Andrew Rawnsley (492) 1445 posts |
Hi folks, I have just spent about 2 hours tracking a very odd fault, and I just wanted to sanity check things. Basically, my code is designed to work on the <Choices$Write>.Boot.PreDesk.Configure.Monitor file. It starts by making a file called Monitor2 in the same folder, and reads from Monitor, and writes modified data to Monitor2. So far so good. After various checks and double checks, I can confirm Monitor2 is repeatably correct, both on VA and RISC OS 5.28/5.29. In the next step, I issue three consecutive wimp_starttask() commands (but could be os_cli() if you prefer), with no wimppolls between – just three C lines. The first *remove MonitorOld (remove deletes without erroring if no file) Hopefully this logic makes sense. This works absolutely fine on VRPC. The correct date ends up in a file called Monitor. On RISC OS 5.28/29 (Pi4, SDFS), however, things go very squiffy. The first line of new Monitor is correct, but the second line is from the original file. To confirm, Monitor and Monitor2 are both correct before the renames occur, but the resulting “Monitor” file contains one good line, and one faulty line. (Basically all this is to cope with situations where EDID0 is not created due to headless boot, causing PreDesk to crash out and error-lock the desktop – another bug!). We really need more sanity checks putting on files created by !Configure in PreDesk. After 2 hours, my solution is to Filer_run an Obey file instead which issues exactly the same commands in an Obey file (copy/pasted). This works correctly on VRPC and Pi4 5.28/29. Any thoughts? I’m very tired now! |
Stuart Painting (5389) 714 posts |
I had a quick look at Wimp_StartTask and it appears to operate in a fire-and-forget fashion. Is there a chance that all three Filer operations would happen concurrently instead of consecutively? |
Martin Avison (27) 1494 posts |
I also use the technique of creating a new file then renaming, as it is fail-safe. But I have never created 3 wimp tasks to do it – I cannot explain the file corruption, but I wonder if they execute ok and in the desired order. I use code like… as the program retains control and can check return codes/errors.
|
Steve Fryatt (216) 2105 posts |
I doubt concurrently (this is RISC OS, after all), but I wonder – without checking the documentation – if it is actually defined anywhere in what order the queued commands will be executed? If this is called from code and not an Obey file, Martin’s approach feels a lot safer. |
Jon Abbott (1421) 2651 posts |
An interesting and potentially worrying bug. If that C function uses Wimp_StartTask it should not run them concurrently, they should each run to completion in order. Can it be Repro’d in assembler using Wimp_StartTask? ie is it a C quirk, or potentially a FileCore issue? |
Colin (478) 2433 posts |
You would have to post your code so we can play with it. I solved the edid problem by changing configure.monitor to
Configure.EDID0 is just a copy of Resources.ScreenMode.Monitors.EDID0 that was there when the hdmi cable was connected. |
Rick Murray (539) 13850 posts |
The bug here is calling Wimp_StartTask to do something like this – why not XOS_CLI or the SWIs provided? |
Andrew Rawnsley (492) 1445 posts |
Just to be clear, wimp_starttask was one attempt to resolve the problem. I started with os_cli(). Perhaps system() would have been best, but I haven’t had much luck with that in the past. Yes, in hindsight, calling swis would be better. However, should there really be much/any difference in calling three *commands in successive lines of code? Certainly one wouldn’t expect file corruption as a result (ie. half of one file, half the other). I chose to use *remove because it doesn’t error if the file isn’t there. code was literally: os_cli(“remove [backup]”); Each file was two lines long, pretty much exactly like Colin’s example! Essentially the code takes an existing monitor config and re-writes it akin to Colin’s example. Ultimately – lesson learned, use SWIs not *commands. But worrying that I ended up with file corruption on SDFS 5.29, when VPRC (HostFS/4.39) worked exactly as I expected. Probably just luck I guess. |
Andrew Rawnsley (492) 1445 posts |
Just to be clear, wimp_starttask was one attempt to resolve the problem. I started with os_cli(). Perhaps system() would have been best, but I haven’t had much luck with that in the past. Yes, in hindsight, calling swis would be better. However, should there really be much/any difference in calling three *commands in successive lines of code? Certainly one wouldn’t expect file corruption as a result (ie. half of one file, half the other). I chose to use *remove because it doesn’t error if the file isn’t there. code was literally: os_cli(“remove [backup]”); Each file was two lines long, pretty much exactly like Colin’s example! Essentially the code takes an existing monitor config and re-writes it akin to Colin’s example. Ultimately – lesson learned, use SWIs not *commands. But worrying that I ended up with file corruption on SDFS 5.29, when VPRC (HostFS/4.39) worked exactly as I expected. Probably just luck I guess. os_cli() is part of os.h/os.c which is part RISC_OSLib IIRC. It was pointed out to me that the standard C library (stdio.h) includes remove() and rename() functions. Would these be better choices? |
Alan Adams (2486) 1149 posts |
I’m struggling to see a mechanism for the sort of corruption you are seeing. None of those commands should be touching the file contents, only the directory entries. The only way I can see for that issue to occur would be if both files were long enough to be held on disc in multiple fragments, and the directory manipulation failed to keep the fragments together. Now someone who knows the file system internals might be able to say whether even that mechanism is feasible. Thinks: you did say SDFS, so I assume this is on an SD card? Could the card’s re-mapping be involved here? |
Rick Murray (539) 13850 posts |
Can you post a snippet of code, with the actual filenames as given? I’ll see if it does the same thing here. And, as Alan says, messing with the names of things shouldn’t alter their contents. Are you sure that:
A brief look at the code suggests that StartTask ought to be deterministic in that you are paged out, and you will be paged back in when either the command exits (there will be no handle) or the command starts itself up as a task and polls (you’ll get the task’s handle). It’s still weird, mind you, when calling OS_CLI would do the job…
That’s not a surprise. system() is a legacy function (part of the ‘standard library’) that works by literally relocating your program up high in your memory slot, calling the command, then putting your application back…assuming it or its workspace wasn’t trashed by the program being called.
Uh… You do know what the ‘X’ in X-form SWIs means, right? If you’re calling it correctly, then it doesn’t matter if *Wipe errors, because the error will be passed to you, and you’re free to ignore it.
I’m not familiar with how os_cli() works, as I use DeskLib, however I’d be quite surprised if it was happy to allow errors to happen. I’d have expected it to return an error pointer or NULL if no error. Take a look at _kernel_oscli() in kernel.h.
It’s an EDID. They’re a couple of kilobytes tops. Plus, I’d have thought if rename was that bugged that it would have shown up long before now.
I concur. This doesn’t make sense. You’re changing the meta data, not the file data. If you can’t post enough code to reproduce, or don’t have time with the meeting coming up, could you mail me a copy of this program and a quick “do this then this” so I know what to do, and I’ll run it on my Pi2 (also SDFS) and report back what happens. |
Rick Murray (539) 13850 posts |
Remembered I have the DDE archive on my phone. The definition is: os_error *os_cli(char *cmd); Shouldn’t it suffice to call that, ignoring what it returns back, in the case of the file not existing? As I said, I use DeskLib, so that would be: if File_Exists(filename) File_Delete(filename); Hmm, there’s no File_Rename() or File_Copy(), maybe these ought to be added? ;-) |
Jon Abbott (1421) 2651 posts |
From my personal experience with ADFFS, I learned *commands should never be used inline in assembler as all manor of corruption can occur. Unrelated, but many, many years ago I raised the issue of there being no SWI equivalent for *DISMOUNT; my personal take is that OS *commands should only ever be fronts to underlying SWI to avoid inline CLI calls. If what you’ve found is consistently reproducible, there’s a potentially serious bug lurking somewhere. If you can strip the code down to just the relevent commands and it still occurs, please post a Repro. It would also be worth testing if it occurs on another machine, another SD and on SATA or USB to rule out SDFS. |
Andrew Rawnsley (492) 1445 posts |
For clarity, the use of star-commands is laziness on my part, really. I know them more-or-less off by heart, use them in obey files, and Remove is easier to remember than the various ~ flags for star-wipe. Conversely, SWI versions require me to look them up on rool docs, then set r0 onwards and call the swi – slightly more work. Lesson learned though. Ironically, most of the time I do use the swis, but figured there’d be no material difference here. BTW, I have a soft spot for Remove – I just like that it exists (handy in obey files) so tend to use it if it makes sense to do so. If a command is there and does precisely what you want, why not use it. But it is the rename pair that seem to cause the trouble. The filenames in question are <Choices$Write>.Boot.PreDesk.Configure.Monitor and <Choices$Write>.Boot.PreDesk.Configure.Monitor2 Rick – I’ll email you the code. Basically it works fine up til the star-commands are issued. #if FALSE the star-commands out, Monitor and Monitor2 files exist exactly as you’d expect. After the renames, the resulting Monitor file (previously Monitor2) has a materially different second line, notably the second line that was in Monitor. At this point it is academic, as time is so pressing before the show, but as Jon said, it felt like lurking horror. |
Rick Murray (539) 13850 posts |
I’ll look out for it and see what it does on my system.
If two rename operations causes file corruption, that would be an understatement.
If it is reproducible in my machine, I’ll try to narrow it down too. [note to self – do SD backup first!] |
Rick Murray (539) 13850 posts |
There are also lurking gotchas like the Quick option in *Copy, which is not something you want to call from a program… |