Iyonix - Disc Error 20
Jon Abbott (1421) 2651 posts |
They are? Personally I would lower the timeout to a sensible time and reset it after each sector, increasing the timeout for larger transfers make no logical sense (to me at any rate!) When you boot a machine that’s configured for a HD, but it’s unplugged, the 30 second timeout seems a little excessive as it takes several minutes for it to tell you there’s a HD fault. I did spot the 30sec/10sec discrepancy, but figured it’s probably expecting the timeout to be set by the user and that 10 seconds was a failsafe. Is the timeout that’s triggering Disc Error 20 going to be patched at some point, either increased or removed to resolve the Iyonix issue? |
Jeffrey Lee (213) 6048 posts |
Timeouts are always a bit tricky when things get broken down into sub-operations. I think it’s the discrepancy between the requirements of real-time systems (“I need this done by this time or else”) and detecting hardware/network faults (“hmm, this device hasn’t responded, maybe it’s broken?”). The driver should be the component that manages hardware fault timeouts (because – in theory – it should know the maximum response times for any operation), while the client should dictate any real-time timeouts. But the source/documentation doesn’t always make it clear what type of timeout is being used in any given situation.
That’s another problem – if the user/client can specify a timeout, how often is that used? How often is it sensible? A program that uses 1-second timeouts because it works on the author’s SSD isn’t going to work very well on a HDD that spins down to save power.
Yes, I’m planning on taking a look at it sometime in the next few weeks. |
Jon Abbott (1421) 2651 posts |
I was hoping to track down the Disc Error 23 issue that some devices have when reading by now, but haven’t managed to find a device that fails yet. When I get back from the London Show, I’ll try swapping the drives around in all my machines to see if I can provoke it. |
Rick Murray (539) 13840 posts |
(my emphasis) Many years ago I had an HCCS harddisc mini podule inside my A3000. It looked for all the world like the IDE interface was half of a 6522. ;-) |
Steve Pampling (1551) 8170 posts |
There’s a rather large number of systems that work on the basis of timeout = 0 being “never timeout” |
Jon Abbott (1421) 2651 posts |
Despite testing 20+ devices today, I’ve been unable to reproduce Disc Error 23 to do any diagnosis. Ironically the Iyonix was randomly generating it at power on at the London Show, which is sods law as it’s now a brick. The device works fine in a RiscPC so its definitely shouting “timing issue” to me. |
Jeffrey Lee (213) 6048 posts |
I can confirm that I can reproduce disc error 20 on my RiscPC when using an SD card. It doesn’t happen very often; I think the key is to perform lots of small FS operations (thus higher ops/sec) rather than focusing on large ops. But I do at least have a repro. I’ve had it once when copying a 600MB, ~18000 file folder (near the start of the op), and once when deleting the folder, after a successful copy! I’ve also had a closer look at the ADFS code and the ATA spec. As you’ve already stated, when the drive asserts DRQ for the first sector of a write op there’s no interrupt generated. The only interrupt that should happen during this time is if the drive fails the request with an error. So that means we don’t have to worry much about drive interrupts occurring while waiting for DRQ. Plus the structure of the ADFS code means it should be pretty straightforward to offload the DRQ wait + initial sector xfer to the TickerV routine. So in a day or two I should hopefully have that code ready for testing. |
Jon Abbott (1421) 2651 posts |
Good news. Is it possible to also build ADFS for older OS versions or produce a patcher easily? Just pondering the best way to retro fit the changes back to 3.×. I’ve not yet managed to reproduce Disc Error 23. From various chats this week, it sounds like possibly a specific SD converter chip (ST368 – don’t currently have one) and a specific SD on an A4000/A5000 might repro it. I took the A5000 apart yesterday and will start testing later in the week with what I have to hand. I’ve certainly had no luck producing it on a RiscPC running 3.70. |
Jeffrey Lee (213) 6048 posts |
I’ll definitely be looking into some kind of solution for older machines. I’ll have to look into the pros and cons of hot-patching vs. patching & reloading vs. loading the latest version of the module. I expect that ROM patching (i.e. hot-patching) will be possible on 3.5+. On MEMC this won’t be possible (since you can’t remap RAM to be ontop of ROM), so it would have to be patch & reload. And in theory the current ADFS can be built for older machines, but I doubt it’s been tested at all (especially Arc-era features like ST506 support) There have also been various problems with ADFSBuffers, so care needs to be taken to not interfere with any patchers which are designed to deal with those. Ideally we’ll be able to produce one patcher (e.g. a newer ADFSUtils / ROMPatch) which can deal with everything. |
Martin Avison (27) 1494 posts |
Talking of ADFS buffers, is there any reason why the size is limited to 255 KB? With modern memory sizes, would there be any advantage in increasing it? The same question also applies to the ADFS Directory cache, and to other FS buffers. It seems the limits were set waaay back when 256KB was big! |
Chris Evans (457) 1614 posts |
Possibly related: Also some IDE CF/SD interface/network card combinations 1 stop networking from working! 1 It may have been all combinations I’m not certain now. |
Jon Abbott (1421) 2651 posts |
I suspect it’s the same issue as on the RiscPC. I’m going to be testing the Iyonix as soon as I get one of mine working, the flash programmer turned up today so fingers crossed, I can get them reflashed and booting. The HForm thread goes into more detail about the issue.
Never heard of that, can you detail Repro steps? Make/model of adapter and card? |
Colin Ferris (399) 1814 posts |
Just out of interest – do you have to remove the ROM – then reflash it – solder it back in – or use jumper leads across to the Iyonix motherboard? |
Jeffrey Lee (213) 6048 posts |
Today’s Iyonix + IOMD ROMs should contain my disc error 20 fix. Next thing on my todo list is an updated ROMpatch for 3.5-4.??, and then after that hopefully something for earlier machines. |
Jon Abbott (1421) 2651 posts |
I’ll grab them tomorrow evening and at least give the IOMD build a test, whilst I’m waiting for the Iyonix flash to arrive. |
Chris Evans (457) 1614 posts |
IIRC all (six? mixed makes/models) of Compact flash cards we had. Atomwide NICs may have worked to some extent but were very unreliable. Not a networking card clash but |
Jon Abbott (1421) 2651 posts |
Ahh…that might explain this issue then – I’ll pull the CF from the machine and see if the NIC starts working again. Would never have guessed that in a million years!
That sounds like Disc Error 20, so possibly -IOCS16 isn’t pulled low on the adapter. No SD adapters in the devices I’m testing, only CF and SATA. From discussions elsewhere, it does sound like SD is the primary source for Disc Error 23.
Yes, all tests I’ve done to date have been on a RiscPC and I’m about to start testing on an A5000. I’ll move onto the Iyonix when I get one of them working. |
Jeffrey Lee (213) 6048 posts |
Over here is an updated ROMPatch for RISC OS 3.50/3.60/3.70/3.71/4.02 which includes the disc error 20 fix. If suitably brave people can give it a test then I’ll get the changes checked into CVS. The usual installation place is in !Boot.Choices.Boot so that it will be loaded during PreDesk. It’s probably a good idea to backup your current ROMPatch before overwriting it, just in case this new one doesn’t work. Note that it also includes CallASWI + CLib since the ROMPatch app was built as APCS-32. This fits with the current way ROMPatch is distributed (as part of the ROOL disc image, which will load CallASWI + 32bit CLib at startup). If people need to run older CLib versions then it should be possible to RMKill the softloaded CLib once the patches have been loaded. 3.70 I’ve been able to test on my RiscPC (with CF card), the other OS versions I’ve tested under RPCEmu (including triggering the main code paths of the patch), so I’m fairly sure that they won’t corrupt your hard disc, but I’m not making any promises! Feedback from RISC OS 4.02 users would be appreciated, since I know there are a few different ROM versions floating around which may confuse ROMPatch. E.g. the “virtually free” version of 4.02 seems to have some patches already applied, which will cause ROMPatch to refuse to do anything. I’m not sure if the same is true with the “easy upgrade” (i.e. softload) version. So possibly this version of ROMPatch will only work on physical 4.02 ROM chips. If necessary I can make ROMPatch smarter so that it can apply each patch on a conditional basis. You can easily check if ROMPatch loaded correctly by looking for a “ROM patches X.XX/Y” dynamic area (e.g. “ROM patches 4.02/6”) |
Chris Evans (457) 1614 posts |
A quick test on OEM (silver, no label) 512MB CF Card as slave to Fujitsu 1GB Master: Cannot see shares at all OEM (silver, no label) 512MB CF card as Master, can initially see shares but cannot connect to them. Shares then slowly drop out The results are the same with or without Jeffrey’s ROM patch. |
Jon Abbott (1421) 2651 posts |
If the drive isn’t being accessed when the NIC is “failing”, it could be a hardware issue. Noise possibly on the power rail affecting the NIC? I can’t think of any way a specific IDE device could knock out the NIC as they’re unrelated in software terms. If IRQ’s were being lost perhaps, but if you don’t need to access the drive to kill the NIC, there won’t be any IRQ’s being generated. That said, CF’s are really slow at responding so I suppose there’s a possibility to affect the network stack if IRQ’s are off at any stage in the ADFS read/write cycle. I’m not sure that’s the case though, as any long delays are handled via TickerV. I’ve not had a chance to pull the CF’s in my machine to see if it solves the NIC issue, I’m planning on resuming testing tomorrow. |
Doug Webb (190) 1180 posts |
Also what module versions are in the NIC /RiscPC disc image. May be worth trying the latest ROOL NIC drivers from the download pages just to cover that one off as well. |
Jon Abbott (1421) 2651 posts |
If my NIC issue is this problem, then I’ve already tried the latest NIC drivers to no avail. |
Doug Webb (190) 1180 posts |
Jon, understand that and it does sound as if it is not going to help but if Chris does it and still has issues then we have a second validation and it then comes back to what else are the adaptors/CF’s causing. |
Jon Abbott (1421) 2651 posts |
Jeffrey, this looks to be specific to pre RISCOS 3.5 machines and is related to large transfers. I’ve reproduced the issue on an A5000 with a CF and if I cap the transfers to 1024 bytes, the error doesn’t occur. Do you want to pursue a patch including your Disc Error 20 fix? As there’s no source for ADFS 2.67 (I think that was the last ADFS build pre RISCOS 3.5) its going to entail poking around in a disassembly and creating a compilable BASIC version of it, so we can recompile it if extensive changes are required. |
Jeffrey Lee (213) 6048 posts |
Yes, creating a patch for that would be nice (maybe an optional one though – I’d imagine that capping all reads to 2 sectors would slow things down a bit). I’m not sure yet how I want to tackle patching pre-3.5 machines. I did have a go at fetching and building the oldest ADFS sources from CVS, but that looked like it was going to be a fool’s errand (too much IOMD code has crept in, so even if the code was fixed to build with the latest build environment there’d still be a lot of work to go through and rip out IOMD bits and restore any lost Arc functionality). The disc error 20 ROMPatch is fairly simple (you just need to stare at the disassembly of the module for a few minutes to locate a few key patch points), so I was thinking of going down the route of creating a new ADFSUtils module which will:
I had a brief look at the copy of ADFSUtils that’s in the 3.11 ROM, it looks like it hangs off of the floppy IRQ vector and peeks & pokes ADFS’s workspace to work around the bug(s). So that’s another bunch of offsets to keep track of. If disc error 23 is just a case of truncating read ops to a certain length then it should be relatively straightforward to slot that in as well, since ADFS already contains code to truncate to 256 sectors (since that’s the limit for the IDE op) |