RISCOS' IRQ handler
Pages: 1 2
Jon Abbott (1421) 2651 posts |
Where is the IRQ handler in the RISCOS source that hooks onto the IRQ hardware vector? I’ve searched, but just can’t seem to find it. |
Rick Murray (539) 13851 posts |
https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources/Kernel/s/NewIRQs?rev=4.10.2.26 ? |
Jon Abbott (1421) 2651 posts |
Thanks Rick Is it normal for the R13_irq to be reset when the IRQ handler exits? This is the behaviour I’m seeing on the Pi when I shim the IRQ vector. Repro:
NOTE: This will stiff your machine, illustration purposes only! |
Jon Abbott (1421) 2651 posts |
The code example above works on the latest IOMD RO5.21 build, on the Pi however R13_irq is incorrect when the IRQ handler exits and instantly locks the machine. Does the Pi have a different IRQ handler to the one linked above? EDIT: It also works on the Iyonix, so it’s looking like the issue is platform specific. |
Jon Abbott (1421) 2651 posts |
I’ve tracked the problem down to the UnthreadV vector call. The RTSupport module hangs off UnthreadV and if you RMKill RTSupport, R13_irq isn’t corrupted and the above Repro code works as expected. Incidentally, when you RMKill RTSupport it throws an “SWI missing” error. |
Jeffrey Lee (213) 6048 posts |
RTSupport makes various assumptions about the layout of the IRQ stack (e.g. stack frame layout the kernel IRQ handler uses), and it will deliberately reset the stack pointer when it’s about to return to the foreground thread. I’m not really sure if it would be possible to change that behaviour – RTSupport’s thread scheduling code is pretty hairy. If what you’re interested in doing is getting some code to run after an IRQ (but without having to wait for a callback) then you’ll probably find it a lot easier to work with RTSupport rather than work against it. The entire point of the module is to allow code to run at a priority level that’s lower than interrupts but higher than any foreground tasks (user code, SWIs, callbacks, etc). See the docs for more info. Note that the module isn’t included on all systems (it relies on SYS mode, so will only work on ARMv4+), so you’ll still need a fallback path using OS_ClaimProcessorVector or similar.
Chances are that’s from something trying to use RTSupport – the Pi makes heavy use of it. In fact I’m quite surprised the system continued running well enough for you to be able to test your code! |
Jon Abbott (1421) 2651 posts |
I’ve been looking through the source code today and it certainly looks “hairy” as you put it. Resetting R13_irq when it exits to foreground it a real PITA, although at least I know it’s by design and can look for a workaround. What I’m trying to do is analyse what address was interrupted by an IRQ, before its actioned, whilst the JIT is running. This works on all platforms except the Pi, although by the sound of it, that may be because the other platforms don’t use RTSupport and the Pi does. One option is to hang of IRQ1V and assume that R14 is at R13+8*4 – the stack being IRQsema, R1, R2, R3, R11, R12, SPSR, R0, R14 Is IRQ1V still allowed, or has it been deprecated? I suppose I could also use my own local stack, which may be better for future compatibility. One question. If RTSupport is making assumptions about the IRQ stack, isn’t it likely to trash the machine if an IRQ handler stacks registers, enters SVC and then re-enables interrupts? I’ll have to test this tomorrow by seeing if killing RTSupport fixes the IRQ related hangs I’m seeing on the Pi. What started this investigation was random hanging when enabling IRQ’s during a frame blit from DA2 to the GPU. Whichever way I go about it, the calling routing is always hanging off the IRQ vector, so IRQ re-entrancy becomes a problem. I’m assuming the IRQ vector is still re-entrant and that switching to SVC and enabling IRQ’s is the correct thing to do for IRQ handlers that take time. I can’t use a callback, as they’re not guaranteed to happen on time and the blit has to happen on VSync – otherwise the game could be part way through a frame update. Now that I know about RTSupport, I could use it as a 50Hz ticker to trigger the VSync and blit. It sounds a lot more reliable than CallBacks that I’m using currently. I just need to fix the machine locks when I re-enable IRQ’s, which as you say, working with it, instead of against may resolve. I’ll code it up and give it a try.
Yes, just figured that myself when I found the documentation for it. |
Jeffrey Lee (213) 6048 posts |
Would you be able to do the analysis at the start of your IRQ handler, and then tail-call the OS’s handler? That way you won’t have to worry about the stack being clobbered.
The only way I can see of replacing it is to directly access a hardcoded location in kernel workspace. So although replacing it would work, I think it’s safe to say it’s not a very future-proof thing to do.
That situation should be fine. RTSupport uses ‘IRQsema’ to work out whether nested IRQs are occurring – the IRQsema value in kernel workspace and in the IRQ stack frames form a linked list, with the number of entries in the list being the same as the level of IRQ nesting.
Correct. |
Jon Abbott (1421) 2651 posts |
A few noob questions relating to RTSupport: 1. Is it available on RO5 IOMD? It makes use of System mode, so I guess not Can GraphicsV 1 be called in SYS mode? Is it re-entrant? I’ve coded up a test that calls GraphicsV 50 times a second, but it just produces an “Abort on data transfer” on every call. What am I doing wrong?
|
Jeffrey Lee (213) 6048 posts |
At the moment it’s not included in the ROM, partly because we don’t need it yet and partly due to the SYS mode issue.
Internally the code makes use of special pollword pointers for some situations. But as far as the user is concerned the only important thing is whether their pollword is non-zero or not. AIUI a routine which is currently idle will only be considered for running if its pollword is non-zero (or its timeout time has passed). Once it’s started running you can safely clear the pollword, and the routine will continue to run until it explicitly yields (calls RT_Yield/RT_TimedYield or uses R14 to return back to the scheduler). If a routine yields but its pollword is still set then you can assume that the routine will soon start again (although another routine at the same or higher priority might run first, or in extreme situations RTSupport might let the foreground thread run for a while if the RTSupport threads have been using too much CPU time). If a threads pollword is non-zero at the time the thread is created (and its priority is high enough) then RT_Register will start running the thread right away – so this has implications for code which needs to make a note of the thread handle that’s returned. One important thing to bear in mind is that RTSupport has no concept of an idle thread. So if all threads (including the foreground) become blocked waiting for their pollword to become non-zero then the behaviour is a bit funny. I’m not sure exactly what the behaviour is, but I think it will end up returning to the foreground (which would have had to have explicitly yielded via RT_Yield). This basically means that if you want to yield until a certain state is reached (e.g. you’re implementing a mutex and you want to yield until the mutex enters a claimable state), you need to call RT_Yield in a loop, with interrupts enabled. The loop is required to make sure you cope with any unexpected return from RT_Yield (e.g. if all other threads are blocked), and interrupts need to be enabled to make sure the entire system won’t grind to a halt (e.g. if a certain IRQ handler is responsible for setting the pollwords of some threads, or if a thread is waiting on a timeout). So for locking a mutex you’ll basically end up with something similar to this
Yes.
Yes. RTSupport doesn’t touch your SYS mode stack, so you can use r13_sys for anything you want.
There’s no such thing as CPSR_svc! The SPSR won’t be preserved, if that’s what you meant.
Yes.
Timeouts are processed from a routine running off of TickerV. So if you specify a time which is in the past (as long as its less than 2^31 centiseconds ago) then the timeout will only be considered as expired once the next timer interrupt occurs.
Yes to both (although I haven’t actually tried calling it from an RTSupport routine before, so it’s possible there’s a bug lurking there somewhere)
I haven’t tried running the code yet, but two problems I can spot are that R0 is wrong when returning from the routine (should be %100 to specify a timeout), and you never clear the pollword value to zero (not that that should cause it to crash). |
Jon Abbott (1421) 2651 posts |
I mean CPSR in SVC. If you switch into SVC you’re liable to corrupt the SVC flags if they’re not preserved when you switch back. As RTSupport switches in/out of SVC to alter R13_svc, I just wondered if it also preserved CPSR_svc as well.
I’ve corrected the R0 error, which has stopped the Aborts, but it still locks the machine. If I understand correctly, I should set the pollword during the RT_Register to >0 and immediately clear it on entry to stop re-entry. I then leave it at 0 and return Monotonic+2 in R2 to force another call in 2cs. eg.
|
Jon Abbott (1421) 2651 posts |
I’ve been pondering on the issue raised by RTSupport overnight. 1. Should RTSupport not check SPSR before it returns to the foreground and only flatten R13_irq if it’s returning to a mode other than IRQ? Why does it even need to flatten the IRQ stack? This will almost certainly break any future claimants of the IRQ hardware vector 2. Sitting where it does in the IRQ chain could cause IRQ and system latency issues when there’s a high rate of IRQ. Would this module not have a detrimental effect on USB Isochronous or Bulk transfer for example? I’m guessing they’re a high priority RTSupport registrant already, so possibly not. 3. How does this module fit in with HAL_Timers? Are the Timer IRQ’s a high priority RTSupport registrant to minimize latency? As the module exists it should really be the default method for anything that’s IRQ related – but it’s not widely advertised. Should the HAL_Timers API and IRQ hardware vector documentation not link to RTSupport and provide examples of how to use it in their context? The IRQ vector should certainly mention that RTSupport flattens the stack on exit. The other issue I see is that as it’s not possible to backport it to IOMD, it’s not a RO5 wide API and consequently anyone coding IRQ driven code needs to be aware that beyond IOMD you should ideally be using RTSupport for IRQ code and on IOMD hook into the IRQ vector. It looks like a very useful API though. I’m busy recoding ADFFS to make full use of it for the next release, it should hopefully resolve a lot of the IRQ related issues I’m seeing on the Pi. |
Jeffrey Lee (213) 6048 posts |
There is only one CPSR register. Switching into SVC mode won’t corrupt CPSR_svc, because it doesn’t exist.
No idea!
RTSupport shouldn’t cause (much) IRQ latency as IRQs are higher priority than RTSupport routines. But it will obviously add latency to anything that’s lower priority than RTSupport (the foreground process, callbacks, etc.)
All IRQs are higher priority than RTSupport routines.
With all the flaws you’re pointing out you want it to be the default way of doing things? :) I see RTSupport as simply being a stopgap solution until something better comes along. Until that time, stick to doing things the traditional way wherever possible, and only use RTSupport for situations where you need thread-like behaviour or want a callback that has lower latency than the standard callback system. |
Jon Abbott (1421) 2651 posts |
I’ve managed to get ADFFS’ blitter running as a thread under RTSupport, however there’s a knock on effect… For a reason I’ve yet to establish, registering code with RTSupport has broken parts of the SWI vector handler, more specifically SWI that are called when you swap discs. The way this currently works is that as my SWI handler exits, if its exiting to a CPU mode with IRQ enabled and a disc swap is pending, the CPU is switched to SVC, the disc swap done, returns to the original CPU mode, restores R14/SPSR and then exits back to “user land”. This works on all machines/RO versions I’ve tested…until I register code with RTSupport. In an attempt not to fight the system, I’ve moved the disc swap code to another RTSupport thread. This also switches to SVC before performing the disc swap and also hangs at the same point. I’m going to debug to see where the issue is specifically, but I suspect either ZLib, FileCore, HourGlass or related SWI’s are not being happy with RTSupport replacing the SVC stack with it’s own or with it collapsing the IRQ stack. EDIT: It looks like OS_File is breaking whilst RTSupport is active. OS_File 17 returns “FileCore in use”. EDIT2: One of the entry checks in FileCore is failing, could it be the IRQsema check failing whilst RTSupport is active? EDIT3: I’ve confirmed its the IRQsema check that’s failing. Whilst RTSupport is active, IRQsema points to a non-zero value and consequently FileCore is being called in “IRQ mode”. I think my workaround is to go back to how I was doing things, by checking as the SWI handler exits but add an addition check on IRQsema to match FileCore’s – that should prevent SWI’s called under RTSupport from triggering the disc swap. All fairly obvious once you know! EDIT4: Even checking IRQsema is zero before starting the disc swap doesn’t fix things. I suspect the problem is that RTSupport kicks in after I’ve done the check and we end up with chained IRQ’s on the stack which then trigger FileCore’s IRQsema check. Following this train of thought, I added an IRQsema check loop before every FileCore SWI and can now swap discs but it will lock eventually. It would seem RTSupport doesn’t sit well with FileCore SWI’s. I’m not sure where to go from here as I essentially need to shut down RTSupport before performing any FileCore actions, or run the risk of an IRQ occurring as FileCore’s SWI handler is entered. |
Jon Abbott (1421) 2651 posts |
I’ve finally got to the bottom of the FileCore issues whilst RTSupport is in use. The solution, which I came to through trial and error is: 1. Release the thread via RT_Unregister Step 2 is the critical step, I can only guess that either RTSupport is leaving something on the IRQ stack after a call to RT_Unregister, or the thread is still on the IRQ stack after RT_Unregister is called if it was previously interrupted and needs to clear down. The root cause seems to be related to either the amount of CPU time an RTSupport thread is consuming – the more CPU time, the more likely a FileCore entry will happen whilst interrupted IRQ’s are sitting on the IRQ stack, or due to raising an event from with an RTSupport thread. There is a slim possibility that random FileCore failures will occur in RO5.21 with RTSupport as it is. It needs checking through a debug build, but my suspicion is that if RTSupport threads are interrupted and become suspended on the IRQ stack when RTSupport hands back to the foreground process, any subsequent FileCore calls will fail as FileCore’s IRQsema check fails. There’s currently three RISCOS registered RTSupport threads, two of which are only used on the Pi (BCMSound and HWPointer), the third is Sound0HAL. The chance of the above behaviour occurring is currently zero, however, as further RTSupport threads are added and the possibility of them being interrupted, the problem may start to show itself. It can possibly be repro’d by creating a CPU intensive RTSupport thread, or raising an event in an RTSupport thread whilst running a foreground process that’s continually performing FileCore operations. |
Steve Pampling (1551) 8172 posts |
Random thought, some people have reported file system corruptions on Pi installs, the same people are from my recollection regular users of sound facilities. |
Jon Abbott (1421) 2651 posts |
Hard to tell until I can create a repro outside of ADFFS. Possible but unlikely I’d say, should the sequence that triggers the problem happen then any FileCore SWI’s will generate a “FileCore in use” error, which I’d expect to get reported back to the user. Random corruption is more likely to be the age old Pi power issue, as when you load the CPU with IRQ’s the power drops, writing to USB is also power intensive. In my experience you shouldn’t attempt writing to USB sticks powered directly off the Pi, everything needs to go through a powered hub. Some linux folk even recommend avoiding SD cards which shows you how bad the Pi’s power issues are. Having said that, I’ve had a long running battle trying to persuade the Pi devs that there’s possibly a issue that’s causing USB drive corruption that isn’t power related. |
Rick Murray (539) 13851 posts |
From the linked forum:
Hmm… I had a teeny-tiny 4GiB Kingston USB device that looked a bit like a microSD card only with USB connections. It lasted about four months of moderate use in my PVR until it started trashing the directory structure (and a lot of the files). Beside me I have a 32GiB SanDisk SD card (colourful) which worked well until one day. When I connect it to any SD reader, it provides no files, no data, nothing. It just gets alarmingly hot. Perhaps the best approach is to use flash media for temporary storage and keep important stuff on spinning rust (and in the case of devices like the Pi, powered externally). |
Jeffrey Lee (213) 6048 posts |
I’m going to debug to see where the issue is specifically, but I suspect either ZLib, FileCore, HourGlass or related SWI’s are not being happy with RTSupport replacing the SVC stack with it’s own or with it collapsing the IRQ stack. Are you trying to use the filing system from within an RTSupport routine? As far as the OS is concerned, RTSupport routines are “background” code, the same as IRQ handlers. You can only safely use re-entrant SWIs from them. If you need to use a non-re-entrant SWI then you need to follow the same steps as usual, i.e. register a callback and run your code from there. |
Jon Abbott (1421) 2651 posts |
No, the FileCore SWI’s are performed as the SWI hardware vector exits back to the user, if it’s exiting back with IRQ enabled and IRQsema is zero. This works until I register the blitter with RTSupport at which point IRQsema changes between the initial check and the first FileCore SWI – which is about a dozen instructions off the top of my head. The workaround hinted at something happening IRQsema wise with RTSupport after the blitter had been unregistered. Perhaps it puts something on the IRQ stack for the next IRQ entry – just a guess. I’m in a hotel most of this and next week, so I’ll try to produce a Repro |
Jon Abbott (1421) 2651 posts |
A thought on this, could RT_Deregister be returning whilst the thread being unregistered is still active? The documentation states otherwise, but it may explain why IRQsema isn’t clear immediately after it returns. RT_Deregister’s Use comments state:
Is this statement actually correct? Should it not read “does not return immediately.”? |
Jeffrey Lee (213) 6048 posts |
When used on the active thread, RT_Deregister should kill it immediately. I’ve written code in the past which relies on this and haven’t run into any problems with it (e.g. SCSISoftUSB will produce some debug output if the routine isn’t killed properly) If the routine in question is currently executing then this call does not return. The routine which is killed will never be returned to. If the routine is currently executing then it means RTSupport will look for another routine to execute. if it isn’t, then it will just kill it and then return to the calling routine. |
Jon Abbott (1421) 2651 posts |
What happens if a foreground process issues the RT_Deregister? I realise now that they’re referring to the thread issuing the RT_Deregister within itself, in which case it doesn’t return – which is understandable. |
Jon Abbott (1421) 2651 posts |
I’m now seeing IRQ_r13 reset to FA102000 on RO5 IOMD as well, although randomly. I’d take a guess that a certain sequence of IRQ’s are causing the IRQ handler to reset R13. Under what condition(s) does RISCOS 5 reset IRQ_r13? EDIT:
Does RISCOS also make assumptions about the layout of the IRQ stack? For example, what happens in the following scenario: 1. Device Vector handler is entered |
Jeffrey Lee (213) 6048 posts |
Off the top of my head, the only built-in thing I can think of which would reset the IRQ stack would be the default abort handler, when it’s about to raise an error. If that’s not it, then it might be worth just searching the kernel source for IRQSTK and seeing what turns up. The IIC driver makes some assumptions about the layout of the IRQ stack (it walks IRQsema to allow it to process any in-progress transfers if OS_IICOp is re-entered), but I don’t think it will reset the stack or have any other side effects. |
Pages: 1 2