[Cortex] Kernel/HAL interrupt handling improvements
Jeffrey Lee (213) 6048 posts |
I’m going to start making new threads for the more important issues I find while developing the Cortex port – it should make things easier to keep track of. While working on getting USB working I’ve encountered a problem which I can only think of as being a deficiency in RISC OS’s IRQ handling. Specifically, RISC OS doesn’t notify the HAL when IRQ processing has completed. This causes problems for the OMAP port because the interrupt controller needs to be told when it can begin processing interrupts again. If it doesn’t get told, it will keep claiming the same device is causing an interrupt until the device is serviced – even if a higher priority device starts interrupting, or if you mask the device in the interrupt controller. At the moment I’ve catered for this by restarting the controller inside HAL_IRQSource/HAL_FIQSource. This works OK for situations where the interrupt handler keeps interrupts disabled, but fails for the USB modules because at some point they appear to re-enable interrupts. This causes them to either get stuck in recursive interrupt handlers until running out of stack space, or report the interrupt to RISC OS as unhandled (causing it to mask the IRQ and stop the drivers from functioning). I’ve been aware of this potential problem for a while, but haven’t acted on it until now just to make sure that it is a problem. The only solution I can see to the problem is to add two new HAL entries – HAL_IRQComplete and HAL_FIQComplete. The kernel will call these entries at the end of interrupt processing, allowing the interrupt handler to do whatever is required to restart processing. As a parameter the calls would take the IRQ/FIQ device number of the device that has completed processing, to allow correct behaviour with nested interrupt handlers. This would allow the OMAP interrupt handling to be rewritten so that HAL_IRQSource masks the interrupt in the interrupt controller, then re-enables the controller. HAL_IRQComplete would simply unmask the indicated interrupt. Other changes would also be needed to the HAL_IRQEnable/HAL_IRQDisable implementations to work on soft-copies of the IRQ masks. This would then allow interrupts to function in the intended manner – i.e. re-enabling interrupts in the CPSR will allow other devices to be serviced, but you won’t ever receive an interrupt from a device that is already inside its interrupt handler. Correct me if I’m wrong on this intended behaviour! A couple of extra notes:
Thoughts? Comments? Questions? |
Ben Avison (25) 445 posts |
I don’t think it helps that the documentation on the HAL’s interrupt management is rather patchy, and rather assumes that you’re familiar with the IOC/IOMD interrupt controller. Subsequent platforms have all been broadly similar, but as far as I can see, the OMAP seems to have added a layer of complexity that is tripping you up. But first I think it’s useful to recap the interrupt controller model the HAL expects. Apologies if this is familiar to you, but hopefully it will be of interest to the broader community even so. The CPU has only two interrupt inputs – IRQ and FIQ - which cause the CPU to switch processor mode and begin executing either the kernel IRQ handler (which is then dispatched to registered interrupt handlers) or the currently-installed FIQ handler (of which there can only be one at once under RISC OS). However, there are many devices, both on-board and external, which may want to trigger an IRQ (or FIQ), and it’s the job of the interrupt controller to moderate between these competing devices. Typically, any interrupt controller will have a set of registers, each bit of which affects one device. For each device, you’d normally expect there to be one bit in a register to do each of these jobs:
Usually the interrupt lines entering the interrupt controller are in a fixed pattern of active high or active low, so for some lines there will be an inverter involved (or again that may be configurable). However, more rarely, some interrupts are edge-triggered. In these cases, the interrupt controller contains a latch, and this is where the interrupt clear register comes into play: the interrupt is asserted from the time the edge is detected until the time software writes to clear the interrupt. A common use for edge-triggered interrupts is timers (presumably because the timers issue a brief pulse when the timer wraps around). The typical sequence of events for a device driver would be:
Usually that’s all that is required, however for a small minority of devices (such as the aforementioned timers), the interrupt handler will need to clear the interrupt in the interrupt controller at step 6 because the interrupt controller’s input line is unable to prompt it. Normally, priorisation of interrupts is a simple matter in RISC OS - the kernel simply scans the interrupt request register(s) (the HAL ANDs the interrupt mask and interrupt status if there isn’t one) for set bit(s) and dispatches calls to the corresponding interrupt handler(s) in bit order. The RISC OS device vector number is simply formed by combining the interrupt register number and the bit number within the register. Now, I see that the OMAP interrupt controller has encroached upon this part of the kernel/HAL interrupt handler’s functionality – it seems to sort interrupts by priority and tell you which is the highest priority currently-interrupting device. In this case, I think the neatest thing to do is to get the interrupt controller to re-evaluate the interrupts on a call to HAL_IRQClear, since the timing at which you’d perform this matches that for clearing latched interrupts (that is, after you’ve stopped the device from asserting its interrupt line, but before IRQs are re-enabled in the CPSR). This will require a lot of device drivers to acquire calls to HAL_IRQClear, but fortunately these will be no-ops on HALs for traditional interrupt controllers and therefore harmless. I think your point about re-enabling IRQs in the CPSR inside the interrupt handler is actually a separate issue, since even with traditional interrupt controllers, you had to ensure the device was no longer driving its interrupt request line before enabling IRQs in the CPSR. You certainly shouldn’t be relying on the kernel’s default interrupt handler code to mask an interrupt if it is unhandled – that’s really just a safety net for buggy interrupt handlers or for software under development. RISC OS has always taken a rather cavlier approach to IRQ latency, and it does rather a lot of processing in IRQ handlers with IRQs disabled – the most notable exception in the desktop OS being the sound fill routines, which do a lot of number-crunching and data shuffling with IRQs enabled but with the IRQ stack still threaded. There was one STB where a video IRQ handler took too long to execute for the interrupt latency required by an audio device, and a rather hacky solution was used whereby while the video IRQ handler executed, it masked all interrupts in the interrupt controller other than that audio IRQ and then enabled CPSR IRQs. Your suggestion of masking the interrupt request(s) corresponding to the currently-executing interrupt handler(s) sounds similar to me – it’s not ideal, especially on platforms which used shared interrupts (such as all PCI platforms). Instead, if you’re concerned about interrupt latency caused by there being too much processing going on inside an IRQ handler, I suggest you investigate the RTSupport module. I initially wrote this to support codec processing, but it would be equally applicable to “deprioritising” heavyweight interrupt handlers – Dan Ellis and I briefly discussed its applicability to the USB stack once, but I don’t think anything came of it. You can also compare this to the Internet stack, which defers processing of data which was ultimately received from Ethernet driver interrupts until transient callbacks are triggered. However, since callbacks only get processed when the OS is idle, you may have to wait for a long time until the data gets processed. Despite many improvements we made to callback latency for STBs (which make heavy use of the Internet stack) this wait may be too long for USB - hence the usefulness of the RTSupport module. |
Jeffrey Lee (213) 6048 posts |
Good point about the shared interrupts on PCI platforms – that’s something I hadn’t considered. I suppose HAL_IRQClear is the most sensible route to take, even if it means tricky debugging sessions working out which interrupt handler the OS is getting stuck in when something goes wrong. |
Jeffrey Lee (213) 6048 posts |
I’ve now made the IRQ handling changes, and it seems to have solved the USB problems, although I’m not really sure why. To complete the changes I’ve moved the interrupt controller reset from from HAL_IRQStatus/HAL_FIQStatus to HAL_IRQClear/HAL_FIQClear, and added calls to HAL_IRQClear to all the appropriate parts of the OS (at the moment, the IRQ silencing code for unclaimed IRQs, and the USB modules). However I encountered a bit of a conundrum with HAL_FIQDisableAll – it’s used in situations where FIQs need silencing, but it doesn’t take an FIQ device number as input, so it can’t be followed by a call to HAL_FIQClear in order to make sure the interrupt controller recalculates the FIQ source. At the moment I’ve worked round this by adding the interrupt controller reset to the end of HAL_FIQDisableAll. Since HAL_FIQDisableAll is only used in circumstances where the ARM wants to stop receiving FIQs I can’t see any reason against this change. What are your thoughts on this? And were you envisaging that HAL_IRQDisable and HAL_FIQDisable should restart interrupt processing also? (Currently they don’t, it’s only the Clear and DisableAll functions that do) |
Ben Avison (25) 445 posts |
Well, in a traditional interrupt controller, HAL_IRQDisable would be enough by itself to cause the CPU IRQ input to be deasserted, or if another interrupt is pending, the kernel interrupt handler to be re-entered and the other interrupt to be serviced instead. So logically, on OMAP, HAL_IRQDisable would reset the interrupt controller’s IRQ sorter, in the same way as HAL_IRQClear does. By extension, the FIQ sorter would be reset on HAL_FIQDisable and HAL_FIQDisableAll. For what it’s worth, don’t expect HAL_FIQDisable to be called much if at all from RISC OS. Because RISC OS only supports one installed FIQ handler at once, every time RISC OS disables FIQs, it disables them all, for safety. This was a particularly simple operation on IOC/IOMD since all FIQ sources were managed by a single FIQ mask register, which would be written with 0 – that’s what HAL_FIQDisableAll is intended to mirror. HAL_FIQDisable is really only there to permit alternative OSes to be built on top of the HAL. |
Jeffrey Lee (213) 6048 posts |
Yeah, that’s pretty much what I was thinking. I’ll update it so that if IRQDisable/FIQDisable disables the currently firing interrupt it will restart the priority sorting – that should ensure things work in the most compatible way. That should also allow me to get rid of the IRQClear call I added to RISC OS in the unhandled IRQ handler, so that in the end the only changes I’m making to the OS/drivers is to make sure that IRQClear gets called whenever a device driver clears the interrupt flag in a device. |