Safeguarding the past, present and future of RISC OS for everyone

News | Downloads | Bugs | Bounties | Forums | Library

Forums → Code review →

[Cortex] Kernel/HAL interrupt handling improvements

6 posts, 2 voices

May 4, 2009 3:20pm Jeffrey Lee (213) 6048 posts	I’m going to start making new threads for the more important issues I find while developing the Cortex port – it should make things easier to keep track of. While working on getting USB working I’ve encountered a problem which I can only think of as being a deficiency in RISC OS’s IRQ handling. Specifically, RISC OS doesn’t notify the HAL when IRQ processing has completed. This causes problems for the OMAP port because the interrupt controller needs to be told when it can begin processing interrupts again. If it doesn’t get told, it will keep claiming the same device is causing an interrupt until the device is serviced – even if a higher priority device starts interrupting, or if you mask the device in the interrupt controller. At the moment I’ve catered for this by restarting the controller inside HAL_IRQSource/HAL_FIQSource. This works OK for situations where the interrupt handler keeps interrupts disabled, but fails for the USB modules because at some point they appear to re-enable interrupts. This causes them to either get stuck in recursive interrupt handlers until running out of stack space, or report the interrupt to RISC OS as unhandled (causing it to mask the IRQ and stop the drivers from functioning). I’ve been aware of this potential problem for a while, but haven’t acted on it until now just to make sure that it is a problem. The only solution I can see to the problem is to add two new HAL entries – HAL_IRQComplete and HAL_FIQComplete. The kernel will call these entries at the end of interrupt processing, allowing the interrupt handler to do whatever is required to restart processing. As a parameter the calls would take the IRQ/FIQ device number of the device that has completed processing, to allow correct behaviour with nested interrupt handlers. This would allow the OMAP interrupt handling to be rewritten so that HAL_IRQSource masks the interrupt in the interrupt controller, then re-enables the controller. HAL_IRQComplete would simply unmask the indicated interrupt. Other changes would also be needed to the HAL_IRQEnable/HAL_IRQDisable implementations to work on soft-copies of the IRQ masks. This would then allow interrupts to function in the intended manner – i.e. re-enabling interrupts in the CPSR will allow other devices to be serviced, but you won’t ever receive an interrupt from a device that is already inside its interrupt handler. Correct me if I’m wrong on this intended behaviour! A couple of extra notes: HAL_IRQClear/HAL_FIQClear could be candidates to use instead of new HAL_IRQComplete/HAL_FIQComplete calls. However it’s currently the responsibility of individual interrupt handlers to call IRQClear/FIQClear, and the calls seem to be focused more towards clearing the interrupt flag in devices managed by the kernel/HAL rather than for devices in general. The OMAP interrupt controller can detect spurious interrupts – i.e. if the device that’s currently identified as the interrupt source stops interrupting. When it detects a spurious interrupt it automatically resets and begins looking for new interrupt sources. But this is the only situation where it will automatically begin reprocessing. E.g. if you mask an IRQ in the interrupt controller and then re-enable interrupts on the ARM then the interrupt controller will still try telling you that the device you masked is interrupting. This means that at some point the HAL must tell the interrupt controller to begin reprocessing, otherwise RISC OS dies early on (because it can’t work out why devices it’s masked via HAL_IRQDisable are still causing interrupts). Overloading HAL_IRQSource and HAL_FIQSource to mask the interrupt is a bit hacky in itself – should these functions be renamed or have a note attached indicating that they’re only used at the start of interrupt processing? Or should new functions be introduced to perform the start-of-interrupt-handler processing? (e.g. to avoid the overhead of another HAL call in the interrupt handler, rename the existing OMAP HAL_IRQSource to HAL_IRQBegin, change the interrupt veneer to call that, and write a new HAL_IRQSource which performs the intended action of identifying the IRQ source without performing any activity that could alter the state of the interrupt controller) Thoughts? Comments? Questions?

May 6, 2009 7:20pm Ben Avison (25) 445 posts	I don’t think it helps that the documentation on the HAL’s interrupt management is rather patchy, and rather assumes that you’re familiar with the IOC/IOMD interrupt controller. Subsequent platforms have all been broadly similar, but as far as I can see, the OMAP seems to have added a layer of complexity that is tripping you up. But first I think it’s useful to recap the interrupt controller model the HAL expects. Apologies if this is familiar to you, but hopefully it will be of interest to the broader community even so. The CPU has only two interrupt inputs – IRQ and FIQ - which cause the CPU to switch processor mode and begin executing either the kernel IRQ handler (which is then dispatched to registered interrupt handlers) or the currently-installed FIQ handler (of which there can only be one at once under RISC OS). However, there are many devices, both on-board and external, which may want to trigger an IRQ (or FIQ), and it’s the job of the interrupt controller to moderate between these competing devices. Typically, any interrupt controller will have a set of registers, each bit of which affects one device. For each device, you’d normally expect there to be one bit in a register to do each of these jobs: interrupt mask (enable or disable) to control whether that interrupt is ignored or not interrupt status (whether the line is asserted) and/or interrupt request (the result of combining the status and mask bits) interrupt clear – see below interrupt type – IRQ or FIQ (though for some controllers like IOC/IOMD, which interrupt requests are IRQs and which are FIQs is hardwired) Usually the interrupt lines entering the interrupt controller are in a fixed pattern of active high or active low, so for some lines there will be an inverter involved (or again that may be configurable). However, more rarely, some interrupts are edge-triggered. In these cases, the interrupt controller contains a latch, and this is where the interrupt clear register comes into play: the interrupt is asserted from the time the edge is detected until the time software writes to clear the interrupt. A common use for edge-triggered interrupts is timers (presumably because the timers issue a brief pulse when the timer wraps around). The typical sequence of events for a device driver would be: Device asserts its interrupt line If the interrupt is unmasked in the interrupt controller, it raises the IRQ line to the CPU The CPU disables interrupts in the CPSR, jumps to IRQ mode and executes the kernel IRQ handler, which ends up calling the device driver’s IRQ handler The device driver determines what actions it needs to take and does the necessary processing (all with IRQs disabled in the CPSR) The device driver pokes a register in the device which causes it to deassert its interrupt line (or for some devices, reading the reason for the interrupt will deassert the interrupt line, and so this will happen earlier on) The interrupt controller spots that the active interrupt is deasserted and so deasserts the CPU IRQ input The device driver returns to the kernel, which exits IRQ mode and re-enables IRQs in the CPSR Usually that’s all that is required, however for a small minority of devices (such as the aforementioned timers), the interrupt handler will need to clear the interrupt in the interrupt controller at step 6 because the interrupt controller’s input line is unable to prompt it. Normally, priorisation of interrupts is a simple matter in RISC OS - the kernel simply scans the interrupt request register(s) (the HAL ANDs the interrupt mask and interrupt status if there isn’t one) for set bit(s) and dispatches calls to the corresponding interrupt handler(s) in bit order. The RISC OS device vector number is simply formed by combining the interrupt register number and the bit number within the register. Now, I see that the OMAP interrupt controller has encroached upon this part of the kernel/HAL interrupt handler’s functionality – it seems to sort interrupts by priority and tell you which is the highest priority currently-interrupting device. In this case, I think the neatest thing to do is to get the interrupt controller to re-evaluate the interrupts on a call to HAL_IRQClear, since the timing at which you’d perform this matches that for clearing latched interrupts (that is, after you’ve stopped the device from asserting its interrupt line, but before IRQs are re-enabled in the CPSR). This will require a lot of device drivers to acquire calls to HAL_IRQClear, but fortunately these will be no-ops on HALs for traditional interrupt controllers and therefore harmless. I think your point about re-enabling IRQs in the CPSR inside the interrupt handler is actually a separate issue, since even with traditional interrupt controllers, you had to ensure the device was no longer driving its interrupt request line before enabling IRQs in the CPSR. You certainly shouldn’t be relying on the kernel’s default interrupt handler code to mask an interrupt if it is unhandled – that’s really just a safety net for buggy interrupt handlers or for software under development. RISC OS has always taken a rather cavlier approach to IRQ latency, and it does rather a lot of processing in IRQ handlers with IRQs disabled – the most notable exception in the desktop OS being the sound fill routines, which do a lot of number-crunching and data shuffling with IRQs enabled but with the IRQ stack still threaded. There was one STB where a video IRQ handler took too long to execute for the interrupt latency required by an audio device, and a rather hacky solution was used whereby while the video IRQ handler executed, it masked all interrupts in the interrupt controller other than that audio IRQ and then enabled CPSR IRQs. Your suggestion of masking the interrupt request(s) corresponding to the currently-executing interrupt handler(s) sounds similar to me – it’s not ideal, especially on platforms which used shared interrupts (such as all PCI platforms). Instead, if you’re concerned about interrupt latency caused by there being too much processing going on inside an IRQ handler, I suggest you investigate the RTSupport module. I initially wrote this to support codec processing, but it would be equally applicable to “deprioritising” heavyweight interrupt handlers – Dan Ellis and I briefly discussed its applicability to the USB stack once, but I don’t think anything came of it. You can also compare this to the Internet stack, which defers processing of data which was ultimately received from Ethernet driver interrupts until transient callbacks are triggered. However, since callbacks only get processed when the OS is idle, you may have to wait for a long time until the data gets processed. Despite many improvements we made to callback latency for STBs (which make heavy use of the Internet stack) this wait may be too long for USB - hence the usefulness of the RTSupport module.

May 6, 2009 7:58pm Jeffrey Lee (213) 6048 posts	Good point about the shared interrupts on PCI platforms – that’s something I hadn’t considered. I suppose HAL_IRQClear is the most sensible route to take, even if it means tricky debugging sessions working out which interrupt handler the OS is getting stuck in when something goes wrong.

May 6, 2009 11:41pm Jeffrey Lee (213) 6048 posts	I’ve now made the IRQ handling changes, and it seems to have solved the USB problems, although I’m not really sure why. To complete the changes I’ve moved the interrupt controller reset from from HAL_IRQStatus/HAL_FIQStatus to HAL_IRQClear/HAL_FIQClear, and added calls to HAL_IRQClear to all the appropriate parts of the OS (at the moment, the IRQ silencing code for unclaimed IRQs, and the USB modules). However I encountered a bit of a conundrum with HAL_FIQDisableAll – it’s used in situations where FIQs need silencing, but it doesn’t take an FIQ device number as input, so it can’t be followed by a call to HAL_FIQClear in order to make sure the interrupt controller recalculates the FIQ source. At the moment I’ve worked round this by adding the interrupt controller reset to the end of HAL_FIQDisableAll. Since HAL_FIQDisableAll is only used in circumstances where the ARM wants to stop receiving FIQs I can’t see any reason against this change. What are your thoughts on this? And were you envisaging that HAL_IRQDisable and HAL_FIQDisable should restart interrupt processing also? (Currently they don’t, it’s only the Clear and DisableAll functions that do)

May 7, 2009 12:26pm Ben Avison (25) 445 posts	Well, in a traditional interrupt controller, HAL_IRQDisable would be enough by itself to cause the CPU IRQ input to be deasserted, or if another interrupt is pending, the kernel interrupt handler to be re-entered and the other interrupt to be serviced instead. So logically, on OMAP, HAL_IRQDisable would reset the interrupt controller’s IRQ sorter, in the same way as HAL_IRQClear does. By extension, the FIQ sorter would be reset on HAL_FIQDisable and HAL_FIQDisableAll. For what it’s worth, don’t expect HAL_FIQDisable to be called much if at all from RISC OS. Because RISC OS only supports one installed FIQ handler at once, every time RISC OS disables FIQs, it disables them all, for safety. This was a particularly simple operation on IOC/IOMD since all FIQ sources were managed by a single FIQ mask register, which would be written with 0 – that’s what HAL_FIQDisableAll is intended to mirror. HAL_FIQDisable is really only there to permit alternative OSes to be built on top of the HAL.

May 7, 2009 12:38pm Jeffrey Lee (213) 6048 posts	Well, in a traditional interrupt controller, HAL_IRQDisable would be enough by itself to cause the CPU IRQ input to be deasserted Yeah, that’s pretty much what I was thinking. I’ll update it so that if IRQDisable/FIQDisable disables the currently firing interrupt it will restart the priority sorting – that should ensure things work in the most compatible way. That should also allow me to get rid of the IRQClear call I added to RISC OS in the unhandled IRQ handler, so that in the end the only changes I’m making to the OS/drivers is to make sure that IRQClear gets called whenever a device driver clears the interrupt flag in a device.

Reply

To post replies, please first log in.

Forums → Code review →

Search forums

Social

Follow us on

and

ROOL Store

Buy RISC OS Open merchandise here, including SD cards for Raspberry Pi and more.

Donate! Why?

Help ROOL make things happen – please consider donating!

RISC OS IPR

RISC OS is an Open Source operating system owned by RISC OS Developments Ltd and licensed primarily under the Apache 2.0 license.

Description

Developer peer review of proposed code alterations.

Voices

Options

Forums
Login

Contact Us | About Us

The RISC OS Open Beast theme is based on Beast's default layout
Site design © RISC OS Open Limited 2024 except where indicated

Hosted by Arachsys

Powered by Beast © 2006 Josh Goebel and Rick Olson
This site runs on Rails