Audio Recording API
Dave Higton (1515) 3534 posts |
When I wrote the USBAudio module, I always envisaged that there would be another layer above it so as to fit in with the sound API of RISC OS. It looked to me like it would be easier to do in two layers than one (and it still does). |
Jason Tribbeck (8508) 21 posts |
I’m hoping that the USBAudio module would effectively become an audio driver – i.e. where I’d put “HAL+Hardware”, so it would provide ‘interrupts’ to say ‘fill me’, or ‘empty me’ (depending on whether you’re playing or recording). |
Steve Pampling (1551) 8172 posts |
Once of those sad bits of history, which ROD are trying to address. The means of addressing things is largely by reinventing because the alternate method (using the RO4.39 source) is not an available option. This might change but that change is outwith the control of anyone associated with RO5 although the door is open. Back on topic, I’m happy to wait a few weeks for Jason to complete his work :) |
Colin (478) 2433 posts |
I wrote a replacement soundDMA module to just work with USBAudio to see what was needed to fit with riscos audio – didn’t work – I vaguely remember the API was not a good fit for USBAudio. Can’t remember the details it was a long time ago. |
Jason Tribbeck (8508) 21 posts |
Okay – we’ll cross that bridge later then. If a new USB stack is on the cards, then it might be easier then… |
Colin (478) 2433 posts |
I’d also point out that if I had an expensive dac plugged in I’d want the data arriving at my dac unmolested by mixers or any other digital filter apart from changing the size of the sample to fit the device. There is also the possiblility that the device does mp3 decoding or as mentioned elsewhere multi channel output. |
Colin (478) 2433 posts |
In that case I’d forget about USB. Bluetooth can also be usb. USB has always held the promise of drivers for everyone instead of just for the pi or some other computer its a shame it has taken second place to HAL devices. |
Jason Tribbeck (8508) 21 posts |
I’m in two minds about SharedSound clients. If they use the helper functions, then it should be okay, but if they do their own mixing, then there’ll be problems with supporting higher resolutions. I was worried about those doing mixing themselves – I don’t suppose anyone knows if any do? |
Jeffrey Lee (213) 6048 posts |
I think all the SharedSound clients that I’ve written do the mixing themselves. And it wouldn’t surprise me if that’s the same for most other programs. The fact that SharedSound’s mixing routine only supports stereo 16bit input (with “RISC OS native” stereo ordering) means that it’s useless for a great number of cases (mono input, stereo reversed input, non-LPCM input, any per-sample processing you want to perform like volume fading or a dynamic voice generator, etc.). Mixing into a temporary buffer which is in a SharedSound-friendly format, and then passing that buffer to SharedSound’s own mixer would typically be less efficient than if the code was to do a direct mix into SharedSound’s buffer. |
Doug Webb (190) 1180 posts |
Where would something like SharedSoundBuffer and StreamManager provided by John Duffell and used by RDPClient to direct and play sound on either Windows server being attached to or the RISC OS client come in to this? I am not sure client based sound works on more modern machines. SharedSoundBuffer is used to play raw sound via the SharedSound module and play from any application and StreamManager buffers it. |
Jason Tribbeck (8508) 21 posts |
Okay – I’ve written my thoughts on SharedSound and the problems with it – and why it’s something to consider down the line. And a few other bits and pieces brought up in this thread. |
Doug Webb (190) 1180 posts |
Thanks Jason for a very informative document covering SharedSound and also for taking a look at the two modules I mentioned as well. Look forward to seeing what comes of all this. |
Colin (478) 2433 posts |
I don’t think your description on the implementation of the device using a ping pong buffer is relevant to the api, bluetooth and usb won’t work like that. If I was starting from scratch I’d do something like this:
Each device has a features list which may include channels for 2 channel PCM stereo, 5 channel surround, mp3 etc. The mixer is implemented as a device with a 2channel stereo feature list and the selector chooses the device chosen by the user in The selector would only choose from devices with 2 channel PCM stereo. The mixer has the same api as a device. Each app can choose any channel on any device – as long as it is available. The device interface for the 2 channel stereo channel is in the native format of the device so may be 16bit 24bit or 32bit. Sound is output by registering a buffer fill function with the device. The fill function also informs an app that the device has disconnected. The device manager api that apps use has a 16bit 24bit and 32bit interface for 2 channel PCM stereo. The app selects the size depending on the sample size in the music they have – they don’t resize samples. The device manager resizes the samples to fit the device. If the input sample size = output sample it just passes the apps buffer fill function to the device otherwise it registers its own function with the device and does the conversion. If the app is able to output sound in any sample size it can ask the device manager for the native format. Other feature channels are passed through to the device. The device manager can list devices for an application to match to its needs. Given that as a starting point I think the existing sound api could remain with its modules and the modules could be made to redirect through the new system. |
Colin (478) 2433 posts |
As we are registering a fill routine with an output device we need an input device to supply an emptying function so that the input can be tied directly to output. |
André Timmermans (100) 655 posts |
Most of them do the mixing on their own. First reason, is that the mixing routines were added in SharedSound 1.00 IIRC. Next the routines are limited to stereo data so not useful for old school tracker music or midi emulation (you mixing between 0 and 128 sounds having all their own sample rate/pitch), mixing multiple mp3s or downsampling 5.1 audio. The routines also don’t perform any kind of interpolation.
Looks fine to me, if the destination is configured to use [16-bit, 2ch], you pass the filling buffer directly to the clients, if not you use an intermediate [16-bit, 2ch] buffer for the standard clients, you use the filling buffer for the other clients and finally merge the intermediate buffer into it afterwards. Client wise, for non-standard clients, I think they should provide an enumeration of the formats they support, they should always support [16-bit, 2 ch] and should not expect to be always called with the same format. Let say that you unplug your 32-bit high fidelity USB card (our your HMDI monitor) and SharedSound reverts to internal sound, non-standard clients will now be expected to fill [16-bit, 2 ch] buffers. |
Jeffrey Lee (213) 6048 posts |
Yes, having all the 16-bit stereo SharedSound clients mix into a shared 16-bit buffer sounds good to me. |
Colin (478) 2433 posts |
16 bit doesn’t matter just because a device supports 16bit audio it doesn’t follow that a device uses a 16bit sample size. You let the application do what it can and have the operating system fit it to the device if necessary. |
André Timmermans (100) 655 posts |
It’s an idea but you may end with application A using [16-bit, 2 ch], B [32-bit, 2ch], C [16-bit, 5.1], … and the need to remerge all these buffers for the target device. |
Colin (478) 2433 posts |
No the only mixing is on 16,24 or 32bit 2 channel stereo PCM channels all other channel formats are for use by applications as necessary on a one to one basis. The target channels for the system mixer are only those devices that support 2ch PCM and are selected via a plugin – All backends report their capabilities via device manager. Apps don’t need to use the system mixer they can do what they like even provide an alternative mixer for use by the selector. It’s all modular based on the backend api. Apps only see the Device manager api. At some stage you have to merge these 3 sizes into the device sample format. The App doesn’t want to do this, The device driver doesn’t want to do this so let the operating system do it if it is necessary that makes writing the app and writing the device driver easier. A surround sound mixer could be added as a separate device if it was needed. You could extend capabilities by adding ‘devices’ to process incoming data and then output through actual devices like the mixer module in my sketch. You could add an mp3 processor for example. |
jim lesurf (2082) 1438 posts |
Ok, some views/thoughts… :-) The main feeling I have is that: 1) Being able to both record and play out via USB is vital. Pretty much the rest of the non-studio world of audio has gone that way. If you want a good ADC or DAC it will be one that uses USB. Only being able to use non-USB ‘sockets’ on a machine ties you to their level of performance, whereas USB sets you free to choose. FWIW I can now use studio/lab quality ADCs and DACs and so do many serious audiophiles and pros. But they require USB. Given that Colin and Dave gave us a way in here some time ago, it is the rest of the RO system that lags. That then shows up with irritations like I can’t use DigitalCD to play out via USB, so have to use USB-aware specific software like the demo programs they and I wrote. Which work, but don’t compete with something like DigitalCD in terms of facilities/flexibilty. 2) It should be possible to choose if the data goes though a ‘mixer’ or ‘direct’. I’ve now also been using Linux for years, and when sorted, it works well for audio via USB. But the persistent way distro developers take for granted the user all will want a ‘mixer’ in the path can be a PITA. Fortunately, the application level programs generally let you choose. Alas, standard RO software doesn’t! (yet!!) 3) These days 16 bit isn’t enough. So if the main or ‘mixer’ or ‘Shared Sound’ setup is to catch up with other OSs it must handle 24 bit and even 32 bit. I’m probably largely echoing the thoughts of others, if so, this is my way of saying ‘hi’ to this thread. I do hope we can get somewhere and I can use my USB kit as easily with RO as with Linux sometime soon! |
jim lesurf (2082) 1438 posts |
TBH I tend to feel that multichannel tends to be better suited to HDMI as a distinct issue for AV rather than general audio. |
Steve Pampling (1551) 8172 posts |
Perhaps a video conference call involving Jason and the three would speed things up. Chat through possibilities and then come back to the forum with any revisions? |
Colin (478) 2433 posts |
It’s simple really I’m all for freezing the current system and writing a totally new one – with the swi names bases on ‘Audio’ and not ‘Sound’ to separate old from new. Then patch the old system to use the new. |
jim lesurf (2082) 1438 posts |
Getting !DigitalCD to ‘talk to’ USB would probably be a very good idea in the short / medium term as it would at least make it easier for more people to find that they can use USB DACs and get improved results. I suspect the key challenge there is 24bit, not the sample rates. Which leads on to Colin’s comment about maybe adding a new set of swis, etc, with a prefix like ‘Audio’ for these ‘new’ uses and have it bypass the parts of the old ‘Sound’ call setup that is either irrelevant or not up to more than 16 bits per sample. I think this is what we really need for the longer term. So the question is if ‘fixing !DigitalCD’ would detract from this or hold it up / divert effort? Also need to bear in mind the ‘recording’ side of this as per the thread title. Linux has Pulse as well as ALSA (not to mention at least one other audio syste,!) although I admit I’d personally like Pulse to be beaten to death with a big heavy stick! 8-] Above said, I’m looking at this as a mere (would-be) user, so leave the magic incantations to those who know. |
Jason Tribbeck (8508) 21 posts |
Jumping around a bit…
USB and Bluetooth will be different, but the process would be broadly the same: the buffer gets filled (probably under a timer interrupt), and then the USB HAL interface can pull data off the buffer and put it into the USB stack’s buffer which then goes to the USB devices’ buffer – so it is kinda multi-buffered (actually, I2S can also be multi-buffered – for example the Broadcom chip in the Raspberry Pi has a very much smaller buffer that gets filled by DMA [16 word?]).
I had considered only having a single mixer an intermediate step (where SharedSound only works with one piece of hardware at a time), and I felt that multiple devices would be preferred by others. But now I’m wondering whether even having multiple LinearHandlers running at the same time even makes sense. What do people think? Do people only use one audio device at a time? Is there any benefit in having more?
I have a number of USB interfaces on my PC for audio creation, and they are pretty sweet! However, I don’t really want to get into the USB side at this moment in time – especially if the USB stack is going to be rewritten.
Just in case anyone didn’t know, applications can choose to implement LinearHandler or SharedSound to give both options. I like that as an option, and forcing applications to have their sound mixed with other apps I’m a little uncomfortable with. But having apps only require a single API would make it easier to develop.
Creating a new one from scratch would mean application developers would need to support maybe three sound APIs initially (depending on how far they want their software to be used). That worries me a bit: do developers want to do this? The fact that all three of them ultimately ends up going through the same code doesn’t really make any difference.
I’m up for it. Just not when I’m at work :) |