Audio Recording API
David Feugey (2125) 2709 posts |
No, but perhaps the new API (limited to 16bit) could rely on SharedSound and SoundDMA on old systems. And that SharedSound and SoundDMA could rely on the new API on new systems. So everyone would be happy. |
Jason Tribbeck (8508) 21 posts |
I’m not sure that’ll help – the underlying implementation doesn’t really matter for developers. If I was developing a new app, then which API do I support? If I need the new features, it’ll be the new API – no question. But if I want to support older machines, why would I use the new API, since the old ones still work? |
Colin (478) 2433 posts |
Which is how SerialUSB works and it’s horrible – but that’s another story. With the interface I suggested All that would be required of a device driver is a swi to register a buffer fill/empty callback and a swi to supply information on the channels it supports. and a control swi A device could register any and all input or ouput channels it has. channel info could be a block of xml text. The audio manager can supply functions to interrogate the xml block. That way the device driver just needs to supply a block of text with its capabilities for each channel it registers. So the device driver has a minimal interface an actual device always fills in its native format – it is up to the client to decide if it can use the channel. I would write a usb audio module with that interface – if USB changes I would write it again. Existing Hal devices just need their interface changing to the new one. so if you started by with an Audio Device Manager module Take the guts out of SharedSound and make a mixer device with the new device api registering an output channel with a PCM 16 bit interface. Take the output from the mixer resizing it for the the channel chosen by the selector from the list of output channels registered with Audiodevice manager. The implementation of sharedsound would change to use the new Audio manager it would select the 16 bit PCM interface. So now you have all the existing apps working with a makeshift 16bit mixer and the user can select from all compatible hal output channels Soundma can be changed to connect to hal via the new Audiodevice Manager. So basically you get just change the implimentations of the current system to use the new. That makes all legacy programs work 16bit you’ve just revamped the interface. At this point I can write a USB audio back end and register a 16/24/32 bit PCM channel depending on the device. This would appear as a selection to the user and they can chose to use any compatible output channel so would have usb audio with legacy apps relatively easily. New apps can use the new api directly access any registered audio channel at any resolution. At some point you replace the systemmixer with one that mixes 32bit and outputs the resolution of the selected device – it doesn’t affect apps. At that point everything will work up to 32 bits. Whether all the flexibility is useful or not is as you say debateable but you could have supported apps register with the audio device manager and the user can select the device for the program. You may have a headset on and be watching youtube where the sound isn’t mixed – who knows what weird and wonderful thing people want to do. Whether the system is capable or using the flexibility is another moot point. |
Colin (478) 2433 posts |
Why would you use any extentions to the existing interface. At the very least adding devices becomes independent of the HAL system and ROOL |
jim lesurf (2082) 1438 posts |
WRT a question asked upthread: I’d certainly find it useful to be able to use more than one USB audio device at the same time. This enables the user to operate a USB ADC and a USB DAC symultaneously. Very useful for test purposes. e.g. for doing measurements on some audio kit. Also useful if someone wants to play some audio out into some headphones whilst they are capturing their own singing or playing in synch. HOWEVER I suspect only a small minority will want to do such things. Most people will simply want to ‘play audio’. And some will want to do something like use a USB ADC to capture a digital transfer of an old CD, cassette, or some other analogue material. Thus having one device at a time would probably be fine for most users. But a minority would like having the ability to run two (or more!). Bottom line IMV: We really need ONE device at a time to be supported. But more would be good. Having the ability to play/capture stereo at rates > 48k and depths >16 is the key. |
Rick Murray (539) 13850 posts |
Ah, but surely for that you would want to monitor what was being recorded? Certainly to know when to correctly start and stop recording…plus sometimes one has to fiddle with settings to get a good recording (my EeePC was fine recording with additional bass boost, but it seems to swamp the ADC on the desktop PC).
For playback, yes. If it can support on-the-fly switching (like directing to Bluetooth headphones if switched on, for example 1) then I would imagine for most use cases one device at a time would suffice.
While I would agree that this should be in the spec (along with consideration for the likes of 5.1), how necessary is it that this be implemented in an audio system intended for domestic computers? 1 Selecting and switching output device isn’t the sound system’s problem; but being able to redirect output to a different device without shutting down and restarting everything may well be… |
jim lesurf (2082) 1438 posts |
WRT monitoring a recording: Usually the user will have a monitor on the analog input to be able to hear what is being recorded.The recording software at the user level should have decent meters. FWIW that tends to be PPMs for serious work. If you look you can see I included these in the demo recorder I wrote for USB. So although it may be handy for the computer to record and playback (almost) symultaneously via USB it isn’t vital. FWIW These days I use a Benchmark ADC and DAC, but I used to use a Scarlett 2i2 which provides its own monitoring when recoring. That a typical device that someone doing stereo recordings or transcriptions will use. More generally: The reality is that these days it is routine for anyone with a keen interest in audio to record at least as 96k/24. This gives more head and elbows room to avoid clipping or sampling filter imperfections. And 96k/24 is often preferred by people with good hifi kit. (Although personally I then dither/shape down to 96k/16 having scaled the levels optimally.) Given that Linux/Mac/Doze all these days support well above 48k/16, and so do most decent ADCs and DACs it would look silly to not cover this. And I’ve had no problems at all running 96k/24 and 192k/24 though my ARMX6 using the items developed by others. The hardware has no problem with it. We just need the system to deal with it. BTW a lot of AV is now 24 bit or at least 20 bit which then needs 3 bytes per sample. Yes, the Iyonix was crap in this respect. But there is no need at all for such a limit on USB. 96k/24 or 192k/24 isn’t a problem. Indeed, nor is 192k/32 because the usual mode for the USB2 DACs/ADCS is to use 4 bytes per sample with the ‘lowest’ as discardable padding after the transfer. If someone wants to experiment, have a look at the demo recorders and players. I’ve done both RO and Linux (ROX) examples. You can find it here http://www.audiomisc.co.uk/software/index.html Its crap, but works and makes the point. Colin: Is your player available? That has the advantage of playing flac. I never bothered to add that as I know my programming is rubbish. |
Colin (478) 2433 posts |
At the risk of frightening people of – shame about the lack of response, a simple ain’t gona happen move on would do :-) Alternative 2 I think it can be simplified further. If you go back to basics to see what you want to achieve you have 1) The user who wants to use program x with any device that will work with it. 2) The program that can only output certain formats ie a simple program to stream a wav file can only output wav file formats. So the user wants to know what devices they can use with a particular program and to be able to select the appropriate one. the Device may only do 44100 16bit 2ch stereo but that doesn’t matter if the user wants to use it thats ok. So the Audio device manager becomes a means of matching a program to a device it supports. From the programming point of view we want programs and devices to be as easy to write as possible. eg we don’t want device drivers to have to implement interface features which don’t apply to the device. From an audio driver perspective: An audio driver registers a single function with the Audio Manager (AM) for each input output channel it has. There would be a required set of reason codes for interfacing with the Audio Manager (AM) The trick is each channel registered doesn’t have the same interface they have the interface of a specific Audio class so a mp2 player would have reason codes to play mp3’s a 2 channel PCM stereo stereo class would have reason codes to play 2 channel PCM stereo stereo. an output channel would have reason codes for an output stream an input channel would have reason codes for an input stream. Also the device only implements those class capabilities that are relevent to it. The most important reason code would be a capabilities function which tells the device manager what the device can do and what functions it has and can be used to determine if the device is suitable for a program. The 2 channel PCM audio output class could be specified as
some are functions others are info. An audio channel may have the following capabilities list
The driver would do nothing about capabilities it didn’t have Channel ids are 16byte hmac_md5 hashes then programs are just handling 16byte arrays instead of variable strings. For USB the hash can include the location on the usb bus or serial number if available, where it is plugged in. whether it is input or output, what usb interface is used, basically any information which can be used to identify a specific channel so that the user can leave it plugged in and save it for use next time. The advantage of the hash function is any amount of data can be thrown at it to identify the device and it gets crunched down to 16 bytes. hmac_md5 is available through the mbedtls library and the AM can provide functions to use it. From the application The Application can then ask the AM to enumerate the devices which has the capabilities it needs It provides the AM with the following list
to get a list of compatible channels it can handle and whether they are in use or not. When selecting a compatible channel from the AM the AM returns a handle to access a specific device. The application can then ask the AM what options are available by passing the handle to the AM with a list of options it would like to see eg.
The for the channel specified earlier AM returns with
So the app can use a frequency changer reason code with one of the frequencies listed but can’t use the volume up reason code. The application can allow the user to choose the device from the list and as it indicated the inuse devices it gives the user the opportunity to force the use of a specific device. To make things easy for the app the AM has a selector popup window available. The app supplies the list of needed capabilities and a window pops up with just channels with those capabilities for the user to chose from An mp3 device would have a different class specification as would a mixer so an app can ask if any mp3/mixer channels are available and again the options are negotiated like the PCM class. With that implemented you have a device driver which just implements its capabilities without wondering what it does about an interface that doesn’t quite fit, An app which just asks for devices it can handle and just uses functions returned from the feature negotiation. From the Audio manager Extendability Classes are trivially extendable – if more capabilities are wanted in a class specification just add them – the device doesn’t care and a legacy app doesn’t care it hasn’t asked for them Fitting it to riscos After the above was implemented. All audio devices are converted to the new backend api – just implementing the features it has and listing them in the capabilities list.
a configure plugin uses the AM channel selector window to allow the user to pick a channel for sharedsound from the list of devices that have sharesound capabilities So all current apps work as now but users can use any compatible audio channel. DigitalCD for example could offer the choice of staying with Sharedsound or using a device direct with the same capabilities as sharedsound. A 32bit mixer could be added later if wanted. Audio input. Just add a input class specifying capabilities of input devices and the input channel can be registered with the manager available for use. |
Colin (478) 2433 posts |
Jim. IsocPlayer |
Grahame Parish (436) 481 posts |
@Colin – That sounds very sensible and flexible to me. It would be great to be able to make full use of the sound capabilities of the hardware (and whatever additional hardware gets plugged in). Hopefully this will be across as much of the hardware platforms as it is possible to support, not just RasPi. |
Dave Higton (1515) 3534 posts |
It is entirely reasonable to want one audio input device and one audio output device to work simultaneously. But is there a good enough reason to support more than one output device, or more than one input device, simultaneously? I think not. I can see the importance of multiple synthesised sound sources being mixed, for playing music. Add in a system beep too. |
Rick Murray (539) 13850 posts |
Yeah… you’re giving away the fact that you are not an average user. The average user will have a turntable or walkman or something, which has a headphone jack. A cable between the device and the computer. And, of course, the expectation that the computer will be able to reproduce the sounds that it is recording.
You’ll find the meters tend to be colour coded (green, then yellow, then red) as the average person probably cannot read them, other than “green good, red bad”. Plus, knowing how loud the recording is doesn’t tell you anything about how the recording sounds. Perhaps there’s a buzz because one of the plugs isn’t in all the way. Or maybe you need to restart the recording because your mobile phone spewed da-duh-da-duh-brrr-da-duh-da-duh in the middle of it?
Just noticed, to stir things up a little, that the default recording setting in Audacity appears to be 44.1kHz 32bit float. |
Rick Murray (539) 13850 posts |
2 output devices? Nothing comes to mind. |
Alan Robertson (52) 420 posts |
One device will send to Bluetooth speaker. Another device sends output to computers own speakers. |
jim lesurf (2082) 1438 posts |
The ‘average user’ may depend on who you are making your average over. However as far as I can tell the Scarlett 2i2 is about the most popular ADC people use and is both an ADC and a DAC with its own monitor. The point here is that we are discussing using USB audio, not a jack socket on the casework. That doesn’t exclude someone else ensuring such a non-USB input works, even if it does a crappy job – like the old Iyonix. FWIW the 2i2 I have is the original version. The current one is better. Goes to 192k/24 IIRC. It also has level indicator LEDs and you can monitor it. Anyone with a record deck is going to need an amp, etc, to listen to it anyway. Thus can use that to monitor. Audacity uses 32bit float internally IIRC as this minimises the risk of damaging the audio info when processing. You can therefore also work in this format. Even studios tend to record no more than 24 or 32 bit int because thats what ADCs tend to operate on. Thus in practice your main concern is to ensure that you record “pass the parcel” what the ADC outputs through USB. Even if the data is then going into Audacity. (Although personally I prefer to record direct 24 bit to a file to ensure this is pass the parcel.) I don’t want to stop anyone from ensuring you can run an ADC and DAC in parallel to monitor IF it doesn’t hamper getting what is basically required. If it is easy enough, fine. But if we want RO to compete in terms of decent audio quality we will need the USB Audio working for both capture and playback. And devices like the 2i2 provide monitoring for the user. |
jim lesurf (2082) 1438 posts |
IIRC even some of the cheap-as-chips Behringer ADCs provide a (headphones) monitor even though they have no level indicators. The reality, though, with using the old-fashioned input socket on a computer is that it tends to be poor quality compared with even a modest USB device. |
jim lesurf (2082) 1438 posts |
Colin: Thanks for adding the link to your player and for what you are doing. |
Colin (478) 2433 posts |
This is how I see it working in practice
|
Colin (478) 2433 posts |
We just need the HAL audiodevice taken out of SoundDMA – a sort of HAL abstaction. |
Jason Tribbeck (8508) 21 posts |
So we’ve identified two types of users: standard, and power users. We don’t want two APIs, but we also don’t want to overengineer something that isn’t going to be used. The basic options for hardware support are:
There’s probably no point in going to “two…”, because once you have more than one, you may as well make “many…” Do we want arbitrary sample rates? Or just the “favourites”? (as mentioned above) Supporting a small list would simplify playback/record code (although dropping support for 44.1KHz would make it even easier – but I suspect controversial!) I’m not entirely convinced by the enumerations above – I would probably have a bit-field like arrangement, so bits 24..31 would be the class, 16..23 would be the ‘function’ (grouping?), and 0..15 would be a bit field. For example:
#define Class_PCM2ChOut (0×40 << 24) // Standard functions #define Function_Resolution (0×01 << 16) #define Function_Frequency (0×02 << 16) #define Function_ChannelCount (0×03 << 16) // PCM2ChOut functions This would be a bit more compact, and easier to get what the capabilities are – although searching for a particular capability would be a little harder. We’d need to consider what happens when we run out of something, and I was considering class-specific functions being 0×80..0xff. A device would declare a list such as:
(Note: Not used to code formatting yet!) |
David Feugey (2125) 2709 posts |
Just to have one code for old and new computers.
And for this too :) |
Rick Murray (539) 13850 posts |
I would think that it might make sense to have provision in the specification for custom sample rates. But for what is actually implemented, stick to the common ones or it’ll never be finished. :-) Oh, and…
…given it’s the official CD audio standard and all of my mp3s are encoded at this sample rate (using various different tools along the way – so I’m probably not alone) and YouTube uses 48kHz for Opus/WebM but 44.1kHz for M4A audio. |
Rick Murray (539) 13850 posts |
Just checked, the ones I’ve bought from Amazon… my phone says they’re 44kHz. |
Colin Ferris (399) 1818 posts |
I suppose one could’nt have 24bit samples used by older Progs. |
Colin (478) 2433 posts |
If your code has no blank lines put ‘bc. ’ – that’s a space after the . – at the beginning of the first line. If the code has a blank line in it use ’bc.. ’ – again space after the dot – at the beginning of the first line and ’p. ’ at the start of a normal paragraph after it. With my system you can inherit other classes for example my PCM2ChOut_Class inherits the channel class. The channel class can be extended anytime with say DescriptionLong. Never have to worry about bit availability if for example an interface becomes deprecated, With AudioMan_EnumerateChannels I can match capabilities for any class. I can add capabilities as tags. I can add a Hal class and have Hal devices use these codes I can then find a PCM2ChOut_Class,HAL_Class,HAL_Default. I can list HAL devices for people to pick a default device in the event that a device is unplugged. enumeraton is important as the capablitlies list will be used to filter the devices presented in a pop up window for users to select from. Its trivial to implement – wrote the basic module in a couple of hours – 206 lines of code. swi code just looks like this.
|