Audio Recording API
Colin (478) 2433 posts |
If fully comprehensive samplerates (I changed my mind and decided against frequency) are requires I think we need to go the USBAudio class route – a list of samplerate ranges with a step size through each range – It can always be added later but it’s not necessary to get SoundDMA to output sound to any compatible device after all we only need to support a certain range for the current system. Once you start thinking what is possible you end up with usb descriptors. |
Colin (478) 2433 posts |
We could go the samplerate enumeration function route. I had envisaged letting the selection window generate its list from a list of capabilities passed to it by an application in a wimp message but we could just pass a list of handles. I was thinking that a devices samplerate range may not be suitable for selection so you wouldn’t want it to appear but if we just pass handles then the app can decide if it wants a device to appear. |
jim lesurf (2082) 1438 posts |
WRT sample rates, etc: In general people stay with 44.1k * N and 48k * N. i.e. those base rates and their integer multiples. Pretty much all USB DACs/ADCs support these up to a given value of N. There is some use of 32k rate. I’ve seen some USB DACs that support this. It is a hangover from the BBC inventing and using it. Which they still do for sending audio to their FM transmitters. But in general pro TV, etc long ago changed to 48k/24 for stereo. There will still be some ‘rediculously-low-rate’ material about. Indeed, I recently committed the sin of putting up some 16k sample rate ultra-low mp3 items on the web. But of course this can simply be upsampled by a player as the sound quality is crap anyway in hifi terms. Short version - I’d say support 44.1k * N and 48k * N and if simple, also 32k. Ramble - I’d regard the rest as the problem for the playing software unless you plan to add in a resampler call/layer. Which complicates things if you want good performance. Chances are any crap-rate material will be an /N of the above, so could be linearly interpolated fairly simply given the poor start-quality not requiring anything better. (If it does, the user should get sox and use it first! :-) ) |
jim lesurf (2082) 1438 posts |
Having written the above I can’t resist adding: http://jcgl.orpheusweb.co.uk/history/people/Tapes/Tapes.html if anyone is interested, the above page has the series of ultra-low-rate mp3 files I referred to. Given that they are speech, they survive the low rate surprisngly well, so some of them can be used as a demo/test. Although others were badly recorded in the first place and are noisy, etc. NOT ‘hi-fi’ material! 8-] |
Steffen Huber (91) 1953 posts |
I think it was actually an international broadcaster’s agreement on digital radio transmissions. German “DSR” (Digitales Satelliten Radio) certainly was 16bit/32kHz when it started in 1989, I think the first prototype receivers go back to the beginning of the 80s. |
jim lesurf (2082) 1438 posts |
Yes. Sorry, what I wrote was muddled as I was in a hurry! The BBC adopted LPCM and 32k for internal distribution and subsequently devised NICAM, then used that rather than LPCM. But stayed with the 32k rate. Potted history as recounted to me by the BBC engineers is here: http://www.audiomisc.co.uk/BBC/PCMandNICAM/History.html They started PCM to the Wrotham (for London) TX in the early 1970s. NICAM is still used for BBC distribution to FM TXs. But the rest of the BBC audio is now based on 48k/24bit for stereo. |
Ronald (387) 195 posts |
BBC audio is now based on 48k/24bit for stereo Its getting harder to browse some sites for information, but there is reference to BBC accepting 48k/16bit minimium for contributers once. There is still reference to the 2017 high resolution concert, but it seemed to me that 320kbit AAC is the medium being used for streaming, with a hint that lower resolution flac was being trialed, with web browsers being issued with the flac capability. |
jim lesurf (2082) 1438 posts |
The default standard specified for contributions/submissions for programme material tends to be 48k with 24 bit preferred over 16 bit. But they may accept other formats. Depends on circumstances. All the internal distribution for stereo audio in the BBC is 48k/24. This is what is then fed into the iPlayer system. Which then will send out to end-users various formats. The ‘highest’ openly available is the 320kbps aac. However some lower rates have been available. Some home ‘net radios’ get the BBC via some intermediary and that intermediate company may down-convert the 320k to something else. Hence anyone using such a commercial box to listen may not get 320k aac but something inferior. This isn’t under the control of the BBC. (The user also risks this stream ceasing because it isn’t the BBC’s job to support the commercial closed boxes. There are already examples of such ‘orphan’ devices.) The 48k/24 gets processed into NICAM for distribution to the FM TX chain. These days it goes there via a commercial network whose details aren’t controlled by the BBC. All they pay for and contract is that it duly arrives at the TXs the same as it was sent by the BBC. :-) Hard to answer the resolution question as NICAM is more akin to ‘floating point’ than standard LPCM. But the result for end users with a superb FM signal and tuner can be assessed via this http://www.audiomisc.co.uk/BBC/FMandNICAM3/FMandNICAM3.html Note in particular that the ‘magazine review measurements’ on tuners tend to be misleadingly optimistic. They just wanted some standard ‘numbers’, not a real assessment which is far more complex to do and understand. In practice you are more likely to notice either: 1) Imperfections due to a poor tuner or poor reception. 2) Level compression applied by the broadcasters. This is rife outwith R3. Mostly due to the actual studio/radio end people, but also to some extent applied at the FM TXs. Bottom line: if you want the best sound, go for R3 320k aac iPlayer, not FM. Nor a commercial closed box ‘net device’. :-) |
Rick Murray (539) 13850 posts |
YouTube, as an alternative, prefers 44.1kHz/24 for their ContentID system. https://support.google.com/youtube/answer/6039860?hl=en
A lot (most?) net radio runs MP3 at 128kbit. I’ve just checked PPN Radio (44100,128). Eagle doesn’t give a sample rate in the request header, but given it runs at 128k, I’d guess 44.1kHz. Others I’ve checked were similar, although a shout out to Heart 80s which appears to be a 44.1kHz AAC stream at 48kbit! I guess it is a trade-off between quality and the ability to receive (plus the costs of bandwidth). Personally, I’d accept a lower quality in order to be able to listen to a station that caters for my taste in music. As for the BBC, mom’s ghost would like to remind you that around these parts, it’s Radio 4 on longwave, so pretty much two tins and a piece of wet string would be better. ;-)
The BBC also have a variable and random approach to sending broadcasts to other countries. My current R4 feed (I rarely listen to it, it was for mom when she wanted an alternative to cricket) is from a URL at llnwd.net. I found this in a web search a few years ago when BBC’s stream player app (about the time they changed from BBC player to BBC sounds, or something like that) threw a fit because my IP address was France…
I think you’d really need a sample (as a WAV or very good bandwidth AAC) that has one signal in one ear and the other in the other ear. Because for all of the pretty charts, it doesn’t compare to directly hearing it. Many years ago when Orange were trying to push the Livephone, they had a test number that played some canned music so you could assess that everything worked. It sounded like music sounds over the phone. After about ten seconds, somebody cut in and said works to the effect of “that’s rubbish, we can do better” and then you got to hear it the Livephone way – which was remarkably better.
I can’t comment on R3 as I don’t listen to it, but I think pretty much anybody born since the early ‘90s will have a messed up sense of what music should sound like, given the horrible job of mastering CDs; and that so many online broadcasters seem to feel that 128kbps is good enough. While some streaming services are better (Amazon Music, 320kbps), I’ve noticed more than a few albums of olden days have been “remastered” to fit into modern aesthetics (read: broken). At any rate (lame attempt to be vaguely on-topic), I would consider any audio system that can’t natively support 44.1kHz, 16 bit audio to be broken by definition. Of course, it should also support, at the very least, 48kHz and 24 bit. :-) It might be worth considering 16kHz, as this is typically used by VoIP. Usefully, it’s easy to translate to 48kHz. |
Ronald (387) 195 posts |
if you want the best sound, go for R3 320k aac iPlayer probably theoretical, but would using the same breed of AAC encoder to decode back into 48/24 for playback be a better fit than other variations? |
jim lesurf (2082) 1438 posts |
Re 24bit from the aac – I can’t say for sure because: 1) I’ve never checked that 2) Nor have I ever asked if the BBC explictly convert to assumed 16bit for the aac. 3) I just play the aac in general. Not analysed it for years except when I compared it with flac during the BBC’s experiment with flac streaming Proms on R3 a couple of years ago. You can check, though, by using a good decoder to give lpcm output as a 24bit file. Then check the noise level. Does it ever fall to, or below, about -95dBFS? If it is always above this, then outputting as 24 bit is probably a waste of space. |
jim lesurf (2082) 1438 posts |
The reasons I recommend avoiding closed ‘net radio’ devices is that you have no control over the quality and if it will abruptly cease working entirely. The situation outwith the UK is complex because in general the BBC are not allowed to provide the 320k aac streams beyond the UK. (With a few exceptions.) Thus you will either get lower rates / poorer codecs or a ‘restream’ from a third party who may not be sanctioned. The UK Government don’t really want the BBC to be able to gain income from allowing access abroad, but have shoved the costs of WS onto them, which they are required to provide. In addition other countries aren’t always keen on letting the BBC in. So this is a political and cost ‘hot potato’ for the BBC. In addition, a fair number of BBC programmes these days are made out-of-house, and the makers would want more money for world access… which the BBC can’t afford when it can’t charge people abroad to hear it. The situation with CD ‘mastering’ is much like radio. Some ‘classical’ (and jazz) is very well mastered. But most ‘pop’ is crap in terms of mastering as they want impact and sales, and to hell with sound quality. |
Jason Tribbeck (8508) 21 posts |
Note that I haven’t gone away – I’ve been writing my thoughts as to whether we need a new API, and the shape of it if we do. Since I’ve been at it all day (in between sanding the walls of my kitchen!), I think I need a bit of a rest, so won’t publish it today. It’s not finished anyway, and I’m certainly not happy about the end of it as it stands, so a fresh look tomorrow will help, I think. |
Ronald (387) 195 posts |
Nor have I ever asked if the BBC explictly convert to assumed 16bit for the aac I recall that in the day of my lowly 48k/24 bit device the view was to record and process in 24bit, and release final material as 48k/16bit. |
jim lesurf (2082) 1438 posts |
Yes, I’ve never asked about it, but it seems rational to use 24 bit for recording and storage/intermediate purposes, but then dither/shape down to 16 bit for final use. FWIW I tend to capture 96k/24 and work on that. Then shape down to 16bit, either 96k or 48k. |
Jason Tribbeck (8508) 21 posts |
Okay – I’ve now written my thoughts on a new API. I’ve been a Software Architect a few times at work, and this is an extension of that. |
Andrew Rawnsley (492) 1445 posts |
Sorry Jason, but for me that URL isn’t working – server’s IP address could not be found. However (a bit of trial and error later) the following does seem to work… http://www.tribbeck.com/index.php/download_file/2208/0 |
David Feugey (2125) 2709 posts |
Very interesting :) |
Jason Tribbeck (8508) 21 posts |
Note: I made a mistake on input terminals and output terminals for USB – I’ll correct it later today (I corrected my understanding towards the end, but didn’t change it when I started looking at it). Thanks, Andrew for that – I have a ‘www2.tribbeck.com’ at home which masquerades as my normal site. I hadn’t realised that the laptop I was using last night was still using it (my desktop, where I wrote the other documents) uses the correct one. The actual URL is, as you’ve stated: |
Colin Ferris (399) 1818 posts |
Any chance of updating the 8bit sound generator to handle 16bit sound samples. |
jim lesurf (2082) 1438 posts |
I’ve had a read though Jason’s API doc. A lot of it is ‘above my pay grade’ in terms of various details, so I’ll have to leave that to people who have a clue about the requirements. Prompts me to add some general comments though, without being sure how relevant/appropriate they are… I think it is a good idea to adopt 32bit values as the basis of buffers, etc, on the basic data level. Its is, I guess, easy simple to use these for 8/16/24 bit values simply padded. And that’s what most modern USB devices do anyway for LPCM transfers. I’d raise a concern of audio pros and enthusiasts. This is the wish to have the ability avoid an form of ‘mixer’ that risks changing the values without you knowing. There is a clash here because many general users may want to hear a ‘bong’ to flag and event – e.g. when mail arrives *even when they are playing some music, etc. But for serious use this is a problem. Hence although in some situations having two sources ‘mixed’ for output is fine, in others you’d want the system to ‘block’ on an “I’ve started something else that I’ve not finished yet” basis. ALSA also provides two forms of spec for play/record. One uses a ‘plughw’ spec. This applies “any necessary conversions” to get output. ‘hw’ applies no conversions and just ‘passes the parcel’. If the output target can’t cope it just fails. Again as with the above, which you’d use depends on the user’s intent. (FWIW having commands for these is also handy when checking a setup. The ‘aplay’ and ‘arecord’ commands are worth a look at as they do provide a way for a user to sort out problems. A parallel for RO might be nice. Good idea to allow for a range of formats, not just LPCM. But I wonder if it would be wiser initially to focus on LPCM, and then add other codecs later on having allowed a space in the way the API is built for it. Sample rate conversions to aid ‘plughw’ type of use is a good idea. But raises the question of ‘how good does the rate conversion need to be’? The reason being that a good conversion requires more processing in general. Changing the number of bits per sample is relatively easy compared with going between 44.1k and 96k and doing it well! And on Doze/Linux I’ve lost count of the number of people who’ve been doing 44.1k/48k conversions needlessly because their system did it by default. (And it is hard to get right, and generally unecessary these days.) |
jim lesurf (2082) 1438 posts |
One other comment. In general USB tends to transfer stereo LPCM. HDMI is much more likely to be used for the formats like ac3, surround formats, etc. But HDMI also uses LPCM, so this is the common denom. |
Colin (478) 2433 posts |
I’m still formulating an opinion but the standouts on first reading for me is 32bit path and mix everything for PCM. I would want neither. The output from SoundDMA – which is the output of all riscos sounds – is basically cd quality. The output from SoundDMA is where the new api starts. To have the current soundsystem working with the new api we just need to point it at a compatible device which is the users decision. I have a usb device with a 16bit resolution and 2 byte sample size, another with 24 bit res 3byte sample size in usb1 mode and 24bit res 4 byte sample size in usb2 mode. I would consider all to be compatible devices. One matches sounddma output exactly the others just need sample size conversion which is relatively trivial has no impact on the sound and absolutely necessary to play the 2 byte stream. It seems silly to change soundma output to 4bytes – I’ll use sample sizes as bit resolution has only a loose connection to sample size – only to downsize the sample to 2bytes – remember mixing has already ocurred for legacy programs so not is required As a first phase I don’t think mixing/processing is required at all. If we don’t write new audio processing units what have we achieved – even if nothing else gets done. 1) the user can select from any compatable devices for legacy audio. So DigitalCD for example can be used with non hal devices and have the same features as it has now. 2) We have a new api to give an application access to any stream on any device we have drivers for with the device being selectable by the user. The desktop interface for user selection would be provided by the new api for consistent user experience across apps. The application would have a way to avoid ‘system’ processing 3) device drivers have an api to write to to make a device work with any legacy/new program that is compatible with the stream format. 4) api extendability to cope with adding devices with alien stream formats and any program using the new format would have a way to select between similar streams. So in a first phase you have mixing using the old api and a new api with a common interface to all devices. Problems: 1) you can’t use the new api and mix/have the system do whatever it does to the input stream to get sound. So you’ve done phase 1, it’s working and its extendable. Phase 2 can be about mixing and processing. |
Andrew Rawnsley (492) 1445 posts |
Whilst I’d like to thank Jim and Colin for two very considered replies/posts above (I think they both have a lot of really meaty goodness in them – the tricky bit is pulling it all together!), I’m a bit worried that the topic is derailing from “Audio Recording API” into “improving audio output”. Both are worthy goals, but I’d personally prefer to see the audio input side solved first, as we have nothing on that front right now, vs at least some solutions on the output side. I’m just getting a “worry vibe” that the audio input side could get delayed/derailed/aborted because we can’t settle on how to handle the output side! |
Colin (478) 2433 posts |
The audio recording api falls out virtually automatically from the audio api as there is no legacy and is essentialy just about saving the format generated by the device and having a standardised api/selector. The problem comes when you want to play what you have recorded which is why the discussion is centring on playback. Unless of course someone thinks something has been missed/would like to see. One thing I would like to see for both recording and playback is an api for application mode. The buffer system proposed wouldn’t work from a desktop app – just like the current system doesn’t. You wouldn’t use it for a robust audio application because of the problem with risc os multitasking but someone may want to hack something together that is good enough and in that situation playing and recording should be no more difficult than loading/saving a file. |