How to do USB audio

1289 posts, 35 voices

Pages: 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 ... 52

Nov 11, 2013 3:19pm Dave Higton (1515) 3526 posts	Choosing a device. What criteria can the user use to match the device. As you note, there is only so much information available from USB. (You’ve seen nameless devices; so have I; I also have a device where one of the outputs is a “generic digital output” which seems like an oxymoron to me.) Internally, i.e. between software components, I think the only reliable one we can use is the USB device name, e.g. “USB12” – and we have to recognise that it’s volatile, i.e. it only persists as long as that devive remains plugged in. As to how to choose the format: that’s something I was about to start posting about. The existing sound API allows all the sample rates, for example, to be listed. With USB it’s a little bit more complicated. Some sample rates are actually a range, i.e. you can pick absolutely any sample rate between (say) 4500 Hz and 48010 Hz. And yes, there are real devices out there that will accept samples at below 5 kHz. I have one. That same device also has different sets of sample rates for the same resolution and number of channels. I don’t know why. So there are two problems: 1) Listing what’s available, so a user can choose; 2) Picking a “good enough” set of parameters to open an endpoint, if an exact match is not available. Both affect the USBAudio API, so I’m interested in opinions and suggestions. As for synchronisation: the USBAudio module has to parse a descriptor in great detail, so my suggestion is that the synchronisation endpoint should be specified to the USB driver as part of the open string. USBAudio could do its part of the job very well (notwithstanding the error that it’s making right now, which I hope to correct very soon – this evening, all being well). For example, one could add a field “sync131;” to use input endpoint 3 as the synchronising endpoint.

Nov 11, 2013 3:23pm Dave Higton (1515) 3526 posts	On the point of device names: my best suggestion is to allow the user to add descriptive device names as environment variables keyed by VID and PID. If they’re there, USBAudio could present them through the existing API, perhaps by using descriptor number 255. e.g. USB$4321-0078 SuperDac on the top shelf

Nov 11, 2013 3:56pm Colin (478) 2433 posts	Why the concern over synchronisation? Its working now. An asynchronous out interface has 2 endpoints, a data endpoint and a feedback endpoint open the data endpoint and the feedback endpoint will work. You think there is a problem? The problem about names for the user to choose from is not that they could change them but that they need to identify them in the first place. I’m also concerned that you don’t consider the need to select terminals instead of devices.

Nov 11, 2013 4:35pm jim lesurf (2082) 1438 posts	Some sample rates are actually a range, i.e. you can pick absolutely any sample rate between (say) 4500 Hz and 48010 Hz. Any yes, there are real devices out there that will accept samples at below 5 kHz. I have one. That same device also has different sets of sample rates for the same resolution and number of channels. I don’t know why. As an ‘aside’ to the main discussion I can say that the ability to take a range of rates is fairly common in some situations. e.g. The BBC have such ‘asynchronous’ digital-to-digital convertors in line with some feeds. This to avoid having all the audio sources and outputs of the BBC needing to be phase locked to just one master clock somewhere. The usual method is to ‘upsample’ to a far higher rate, then have the output pick the series out that it wants from the upsampled series. However you still need to specify a ‘sample rate’ to enable the upsampling to know what filter settings to employ. Jim

Nov 11, 2013 4:53pm Dave Higton (1515) 3526 posts	You think there is a problem? I thought you suggested above that the parsing for the synchronisation endpoint could not be relied upon. It’s not so much that I think there’s a problem, though, as that I don’t think the lower levels should need to parse the descriptors; but USBAudio does, and the synchronisation information can be got out easily and reliably. The demarcation between the layers looks neater to me if handled that way. I’m also concerned that you don’t consider the need to select terminals instead of devices. What seems to happen is that the terminal is selected by means of a selector unit. Yes, I need to add some means to list the available terminals, and some means to select them. (Note that there is no consistency in the naming of the endpoints. I ask you to consider what a “generic digital output” terminal is.) The stuff that you and Jim have doesn’t appear to possess this functionality, but one of the cheap boxes I have does.

Nov 11, 2013 5:58pm Colin (478) 2433 posts	I thought you suggested above that the parsing for the synchronisation endpoint could not be relied upon No. Theres no problem. Theres plenty of other stuff that shouldn’t be in the USBDriver so selecting the synchro endpoint just added to the list – it was trivial anyway once the main endpoint is known. Isochronous is not going to be all singing all dancing with the DeviceFS interface. Its not flexible enough. The programming interface I’ve been trying to add – before I got sidetracked into audio – would allow you to do stuff you can only do inside the USBDriver at the moment without the risk of breaking something. If it wasn’t for the fact that USBDriver handles all USB data storage devices I would have submitted it to ROOL long ago. More esoteric feedback techniques where one feedback endpoint services multiple data endpoints will have to be done outside the driver. as will any different requirements for other device classes. What seems to happen is that the terminal is selected by means of a selector unit. Yes, I need to add some means to list the available terminals You are just adding 2 layers of selection 1 to select the device and another to select the terminal in the device which may or may not go through a selector. Yes a device may have multiple outputs and a single usb input stream but it may have an input terminal for each output terminal. The selector UNIT can be hidden by the module.

Nov 11, 2013 9:08pm Dave Higton (1515) 3526 posts	I’ve been thinking some more about isochronous endpoints with feedback, and reading the USB 1.1 document again on that topic. The feedback mechanism is generic to isochronous, i.e. not at all related to class of device. Parsing of the descriptor, as you have done, is class specific. So, while the driver you have done will work for an audio class device, it won’t work for a Bluetooth audio device (for example) that requires synchronisation because the dongle that provides the host end of the link has isochronous endpoints but is not an audio class device. I really think that: the class-specific parsing should be done by a class-specific driver module; the generic synchronisation of isochronous endpoints should be handled in the lower levels of the USB stack; the class-specific driver should pass the synchronisation endpoint address to the USB driver when the isochronous endpoint is opened. If you do it any other way, you’re blurring the responsibilities of the various components. You would require the USB driver module to be updated to handle any other class of device with an isochronous endpoint that required synchronisation. If you keep the functionality separate and properly layered, the USB driver remains the same. You can add a class driver, or write an application for the device, either of which calls on the generic synchronisation functionality provided by the stack. Don’t you agree?

Nov 11, 2013 9:53pm Colin (478) 2433 posts	Totally agree. But the callbacks from the USB transfers should also be in a class specific module. Having lots of special field options which are class specific is crazy. Best ignore DeviceFS altogether that way class drivers can be independant with control at the lowest level and easily replacable. I made Class 1 Audio a special case. I don’t intend to add any more. Hopefully a new interface will make the DeviceFS interface redundant. Consider if we had been doing this before Audio Class 2 came out. If I added special fields for everything and class 2 came out then the Audio module could not be made to work without changing the usbdriver. If we didn’t have access to the usbdriver source you could never write a class 2 driver. We really do need modularity but it needs to be true modularity. I should add that I don’t mind any one else bodging the DeviceFS interface in USBDriver but I can’t bring myself to do it.

Nov 12, 2013 1:31pm Dave Higton (1515) 3526 posts	But the callbacks from the USB transfers should also be in a class specific module. I’m not sure what you are referring to with callbacks; do you mean what is required in order to automate file-to-stream transfer and put it into the background? I should add that I’ve never tried using callbacks, so I’m unfamiliar with them. Having lots of special field options which are class specific is crazy. You’re right; it would be indefensible. What I’m suggesting is that the string used to open a stream should contain some strings that specify the related synchronising endpoint in a generic way. How about: syncep131; and there’s room for some discussion here. In my example above, 131 is endpoint 3 IN, with the extra 128 specifying the direction, exactly as USB does it. But I think it would be equally sensible to say: syncep3; and let the USBDriver assume that the direction must be opposite to that of the main isochronous endpoint. But I also can’t help noticing that USB 1.1 states that the synchronising endpoint is of type interrupt, but these DACs that you and Jim have contain an isochronous synchronising endpoint. It’s not a big deal in principle, as both interrupt and isochronous modes are there to guarantee bandwidth. The question is whether the USBDriver needs to know what type it’s being asked to open. The synchronising endpoint also has a delay parameter. Do the lower levels of the USB stack need to know this value too? The worst case would be something like: syncep 131,3,5; meaning endpoint 3 IN, isochronous mode, delay 5 milliseconds (please forgive me if I’ve got the middle value wrong – this is only an example of what we might do). Treating the information like that is completely class independent. It would be easy for me to add such fields to the USBAudio module. What do you think of this suggestion?

Nov 16, 2013 7:11pm Dave Higton (1515) 3526 posts	I’m thinking about how to find out sample rates. The simplest solution seems to be to call USBAudio_EnumerateSampleRates with a number of channels and a resolution (either of which, but not both, can be zero to specify “any”), a direction and an empty block. The block is returned with a list of sample rates. There has to be a bit of formatting in the list because the answer can come back as a range, a list of discrete frequencies, or possibly both. Is there a simpler way to do it? The number of parameters that has to be involved suggests to me that they should be passed in in the block.

Nov 16, 2013 7:16pm Dave Higton (1515) 3526 posts	And while we’re at it, there has to be a way to enumerate the available resolutions for a given direction and number of channels.

Nov 16, 2013 7:59pm Colin (478) 2433 posts	My favoured method at the moment is the equivalent of `handle = USBAudio_findOut(numchannels, bitResolution)` which finds an endpoint with numchannels and a bit resolution `>=` bitResolution. ie if bitResolution is 16 it will match a 24 bit endpoint. `fsHandle = USBAudio_open(handle,frequency)` so your enumeration would be `more = 0 do frequency = USBAudio_EnumerateSampleRates(handle,&more) while more != 0` mute would be `USBAudio_setMute(handle,on)` etc.

Nov 16, 2013 8:51pm Colin (478) 2433 posts	Enumeration of frequencies can’t work as I’ve shown you need to return a frequency range and stepsize. See Clock Source Control descriptor for Class 2 audio. May as well return the list returned from GET RANGE which returns `struct { short number_of_entries; struct { uint32 from,to,stepsize; } frequency[]; }` The Class 2 device I have returns 6 frequencies `6 44100 44100 0 48000 48000 0 88200 88200 0 96000 96000 0 176400 176400 0 192000 192000 0` It may be better to just select the one you want and return the nearest ie have `frequency = setFrequency(handle,freq); fsHandle = USBAudio_open(handle)` Then you can try a couple of frequencies and find the one you want by trial and error.

Nov 16, 2013 9:17pm Dave Higton (1515) 3526 posts	Don’t forget there is a direction too. I feel uneasy about opening a stream to an endpoint whose resolution is greater than that requested. If you have 16 bit data, you need to treat those data very differently if the device’s resolution is 16 bits or more than 16 bits – i.e. 16 bits and you simply pass the data on, more than 16 bits and you have to pad out every sample of every channel. In your scheme you don’t even return information about the mismatch.

Nov 16, 2013 10:09pm Colin (478) 2433 posts	In your scheme you don’t even return information about the mismatch. you just ask for it and decide to accept it or not. `USBAudio_getBitResolution USBAudio_getSubSlotSize` Note subSlotSize (size of channel in bytes) may be greater than (bitResolution + 7) DIV 8 The device I have has 24 bit channels and 32bit subSlotSize I should add that subSlotSize is the number of bytes used to transport each channel

Nov 17, 2013 10:27am jim lesurf (2082) 1438 posts	At what level/stage are you thinking things like sample size of rate conversions would be applied when needed? In the wimp app, or the modules? I’ve been assuming that your modules will just establish what is available and then only perhaps do simpler changes like 16 → 24 by byte padding whilst leaving ‘harder’ conversions to the application. FWIW In practice I suspect that almost all decent-spec USB DACs will be OK at 44.1k/48k and the integer multiples.¹ So when playing such files you’d ask for the rate which matches the file. Beyond that, I assume it is down to the playing application rather than the module layers to decide what to do? Ditto for resolution. Is your plan to deal with resolution (or rate) alterations in the playing app or in a module? When reducing 24 bit input to play on a 16 bit DAC you’d have a choice to make about the method. The crudest is to just hack off the lowest bytes. But better would be to employ at least dither or shaping, or if possible, a mix of these and upsampling. I guess that anything more complex that hacking off the bytes would be better done at the application level than in a module. But you’re obviously better placed than myself to judge that!² Jim ¹ OK, 32k is also an industry ‘standard’ rate. But rarely used beyond some special areas. You’re unlikely to find many 32k files on audio enthusiast machines. And the chances are that anyone with such files also knows how to do their own rate conversions! :-) ² That said, it seems unlikely anyone with a 16bit DAC will be wanting to collect and play 24bit files. Given the full metal jacket of dither, shaping, and upsampling it can be done well, but if they are keen on 24bit they’d almost surely have a 24-bit DAC. So the need for 24 → 16 conversions is probably less likely than 16 → 24.

Nov 17, 2013 11:07am Colin (478) 2433 posts	I feel uneasy about opening a stream to an endpoint whose resolution is greater than that requested If I want to play audio I want to play this file on that device. You are not trying to answer the question: Can I play this file natively on any device? Which is what you are trying to do – I think. So as a user I need to select the device. I then need to find the most suitable endpoint on that device and if you find a higher bit resolution it is suitable as you can use the endpoint to play the file losslessly. If the device has 2 Input Streaming terminals then just finding the first endpoint suitable on the device is not good enough as the 2nd output would never be found and if your speakers are plugged into the 2nd and your headphones the first data will always be routed to headphones. Deciding on the device by class is only useful if you only plug 1 audio device in at a time. The only real way to decide which device to use is matching Vendor,Product,Version,Output terminal and that fails if you plug in identical devices and you want to use the 2nd device.

Nov 17, 2013 11:14am Colin (478) 2433 posts	Jim. whilst I’m not doing the module I’d expect the module to do no conversions. Its trying to match your requirements to an endpoint on a device (an endpoint is where you actually send the data).

Nov 17, 2013 12:50pm jim lesurf (2082) 1438 posts	Jim. whilst I’m not doing the module I’d expect the module to do no conversions. OK, that seems quite sensible to me. In principle the process should be to discover what rates and sample sizes are acceptable by the DAC, then let the playing app decide what to do if none of the ‘DAC values’ matches the file to be played. WRT sizes: Its trivially easy to go 16 → 24 bit. And although decent 24 → 16 is harder it isn’t likely to arise much for the reasons I gave. If needed simply dumping the LSBytes would probably pass muster as anyone that keen on 24bit audio quality would be using a 24 bit DAC anyway. WRT rates: Again, anyone serious about quality will be playing files with rates in the 44.1k/48k ‘families’ and ensuring their DAC can play these ‘native’. As you can see from your own DAC these days it is becoming taken as normal that a good USB DAC will do the multiples up to 192k/24bit. Other cases may be OK with either linear interpolation or a simple upsample. The advantage of USB DACs here is that the user can choose. So they can get a DAC that covers what they want ‘natively’ and minimise conversions. Thus the only common situation where a conversion is needed for those serious will be playing 16bit files on a 24bit DAC – which is about the most trivial conversion to get right with minimal processing. I’d assume any playing amp can do that trivially if the Modules don’t do it. FWIW I only left out ‘promotion’ from 16 to 24 in my Upsampling demos because there is currently no way to actually get 24 bit values though SoundDMA into the (24bit) DAC on a PandaBoard. If anything it would be easier to have it than not if SharedSound/SoundDMA handled it! Jim

Nov 17, 2013 1:32pm Colin (478) 2433 posts	If you wanted to use the pandaboard dac fully It would probably be easier to rmkill the risc os sound modules and deal with the hardware directly. Anyway, you’ll be pleased to know that I’ve heard 96000/24 music over USB2. Can’t get the feedback working and its flakey but getting nearer. I only left out ‘promotion’ from 16 to 24 in my Upsampling demos Padding isn’t just required for increasing the file’s bit resolution to the device’s bit resolution ie 16 → 24. The 24 bit DacMagicXS I have requires 32 bits per channel in USB2 mode (24 in USB1 mode) even though its bit resolution is only 24 bit so the file’s bit resolution needs promoting to 32 bit.

Nov 17, 2013 1:56pm jim lesurf (2082) 1438 posts	Padding isn’t just required for increasing the file’s bit resolution to the device’s bit resolution ie 16 → 24. The 24 bit DacMagicXS I have requires 32 bits per channel in USB2 mode (24 in USB1 mode) even though its bit resolution is only 24 bit so the file’s bit resolution needs promoting to 32 bit. That’s interesting. It makes me think they’re planning to allow for 32 bit input at a later date. That in turn may help eventual DSD/DXD input. It becoming more common for the DAC chips to work internally with 32 bit values. Does the XS have a front panel legend to indicate this? The Pro and 851C have “24/384” to indicate the internal size and rate used. I’ve been expecting these numbers to get bigger with later models. Anyway, you’ll be pleased to know that I’ve heard 96000/24 music over USB2. Can’t get the feedback working and its flakey but getting nearer. :-) Is that – so far – on an Iyonix? Jim

Nov 17, 2013 2:36pm Colin (478) 2433 posts	Is that – so far – on an Iyonix? Yes Does the XS have a front panel legend to indicate this? That amused me :-) given that it’s only the size of a matchbox but half as thick the term ‘front panel’ doesn’t really fit. It doesn’t even say ‘Swan Vesta’ on it. If you can be bothered to switch one of your devices over to usb2 you may like to try USBDescriptors.zip to see what subslotsize your DACs expect in USB2 mode. USBDescriptors.zip will now read Class 2 audio and after running it a search for “subslotsize” will show what it expects with the bit resolution just below it – for me they are 4 and 24.

Nov 17, 2013 4:15pm jim lesurf (2082) 1438 posts	That amused me :-) given that it’s only the size of a matchbox but half as thick the term ‘front panel’ doesn’t really fit. It doesn’t even say ‘Swan Vesta’ on it. Ah! I hadn’t realised it was that small. The Chinese must have even smaller electrons than the Japanese. :-) If you can be bothered to switch one of your devices over to usb2 you may like to try USBDescriptors.zip to see what subslotsize your DACs expect in USB2 mode. USBDescriptors.zip will now read Class 2 audio and after running it a search for “subslotsize” will show what it expects with the bit resolution just below it – for me they are 4 and 24. Did I not do that when I switched my ‘Plus’ to Class 2 some time ago? I thought I had. I’m happy to switch it to Class 2. However given the rigmerole it gave me getting it back to Class 1 I’m delaying that until you and Dave are happy that you’ve sussed Class 1 and I can start testing Class 2 without needing to switch it back again. FWIW From then on I’ll use the Halide Bridge into the DAC for Linux and the Class 2 input of the Plus for RO. :-) Do you think Class 1 is now ‘done’ so far as the tests I need to run are concerned? If so, I’ll switch the DAC to Class 2. Jim

Nov 17, 2013 6:52pm Dave Higton (1515) 3526 posts	Interesting discussion. Lots of things to respond to. When reducing 24 bit input to play on a 16 bit DAC you’d have a choice to make about the method. The crudest is to just hack off the lowest bytes. But better would be to employ at least dither or shaping, or if possible, a mix of these and upsampling. There is nothing you can do that is better in any way than hacking off the bottom 8 bits. You can do worse; dithering is one example. All it does is add noise. Truncating allows the DAC to output its closest approximation to the original signal. Rounding is no better than truncating; the difference is a DC shift of half an LS bit. Inaudible; it’s equivalent to a change in the atmospheric pressure (and much smaller than the change in pressure between good weather and bad weather – but I digress!) You have to look at it from a mathematical perspective. Upsampling permits an easier design of analogue reconstitution filter. At what level/stage are you thinking things like sample size of rate conversions would be applied when needed? In the wimp app, or the modules? It could be an a Wimp app, or it could be in a module – but it doesn’t belong in the USBAudio module. Its application is not specific to USB. Think sensible layering here. But there is a big question about what hardware should do any complicated rate conversion. If you have a DSP available, it would be nice to use it. I’m not familiar with the performance of the VFP or similar units in some later ARM processors, but I am familiar with sample rate conversion in the general sense. It requires large numbers of multiply-accumulates per sample, so be very afraid. To answer Colin’s points: I’m trying to replicate some of the functionality of earlier modules, such as SoundDMA, which allows reading the sample rates. Earlier modules didn’t allow much control over resolution as there wasn’t much point – the hardware does what it does and can never change. USB devices can. As for devices with multiple endpoints: has anyone found one? I have a device that supports up to 8 channels out, and supports input from line, microphone and S/PDIF. There is no choice as to output terminal. The input terminals all feed into a selector unit, whose output in turn feeds one streaming endpoint. I’m all for supporting multiple endpoints in a device, but I’d like to be shown that it is more than a theoretical possibility. I suspect that, in reality, no-one would try to support multiple endpoints because it’s not much practical use. Why would two independent sound sources be used simultaneously in close proximity? And if you do, why wouldn’t you just have two separate USB devices?

Nov 17, 2013 7:01pm Dave Higton (1515) 3526 posts	Class 2 support: I only began to read the audio class 2 specification yesterday. I would be happy to add support, but there is the problem of testing; I don’t have a class 2 device. However, I’ve aded a way to test from arbitrary device descriptors, so I would very much appreciate receiving some dumps of same. If you can run the app from http://davehigton.me.uk/Audio/ModuleTest1.zip in a TaskWindow and send me the result, that should do the job.