Audio: future improvements?

61 posts, 13 voices

Pages: 1 2 3

Jul 27, 2013 11:55am jim lesurf (2082) 1438 posts	I’d like to introduce some ideas for possible improvements in the ways RO can handle audio. There are a number of aspects to this which center around three things: mixing, resampling, and quality. We now have in place some foundation for a general audio ‘mixer’. This lets a user use more than one sound source, and output the results via more than one device. Useful for flexibility. However unless regard is taken to some points this can also cause problems. For example, if the system is expected to play two sound inputs symultaneously which have different sample rates. This means that at least one of them will need resampling/interpolation. Unfortunately, one of the hardest resamplings to do well is between 44.1k/48k – the two rates that have been most common. At present RO software, etc, tends to use linear interpolation which does such a conversion quite poorly. The above also – as exampled at present by the ARMiniX/PandaBoard – have implications for trying to do something like play a source at 48k using a machine (mixer/system) rate at 44.1k. From these examples mixing tends to require fairly good resampling processes if the quality is not to suffer. OTOH some users who are concerned about quality will wish to avoid mixing. If for not other reason as to avoid ‘noises’ cropping up when listening to music. This means a mixer may need to be easily setable to ‘block’ any other sources when one is currently being played out. The above isn’t simply a matter of not hearing unwanted noises. There is also a related question of what happens to the gains and risk of clipping if a new source starts. When that happens you may either require gains to be adjusted to avoid the sum clipping, or have the combination clip. There is, of course, more to this. But it means that flexibility means we may need to think about a bigger range of situations. Introducing and using a mixer means some things can occur which can’t with a setup that only has one source and one output at any time. At the heart of this I suspect are the following issues: A) Choice of system playout rate. And if this should be fixed, or ‘follow the source’. B) Ability to choose between blocking and mixing. C) Ability to choose between scaling (volume controls) and pass-the-parcel from source to output so that the output device gets sound data that has not been altered along the way once it has been converted to LPCM. The ‘follow the file’ rate point seems significant to me, as exampled again by the ARminiX/PandaBoard. At present, 44.1k material plays best with a system rate of 88.4k, whereas 48k material (and 96k material which is increasingly available) playes best with a system rate of 96k. Use the ‘wrong’ rate and the linear interpolation tends to put anharmonic aliasing down in the audible range. This is avoidable if the player automatically switches the system rate to ‘follow the file’. But in turn that may be a problem for a ‘mixer’ that may be confused by having the system rate suddenly change. There are some other issues which I suspect mainly concern audio drivers. e.g. ‘gapless playback’ which has become the norm elsewhere. This matters for some kinds of music. And of course beyond this are 96k/24bit playback and ‘surround’. All of which makes me feel we need to think about the roadmap for this area and plan out an extended API, etc. Be interested to see any reactions. :-) Jim

Jul 27, 2013 3:43pm Steffen Huber (91) 1953 posts	I am following the whole computer audio stuff only with half an eye, but my impression for a number of years is that everybody from the ambitious home user to the professional don’t really care about analogue output from the computer. The presence of all kind of electromagnetic radiation inside a computer makes it very hard to create a high quality analogue signal. So everyone has gone digital. So my prio #1 would be to enable digital audio output in a “straight” fashion – i.e. supply whatever digital data goes into SoundDMA to the digital out – PandaBoard and Raspberry Pi should be able to do that via HDMI, not sure where that leaves the BeagleBoard. For volume control, rely on the external device which handles the DA conversion. Bonus points for adding output control for the user (i.e. being able to specify which kind of signal is allowed for output). My prio #2 would be to do audio via USB. Prio #3 to #n: revive the RISC OS software market to create powerful audio software. Prio #n+1: fix whatever strange things happen to analogue audio But overall, I think there are many more pressing things wrt RISC OS than the whole sound issue. E.g. porting Firefox (or helping the NetSurf guys to speed up JavaScript development). Even porting Java – Oracle is now supplying Java 8 previews for various ARM platforms, the ARM HotSpot JIT is getting faster all the time, and JavaFX is a seriously nice GUI toolkit. It just currently lacks a RISC OS Look&Feel :-) Sorry Jim, I guess that is not the kind of reaction you are looking for :-)

Jul 27, 2013 4:15pm jim lesurf (2082) 1438 posts	I am following the whole computer audio stuff only with half an eye, but my impression for a number of years is that everybody from the ambitious home user to the professional don’t really care about analogue output from the computer. The presence of all kind of electromagnetic radiation inside a computer makes it very hard to create a high quality analogue signal. So everyone has gone digital. I’d agree on the whole. Those serious about audio have tended to migrate to using USB DACs fed via asynch/iso transfer with sample-perfect data that has been ‘pass the parcel’ delivered from the source file or stream. However there are also people who still find an ‘all in one box’ attractive. I suspect that for them, a box as small and potentiall quiet (mechanically) as the ARMiniX/PandaBoard with its SD and USB sockets would be attractive. In the end, if it works well, it would find some users. After all it is far cheaper (and smaller!) than many of the top end USB DACs. And the measurements I’ve made so far show it doesn’t really suffer from the kinds of clock, etc, hashes that can concern people. I’d agree with your point about HDMI being a potentially good idea. But that also has a bad name with audiophiles who suspect it of high jitter levels. So whilst Ok for home theater it is unattractive to the same people who’d rule out analogue on the assumption of problems of the types you mention. To me, your priority #2 would in an ideal world be my #1. But no-one seemed to even be willing to take that one when I tried to draw attention with a small bounty. Too much work, too little interest, too little money. However the measurements I’ve made would already stand up to competition wrt 96k/16bit, which is pretty close to being home. Particularly given that the ARMiniX chip is 24bit, so this is a HAL/API problem from the RO POV. 96k/24 is the current ‘standard’ for serious computer audio. I’m not surprised by your reaction. Its pretty much what I’d expect. But I’ll just point out that I’m fairly familiar with the serious domestic audio market and the required engineering. I think the new machines running RO could find a niche there. And although only a minority of people spend seriously on audio, there are I suspect rather more of them at present than committed RO users. So although small wrt the general population, they could bring a lot more users – and cash and interest – to RO if attracted. Whatever, the reality is that people using RO will want to play audio. And the shift is already under way to > ‘CD quality’ computer files. Given the movement I don’t really see why we should simply mimic the failures of the past to spot such changes and opportunities when the hardware is clearly capable. I must admit I’m puzzled by the idea that because we’d like some other things, we should simply ignore this. What I’m suggesting is thinking about the API and roadmap. Not implimenting 384k/32bit/multichannel by next week! If we don’t, new users who come for new reasons won’t materialise because we haven’t realised that they may want things which aren’t the same as the wishes of long-term RO users. And that a compact, low-power, quiet platform can have an edge here. (I also suspect it would be useful for other ARM based machines.) I fear you may be falling into a rather inward-looking view of what is in essence ‘what many existing users are thinking they’d like’ rather than looking out of the window. :-) And I’d suggest that dealing with the immediate problems like clipping or system rates should be a rather easier thing to deal with than porting an up to date FF with all the trimmings. However that isn’t what I was raising here as it is a short term – relatively minor – snag. My point is that it will take time to decide what the roadmap/API should be, so we won’t be caught out later on. So I’m not surprised by what you say. I just hope, though, that people will take a step back and think of this from a fresh POV of potential users who have different requirements to most of those already ‘inside the tent’. Getting new users may be a different matter to getting some of the things established users who’ve adopted RO anyway would like. May mean some new thinking… :-) Jim

Jul 27, 2013 4:21pm Jeffrey Lee (213) 6048 posts	So my prio #1 would be to enable digital audio output in a “straight” fashion – i.e. supply whatever digital data goes into SoundDMA to the digital out – PandaBoard and Raspberry Pi should be able to do that via HDMI, not sure where that leaves the BeagleBoard. BeagleBoard (and PandaBoard?) have headers available which expose the I2S data stream from the SoC (i.e. the data SoundDMA is producing). Of course that would require you to fit your own I2S-compatible DAC, and your own mixer hardware (since you’d be bypassing the onboard audio chip which is in charge of the mixing). Plus you’d probably need some kind of control signals to tell the DAC what sample rate it should be using, so maybe you’d need something wired up by I2C as well. So, it’s not quite the same as a standard consumer-friendly optical/digital audio output, but it’s definitely a way at getting at the pure, unmodified digital signals.

Jul 27, 2013 4:28pm jim lesurf (2082) 1438 posts	Having a digital output (particularly optical to avoid loops) can be attractive. But I suspect the reality these days is that people will divide into three categories: A) Use a USB DAC B) Use analogue output C) Use HDMI Interest in spdif is fading given (A) which is available over a wide range of price and performance as well as size, etc. I’d like to see all three. Indeed, my real preference would be (A). But given the clear lack of work on (A) my feeling is that the other options make most sense as things stand. But bear in mind that audiophiles are wary of HDMI. So we should not assume that would be taken as ‘better’. But regardless, none of that changes my points about API, etc, for things like mixing vs blocking, 24b source material, etc. That applies however the output emerges. That’s what I was raising here. Jim

Jul 27, 2013 5:06pm Colin (478) 2433 posts	A) Use a USB DAC B) Use analogue output C) Use HDMI Aren’t A and C effectively the same. They are just different paths to send isochronous digital audio to an external DAC aren’t they? And shouldn’t B be the same data delivered to an internal DAC?

Jul 27, 2013 5:30pm jim lesurf (2082) 1438 posts	Yes, in theory they all should deliver the same sequence of values. But the reality is that the trend is to use USB DACs for audio. HDMI tends to be used by TV sets, displays, and ‘home theatre receivers’ – which expect surround as well as stereo. As I think people will know well from my attempt in past years – I’d love to see RO support USB asynch/iso audio. It would immediately free us for any HAL porting concerns about audio provided the USB is ported (which would be essential anyway!) HDMI is disliked because measurements show that a lot of HDMI kit has timing problems orders of magnitude bigger than spdif or USB. There is no theoretical reason for this having to be the case. But HDMI is ‘general consumer market’ whereas the iso/asynch USB audio method was explicitly developed for high quality audio purposes. Hence there are differences in practice which theory tells us need not exist. Yes, B should be the same. But a lot of consumer grade computers skimp on audio. Afraid I’d include the Iyonix as a RO example in that. However ARM based hardware is already used in audio, and on a range of devices people are using. So RO could gain a better take-up here. I need to re-distinguish two things though. 1) The question of what kind of output to use A-B-C, and the problems of a specific instance like the ARminiX clipping. 2) The fact that we have already introduced ‘mixing’ and being able to play at various system rates into RO. My real point in starting this thread is that (2) already has implications in practice. e.g when someone has two different sound sources playing symultaenously which may have different sample rates and then plays them out. The mixer will have to run at one rate, so resampling is needed. OR the user can get to choose only to play one and ‘block’ the other. This means a user choice of block vs mix. There are similar choices to be made wrt system rate, and if it should ‘follow the file’ or not. And questions like what to do given the DAC is already 96k/24bit capable when the user wants to play a 96k/24 file. The point is that this – and the other issues I raised – do need thinking about. People are doing these things now on other platforms. Simply failing to think about this and make some plans will become a factor that puts new users off. So my point isn’t that we can get the hardware to do all the things by next week. It is that we have to consider them so we are aware of what is going to be needed, and then put in place a suitable API and roadmap so we aren’t simply left behind. Personally, I think it is a shame we have igored USB audio. ¹ But the reality is that many people buy squeezeboxes, etc. A variety of small player boxes, tablets, etc, that can have their own DAC giving analogue output. And the ARMiniX – despite the current snags – shows quite clearly that the results can compete if we ensure we can actually play the music in the ways users (as distinct from computer enthusiasts) expect. In practice it’s no disaster for me if RO misses this boat at a time when people are currently adopting new ways to play music. I can get good results from my Linux-based system. But it seems a shame for RO to lose out if the only reason is that people already involved don’t want to think about it. Jim ¹ Yes, I understand that the USB audio simply may be too much to expect as things stand. Fair enough. I can regret that, but understand why. Doesn’t change my other points though. BTW apologies if someone sees this change. I keep having to edit it as I still find typing into a narrow window a real PITA!

Jul 27, 2013 5:49pm Colin (478) 2433 posts	It seems to me that usb audio may not solve your problems. The obvious way to use it on RISC OS is to write an audio class driver as a back end to the normal sound system. In the same way as SCSISoftUSB is a class driver for mass storage devices and presents to the user as a SCSI device. Just because its a USB backend should make no difference to the RISC OS sound system.

Jul 27, 2013 6:38pm Rick Murray (539) 13850 posts	All: BeagleBoard (and PandaBoard?) have headers available which expose the I2S data stream from the SoC I thought it was only the Beagle xM that had the easy-to-access McBSP? Though it is worth noting that this is I2S; if you need something else (S/PDIF?) then you’ll need additional hardware, which could get messy/complicated as Jeffrey points out. Steffen: My prio #2 would be to do audio via USB. […] Prio #n+1: fix whatever strange things happen to analogue audio What makes you think USB audio will be so good? Essentially, given audio is analogue, you are taking the digital to analogue stage and placing it somewhere else. If RISC OS is seemingly having problems with existing hardware, wouldn’t these be propsgated with USB audio? What needs to be done is for somebody who knows enough about audio to understand what Jim is talking about to look at the audio system to see what is going on. What I suspect would make a big difference is to modify the sound system to be capable of switching the entire system to the sample rate of the predominant data passing, so it would switch down to 44.1kHz if I was listening to MP3s. In other words, it should know what the hardware can do and only resample as a last resort. Likewise, legacy audio should be auto-resampled to match the current pure audio data rate. Furthermore, it might be an idea to pass off the volume controls to the audio chip if it is capable, and avoid controlling the audio in software if there is another option. There’s no point having the audio output on max and stepping down the samples being played – that’ll just introduce noise. That’s my 2p worth. Might be an idea to see how other systems do it.

Jul 28, 2013 8:29am jim lesurf (2082) 1438 posts	Just because its a USB backend should make no difference to the RISC OS sound system. It will if – as I currently think is the case – the problems with clipping, etc, are down to the HAL/OMAP. And the real advantage of USB audio is that it would mean a choice of sound devices with the provision more easily ‘portable’ to each new RO machine. Removes the need to write a specific audio HAL for every new machine. Jim

Jul 28, 2013 8:36am jim lesurf (2082) 1438 posts	What makes you think USB audio will be so good? Essentially, given audio is analogue, you are taking the digital to analogue stage and placing it somewhere else. If RISC OS is seemingly having problems with existing hardware, wouldn’t these be propsgated with USB audio? Audio only become ‘analogue’ at the point when the series of digital info is converted into an ‘analogue’ waveform (pair for stereo). The main problems I’ve been uncovering seem to be with the way the digital data is handled and fed to the hardware. I’d agree that this needs to be done correctly and appropriately whatever hardware is used – inc USB. However the point of USB is that it is akin to the old argument for Java. Write once, run everywhere. If we had a correctly working USB audio interface that would then ‘just work’ on new machines as well as existing ones given the requirement that the USB stack is ported. Which is an essential anyway! However the point of this thread was that I was raising some other questions that I don’t think we can continue to ignore about how audio is already developing. If we don’t deal with them we’d be in a situation akin to if we’d stayed with ‘8bit sound only’ today. And I’ve (regretfully) accepted that no-one is going to impliment USB audio as things stand. Chicken and egg. Whilst no RO users use it, none expect it. Jim

Jul 28, 2013 10:44am Jess Hampshire (158) 865 posts	Wouldn’t it make sense to have a distinction between a quality audio channel and an alert audio channel? The quality channel would be expected to output at the system defined sample rate or at half of it. The alert channel(s)would not have a fixed rate. The user would then choose how to deal with these. So you might send the quality channel(s) digitally to your DAC on the hifi and the alerts might come out of the internal speaker. Or you might mix them, or mix them with the alerts being lowered in volume if there is quality content, or the quality channel(s) might mute the alert channel(s)

Jul 28, 2013 11:31am jim lesurf (2082) 1438 posts	The idea of ‘alert’ and ‘quality’ channels seems a reasonable one in terms of user meaning/distinction and choice. The present assumptions (e.g. in Linux) which RO developers seem to be following seem to be to have one overall ‘mixer’. I tend to think of the choice as being between mixing and blocking as that is a traditional way to make the distinction. In some ways that may be simpler for the API as there is one basic choice mixing versus blocking. Personally, I’d not want any ‘alerts’ to come out of any speakers I could hear at all if I’m listening to music. When I’m doing that I’m listening to music not using a computer. But if someone wants to hear ‘alerts’ then the system should allow that to occur. So again, what seems to really matter to me here is that the user can choose what they prefer. Your approach may make things clear in user terms. But the risk of what you suggest is that it just shifts the way someone chooses mixing/blocking to another part of the sound setup. May simply make things more complicated for the programmers and system. And as soon as we have mixing the system may have to resample in a way that changes with any changes in inputs (and outputs) so it means taking resampling or interpolation more seriously. Blocking systems can find this much easier as they may be able to have the rate follow the file, and either avoid resampling or keep to easy conversions like the x2 rate which is about as easy as they come. (Whereas mixing 44.1 and 48 is a comparitive nightmare to get right. Despite currently being the most common two domestic user rates.) With Linux the basic choice is between mixing and blocking. Typified by a trend to allowing something like Pulse audio to automagically guess what you want, or taking user control to block and go ‘direct’ via ALSA. (Which on Linux is the only way I’ve found that works correctly in the purist audio sense.) Getting Pulse to do something the distro builders assumed you wouldn’t want can be a nightmare best avoided. Although Pulse promises flexibility. my experience of it is that it is hard to do something that the distro builders didn’t think you’d want. Ends up like fighting ‘nanny knows best’. Not something I’d like to see happen to RO. So I’d be wary of adding more ‘layers’ like Linux has. It seems to bring added ‘too many cooks spoil the broth’ opportunities. However I’d be interest to see what others think of your suggestion. I’ve just been thinking of a three of general switches for the user: Mixing/blocking resampling/direct System volume/pass-the-parcel But there may be another approach that gives the same user choice in a more convenient manner. Of course, this still leaves questions like being able to handle 24bit audio… Jim

Jul 28, 2013 12:22pm Jess Hampshire (158) 865 posts	With further thought, the term channel is confusing. (i.e does an audio channel consist of a a left and right channel, for example.) What I was thinking of was the audio streams coming out of programs. It has occurred to me that more that two types are needed. (From the point of view of destination, not content). Possibly the system should define these 1. High quality (Must be at system sample rate.) 2. Telephony 3. Alert 4. Default (Any program defined types that haven’t got a defined destination) In configure you would have these entries (plus any addition ones required by programs) to allow these to be mapped to the available hardware. (Where they go, whether they block or mix and priority.) Programs would be aware if they had been blocked. (So if a phone call blocked media replay, it could be paused, for example, or a system alert could flash an LED perhaps).

Jul 28, 2013 1:28pm jim lesurf (2082) 1438 posts	Accepting the idea of having the added layer of complexity, there are still some snags. For example it may not be possible on given hardware for the ‘High quality’ to always meet “must be at system rate”. (Or, indeed, match in sample size since people are using 32bit for some sampling as well as 24bit.) So some kind of resampling may be needed as a fallback, even if the ideal is to pass though without any alterations if the hardware allows. So we’d still need to resolve how things like resampling are handled at ‘high quality’ – which is quite demanding if you have both 44.1k and 49k involved. Alas, this is where USB audio would help since the best USB audio DACs provide 44.1k/48k/88.2k/96k/176.4k/192k at 24bit (or more). ¹ So allow the user to ensure that almost any music they play doesn’t require any mixing or resampling unless they want to mix two streams/files to hear them symultaenously. Most internal ‘soundcards’ support relatively few rates natively, so need a ‘mixer’ of some kind. e.g. in Linux ALSA the difference between ‘hw:’ and ‘plughw:’ driving. Again, Linux gives the user the choice. (Although the documentation is awful.) An advantage of Linux is that the resampling layer provided by ALSA can work very well indeed. But that means having the host CPU/OS do the work, not leave it to a specialist audio chip. The bottom line on that is that with ALSA if you choose ‘hw:’ with a suitable USB DAC and choose blocking (i.e. no mixer) you can play things pass-the-parcel with the host system needing to do almost no work for LPCM. And get best possible results. But we don’t have the option as things stand… So will have to allow for resampling to be used for some material even when not mixing sources. Some music files these days (e.g. the Beatles USB ‘apple’ stick) may provide something like 44.1k/24bit files, although 96k/24 is actually far more common. It also becomes more complex if some would want an alert ‘delayed’, others, sent to some other output, etc, rather than just being blocked/discarded. Hence although your idea makes sense in terms of user-level descriptions and labelling, it may make getting a result harder. However I can see the logic and the attraction. It is still based on getting blocking, mixing, etc, options working well under the skin, though. Jim ¹ This is actually becoming the minimum spec. People are already using USB DACs that accept 384k/32bit. Many others use this internally already. The DAC beside the monitor I’m looking at does so.

Jul 28, 2013 2:55pm Colin (478) 2433 posts	In a hypothetical sound system why would there be any problem with mixers and resamplers. If you send an audio stream to the sound system presumably the sound system knows the sound devices capabilities and can determine whether resampling is required. It also knows that no other audio is being played on a particular device so can determine whether mixing is required. So if you only play one stream to one device and the device can handle the input data without resampling you get what you call pass the parcel. If you play 2 audio streams to one device they get mixed but thats your problem. If you don’t want mixing don’t play 2 audio streams to the same device. The capabilities of the audio device doesn’t matter, if the audio stream doesn’t match the audio device’s capabilities it needs resampling. If you are right and the main problem you are experiencing is in the HAL then if you can’t get that fixed you won’t get anything else done as it is a much smaller problem than redesigning the sound system.

Jul 28, 2013 2:57pm Jess Hampshire (158) 865 posts	For example it may not be possible on given hardware for the ‘High quality’ to always meet “must be at system rate”. (Or, indeed, match in sample size since people are using 32bit for some sampling as well as 24bit.) Sample size isn’t that relevant, because converting is pretty simple. Since to go to anywhere but default, software would need to be written with that in mind, it would be up to the player to deal with. The player would be able to switch system sample rate. If the sound rate was wrong the player would have to deal with it by either re-sampling itself, or by sending it to another destination (perhaps defining an extra output that is a duplicate of the High quality one, but without the sample rate restriction). My suggestion is that the high quality destination is no compromise.

Jul 28, 2013 4:42pm jim lesurf (2082) 1438 posts	In a hypothetical sound system why would there be any problem with mixers and resamplers. In a hypothericallly ‘perfect’ system there won’t be any problems. Reality is another matter, though… :-) Ditto for some of your other comments. Yes it may be the user’s ‘problem’ when mixing takes place. The aim is to minimise or avoid such problems. Unless the user has to option to prevent mixing or other resamplings they may get poorer results. All hinges on how the real system differs from a hypothertical ‘perfect’ one. The capabilities of the audio device do matter precisely because they need to match the source materials to be able to avoid resamplings. That’s one of the points of having DAC hardware that can accept all the rates you wish to play. This thread isn’t about the HAL problems. But I expect we can, indeed, fix the main ones like the clipping. Jim

Jul 28, 2013 4:46pm jim lesurf (2082) 1438 posts	Sample size isn’t that relevant, because converting is pretty simple. Please explain why you think conversions between, say 44.1k <=> 48k are “pretty simple”. Sample size is relevant because the quantisation floor will depend upon it. Bear in mind also that not all music is relentlessly loud. As an experiment, earlier today I hacked the ‘PowerBars’ plugin for DigitalCD to behave more like a PPM (Peak Programme Meter). That gave it a log display with markers for -3dB/-10dB/-20dB/-30dB/-40dB and with a fast attack and slow decay. This makes it far easier to assess the levels of a lot of ‘classical’ music or small acoustic jazz where the levels can spend a lot of time down below -20dB or -30dB. Consider that in terms of the levels wrt the quantisation floor of 16bit. Jim

Jul 28, 2013 6:18pm Colin (478) 2433 posts	To the system software the capabilities of the device doesn’t matter the only thing that matters is how to play the audio stream. Maximizing quality is only a matter of tweaking. To play an audio stream to a device you have the following steps. 1 read Audio Data 2 optional Decoding 3 optional Resampling 4 optional Volume adjust 5 optional Mixing 6 send data to device Decoding is done if the Audio data isn’t in the format required for the device Resampling is done if the Audio data is not supported by the device Volume is changed if the gain you set isn’t 0db Mixing is done if more than 1 stream is trying to use the device. The only way to avoid 2-5 is for the user to ensure they are not required by matching the audio data to the device. A device having 16 Audio standards is no better than one with 1 standard if that matches the input stream you always use. It may well be the case for example that an all singing all dancing usb audio device just upsamples everything to its highest standard on the device itself and is in effect a single standard device. Anyway my point is if I was writing it I wouldn’t care about the device I’d not be writing it for 1 device I’d be writing it for all devices. From the system softwares point of view you have to be able to resample between any 2 standards even if the algorithm to do so is rubbish. The input stream tells you the ‘from’ the device tells you the ‘to’.

Jul 28, 2013 6:26pm Jess Hampshire (158) 865 posts	Please explain why you think conversions between, say 44.1k <=> 48k are “pretty simple”. I don’t, hence the whole suggestion of avoiding changing sample rates for hifi streams. Why do you think that? Sample size is relevant because the quantisation floor will depend upon it. Of course, but I was referring to relevance of it matching, not the quality. If the output system has more bits than the source, then pad with zeros if less drop some bits. Obviously a bit more processing than that would be desirable to deal with the quantization noise, but this would only be an issue if you chose to mix in other channels, and if you were going to do that, then odds are, ultimate quality isn’t an issue for that particular system.

Jul 28, 2013 6:55pm Rick Murray (539) 13850 posts	Jim: However the point of USB is that it is akin to the old argument for Java. Write once, run everywhere. The problem with that is that you risk not having any sound until you purchase a USB audio widget even if the hardware is capable of making sounds. It also creates potential problems for USB connectivity. To give an example, on my Beagle I have four USB ports. These are connected as follows: Flash (ADFS) Mouse Flash (FAT32) Keyboard When I am using MIDI, I can remove the FAT32 flash and put MIDI there instead. To add anything else is likely to require a USB hub. Oh, and the sound system will need to cope gracefully with the audio thingy appearing and disappearing at any time. Are all of the USB audio devices the same in terms of programming, or are we likely to end up with a situation like WiFi where they are all slightly different depending on the chipset used? Jess: Wouldn’t it make sense to have a distinction between a quality audio channel and an alert audio channel? This is why in my earlier suggestion I kept legacy audio separate. My feeling is that the primary audio system should switch rates to match what is being played, with resampling in hardware only if the native hardware cannot match the desired sampling rate (though I think there needs to be a mechanism for the audio system or the HAL to specify a range of acceptable rates – I think it may be easier/better for an MP3 player to decode to 48kHz if 44.1 is unavailable, than for it to decode to 44.1 and have the sound system subsequently convert it to 48). The legacy audio, and I would certainly imagine system beeps would be legacy audio, would always be resampled to match the primary sound system rate. I mean, sure, the quality might suck in making a 22kHz beep play at 96kHz or whatever – but will anybody notice/care for a simple “ding!”? Nobody specific: For the primary audio, the system should – in as much as is possible – do as little to the data as it can. For example, if the output device offers volume control and it can accept 48kHz 16 bit stereo samples, the OS ought to set the desired volume in hardware (not software) and throw the data directly at the hardware. It should only intervene and fiddle with the data when the hardware cannot cope, like if the data is in some weird format or sampling rate that is not supported. This is what Jim calls pass-the-parcel. And, for sure, doesn’t it make sense to dump the task onto hardware when appropriate instead of doing a lot of (potentially unnecessary) work by itself? As for the issue of alerts – this should be simple to handle for people who do/do not want to hear them. If alerts use the legacy audio, then just mute it in the mixer stage… As for the issue of playing multiple things at different sample rates via the primary channel; I feel in this case the locked rate of the sound system should be the rate of the first thing that started playing, with the sound system resampling anything else on the fly. If the first app should close and thus no longer require audio, the system can then switch rates to match the next task to register itself into the sound system. The application doesn’t necessarily need to know this has happened. But it does imply a register/deregister process so the sound system knows who is where and what they want. Think akin to Wimp_Initialise … Wimp_CloseDown. Jim asks about 24 bit audio. These ideas should, if implemented [by whom, though?] potentially permit 24 bit audio. If the hardware can cope, it’ll be thrown at it, if not, just resample. Perhaps the sound system can offer two forms of resample, a quick’n’dirty and a high quality. I suggest this because people such as Jim would clearly want a decent resampling method for the OS to play stuff the hardware can’t do natively (see all those pretty charts), however for cases like two programs using the sound system at the same time, it might be acceptable for the one using a different sampling rate to get a quick’n’dirty conversion. I mean, how likely is it that two programs will be playing audio at the same time except for system beeps and starting another audio app by accident? Jess: Programs would be aware if they had been blocked. (So if a phone call blocked media replay, it could be paused, for example, or a system alert could flash an LED perhaps). …ah, you looked at any Android phone. ;-) Though I think the audio system does NOT block music during a phone call. Instead, it fires a notification of some kind (that is why things that play media always ask for the “Phone state and identity” permission) and it is up to the app to pause itself. Likewise for the LED flashing, the app raises a message to make a notification, and you can have a choice of vibrate, blink some sort of LED, make a noise. This is handled by the OS, and highly inconsistently. For example, my “du-da-du!” (Miyuki’s catchphrase) notification tone will not play on the speaker when my phone is in silent mode, but it will play in the headphones if I’m listening to music. Perhaps there needs to be a disconnection here with a “Notification Provider” module that handles beeps and alerts and such. The sound system should not be concerned with flashing LEDs. If the application wants a beep, it should get a beep (unless the beep is muted). Anything more complex should be indirected though another module that decides how to notify the user. The app can then say “it’s a warning” or “epic fail, sound the sirens!” and the notification provider can figure out what to do. Jim: Some music files these days (e.g. the Beatles USB ‘apple’ stick) may provide something like 44.1k/24bit files, although 96k/24 is actually far more common. … Mine are 256/320kbit MP3. I avoid any outfit attempting to sell 128kbit MP3. I think they’re fairly universally 44.1kHz. I’ve highlighted in italics part of your quote to highlight that your audio requirements are a world away from the Mass Market Sheep who download stuff from iTunes, Amazon, a dozen similar outfits (like Deezer), listen direct via streaming sites, or look for rips on MediaFire etc. I once obtained a FLAC. Took me for-e-ver to find something that would play it. I think that says it all, really. (^_^) However, as always, the main problem has been highlighted above. People making snazzy hardware which is put on “open source” boards, with patchy documentation, if any. <rant>To give an example – one of the early ideas of an open source device is the Neuros OSD (a PVR). The whole thing was designed to be open source, however the media codecs were closed source because a third party supplied them. It was envisaged that this would get the box to market and eventually the codecs and system devices would be rewritten in an open source manner. It all looked good until it became clear that TI required an NDA to be signed in order to access any technical documentation on the DM320 chip, and that TI had absolutely NO interest in dealing with private individuals whatsoever [a copy of the message TI sent me is here]. I note that TI are (slowly) getting better, but Broadcom isn’t there yet. At any rate, the lack of documentation essentially killed the OSD when people realised that the most they could do is hack the OS or the application software, and that adding support for alternative types of video is bordering on the impossible. Indeed, Neuros wrote software to play 240p YouTube videos by ripping apart the FLV at runtime and faking up together as a standard H.263 file (that the codec understands). It is something of an ugly hack. I, likewise, would like to save my files as AVI instead of MP4 because my DVD players can technically play the recordings, but they only understand AVI wrappers. Again – it’s a dream, never a reality. I just wish these companies wouldn’t be so damn protective of their intellectual property to consider developer documentation to be “top secret”. Here’s a reality for you. The Chinese probably already have a copy of your NDA’d documents. Before the hardware hits the market. Anything else is just annoying for developers, especially if the companies are going to play pick and mix with who gets the info and who does not. I eventually found copies of the DM320 datasheets (on a Chinese site, natch) and it’s like the OMAP datasheets. There’s nothing super-special-whoo-hoo there to justify all the NDA nonsense, in my opinion. But you can see from this that a lack of proper documentation, and a lack of willingness to provide such, can have a knock on effect, not only on a home hacker but a whole minority operating system and its users. Maybe we ought to have “The chipset hall of shame” listing all the devices in contemporary RISC OS machines and their levels of documentation openness? </rant>

Jul 28, 2013 8:41pm Jess Hampshire (158) 865 posts	I’m reasonably sure I have made my suggestion as clear as mud. :( (Probably because I’m not a programmer) I’ll try and explain what I mean from a different angle. (With additions to the original idea to deal with issues pointed out). Each sound stream would have an additional piece of metadata (generated by the program) that represents the intended use of the stream and where the sound is produced from. The sound system uses this metadata to decide what to do with the stream, based on user choice of how to use the available hardware. There would be predefined values. HiFi – Only one stream at a time, sample rate must be supported by hardware. (All other destinations resample as needed) Media – Goes to the same hardware as hifi, but allows resampling (a media player would choose this if refused hifi due to lack of sample rates) Telephony Alert -intended for system bleeps Legacy – all streams with no Identifier Default – all streams with Identifiers with no destination entry Configure would have system where the user could map the destinations and options of the system. (Which would be machine dependent, and if pluggable audio systems are supported there would be options for with and without ). Each destination would have options including which hardware it uses, and whether it blocks or attenuates other streams. For example hifi on a DAC going to a hifi would normally block anything else trying to use the same hardware, but do nothing to sounds going to the internal speaker. But where there is only one sound system, it would allow other sounds to be mixed in. Telephony would mute all the other sounds, with the exception of alerts. The user might choose to send legacy to a DAC, to support an old media player. All these would be user defined, but with sensible defaults. If a sound stream is blocked a message would be sent, (it would be up to the programmer whether this is acted upon) Pausing play (I don’t have an Android phone) or flashing an LED, would be up to the program itself, not the system. What I am taking ideas from is the way skype can use a handset while the main speakers produce music. Hope this is less unclear.

Jul 28, 2013 10:55pm Rick Murray (539) 13850 posts	(Probably because I’m not a programmer) I suspect some of the issues also are due to a lack of understanding of the complex parts. To give an example, sometimes on badly encoded lower bitrate MP3s I can hear a sort of twangy whispering that seems almost as if it is a third channel between the left and the right. Either that or I’m going crazy. Now Jim might be able to offer a description like pre-tensile subharmonic contra-cerebral phase distortion, but for me it’s just “phantom twangy whispering”. I get that if we perform 44.1kHz to 48kHz resampling at 88/92kHz, then we have more room to play in, kind of like if you are trying to sample something that does not have a separate clock pulse (serial comms, PCM data), you don’t sample at the same rate as the data is coming in, you sample – if possible – at 3x or 4x to be more immune to jitter. However the more complicated parts of how audio resampling works is beyond my understanding. So I look at how a sound system might ideally work, with the resampling being a black box part written by somebody way smarter. ;-) HiFi – Only one stream at a time, sample rate must be supported by hardware. (All other destinations resample as needed) What happens if the HiFi input is not in a format understood by the hardware? This is why I broke my idea down to “legacy” and “primary”, with the specification that the good audio (your HiFi/media) would use the primary audio path, which will be passed as much as possible unmodified through the system. But modification might be required depending upon the hardware in use. Essentially it is like your “HiFi” and “media” rolled into one; but simplified in that a program does not have to ask to use HiFi and, if refused, ask for media. It will just ask for the primary audio path and the OS should do as little as possible to the data. As I said before, it seems silly to do a lot of unnecessary processing when the data could instead be thrown at the hardware. Alert -intended for system bleeps Legacy – all streams with no Identifier I figured that there wouldn’t be too much point in providing a high quality playback method just for a “beep”. Come on, how many people are still using WaveSynth-Beep for their error boxes? This is why I lumped alerts in with legacy – as it is likely that legacy style code will be used to play the beeps, and it gets around the issue of how to sync legacy to the primary sound stream – a quick’n’dirty resample will suffice. Each sound stream would have an additional piece of metadata (generated by the program) that represents the intended use of the stream and where the sound is produced from. I like the idea, but I think SWI calls might be better than metadata. If you embed metadata into the audio stream, how do you set up the system? Do you attempt to play and alter the metadata if the playback fails? Under RISC OS we have SWIs so (under your proposal) the app could first ask for audio capabilities, then ask if playback of such-and-such is available, then set up how to play. Or, under my (primary/legacy only) proposal, register how it wants to play and then play – with the audio system dealing with things in a more transparent manner. If the application spits out 24bit 96kHz samples and the audio system has to munge this into 16 bit 44.1kHz samples as that’s all the hardware can cope with, the application doesn’t necessarily need to know this. The application asks for sound to be played, the sound system should honour this request to the best of its abilities. Any conversions necessary should be justified by the hardware, otherwise, pass-the-parcel. The user might choose to send legacy to a DAC, to support an old media player. One thing my proposal is weaker with is multiple audio device support. I like the idea of a configure plug-in to determine what device sounds are directed to (with a fallback, of course, if the chosen device is not present); though I will point out that there are plenty of quirks in existing hardware – only recently (June update) has the RaspberryPi been capable of outputting audio to the via HDMI and the headphone jack at the same time. Without low-level technical details from Broadcom, quite likely it won’t be possible to direct different audio data to each output. Pausing play (I don’t have an Android phone) or flashing an LED, would be up to the program itself, not the system. I must disagree here, as it is tasking the program with the job of knowing the specifics of a wide range of hardware and a never-ending cycle of updates for potential future hardware; not to mention the issue of “what if multiple programs want to make a notification?” plus “what if I want the LED to blink more slowly?”. Functionality like this is best handled via an intermediary module that can marshall notification requests within the hardware capabilities (RiscPC can blink the FDD LED; Beagles can do all sorts of cool things like pulsing LEDs; the RaspberryPi has no user wibbable LEDs) and the associated configuration (beep, don’t beep, etc). It shouldn’t be up to each application to do this for itself, and certainly an application should not be messing around with the hardware by itself. Each destination would have options including which hardware it uses, and whether it blocks or attenuates other streams. Nice, but add “…or mixes with other streams”. And in the case of attenuation or mixing, you might need to specify how. If you consider such-and-such an app might be using one sampling rate and another a different sampling rate, how do you sanely mix together two different sampling rates? There’s a clever method, and a nasty method. ;-)

Jul 28, 2013 11:56pm patric aristide (434) 418 posts	I once obtained a FLAC. Took me for-e-ver to find something that would play it. I think that says it all, really. Huh? Even DigitalCD on RISC OS handles FLAC as do several players on more popular platforms.

Pages: 1 2 3

Reply

To post replies, please first log in.

Forums → Wish lists →

Audio: future improvements?

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options