Audio: future improvements?
jim lesurf (2082) 1438 posts |
I’d like to introduce some ideas for possible improvements in the ways RO can handle audio. There are a number of aspects to this which center around three things: mixing, resampling, and quality. We now have in place some foundation for a general audio ‘mixer’. This lets a user use more than one sound source, and output the results via more than one device. Useful for flexibility. However unless regard is taken to some points this can also cause problems. For example, if the system is expected to play two sound inputs symultaneously which have different sample rates. This means that at least one of them will need resampling/interpolation. Unfortunately, one of the hardest resamplings to do well is between 44.1k/48k – the two rates that have been most common. At present RO software, etc, tends to use linear interpolation which does such a conversion quite poorly. The above also – as exampled at present by the ARMiniX/PandaBoard – have implications for trying to do something like play a source at 48k using a machine (mixer/system) rate at 44.1k. From these examples mixing tends to require fairly good resampling processes if the quality is not to suffer. OTOH some users who are concerned about quality will wish to avoid mixing. If for not other reason as to avoid ‘noises’ cropping up when listening to music. This means a mixer may need to be easily setable to ‘block’ any other sources when one is currently being played out. The above isn’t simply a matter of not hearing unwanted noises. There is also a related question of what happens to the gains and risk of clipping if a new source starts. When that happens you may either require gains to be adjusted to avoid the sum clipping, or have the combination clip. There is, of course, more to this. But it means that flexibility means we may need to think about a bigger range of situations. Introducing and using a mixer means some things can occur which can’t with a setup that only has one source and one output at any time. At the heart of this I suspect are the following issues: A) Choice of system playout rate. And if this should be fixed, or ‘follow the source’. B) Ability to choose between blocking and mixing. C) Ability to choose between scaling (volume controls) and pass-the-parcel from source to output so that the output device gets sound data that has not been altered along the way once it has been converted to LPCM. The ‘follow the file’ rate point seems significant to me, as exampled again by the ARminiX/PandaBoard. At present, 44.1k material plays best with a system rate of 88.4k, whereas 48k material (and 96k material which is increasingly available) playes best with a system rate of 96k. Use the ‘wrong’ rate and the linear interpolation tends to put anharmonic aliasing down in the audible range. This is avoidable if the player automatically switches the system rate to ‘follow the file’. But in turn that may be a problem for a ‘mixer’ that may be confused by having the system rate suddenly change. There are some other issues which I suspect mainly concern audio drivers. e.g. ‘gapless playback’ which has become the norm elsewhere. This matters for some kinds of music. And of course beyond this are 96k/24bit playback and ‘surround’. All of which makes me feel we need to think about the roadmap for this area and plan out an extended API, etc. Be interested to see any reactions. :-) Jim |
Steffen Huber (91) 1954 posts |
I am following the whole computer audio stuff only with half an eye, but my impression for a number of years is that everybody from the ambitious home user to the professional don’t really care about analogue output from the computer. The presence of all kind of electromagnetic radiation inside a computer makes it very hard to create a high quality analogue signal. So everyone has gone digital. So my prio #1 would be to enable digital audio output in a “straight” fashion – i.e. supply whatever digital data goes into SoundDMA to the digital out – PandaBoard and Raspberry Pi should be able to do that via HDMI, not sure where that leaves the BeagleBoard. For volume control, rely on the external device which handles the DA conversion. Bonus points for adding output control for the user (i.e. being able to specify which kind of signal is allowed for output). My prio #2 would be to do audio via USB. Prio #3 to #n: revive the RISC OS software market to create powerful audio software. Prio #n+1: fix whatever strange things happen to analogue audio But overall, I think there are many more pressing things wrt RISC OS than the whole sound issue. E.g. porting Firefox (or helping the NetSurf guys to speed up JavaScript development). Even porting Java – Oracle is now supplying Java 8 previews for various ARM platforms, the ARM HotSpot JIT is getting faster all the time, and JavaFX is a seriously nice GUI toolkit. It just currently lacks a RISC OS Look&Feel :-) Sorry Jim, I guess that is not the kind of reaction you are looking for :-) |
jim lesurf (2082) 1438 posts |
I’d agree on the whole. Those serious about audio have tended to migrate to using USB DACs fed via asynch/iso transfer with sample-perfect data that has been ‘pass the parcel’ delivered from the source file or stream. However there are also people who still find an ‘all in one box’ attractive. I suspect that for them, a box as small and potentiall quiet (mechanically) as the ARMiniX/PandaBoard with its SD and USB sockets would be attractive. In the end, if it works well, it would find some users. After all it is far cheaper (and smaller!) than many of the top end USB DACs. And the measurements I’ve made so far show it doesn’t really suffer from the kinds of clock, etc, hashes that can concern people. I’d agree with your point about HDMI being a potentially good idea. But that also has a bad name with audiophiles who suspect it of high jitter levels. So whilst Ok for home theater it is unattractive to the same people who’d rule out analogue on the assumption of problems of the types you mention. To me, your priority #2 would in an ideal world be my #1. But no-one seemed to even be willing to take that one when I tried to draw attention with a small bounty. Too much work, too little interest, too little money. However the measurements I’ve made would already stand up to competition wrt 96k/16bit, which is pretty close to being home. Particularly given that the ARMiniX chip is 24bit, so this is a HAL/API problem from the RO POV. 96k/24 is the current ‘standard’ for serious computer audio. I’m not surprised by your reaction. Its pretty much what I’d expect. But I’ll just point out that I’m fairly familiar with the serious domestic audio market and the required engineering. I think the new machines running RO could find a niche there. And although only a minority of people spend seriously on audio, there are I suspect rather more of them at present than committed RO users. So although small wrt the general population, they could bring a lot more users – and cash and interest – to RO if attracted. Whatever, the reality is that people using RO will want to play audio. And the shift is already under way to > ‘CD quality’ computer files. Given the movement I don’t really see why we should simply mimic the failures of the past to spot such changes and opportunities when the hardware is clearly capable. I must admit I’m puzzled by the idea that because we’d like some other things, we should simply ignore this. What I’m suggesting is thinking about the API and roadmap. Not implimenting 384k/32bit/multichannel by next week! If we don’t, new users who come for new reasons won’t materialise because we haven’t realised that they may want things which aren’t the same as the wishes of long-term RO users. And that a compact, low-power, quiet platform can have an edge here. (I also suspect it would be useful for other ARM based machines.) I fear you may be falling into a rather inward-looking view of what is in essence ‘what many existing users are thinking they’d like’ rather than looking out of the window. :-) And I’d suggest that dealing with the immediate problems like clipping or system rates should be a rather easier thing to deal with than porting an up to date FF with all the trimmings. However that isn’t what I was raising here as it is a short term – relatively minor – snag. My point is that it will take time to decide what the roadmap/API should be, so we won’t be caught out later on. So I’m not surprised by what you say. I just hope, though, that people will take a step back and think of this from a fresh POV of potential users who have different requirements to most of those already ‘inside the tent’. Getting new users may be a different matter to getting some of the things established users who’ve adopted RO anyway would like. May mean some new thinking… :-) Jim |
Jeffrey Lee (213) 6048 posts |
BeagleBoard (and PandaBoard?) have headers available which expose the I2S data stream from the SoC (i.e. the data SoundDMA is producing). Of course that would require you to fit your own I2S-compatible DAC, and your own mixer hardware (since you’d be bypassing the onboard audio chip which is in charge of the mixing). Plus you’d probably need some kind of control signals to tell the DAC what sample rate it should be using, so maybe you’d need something wired up by I2C as well. So, it’s not quite the same as a standard consumer-friendly optical/digital audio output, but it’s definitely a way at getting at the pure, unmodified digital signals. |
jim lesurf (2082) 1438 posts |
Having a digital output (particularly optical to avoid loops) can be attractive. But I suspect the reality these days is that people will divide into three categories: A) Use a USB DAC B) Use analogue output C) Use HDMI Interest in spdif is fading given (A) which is available over a wide range of price and performance as well as size, etc. I’d like to see all three. Indeed, my real preference would be (A). But given the clear lack of work on (A) my feeling is that the other options make most sense as things stand. But bear in mind that audiophiles are wary of HDMI. So we should not assume that would be taken as ‘better’. But regardless, none of that changes my points about API, etc, for things like mixing vs blocking, 24b source material, etc. That applies however the output emerges. That’s what I was raising here. Jim |
Colin (478) 2433 posts |
Aren’t A and C effectively the same. They are just different paths to send isochronous digital audio to an external DAC aren’t they? And shouldn’t B be the same data delivered to an internal DAC? |
jim lesurf (2082) 1438 posts |
Yes, in theory they all should deliver the same sequence of values. But the reality is that the trend is to use USB DACs for audio. HDMI tends to be used by TV sets, displays, and ‘home theatre receivers’ – which expect surround as well as stereo. As I think people will know well from my attempt in past years – I’d love to see RO support USB asynch/iso audio. It would immediately free us for any HAL porting concerns about audio provided the USB is ported (which would be essential anyway!) HDMI is disliked because measurements show that a lot of HDMI kit has timing problems orders of magnitude bigger than spdif or USB. There is no theoretical reason for this having to be the case. But HDMI is ‘general consumer market’ whereas the iso/asynch USB audio method was explicitly developed for high quality audio purposes. Hence there are differences in practice which theory tells us need not exist. Yes, B should be the same. But a lot of consumer grade computers skimp on audio. Afraid I’d include the Iyonix as a RO example in that. However ARM based hardware is already used in audio, and on a range of devices people are using. So RO could gain a better take-up here. I need to re-distinguish two things though. 1) The question of what kind of output to use A-B-C, and the problems of a specific instance like the ARminiX clipping. 2) The fact that we have already introduced ‘mixing’ and being able to play at various system rates into RO. My real point in starting this thread is that (2) already has implications in practice. e.g when someone has two different sound sources playing symultaenously which may have different sample rates and then plays them out. The mixer will have to run at one rate, so resampling is needed. OR the user can get to choose only to play one and ‘block’ the other. This means a user choice of block vs mix. There are similar choices to be made wrt system rate, and if it should ‘follow the file’ or not. And questions like what to do given the DAC is already 96k/24bit capable when the user wants to play a 96k/24 file. The point is that this – and the other issues I raised – do need thinking about. People are doing these things now on other platforms. Simply failing to think about this and make some plans will become a factor that puts new users off. So my point isn’t that we can get the hardware to do all the things by next week. It is that we have to consider them so we are aware of what is going to be needed, and then put in place a suitable API and roadmap so we aren’t simply left behind. Personally, I think it is a shame we have igored USB audio. 1 But the reality is that many people buy squeezeboxes, etc. A variety of small player boxes, tablets, etc, that can have their own DAC giving analogue output. And the ARMiniX – despite the current snags – shows quite clearly that the results can compete if we ensure we can actually play the music in the ways users (as distinct from computer enthusiasts) expect. In practice it’s no disaster for me if RO misses this boat at a time when people are currently adopting new ways to play music. I can get good results from my Linux-based system. But it seems a shame for RO to lose out if the only reason is that people already involved don’t want to think about it. Jim 1 Yes, I understand that the USB audio simply may be too much to expect as things stand. Fair enough. I can regret that, but understand why. Doesn’t change my other points though. BTW apologies if someone sees this change. I keep having to edit it as I still find typing into a narrow window a real PITA! |
Colin (478) 2433 posts |
It seems to me that usb audio may not solve your problems. The obvious way to use it on RISC OS is to write an audio class driver as a back end to the normal sound system. In the same way as SCSISoftUSB is a class driver for mass storage devices and presents to the user as a SCSI device. Just because its a USB backend should make no difference to the RISC OS sound system. |
Rick Murray (539) 13857 posts |
All:
I thought it was only the Beagle xM that had the easy-to-access McBSP? Though it is worth noting that this is I2S; if you need something else (S/PDIF?) then you’ll need additional hardware, which could get messy/complicated as Jeffrey points out. Steffen:
What makes you think USB audio will be so good? Essentially, given audio is analogue, you are taking the digital to analogue stage and placing it somewhere else. If RISC OS is seemingly having problems with existing hardware, wouldn’t these be propsgated with USB audio? What needs to be done is for somebody who knows enough about audio to understand what Jim is talking about to look at the audio system to see what is going on. What I suspect would make a big difference is to modify the sound system to be capable of switching the entire system to the sample rate of the predominant data passing, so it would switch down to 44.1kHz if I was listening to MP3s. In other words, it should know what the hardware can do and only resample as a last resort. Likewise, legacy audio should be auto-resampled to match the current pure audio data rate. Furthermore, it might be an idea to pass off the volume controls to the audio chip if it is capable, and avoid controlling the audio in software if there is another option. There’s no point having the audio output on max and stepping down the samples being played – that’ll just introduce noise. That’s my 2p worth. Might be an idea to see how other systems do it. |
jim lesurf (2082) 1438 posts |
It will if – as I currently think is the case – the problems with clipping, etc, are down to the HAL/OMAP. And the real advantage of USB audio is that it would mean a choice of sound devices with the provision more easily ‘portable’ to each new RO machine. Removes the need to write a specific audio HAL for every new machine. Jim |
jim lesurf (2082) 1438 posts |
Audio only become ‘analogue’ at the point when the series of digital info is converted into an ‘analogue’ waveform (pair for stereo). The main problems I’ve been uncovering seem to be with the way the digital data is handled and fed to the hardware. I’d agree that this needs to be done correctly and appropriately whatever hardware is used – inc USB. However the point of USB is that it is akin to the old argument for Java. Write once, run everywhere. If we had a correctly working USB audio interface that would then ‘just work’ on new machines as well as existing ones given the requirement that the USB stack is ported. Which is an essential anyway! However the point of this thread was that I was raising some other questions that I don’t think we can continue to ignore about how audio is already developing. If we don’t deal with them we’d be in a situation akin to if we’d stayed with ‘8bit sound only’ today. And I’ve (regretfully) accepted that no-one is going to impliment USB audio as things stand. Chicken and egg. Whilst no RO users use it, none expect it. Jim |
Jess Hampshire (158) 865 posts |
Wouldn’t it make sense to have a distinction between a quality audio channel and an alert audio channel? The quality channel would be expected to output at the system defined sample rate or at half of it. The alert channel(s)would not have a fixed rate. The user would then choose how to deal with these. So you might send the quality channel(s) digitally to your DAC on the hifi and the alerts might come out of the internal speaker. Or you might mix them, or mix them with the alerts being lowered in volume if there is quality content, or the quality channel(s) might mute the alert channel(s) |
jim lesurf (2082) 1438 posts |
The idea of ‘alert’ and ‘quality’ channels seems a reasonable one in terms of user meaning/distinction and choice. The present assumptions (e.g. in Linux) which RO developers seem to be following seem to be to have one overall ‘mixer’. I tend to think of the choice as being between mixing and blocking as that is a traditional way to make the distinction. In some ways that may be simpler for the API as there is one basic choice mixing versus blocking. Personally, I’d not want any ‘alerts’ to come out of any speakers I could hear at all if I’m listening to music. When I’m doing that I’m listening to music not using a computer. But if someone wants to hear ‘alerts’ then the system should allow that to occur. So again, what seems to really matter to me here is that the user can choose what they prefer. Your approach may make things clear in user terms. But the risk of what you suggest is that it just shifts the way someone chooses mixing/blocking to another part of the sound setup. May simply make things more complicated for the programmers and system. And as soon as we have mixing the system may have to resample in a way that changes with any changes in inputs (and outputs) so it means taking resampling or interpolation more seriously. Blocking systems can find this much easier as they may be able to have the rate follow the file, and either avoid resampling or keep to easy conversions like the x2 rate which is about as easy as they come. (Whereas mixing 44.1 and 48 is a comparitive nightmare to get right. Despite currently being the most common two domestic user rates.) With Linux the basic choice is between mixing and blocking. Typified by a trend to allowing something like Pulse audio to automagically guess what you want, or taking user control to block and go ‘direct’ via ALSA. (Which on Linux is the only way I’ve found that works correctly in the purist audio sense.) Getting Pulse to do something the distro builders assumed you wouldn’t want can be a nightmare best avoided. Although Pulse promises flexibility. my experience of it is that it is hard to do something that the distro builders didn’t think you’d want. Ends up like fighting ‘nanny knows best’. Not something I’d like to see happen to RO. So I’d be wary of adding more ‘layers’ like Linux has. It seems to bring added ‘too many cooks spoil the broth’ opportunities. However I’d be interest to see what others think of your suggestion. I’ve just been thinking of a three of general switches for the user: Mixing/blocking resampling/direct System volume/pass-the-parcel But there may be another approach that gives the same user choice in a more convenient manner. Of course, this still leaves questions like being able to handle 24bit audio… Jim |
Jess Hampshire (158) 865 posts |
With further thought, the term channel is confusing. (i.e does an audio channel consist of a a left and right channel, for example.) What I was thinking of was the audio streams coming out of programs. It has occurred to me that more that two types are needed. (From the point of view of destination, not content). Possibly the system should define these 1. High quality (Must be at system sample rate.) In configure you would have these entries (plus any addition ones required by programs) to allow these to be mapped to the available hardware. (Where they go, whether they block or mix and priority.) Programs would be aware if they had been blocked. (So if a phone call blocked media replay, it could be paused, for example, or a system alert could flash an LED perhaps). |
jim lesurf (2082) 1438 posts |
Accepting the idea of having the added layer of complexity, there are still some snags. For example it may not be possible on given hardware for the ‘High quality’ to always meet “must be at system rate”. (Or, indeed, match in sample size since people are using 32bit for some sampling as well as 24bit.) So some kind of resampling may be needed as a fallback, even if the ideal is to pass though without any alterations if the hardware allows. So we’d still need to resolve how things like resampling are handled at ‘high quality’ – which is quite demanding if you have both 44.1k and 49k involved. Alas, this is where USB audio would help since the best USB audio DACs provide 44.1k/48k/88.2k/96k/176.4k/192k at 24bit (or more). 1 So allow the user to ensure that almost any music they play doesn’t require any mixing or resampling unless they want to mix two streams/files to hear them symultaenously. Most internal ‘soundcards’ support relatively few rates natively, so need a ‘mixer’ of some kind. e.g. in Linux ALSA the difference between ‘hw:’ and ‘plughw:’ driving. Again, Linux gives the user the choice. (Although the documentation is awful.) An advantage of Linux is that the resampling layer provided by ALSA can work very well indeed. But that means having the host CPU/OS do the work, not leave it to a specialist audio chip. The bottom line on that is that with ALSA if you choose ‘hw:’ with a suitable USB DAC and choose blocking (i.e. no mixer) you can play things pass-the-parcel with the host system needing to do almost no work for LPCM. And get best possible results. But we don’t have the option as things stand… So will have to allow for resampling to be used for some material even when not mixing sources. Some music files these days (e.g. the Beatles USB ‘apple’ stick) may provide something like 44.1k/24bit files, although 96k/24 is actually far more common. It also becomes more complex if some would want an alert ‘delayed’, others, sent to some other output, etc, rather than just being blocked/discarded. Hence although your idea makes sense in terms of user-level descriptions and labelling, it may make getting a result harder. However I can see the logic and the attraction. It is still based on getting blocking, mixing, etc, options working well under the skin, though. Jim 1 This is actually becoming the minimum spec. People are already using USB DACs that accept 384k/32bit. Many others use this internally already. The DAC beside the monitor I’m looking at does so. |
Colin (478) 2433 posts |
In a hypothetical sound system why would there be any problem with mixers and resamplers. If you send an audio stream to the sound system presumably the sound system knows the sound devices capabilities and can determine whether resampling is required. It also knows that no other audio is being played on a particular device so can determine whether mixing is required. So if you only play one stream to one device and the device can handle the input data without resampling you get what you call pass the parcel. If you play 2 audio streams to one device they get mixed but thats your problem. If you don’t want mixing don’t play 2 audio streams to the same device. The capabilities of the audio device doesn’t matter, if the audio stream doesn’t match the audio device’s capabilities it needs resampling. If you are right and the main problem you are experiencing is in the HAL then if you can’t get that fixed you won’t get anything else done as it is a much smaller problem than redesigning the sound system. |
Jess Hampshire (158) 865 posts |
Sample size isn’t that relevant, because converting is pretty simple. Since to go to anywhere but default, software would need to be written with that in mind, it would be up to the player to deal with. The player would be able to switch system sample rate. If the sound rate was wrong the player would have to deal with it by either re-sampling itself, or by sending it to another destination (perhaps defining an extra output that is a duplicate of the High quality one, but without the sample rate restriction). My suggestion is that the high quality destination is no compromise. |
jim lesurf (2082) 1438 posts |
In a hypothericallly ‘perfect’ system there won’t be any problems. Reality is another matter, though… :-) Ditto for some of your other comments. Yes it may be the user’s ‘problem’ when mixing takes place. The aim is to minimise or avoid such problems. Unless the user has to option to prevent mixing or other resamplings they may get poorer results. All hinges on how the real system differs from a hypothertical ‘perfect’ one. The capabilities of the audio device do matter precisely because they need to match the source materials to be able to avoid resamplings. That’s one of the points of having DAC hardware that can accept all the rates you wish to play. This thread isn’t about the HAL problems. But I expect we can, indeed, fix the main ones like the clipping. Jim |
jim lesurf (2082) 1438 posts |
Please explain why you think conversions between, say 44.1k <=> 48k are “pretty simple”. Sample size is relevant because the quantisation floor will depend upon it. Bear in mind also that not all music is relentlessly loud. As an experiment, earlier today I hacked the ‘PowerBars’ plugin for DigitalCD to behave more like a PPM (Peak Programme Meter). That gave it a log display with markers for -3dB/-10dB/-20dB/-30dB/-40dB and with a fast attack and slow decay. This makes it far easier to assess the levels of a lot of ‘classical’ music or small acoustic jazz where the levels can spend a lot of time down below -20dB or -30dB. Consider that in terms of the levels wrt the quantisation floor of 16bit. Jim |
Colin (478) 2433 posts |
To the system software the capabilities of the device doesn’t matter the only thing that matters is how to play the audio stream. Maximizing quality is only a matter of tweaking. To play an audio stream to a device you have the following steps. 1 read Audio Data Decoding is done if the Audio data isn’t in the format required for the device The only way to avoid 2-5 is for the user to ensure they are not required by matching the audio data to the device. A device having 16 Audio standards is no better than one with 1 standard if that matches the input stream you always use. It may well be the case for example that an all singing all dancing usb audio device just upsamples everything to its highest standard on the device itself and is in effect a single standard device. Anyway my point is if I was writing it I wouldn’t care about the device I’d not be writing it for 1 device I’d be writing it for all devices. From the system softwares point of view you have to be able to resample between any 2 standards even if the algorithm to do so is rubbish. The input stream tells you the ‘from’ the device tells you the ‘to’. |
Jess Hampshire (158) 865 posts |
I don’t, hence the whole suggestion of avoiding changing sample rates for hifi streams. Why do you think that?
Of course, but I was referring to relevance of it matching, not the quality. If the output system has more bits than the source, then pad with zeros if less drop some bits. Obviously a bit more processing than that would be desirable to deal with the quantization noise, but this would only be an issue if you chose to mix in other channels, and if you were going to do that, then odds are, ultimate quality isn’t an issue for that particular system. |
Rick Murray (539) 13857 posts |
Jim:
The problem with that is that you risk not having any sound until you purchase a USB audio widget even if the hardware is capable of making sounds. It also creates potential problems for USB connectivity. To give an example, on my Beagle I have four USB ports. These are connected as follows: Flash (ADFS) Mouse Flash (FAT32) Keyboard When I am using MIDI, I can remove the FAT32 flash and put MIDI there instead. To add anything else is likely to require a USB hub. Oh, and the sound system will need to cope gracefully with the audio thingy appearing and disappearing at any time. Are all of the USB audio devices the same in terms of programming, or are we likely to end up with a situation like WiFi where they are all slightly different depending on the chipset used? Jess:
This is why in my earlier suggestion I kept legacy audio separate. My feeling is that the primary audio system should switch rates to match what is being played, with resampling in hardware only if the native hardware cannot match the desired sampling rate (though I think there needs to be a mechanism for the audio system or the HAL to specify a range of acceptable rates – I think it may be easier/better for an MP3 player to decode to 48kHz if 44.1 is unavailable, than for it to decode to 44.1 and have the sound system subsequently convert it to 48). Nobody specific: For the primary audio, the system should – in as much as is possible – do as little to the data as it can. For example, if the output device offers volume control and it can accept 48kHz 16 bit stereo samples, the OS ought to set the desired volume in hardware (not software) and throw the data directly at the hardware. It should only intervene and fiddle with the data when the hardware cannot cope, like if the data is in some weird format or sampling rate that is not supported. As for the issue of alerts – this should be simple to handle for people who do/do not want to hear them. If alerts use the legacy audio, then just mute it in the mixer stage… As for the issue of playing multiple things at different sample rates via the primary channel; I feel in this case the locked rate of the sound system should be the rate of the first thing that started playing, with the sound system resampling anything else on the fly. If the first app should close and thus no longer require audio, the system can then switch rates to match the next task to register itself into the sound system. The application doesn’t necessarily need to know this has happened. But it does imply a register/deregister process so the sound system knows who is where and what they want. Think akin to Wimp_Initialise … Wimp_CloseDown. Jim asks about 24 bit audio. These ideas should, if implemented [by whom, though?] potentially permit 24 bit audio. If the hardware can cope, it’ll be thrown at it, if not, just resample. Perhaps the sound system can offer two forms of resample, a quick’n’dirty and a high quality. I suggest this because people such as Jim would clearly want a decent resampling method for the OS to play stuff the hardware can’t do natively (see all those pretty charts), however for cases like two programs using the sound system at the same time, it might be acceptable for the one using a different sampling rate to get a quick’n’dirty conversion. I mean, how likely is it that two programs will be playing audio at the same time except for system beeps and starting another audio app by accident? Jess:
…ah, you looked at any Android phone. ;-) Perhaps there needs to be a disconnection here with a “Notification Provider” module that handles beeps and alerts and such. The sound system should not be concerned with flashing LEDs. If the application wants a beep, it should get a beep (unless the beep is muted). Anything more complex should be indirected though another module that decides how to notify the user. The app can then say “it’s a warning” or “epic fail, sound the sirens!” and the notification provider can figure out what to do. Jim:
… Mine are 256/320kbit MP3. I avoid any outfit attempting to sell 128kbit MP3. I think they’re fairly universally 44.1kHz. I’ve highlighted in italics part of your quote to highlight that your audio requirements are a world away from the Mass Market Sheep who download stuff from iTunes, Amazon, a dozen similar outfits (like Deezer), listen direct via streaming sites, or look for rips on MediaFire etc. However, as always, the main problem has been highlighted above. People making snazzy hardware which is put on “open source” boards, with patchy documentation, if any. <rant>To give an example – one of the early ideas of an open source device is the Neuros OSD (a PVR). The whole thing was designed to be open source, however the media codecs were closed source because a third party supplied them. It was envisaged that this would get the box to market and eventually the codecs and system devices would be rewritten in an open source manner. It all looked good until it became clear that TI required an NDA to be signed in order to access any technical documentation on the DM320 chip, and that TI had absolutely NO interest in dealing with private individuals whatsoever [a copy of the message TI sent me is here]. I note that TI are (slowly) getting better, but Broadcom isn’t there yet. At any rate, the lack of documentation essentially killed the OSD when people realised that the most they could do is hack the OS or the application software, and that adding support for alternative types of video is bordering on the impossible. Indeed, Neuros wrote software to play 240p YouTube videos by ripping apart the FLV at runtime and faking up together as a standard H.263 file (that the codec understands). It is something of an ugly hack. I, likewise, would like to save my files as AVI instead of MP4 because my DVD players can technically play the recordings, but they only understand AVI wrappers. Again – it’s a dream, never a reality. I just wish these companies wouldn’t be so damn protective of their intellectual property to consider developer documentation to be “top secret”. Here’s a reality for you. The Chinese probably already have a copy of your NDA’d documents. Before the hardware hits the market. Anything else is just annoying for developers, especially if the companies are going to play pick and mix with who gets the info and who does not. |
Jess Hampshire (158) 865 posts |
I’m reasonably sure I have made my suggestion as clear as mud. :( (Probably because I’m not a programmer) I’ll try and explain what I mean from a different angle. (With additions to the original idea to deal with issues pointed out). Each sound stream would have an additional piece of metadata (generated by the program) that represents the intended use of the stream and where the sound is produced from. The sound system uses this metadata to decide what to do with the stream, based on user choice of how to use the available hardware. There would be predefined values. HiFi – Only one stream at a time, sample rate must be supported by hardware. Configure would have system where the user could map the destinations and options of the system. (Which would be machine dependent, and if pluggable audio systems are supported there would be options for with and without ). Each destination would have options including which hardware it uses, and whether it blocks or attenuates other streams. For example hifi on a DAC going to a hifi would normally block anything else trying to use the same hardware, but do nothing to sounds going to the internal speaker. But where there is only one sound system, it would allow other sounds to be mixed in. Telephony would mute all the other sounds, with the exception of alerts. The user might choose to send legacy to a DAC, to support an old media player. All these would be user defined, but with sensible defaults. If a sound stream is blocked a message would be sent, (it would be up to the programmer whether this is acted upon) Pausing play (I don’t have an Android phone) or flashing an LED, would be up to the program itself, not the system. What I am taking ideas from is the way skype can use a handset while the main speakers produce music. Hope this is less unclear. |
Rick Murray (539) 13857 posts |
I suspect some of the issues also are due to a lack of understanding of the complex parts. To give an example, sometimes on badly encoded lower bitrate MP3s I can hear a sort of twangy whispering that seems almost as if it is a third channel between the left and the right. Either that or I’m going crazy.
What happens if the HiFi input is not in a format understood by the hardware? This is why I broke my idea down to “legacy” and “primary”, with the specification that the good audio (your HiFi/media) would use the primary audio path, which will be passed as much as possible unmodified through the system. But modification might be required depending upon the hardware in use. Essentially it is like your “HiFi” and “media” rolled into one; but simplified in that a program does not have to ask to use HiFi and, if refused, ask for media. It will just ask for the primary audio path and the OS should do as little as possible to the data. As I said before, it seems silly to do a lot of unnecessary processing when the data could instead be thrown at the hardware.
I figured that there wouldn’t be too much point in providing a high quality playback method just for a “beep”. Come on, how many people are still using WaveSynth-Beep for their error boxes? This is why I lumped alerts in with legacy – as it is likely that legacy style code will be used to play the beeps, and it gets around the issue of how to sync legacy to the primary sound stream – a quick’n’dirty resample will suffice.
I like the idea, but I think SWI calls might be better than metadata. If you embed metadata into the audio stream, how do you set up the system? Do you attempt to play and alter the metadata if the playback fails? Under RISC OS we have SWIs so (under your proposal) the app could first ask for audio capabilities, then ask if playback of such-and-such is available, then set up how to play.
One thing my proposal is weaker with is multiple audio device support. I like the idea of a configure plug-in to determine what device sounds are directed to (with a fallback, of course, if the chosen device is not present); though I will point out that there are plenty of quirks in existing hardware – only recently (June update) has the RaspberryPi been capable of outputting audio to the via HDMI and the headphone jack at the same time. Without low-level technical details from Broadcom, quite likely it won’t be possible to direct different audio data to each output.
I must disagree here, as it is tasking the program with the job of knowing the specifics of a wide range of hardware and a never-ending cycle of updates for potential future hardware; not to mention the issue of “what if multiple programs want to make a notification?” plus “what if I want the LED to blink more slowly?”.
Nice, but add “…or mixes with other streams”. And in the case of attenuation or mixing, you might need to specify how. If you consider such-and-such an app might be using one sampling rate and another a different sampling rate, how do you sanely mix together two different sampling rates? There’s a clever method, and a nasty method. ;-) |
patric aristide (434) 418 posts |
Huh? Even DigitalCD on RISC OS handles FLAC as do several players on more popular platforms. |