Iyonix audio
Rick Murray (539) 13840 posts |
Something that always bugged me was the assertion by Jim Lesurf (and others, from before I can point to a source) that the Iyonix outputs sound at 48kHz. To me this seems a bit crazy/lame given that the majority of available audio is going to be at 44.1kHz since that is the natural sample rate of a CD, hence a lot of MP3s will be at 44.1kHz as converting it to something else stands to degrade the sound further than is reasonable, even after the losses inherent to MP3 compression. There are two chips in the Iyonix to handle sound. The first is the sound “codec” (correct use of the word codec, not the “video format decoder” use!). This chip takes digital samples and generates sound from them, and vice versa. The second is a “Acer M5451 AC’97 controller”. I cannot find a brief or datasheet, however it appears to be an AC’97 (original spec) compliant device what links a PCI bus to the protocol used by the SigmaTel sound chip. This is where the problems begin. AC’97. If you know anything at all about audio, you will appreciate that the only way to get CD audio (at 44.1kHz) to run at 48kHz is to play most of the samples in order but to duplicate a few. My maths isn’t particularly good, but 48-44 is 4, which is an eleventh of 44; thus I would expect that something like 8-9% of samples are repeated in order to botch (there is no other word, well, “butcher” maybe) the audio into something that runs at 48kHz. Why am I talking about 48kHz? AC’97. A travestry against humanity. Poor Jim is looking for decent audio support, but back in ‘97 along marched Intel who defined a whoo-hoo new standard and, to quote Wikipedia: AC’97 defines a high-quality, 16- or 20-bit audio architecture with surround sound support for the PC. AC’97 supports a 96 kHz sampling rate at 20-bit stereo resolution and a 48 kHz sampling rate at 20-bit stereo resolution for multichannel recording and playback. From the bits of forum postings I could see in my searches, it looks as if the M5451 supports the original AC’97 spec, which means it is locked at 48kHz. [the Sigmatel codec supports a later version of AC’97 with a broader range of sampling rates, perhaps after Intel realised their folly; but by then the damage had been done] Did I read that right? Did Intel really push a sound protocol that did not originally natively support the most common sound sample rate around and by consequence risk setting PC audio support back almost a decade ? Unfortunately. |
Dave Higton (1515) 3526 posts |
Bollocks, Rick. Sample rate conversion is something that I have lots more experience of than probably anyone on these fora. I have commissioned two ASICs to do exactly that. I became part of the design teams while they were being specified and designed. As the customer, and as a member of the design teams, I had to learn the mathematics of sample rate conversion. The proper way to do it involves some discrete-time mathematics, which boils down to computing each output sample from a finite set of input samples, and low pass filtering the result. You can compute a suitable low pass filter using the Remez Exchange Algorithm. Once you’ve done that, the result will be indistinguishable from the original, to the human ear. Going the other way (i.e. from 48 kHz to 44.1 kHz) involves a small reduction in bandwidth. Most (and I really do mean most) human ears would not notice the difference anyway. The above statements about inaudible differences assume a playback system that dosn’t generate reconstitution products below half the sampling frequency. I suspect that quite a few out there do, though; they shouldn’t, but some do, as a result of using a half-band filter for reconstitution, whch is cheaper (less taps) than a proper low pass filter whose stop band starts at half the sampling rate. Note very carefully what I’ve said there: any audible differences are likely to be caused by inadequate reconstitution filters, and not by a proper resampling process. Resampling is fairly computationally intensive, and is far better handled on a DSP than on an ARM CPU. No, I’m not going to attempt to teach anyone the mathematics of discrete time systems or of sampling rate conversion here. They are standard DSP mathematics, which are open to anyone to study. |
Rick Murray (539) 13840 posts |
If you say so.
And that’s the problem. Custom ASICs. You do understand, I trust, that budget computer hardware will do exactly none of that complicated stuff and will prefer instead to botch samples in real time with a simplistic bit of code to fit the limitations of a braindead protocol, right? If you don’t believe me, sources are available on-line. Custom anything carries an inherent cost, and given that your average listener may well not realise the differences, subtracting from the bottom line for something that might not even be an issue for the majority is likely not going to please the bean counters. From what I can see on Google, processor speeds in 1997 (when the fixed 48kHz spec was introduced) were 200-500MHz for the Pentium families. I rather think that for something as mundane as playing audio, it might use the method I describe and not the method you describe. Your solution would be reserved for pricey soundcards and not generic bog-standard motherboard audio. Feel free to look at this: …/RiscOS/Sources/HWSupport/Sound/Sound0Trid/s/Sound0 PS: Don’t mix up 2013 with 1997. |
Rick Murray (539) 13840 posts |
Furthermore, a comment regarding a 2006 version of the Linux sound driver “ALSA” [source]:
Another from Wiki about SoundBlaster cards [source]:
Another [source]:
I think that makes the point. You know a lot about resampling audio to make good results. It’s a shame that the manufacturers since the late ‘90s and through the ’00s weren’t as clever. Before you reply with a word like “Bollocks”, you might want to step out of your field of expertise and observe the muck that exists in the real world. Where a handful of instructions in a driver are so much cheaper than custom hardware, to support a badly thought out specification in the first place – there is no direct linear relationship between 44.1 and 48 (actually, 88.2 and 96, IIRC) – if the original spec supported both, all of the typical sampling rates could be converted losslessly (32kHz → x3 = 96; 22.05 → x4 = 88.2; etc etc). But it didn’t. So the market did it in the cheapest and nastiest way that got the job done… |
Dave Higton (1515) 3526 posts |
You made a blanket statement:
I see no restriction in there to some limited set of hardware. An Iyonix can do a proper conversion – but not in real time. It doesn’t require “fancy custom hardware”, but to do it properly requires using a suitable algorithm. |
Rick Murray (539) 13840 posts |
(my highlights) Was that not enough context to suggest I was talking about the Iyonix? I made a few blanket statements as it appears that this problem is not just specific to the Iyonix. It seems a number of PCs/drivers/soundcards working to the AC’97 specification also did it the cheap’n’simple way. Why? You yourself said it:
Which essentially means it cannot do a proper conversion in realtime ! Think about it – games, listening to music, streaming radio. There are a hundred uses for the audio system where one does not have the luxury of waiting for a pleasing conversion. Not only that, but the conversion that is performed is not expected to unduly laden the processor. Who wants a computer where playing sounds makes everything run like treacle? So it is done quickly, and not very pleasantly, and without luxuries (the parts I call “fancy hardware”) like DSPs and ASICs. Which is kinda what I said in the post at the top of this page. Yes, it is possible to convert 44.1kHz to 48kHz; just as it is possible to convert PAL to SÉCAM; or 720p to 480p etc etc using either good software (takes time) or custom hardware (for realtime). You might even get realtime out of a software+DSP combo. But that’s something of a straw man when we are talking about a specification designed in 1997 using simpler hardware as was available in 1997 and trying to do it all in software to make it inexpensive to produce. Without custom hardware, if the more recent Iyonix can’t do this properly in realtime, I very much doubt a Pentium could manage it in realtime. The only solution that fits into all of this is… a linear resample. Smarter/later ones might work with interpolation but the earlier ones (especially for low-end processors) might really be as ugly as “play X samples then repeat one” (which will need a secondary level of correction to counteract sync errors as 44.1→48 is not even). |
Dave Higton (1515) 3526 posts |
I thought the title of the thread was what gave it away. We know that the Iyonix’s audio system, as provided by Castle, has some serious limitations. If you want an Iyonix to replay audio from a 44.1 kHz source, here are some suggestions: 1 Do it very badly in real time by repeating samples. 2 Do it very badly in real time by linear interpolation of samples. (We haven’t gone into the maths of this, but if you’re expecting high audio quailty to result, you’ll be disappointed.) 3 Do it well, but not in real time, by doing a proper sample rate conversion to 48 kHz followed by replay. Then you’ll simply have to question the quality of the 48 kHz replay system. 4 Do it well in real time by buying a better sound card and writing drivers for it. Btw I didn’t suggest ASICs are required to do real time sample rate conversion. I only mentioned them because that’s where I got my proper education in sample rate conversion. The conversion is within the capability of numerous modern general purpose DSPs. It may well be possible in the DSP in the BeagleBoard, for instance. Your mention of custom hardware seems rather odd. An audio card is custom hardware for the specific purpose of doing audio I/O. The Iyonix already has custom hardware, therefore – but still not adequate for the task that several people want. Upgrade of that custom hardware seems appropriate. If the idea of upgrading an old machine makes enough sense at all, that is. (Typed on my Iyonix.) I do have a second audio card in my Iyonix, which I bought some years ago… it seemed like a good idea at the time. I’ve never done anything with it, as far as I can recall. |
Rick Murray (539) 13840 posts |
Doh! <facepalm> ;-)
I would imagine it would be one of these for CD audio and many MP3s played in realtime. I hope the latter, but I’ve not examined the code in much detail.
I’d imagine it to be the sound equivalent of “artefacts” as seen in video. When you insert samples that don’t originally belong there, you are changing what the output is, right? Short inserted samples would, if I remember correctly, be a high frequency noise.
and:
Exactly. And I feel this is primarily because the initial version of AC’97 promoted only a 48kHz audio system on hardware that wasn’t powerful enough to perform proper conversions in realtime. I really really want to know what on earth was going through
But (IIRC! big TRM!) totally unnecessary as the hardware can handle the common sample rates so you don’t need to fudge in software to cover inadequacies of the hardware. |
Lee Shepherd (435) 51 posts |
I find this thread very interesting. I had long discussions with John Ballance about Issues with ‘artefacts’ being added to the audio buffer, its audible as a crackling sound when playing MP3s and Wave files and very annoying. http://www.drobe.co.uk/riscos/article.php?id=1438&nc=29 Lee |
Dave Higton (1515) 3526 posts |
There are lots of people who don’t understand audio at all, which has always struck me as odd because I’ve always found it simple. There are even fewer people who understand signal processing. I can understand why – it is a highly mathematical subject. Without an understanding of the mathematics or of audio, they have no idea of the implications of failing to handle common sample rates properly.
Right…
Depends on how you view it. There’s something there that shouldn’t be there, but that something is sidebands caused by a modulation process. To understand what it is, and why it is, you have to understand some of the mathematics of digital signal processing. |
Rick Murray (539) 13840 posts |
I don’t have an Iyonix; though it seems to me that there are two potential causes for cracking. The first, and something I noticed when splicing sound together in Nero WaveEdit is that if the samples are not correctly matched, there is a crackle. So it is possible that a very simplistic conversion from 44.1kHz to 48kHz resample may distort the audio in a way that can be heard. The second may be that the sound system cannot keep up with the data flow required by the SigmaTel chip. The full STAC9750 datasheet doesn’t mention buffering; I can’t find a datasheet for the M5451; so I don’t know if that buffers; worst case is we’re continually throwing data at the device at 1.5Mbps (48kHz x 16bit x 2 channels). While that works out to be around 190K/sec (/8 then /1024), it is 190K/sec that is quite time dependent. If the samples need to be specifically written to different registers, this could double… [I’ll await Dave to tell me how far off my maths is ;-) ] I think the test would be if somebody could modify the sound playback to play things at 48kHz without conversion (ie play it slightly fast). If there are still clicks, it might be an issue of getting the data to the chip in time. If the clicks go away, then it is likely to be something introduced by the 44.1→48 conversion. As for the forum linked, some of the comments were… amusing. Though I cringe when I think I used to find 128kbit MP3 to be acceptable. For those who don’t hear a difference, try to find a piece of music that uses both treble and bass at the same time in a distinctive way. My test music (great for showing up inadequacies of headphones) is “Shell” by Bana [Witch Hunter Robin OP]. |
nemo (145) 2546 posts |
Dave shone a bright beam of knowledge into the murky proceedings but then claimed:
I’m slightly surprised by ‘very badly’ here. Clearly that would rather depend on the input signal. But even in the worst case (Mr Nyquist) it’s going to be better than nearest neighbour, wouldn’t you say? I’d also have expected that various simple higher order interpolations could be carried out without over-taxing the ARM, though I’ve not thought about the log/lin aspect. Further insight would be welcomed. |
Dave Higton (1515) 3526 posts |
Yes – and your tastes, of course. How bad does it have to be before it’s very bad? There is a clear dependence on rate of change of input signal. To take it to the absurd, a DC level would not suffer at all from repeated or linearly interpolated samples. Linear interpolation uses proportion n of one original sample, and proportion (1-n) of the next original sample, where n is in the range 0 to 1. This is a two-tap finite impulse response filter except in the extreme cases where n is 1 or 0, whereupon it collapses to a one-tap filter. One tap is fine; the output equals the input. No change to the frequency response. Two taps with n = 0.5 is another case entirely. This is a low pass filter whose gain is 1 at DC and 0 at the Nyquist frequency. That’s quite a significant low pass filter, wouldn’t you say? The coefficients are varying with time in a regular pattern between those two extremes. I’m sure you can picture it. So the high frequencies are subject to regular amplitude modulation. The higher the signal frequency, the deeper the modulation. |
nemo (145) 2546 posts |
I can, yes. But isn’t that exactly what a 22KHz sine sampled at 48KHz would look like anyway? Edit: I think you’re being a bit harsh: In order those waves are:
|
Dave Higton (1515) 3526 posts |
Look like: no. Look similar to: yes. But here’s the interesting thing. If you sample a sine wave close to the Nyquist frequency, it looks like a dog’s breakfast, yes. But no information is lost! The sine wave can be reconstituted beautifully – without cheating. All that is required is to filter out the aliases – the reflection about the Nyquist frequency, and the further images about multiples of the Nyquist frequency. (Since the normal reconstitution filter is a low pass, it isn’t normally necessary to think about the further multiples.) An adequate reconstitution filter is pretty much impossibly difficult in the analogue domain. The better approach is to do proper digital interpolation to some multiple of the original sampling frequency. “Properly” involves a good low pass filter whose stop band begins at the new Nyquist frequency. Now the only remaining reconstitution aliases are at high enough frequencies that an analogue reconstitution filter of tolerable complexity can do a good enough job. The trouble with this subject is that it isn’t intuitive. You have to at least see and understand the maths – and even then it’s still not very intuitive. I’m in the very fortunate position of having seen it all done. It was stunning. All that maths? Just like they teach you at University? It really does work. |
Dave Higton (1515) 3526 posts |
To add a bit more specific detail about your example: Sampling causes double sideband suppressed carrier modulation of the signal, at the sampling frequency and all integer multiples of the sampling frequency. In your example, the main interfering product is at 26 kHz. If you filter out 26 kHz and all the higher frequencies – while still passing 22 kHz – you’ll see a clean 22 kHz sine wave reconstituted. That’s a tall order (pun intended!) for an analogue filter. |