audio puzzles, etc
jim lesurf (2082) 1438 posts |
Hi, This is my first attempt to post to a web forum of any kind as I’m afraid I find them very difficult even to read! So apologies if this doesn’t come out right. There is a fair bit I want to raise wrt the ‘audio’ behaviour of RO hardware, etc, as I’ve been doing some tests to measure the performance of my new PandaBoard/ARMiniX. This threw up some things that puzzle me, etc. Trying to find some answers also made me feel there is a lack of clear, extensive, and up-to-date documentation on this that is easily available. Because of that I have a number of aims in mind. One is to be able to pull together are write some better documentation that people will find easier to read. Another is to determine where ‘oddities’ I’ve measured or observed are due to my mistakes, and where they indicate a snag or bug in software, and if so, at what level along the chain from source material (file or stream) to output (audio emerging from the DAC / audio chip). To start things off I’ll ask brief questions about a few topics I’m trying to work on. Can explain more as and when others may respond. 1) When doing my tests I’ve found that the *volume level does affect the output scaling for 16-bit source material. Yet the impression I’ve got from what I’ve read is that *volume is intended for 8-bit material. Have I misunderstood this? 2) I find that when I allow an application like PlaySound or DigitalCD and drivers like PlayIt to take note of the *volume then I have to set a *volume value of about ‘100’ to ensure that a source file with a full-range 16-bit waveform just fails to clip the audio chip on my ARMiniX. Yet the report back is that this value gives a ‘gain’ of well below 100%. There is more to it, but I’m having to type this into a tiny window that makes it hard for me to explain. But the basic question is: What settings will ensure (either taking *volume into account or bypassing it) that full level 16-bit values in an LPCM source get those values to the DAC and give full range output with no scaling factors? [N.B. I do know that the chip on the PandaBoard is actually 24-bit and works via delta-sigma. That shows up it measurements of its output. But above that means that in effect 16-bit input values would be simply ‘promoted’ by an 8-bit shift preserving sign.] That’ll do for now. :-) Be interested to see what people make of it. in addition, I am looking at writing documentation, so I’d like to ask questions about that in due course, and will also eventually make the results public so people can point out my errors and omissions, which I will then correct, once I’ve understood. All this is prompted as I’m quite impressed with the audio performance you can get from an ARMiniX / PandaBoard. But at present you have to do some furtling around to get it to work well, and there are measured signs that some things could be further improved. FWIW I’m hoping also during the next month or so to write about RO audio and the new machines for a consumer mag. Want to have as positive and accurate message for that as I can! :-) Cheers, Jim |
nemo (145) 2546 posts |
I can tell you everything you could ever want to know about the 8bit audio… but I’ve no idea about the 16bit. Far too new for me. ;-) |
jim lesurf (2082) 1438 posts |
Hi, Well, the current situation is that serious domestic audio is already shifting to 96k/24bit being the standard, with some people going to higher rates and sample sizes, or even DSD, DXD, etc. The good news is that tests I’ve done show that the new PandaBoard + RO can play 96k/16 wav and flac very nicely. However, we have some way to go to catch up with some of the computer dacs, etc, that I and others use regularly with Linux! There 96k/24 is common, not 96k/16. :-/ This is a bit [ pun alert! ] frustrating given that the DAC chip on the PandaBoard does accept 24bit samples. Indeed, IIUC, it requires 24bit! So a challenge there for some future point is to define an extended API for RO to cover it, and to impliment a way of getting 24bit values from source to chip. However that’s for another day, I guess. Much as I’d love to see it asap. :-) I should admit that my personal interest in ‘8-bit’ is limited. Little interest in using it myself. As far as I can see, 8-bit is fairly well documented by the PRM. So any new documents I generated could probably largely refer to that for what the PRM already contains. However it lacks a lot of the newer/16-bit details. e.g. the current state of SharedSound and *MixVolume. And my tests with 16bit sometimes seem to give results that don’t seem to agree with some of what I read. Which is one of my main puzzles at present. Perhaps given the above, you should focus on 24bit. ;→ BTW Is there a way to change the height of the ‘letterbox’ that NetSurf gives me to type a reply into? Or is the idea to make it hard for someone to write more than about 8 lines? :-) Or should I assume that I need to use FF rather than NetSurf? Does that make it easier to adjust what I see as I type, etc? Slainte, Jim |
Colin (478) 2433 posts |
Write it in a word processor/text editor and drag it in. Ideally paragraphs should be single lines – if you see what I mean. |
Martin Avison (27) 1494 posts |
Not that I am aware of with v3.0 as there is no ‘drag corner to resize’ visible. When writing anything significant into a forum or other web form (any platform) I tend to write & edit it in my favourite text editor for that platform, then when happy with it drag/drop or cut/paste it in to the entry box. That makes it much easier to see the whole text, check and correct it, and also avoids losing it when (not if) something happens that means you lose the web page! |
Colin Ferris (399) 1814 posts |
What about using StrongEd – spell checker :-) and then when finished – drag you msg over to NetSurf. Seems your blogs are going to compete in length with Heyrick :-) You may find some info about “sound” on http://www.duffell.riscos.me.uk/ You will have to look for the 32bit code. |
Trevor Johnson (329) 1645 posts |
Your posts have come out alright here, so congratulations! AFAIK the height can’t be changed under any browser. For longer posts, you could compose elsewhere and then paste in. (In fact, it’s best to keep a copy anyway, just in case there’s a problem!) |
Colin (478) 2433 posts |
If only you got so many answers to your audio questions :-) |
jim lesurf (2082) 1438 posts |
Hi, Thanks, Colin for the pointer to the duffel. page. I’d actually found it a few days ago after some discussions on usenet. But as yet haven’t made sense of the software it offers! :-) And yes, more answers to the audio puzzles would be welcome! It does set me wondering about two issues, though. One is if something like that is a basis for ‘better than linear’ interpolation? The existing linear interpolation allows us to play 44.1k with a system rate of 88.2k and the linear interpolation works fairly well. i.e. although it generates anharmonic distortion, its all in the ultrasonic. Whereas better interpolation (e.g. as used by sox) avoids this problem and lets you use source/output rate ratios other than x2 without problems. The question being how to include better interpolations or resamplings when playing? The other issue is one people seem not to have considered at all: How to make use of the audio input of the PandaBoard? That also can work at 96k/24 would be superb as an audio recorder and as the basis for measurement systems – e.g. scope, FFT, etc. I did write a simple scope/FFT for my Iyonix, but that relied on !AudioIn and only works at 48k with a less-than-16bit range. Would be really useful to have something better using PandaBoard + RO. Thanks for all the comments about using an external editor. I’ll try that at it probably will be easier. Also let me keep a local copy of what I’ve written for reference. I’m just hoping someone can answer my questions about the quirks I’ve found. But I suspect I may be: A) The only person (as yet) who has actually measured and analysed the output of the ARMiniX. So others may not have known what happens. B) Am I really that rare that I’m the only person – apart perhaps from Keith Dunlop :-) – who wants/expects to see a RO computer offer high quality audio? As I think I wrote earlier, I’d like to be able to mention the new RO machines as a possible alternative for audiophiles in magazine articles aimed at them. Would be one more reason for new users to find RO of interest. FWIW Having quieted the fan noises of my ARMiniX it clearly has the potential of being a small and convenient music player as part of quite a good system. Just that it does still seem to have some rough edges that need dealing with. When I get a chance I’ll post a link to some graphics that show what I mean. Slainte, jim |
jim lesurf (2082) 1438 posts |
Ok, here’s an example of the measured results I’ve been getting. http://jcgl.orpheusweb.co.uk/temp/R88.png All with the ARMiniX system rate set to 88.2k (44.1k doesn’t work which seems absurd to me as I think the chip will handle it, but leave that for now…) The left-hand plots show an HF test signal. Top is a 44.1k rate file that is being linear interpolated (presumably by PlayIt?). You can see aliases at about 50kHz which shouldn’t be present. Plus the ‘hill’ of ultrasonic hash caused by the delta-sigma process in the chip. Lower graph. Same test waveform by 88.2k sampled. ta-daa! No aliases. This tells us that the result would be clearer for the 44.1k file if the interpolation was better than linear. Even 3rd/4th order would be lots better. Riht hand side. Bottom graph. This shows a test sinewave file at 48k sample rate being played with a system rate of 88.2k. Now the alaises extend down into the audible range because linear interpolation makes a mess when the syatem rate isn’t exactly x2 the source data rate. All these results using PlaySound+PlayIt and lpcm 16bit wave files as the source. Shows why better interpolation is desirable. Particularly if you want to play on the ARMiniX anything with a rate that doesn’t currently follow the golden rule of ‘system rate = 2 x source rate’. Jim |
Rick Murray (539) 13840 posts |
You might be able to with Firefox if you write some code to hack the page content.
Perhaps somebody at ROOL might be persuaded to bump up the rows value if we ask really nicely? |
Rick Murray (539) 13840 posts |
This might be a side-effect of the asinine AC97 specification1 and its legacy?
Ditto the Beagle. And likewise the Pi (I’m sure somebody has wired an electret to the GPIO – the BCM chip does have an ADC onboard, doesn’t it?).
Given that I listen (via my mobile phone) to a fair amount of MP3 converted from AAC from YouTube videos… I think I’m the last person to ask. I do, at least, convert 96/128kbit AAC to 160kbit MP3 (I can hear losses in 128kbit, the music sounds less lively), plus full stereo instead of joint stereo (JS can introduce some interesting aliasing effects in quiet parts, sounds like distant whispers). You asked this on comp.sys.acorn a while back, to the same deafening silence. I do hope you find another RISC OS audiophile, but I’m afraid I’m not the one. All I ask is that in-ear ear lugs provide some sort of bass response so my music doesn’t sound like it is being played on a Tandy radio… 1 In 1997, CDs were commonplace and many computers were capable of playing audio CDs. So guess the ONE sampling rate that Intel omitted from the AC97 spec. It’s nice to have 48/96kHz in 16/20 bit, with up to six channels… but to neglect the widespread 44.1kHz is kind of stupid IMHO. |
nemo (145) 2546 posts |
Probably. And I wouldn’t limit it to RO users. This is an MP3 world after all.
Anything will be better than linear, granted. But equally it’s only going to be noticeable for (what was at 44.1) “high” frequency. Similarly, the difference between 16bit and 24bit is only going to be discernible (I guess) in low signal/high amplification circumstances that also aren’t common. I imagine there’s as many RO users worried about 24bit audio as there are about the lingering death of ambisonics… Fascinating and educational graphs BTW. Sadly I’m a chocolate teapot as I don’t have any RO hardware capable of anything better than 8bit. Yes, truly. |
jim lesurf (2082) 1438 posts |
Sorry for this, but I have no idea how to quote from earlier postings in the way I would for email or usenet. So I’ll have to answer/comment more indirectly. FWIW The lack of 44.1k working properly (at present!) on the PandaBoard is duplicated for 48k as well. It isn’t anything to do with AC97. But down to the HAL, I think. When you set a 44.1/48k system rate the system actually still uses the TI audio chip at 88.2k or 96k. It does this in a (to me) crazy way. It needs to double up the number of samples somewhere along the way whilst letting RO think the rate is low. So the HAL/OMAP simply uses both the R and L values for both output channel. The result is a weird ‘mono with added ultrasonic crap’. Also confuses the hell out of some ported items. The assumption seems to have been that the chip doesn’t do 44.1k or 48k. But I suspect this it does for various reasons, e.g.: 1) For ages 48k/16 has been the bog standard for ‘PCs’. Just as it has been for DVD-V, TV, DAB, etc. 2) Similary Audio CD and nearly all mp3 files use 44.1k. So a hardware maker of a chip to go into a computer would need these days to be seriously crazy not to support these rates. Or at least one of them. And looking at the documents I think the TI chip is a delta-sigma resampling one. (Hence the HF hill in the spectrum.) That means it can accept almost any input rate as it is going to upsample anyway. Indeed, the spec sheets list a number of rates inc. 44.1k and 48k. So the current limits and problems seem due to not getting the data fed to the chip correctly IMHO. Really should be fixable. I guess I am unusual in being a RO user and a electronic engineer who has worked on consumer audio and a journalist who writes for HiFi and other consumer mags. But that means I can see the audio/AV consumer areas as well as RO. I know there is a lot of interest in serious audio, and that people pay good money for it, as well as some being into DIY. Hence I’m sure that the PandaBoard (and some other new hardware) has serious potential in the audio area. There may not be many serious audiophiles now use RO, but they could easily adopt RO if some fairly basic problems/limits I’ve seen were sorted. And that could mean manufacturers as well as people who want to play music. So there is a real opportinuty here. Earlier hardware like the Iyonix simply couldn’t have coped. The hardware simply wasn’t up to it. But the PandaBoard has a 96k/24 capability, and tests show it can deliver good results. I suspect the RPi can do well also as it has an HDMI. I suspect the real question here is if people are simply satisfied for RO to ‘work on new hardware but be a bit faster’ – i.e. keep existing users happy. Or to see it able to expand to draw in many new users for new purposes. I’d like to see the latter, as well as being able to use it myself as a good audio player, etc. BTW One thing I do wonder about is: How many people who happily play moderate rate mp3 files have spent time listening to seriously good audio setups playing high resolution material. In my experience most people have never even experienced good stereo from speakers as it is quite hard to achieve. Stereo doesn’t simply mean two speakers with the sounds spread between them like washing on a line. :-) Yes, I’m sure most people these days use mp3 and don’t even think of anything that might just sound better. But there are also thousands of people into serious audio. I suspect that – at least pre Raspberry Pi! – there were far more audiophiles in the UK than RO users! Many of them willing to put a lot of effort and cash into it. Some of them really do put the ‘fanatic’ back into ‘fan’. 8-] Of course, these thoughts are subversive. 8-] They could mean becoming dissatisfied and wanting something better, or even welcoming more people to the party. :-) And no, better interpolation isn’t simply a question of “only dscernable at high frequency”. Unless you very carefully keep a x2 rate ratio the garbage will extend down to the audible range. Given that a lot of material is 48k these days, that implies something else that RO currently “doesn’t do”. Which is have the system sample rate switch with the source automatically. When I play audio on my Linux boxes the samples are fed to the DAC at the rate-for-the-file. I can watch the LEDs on the front show this. The samples reaching the DAC are the ones listed in the file. No needless conversions. Actually make the computer’s task easier as it is just feeding data. Whereas with RO I had to hack a copy of PlaySound to make it check the sample rate for each file as it started to play it, the change the system rate to suit. It works. But as a hack I can’t release it as it “isn’t the normal way” and would be a confusion to have more than one version of PlaySound. Having seen the duffel software, though, I may sometime try to put together some playing app of my own that changes rate to suit in this way. May even be able to work out how to improve the interpolation if someone who understand these issues, and the questions I’ve been raising, actually responds with the info I am asking about! Have I raised my questions on the wrong ‘section’ of the forum? I was hoping someone involved in the audio programming would respond with some answers. OK, only been a day so far, but nothing yet… I’m wondering if this is going the way of the old Goon Show joke, “Christmas Eve, and still no offers of pantomime!” :-) Jim |
Colin (478) 2433 posts |
Jim, Can you go back and edit your message and put < blockquote> </ blockquote> around what you’ve quoted html style it’ll be easier to read then. thanks. I should add you can edit your existing post you don’t need to post a new one. |
Colin (478) 2433 posts |
Forget what I said above I see what you’ve done now. |
Colin (478) 2433 posts |
I know its not what you want to hear but the best person to do this may well be you. I can program, I like my audio better than I’ve heard from a computer but I know nothing about programming audio – I fried the audio chip in my iyonix years ago – so I’d be useless. I agree risc os should maximise its audio abilities. If the RISC OS drivers are not good enough change them – with the open source you don’t have to put up with inadequate drivers. I would assume that the current drivers are a subset of what you want so if you got new drivers working the old API could just call the new. If you definitely don’t feel up for it maybe you could specify exactly what you want. What I’m suggesting is that given we may not have anyone capable of doing everything maybe the task can be split for someone else to do what you can’t. |
nemo (145) 2546 posts |
You are now our nominated audio expert.
Well that’s going to be the majority, yes. :-) However, lead and others will follow. Build it and they will come. etc
Indeed. I was referring to the obvious truism that linear interpolation works fine for low frequency signals but at frequencies closer to the sample rate it produces increasing aliasing, which definitely does appear in the audible range as you say.
Regrettably, my experience in the graphics, font and internationalisation fields has shown that the people who did the work and the people who understand the issues are not always the same people, and in any case both groups are rather thin on the ground these days. But what the scene lacks in numbers and experience it makes up for in dedication! (Me excluded :-/)
You’d have to find them first. |
jim lesurf (2082) 1438 posts |
Hi: Can someone explain how I can get to take parts from a previous posting and copy them here as a quote? I can’t find a way to do this with NetSurf. So I’m going to have to paraphrase, etc, again. :-/ Colin: wrt my programming in this area, I can explain what is required. But I have no idea how to write or modify something like one of the drivers, or modules. My programming skills are in a limited area, and I haven’t done any assembler in decades! (The only module I ever wrote – DrawGen – was done with the ABC compiler which meant it was entirely in BASIC. And I’ve not even used BASIC for anything beyond a ten-line quick prog for years.) I can summarise below what is needed, though, and explain in more detail as may be appropriate. Nemo: Yes, fair enough that in some ways I’m probably the closest to an ‘expert’ in general audio terms. I can design, build, and test audio hardware and have a fair idea how things like DACs, amps, etc, work. I also know a fair bit about the state of play of high-end consumer audio and what people expect or look for. But as I say above, I’m simply not of the right ‘grade’ when it comes to software at the assembler/module/driver/OS/HAL levels! One reason I’m asking questions is that i’m often not sure which part of the system is where something occurs or may need sorting. Above said, AIUI at present linear interpolation is done by something like the PlayIt driver. And that at present for a simple x2 upsample (e.g. playing a 44.1k sample rate file with a 88.2k system rate) this simply interpolates the average of the values ‘either side’ of the one to be inserted. If any of that is wrong, someone will have to tell me, and explain what does happen. However, a better interpolation for x2 would be to use the four values around the point to be interpolated. i.e. the samples one sample and two samples ‘before’ and those one and two after. If that isn’t clear, I’ll illustrate. For simplicity think of just one of the stereo channels. We can represent the input samples as a series x[i] represnting the waveform at times dt*i. Linear interpolation works out an intersample at a time dt*(i+0.5) as being and duly sends to the output x1 y1 x2 y2 x3 y3… as the sequence. So sending out values at x2 rate with linearly interpolated ones in between the values from the source. For the next practical order if we use the same method we’d have So the driver now has a slightly more complicated routine to perform for working out the intersample values. But the idea is much the same. In fact we don’t need to do the math as above because – as is general for digital filtering – there are different forms and approaches. The above is an ‘FIR’ filter. But we can employ alternatives that use the four ‘previous’ values. However that means we have to recalculate every output value. Not pass on the x[] ones. So this again is a matter of what turns out to be simplest from the programmer’s POV as well as what kind of results we want. One note of warning. The input values may reach up to full range for 16 bit. So we need to ensure that the calculation is done with ints bigger than 16 bit to avoid some intersamples being ‘clipped’ (overflowing). That can’t happen for linear but it can happen for higher order. Not obvious from the above, but we don’t have to have both A and B +ve. One of these is usually -ve. And ideally, any such calculation should also be dithered. Which again implies shifting the values a bit. [pun alert] But I suspect that will pass un-noticed if we ignored it. Hope that makes sense. Have to break off. Need to take some photos for an article. Slainte, Jim |
Dave Higton (1515) 3526 posts |
Pointless. The error is only plus or minus half an LSB, and bears no systematic relationship to the sample values. In effect it’s already dithered. There you are – I’ve already applied an optimisation for you :-) |
Dave Higton (1515) 3526 posts |
Control-C and control-V work in Netsurf too. Copy the text you want to quote, and insert it after the characters "bq. " (you must include the space). Do your experimentation in the “Tests” forum. See the “Formatting help” section at the bottom of every forum page. |
Rick Murray (539) 13840 posts |
There’s sadly this thing called reality. I’ve heard a high end sound system, demo’d by a guy who used bass-that-makes-you-hork as a benchmark. |
Jeffrey Lee (213) 6048 posts |
It’s my understanding that *volume should only affect 8-bit material. But it’s possible that I’ve missed something when checking the code. Or maybe some music players/sound generators aren’t following the rules properly and are using the wrong volume controls for things. 2) I find that when I allow an application like PlaySound or DigitalCD and drivers like PlayIt to take note of the *volume then I have to set a *volume value of about ‘100’ to ensure that a source file with a full-range 16-bit waveform just fails to clip the audio chip on my ARMiniX. Yet the report back is that this value gives a ‘gain’ of well below 100%. I suspect the easiest way of getting to the bottom of problems like this would be to add some debugging features to the OS to allow the output of the various sound systems to be captured. E.g. capture the raw 8-bit sound data, capture the output of each SharedSound client, capture the final data that SoundDMA sends to the hardware, capture all the different volume level settings, etc. Then it will be easy to see if each component is applying volume settings correctly and where any clipping is being performed.
At the moment I’d say the only way to ensure that’s the case is to:
The first three steps should ensure that the sound data you generate in your linear handler doesn’t get modified by the OS (although, the first two are technically unnecessary since your handler could just overwrite any 8-bit sounds if it wanted, and installing your handler in step three would remove the handler SharedSound had installed). The fourth step should ensure the hardware isn’t applying any amplification/attenuation to the sound (and if it is, then it’s probably a bug in the HAL which should be fixed). As far as any future major update to the RISC OS sound system goes, I expect we’d want to aim for a system which supports the following:
That should put us on part with most other modern OS’s. But it is a pretty big job, and will need someone like Jim to verify that everything is working as intended :) (At the moment the best I can offer is to pipe the output into my PC and record it through that. Not very high tech!) |
Dave Higton (1515) 3526 posts |
In an ideal world we would be using the on-chip DSP for these DSP tasks. TI do provide some stuff, but much of it remains proprietary to them and the rest seems to assume Linux. It’s not clear to me whether any of it could be re-used by RISC OS. I do have access to TI’s Code Composer Studio, but that’s no use unless we can find out how to get code into the DSP and data and out of it. |
jim lesurf (2082) 1438 posts |
Hi Dave, “bq. "Control-C and control-V work in Netsurf too. Copy the text you want to quote, and insert it after the characters "bq. " (you mustinclude the space). Afraid I’m still not following something wrt Netsurf. Although CtlV/C may work I can’y use the mouse to select any chunk of text from a previous posting that I wish to quote! The above was the result of my saving the entire page from NS as text and then dropping a selection back onto the letterbox from !DeskEdit. This makes it difficult for me to deal with points Jeffrey and others have raised. But I’ll do the best I can in another posting and hope someone can explain to me what i’m not understanding about how to copy previous postings from their list on the page down into the ‘reply’ letterbox. Jim |