FP enabled version of LAME?
Andrew Rawnsley (492) 1445 posts |
I think I saw someone post some results from a version of LAME compiled for Beagleboard with FP enabled? Thanks in advance |
Trevor Johnson (329) 1645 posts |
Are you on about this stuff that Kuemmel and Chris did? |
Rob Heaton (274) 515 posts |
I’ve just built LAME using the RISC OS autobuilder on Ubuntu. |
Andrew Rawnsley (492) 1445 posts |
Yes, it was the Kuemmel and Chris thread – the implication being that it could be compiled with hardware fp, I think. Has this build be done that way? It just seems like a huge candidate for hardware fp benefits – off the top of my head one of the few instances where it could show immediate benefits. |
Rob Heaton (274) 515 posts |
This build hasn’t been compiled with hardware fp. |
Matthew Phillips (473) 721 posts |
Did you mean that you “don’t know” or “now know”, Rob? |
Rob Heaton (274) 515 posts |
Ooops! I meant “Don’t Know” :D |
Kuemmel (439) 384 posts |
…hm, may be somebody could help out regarding trying the hardware fp compiler option ?…I tried to google about autobuilder compiler options, but didn’t succeed…neither I’ve got a linux system or something to try on my own :-( To clarify something, the difference between ‘softfp’ and ‘hard’ is as far as I understand it not that we have a software fp. Both options use the fp hardware. Only the calling conventions are different (…whatever the effect is…that’s what I’m curious about…). Read here from the gcc manual:
Specifying `soft’ causes GCC to generate output containing library calls for floating-point operations. `softfp’ allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. `hard’ allows generation of floating-point instructions and uses FPU-specific calling conventions. The default depends on the specific target configuration. Note that the hard-float and soft-float ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries. |
Kuemmel (439) 384 posts |
Regarding the differences between softftp and hard, I found some really interesting pages in the unix community. Here you can find lot’s of benchmarks → Link1 Especially povray speeds up enormously with the ‘hard’ option. On the other side encoding mp3’s with mplayer doesn’t do much. More explanation regarding the differences between the options can be found here Link2 The difference is mainly that a binary compiled with “soft” or “softfp” is using the standard ABI where all floating point arguments are passed in the standard ARM register file (r0-r3) which means that if the function has emulated floating point OR hardware floating point, but in the case of hardware floating point the data has to be copied to the real FPU first which is somewhat of a performance hit (it incurs a 20-cycle pipeline stall and flush while the data is copied to the FPU register file, compilers do reschedule the copies to try and hide it but sometimes it’s not possible to get rid of that much waiting time). It also impacts code density. The VFP variant is what we call “hardfp” which puts floating point arguments in the floating point register file (s0 or d0 and onwards). On a chip with no VFP unit doing this would cause a coprocessor exception. They are also mutually incompatible with the standard ABI where floating point arguments are involved (but an app that ONLY uses integer arguments would work just fine, it’s unsafe to assume this just from the binary object header though). Essentially Debian armel uses soft (forcing emulation), Ubuntu has used soft in Jaunty and softfp in Karmic (optional hardware floating point with incurred penalties as above) and only Debian armhf has hardfp (no penalties except that it will not work with no FPU and requires a certain base level of CPU) enabled right now but Android and Ubuntu may have it in the future (this time next year maybe). So seems to be a quite useful thing to have that also on Risc OS…though I got no clue if the gcc developers can do it or how easy or difficult that is… |
Andrew Rawnsley (492) 1445 posts |
Just a quick followup – Chris Gransden sent me a version build with hardfp, and it performed about 3x faster than the previous version. It still isn’t what you’d call speedy, but it seemed to be down to about realtime conversion speed, which is a big improvement over older versions, and I’d even say quite usable. |
Chris Johnson (125) 825 posts |
I have been trying out lame myself with a view to getting my own ripping/encoding software going on the ARMini. On the Iyonix I have found that the older 3.92 version of LAME is about four times faster than Rob Heaton’s compiled version of LAME 3.98. I haven’t tried the hardfp version on the BB yet. |