Benchmarks
Pages: 1 2 3 4 5 6 7 8 9 10 11 ... 18
Chris Hall (132) 3554 posts |
Using version 1.1 of Richard Spencer’s benchmarking programme I have added some further benchmarks to the original table. In the table below, the benchmarks are expressed in percentages, where 100% is equivalent to a Strong Arm 202MHz Risc PC running RISC OS 4.02 Note: on the basis of these figures, the Beagleboard XM is currently running at 800MHz until a ‘processor speed choice’ utility is completed, allowing 1GHz to be selected. Edit: actually 600MHz not 800MHz allowing for super-scalalr nature .. so will speed up a lot if we can get processor frequency switching implemented Edit: Note HDD tests used a 160Gbyte USB HDD, not a USB pen drive Two of the benchmarks proved sensitive to screen resolution and so I repeated these at various resolutions: (**) – mode not supported correctly due to colour matrix hardware |
Steffen Huber (91) 1953 posts |
I did a benchmark roundup with various synthetic and “real world” tests, mostly comparing the BeagleBoard with the IYONIX. Basically, the results are similar to what RISCOSmark discovered:
|
Jeffrey Lee (213) 6048 posts |
Actually, if you compare those figures to the ones I got for a beagleboard running at 500MHz then it looks like the default speed for the xM is only 600MHz. Remember that the Cortex-A8 is dual-issue superscalar, so it should get about twice as many MIPS per MHz than previous RO machines. Also it’s interesting to see the big improvement in the memory test results (1089% → 3556%). I know the L1 caches are now twice as big (32KB vs. 16KB), and that the L3 interconnect should be faster, but for a performance boost like that I’m guessing that the L2 cache must have received a clock boost too (I’m not sure offhand how the L2 cache is clocked).
First thing to do here is probably to sort out support for background transfers (and therefore disc caching) in SCSIFS. Then do some more profiling to see if there’s anything that can be done to speed up the USB stack/SCSISoftUSB as a whole. We should also look into whether it’s possible to do anything to reduce the long mount times of USB sticks (unless the support for background transfers magically fixes it)
Apart from implementing a fix for the slow left-to-right rectangle copy op, I think it will take a lot of hard work to get any other performance gains from the DMA controller. So rather than pour loads of time into that we should probably take the easier route of making screen memory cacheable, like it is in RO 4.
Yes please! Since it’s been sat gathering dust on my hard disc for a while now, I think that this weekend I’ll finish up and release the first version of the VFPSupport module. That way RISC OS will (finally) be ready for programs to start using the VFP/NEON unit. Another item to add to the list: I/we still need to do a full investigation into why it takes so long to enter the desktop. I know it’s because OS_ChangeDynamicArea decides to flush the cache a few hundred/million times, but I’m not sure why, or how best to restructure the code. For a 256MB beagleboard it’s just about beareble, but after seeing how long it takes a 512MB touch book to enter the desktop I think this is something I’ll be looking at fairly soon after I get my -xM. |
Tank (53) 375 posts |
Just for info, my Devkit8000 results at various speeds
HDD is connected via a Toshiba port replicator II, from the standard USB port, and is a 38G IDE drive connected to a JMicron USB to ATA/ATAPI Bridge. |
Kuemmel (439) 384 posts |
From my fixed point Mandelbrot fractal code (Link) (I got a version using the normal 32bit MUL and one with the 64bit SMUL) I calculated the efficiency of the different CPU’s. The unit of measurement of the efficiency is [1000 Iterations per MHz]. So some kind of direct comparison of the computing power of the CPU core. The 32bit results show that the StrongARM/Iyonix are really same kind of ‘family’, ARM9 and C-A8 are a bit worse, though overall I don’t understand the difference between the two C-A8 (may be board revision, don’t remember). For the 64bit SMUL results it’s of course much faster as there’s less instructions needed, but also the code runs similar except the ARM9 and 600 MHz C-A8. The better 64bit results (compared to 32bit) of the C-A8 might also be due to the longer pipeline, I guess. Now I would be really keen to see a 1000 MHz C-A8 run and especially if ever possible a C-A9 run (if the Beagle Board will use this chip some time next year…), as the C-A9 can work Out-Of-Order what can help quite a lot…but may be the code has to be optimized to benefit from that (creating different ‘instruction lines’ with different registers while loop unrooling, etc.). ...and of course I got to do a VFP version :-) ! |
Jeffrey Lee (213) 6048 posts |
That kind of optimisation will help with pretty much any pipelined CPU design, as it will help reduce dependencies between adjacent instructions. And it’ll be especially helpful for Cortex-A8, because the only way the CPU can dual-issue instructions is if the 2nd instruction doesn’t depend on the results of the 1st (along with some other restrictions, e.g. load/store instructions, and (IIRC) multiply instructions, can only execute from the first pipe) |
Kuemmel (439) 384 posts |
@Jeffrey: I’ll try out some stuff regarding that instructions lines soon. I can really say that the benefit of this kind of coding for fractals on Intel’s worked out really nice…the different instruction lines combined with loop unrolling indeed multiplied the performance on the Core2Duo/i7 architecture (though this CPU really has massive parallel units and bandwith) compared to the Pentium III’s where those didn’t have any effect at all. I guess that could be the case for StrongARM compared to C-A8, too, we’ll see… |
Chris Gransden (337) 1207 posts |
Here’s the benchmarks running at 800Mhz.
|
Chris Hall (132) 3554 posts |
So … what ARM code did you use to switch processor speed within RISC OS? Or is it possible to switch speed before doing the fatload of riscos? |
Chris Gransden (337) 1207 posts |
I just changed the relevant values in this program. Oh and here’s the benchmarks at 1.2GHz. :-)
|
Jeffrey Lee (213) 6048 posts |
Except I removed it from my website a couple of weeks ago to make sure people couldn’t download it and use it to fry their -xM :) Once I’ve received my xM I’ll upload a new version which will understand the difference between the two OMAPs and allow the xM to be clocked to 800MHz. But for 1GHz I’m going to stick to what the datasheet says and wait until we have a SmartReflex driver (and the CPU frequency code) in RISC OS. |
Chris Gransden (337) 1207 posts |
I have a fan just in case. None of the chips on the board are even warm to the touch. I wonder how high it will go. CPUTmpMon is reading -40C. If you could suggest a fix for this then I might try 1.2GHz. |
W P Blatchley (147) 247 posts |
I’ll have a look at the datasheet for the xM and see if I can figure out what the problem is. I don’t have an xM myself, though, so I’ll need a brave person to test it out for me! |
W P Blatchley (147) 247 posts |
According to the DM37x TRM, the relevant registers (CM_FCLKEN3_CORE and CONTROL_TEMP_SENSOR) are absolutely identical. Does CPUTmpMon also report nonsense when you run at a lower clock speed? It could just be a timing issue (because I seem to remember employing a horribly ugly hack of a loop to wait for the temp. sensor conversion to start…) |
Chris Gransden (337) 1207 posts |
It reports -40C no matter what speed I run at. |
Kuemmel (439) 384 posts |
Dear Chris, you seem to have RiscOS running at 1 or even 1.2 GHz. Could you give my two benchmarks a run and post the results ? First is !FireBench and second is !FixFrac (Just try Set Nr. 2, that’s enough). |
Chris Gransden (337) 1207 posts |
Results at 1.2GHz. :-O
I also edited the post above. Results now for 1.2GHz. |
Chris Hall (132) 3554 posts |
What does ‘the datasheet say’ and what is a ‘SmartReflex’ driver? |
Chris Gransden (337) 1207 posts |
Perhaps a more useful real world benchmark. Watching DVDs is no longer a painful experience. At 1GHz KinoAMP averages over 23fps. |
W P Blatchley (147) 247 posts |
@Chris G, Trying to understand why CPUTmpMon doesn’t work. Could you try running the following and posting the output? *memorya p 48004a08 6 *memorya p 48002524 100 *memorya p 48002524 0 Thanks! |
Jeffrey Lee (213) 6048 posts |
sprs685.pdf, page 141, section 4.3.4: The “Processor voltages without SmartReflex” table doesn’t list OPP1G, but the “Processor voltages with SmartReflex” table (on the next page) does. They don’t give much reasoning for its absence, except for this note: OPP1G is a high performance operating point which has following requirements: – ABB LDO must be set to FBB (Forward Body Bias) mode when switching to this OPP. It requires having a 1uF capacitor connected to cap_vdd_bb_mpu_iva. – AVS (Adaptive Voltage Scaling) power technique must be used to achieve optimum operating voltage.
SmartReflex is the name TI has given to a collection of different power management techniques that they use in their products, and the associated hardware/software implementations of those techniques. Basically the main aim of SmartReflex is to allow devices to run on as little power as possible, thereby reducing heat, increasing battery life, and (I believe) increasing component lifetime. See here for some official blurb. |
Chris Gransden (337) 1207 posts |
|
Chris Hall (132) 3554 posts |
Jeffrey, I think you are right not to encourage 1GHz until the proper power management is in place for RISCOS. Certainly the hardware (1uF cap) must be present as the Linux GUI can operate inter alia at 600/800/1000 MHz on the XM. |
Chris Hall (132) 3554 posts |
What is your advice regarding the following hack code to change gear (i.e. processor speed)?
CM_CLKSEL1_PLL_MPU%=&48004940 CM_CLKSEL2_PLL_MPU%=&48004944 MASK_M%=&7FF00 MASK_N%=&3F SHIFT_M%=8 MASK_M2%=&1F : cm1%=!CM_CLKSEL1_PLL_MPU% cm2%=!CM_CLKSEL2_PLL_MPU% m%=(cm1% AND MASK_M%)>>SHIFT_M% n%=cm1% AND MASK_N% m2%=cm2% AND MASK_M2% PRINT "CPU frequency is ";52*m%/(n%+1)/m2%;" MHz" !CM_CLKSEL2_PLL_MPU%=(cm2% AND NOT(MASK_M2%)) OR (m2%*600/800) cm2%=!CM_CLKSEL2_PLL_MPU% m2%=cm2% AND MASK_M2% PRINT "CPU frequency is now ";52*m%/(n%+1)/m2%;" MHz" |
Jeffrey Lee (213) 6048 posts |
It won’t work, I can tell you that much :-) Although I can’t guarantee it will work (still no delivery from Farnell :-(), tonight I’ll upload a new version of my BASIC program that will allow the -xM clocked to up to 800MHz. And if you’re adventurous like Chris G. then I’m sure you’ll be able to work out how to alter the code to allow running at 1GHz+. |
Pages: 1 2 3 4 5 6 7 8 9 10 11 ... 18