C to BASIC struggle
Kuemmel (439) 384 posts |
@Jan: Here are the IgepV5 (1 GHz) results from your code:
Regarding the 4/2 pass choice it seems that for larger amounts (>1000000) of data the 2 pass solution becomes better. I agree about the beauty of the algorithm, according to Wikipedia it dates back to tabulating machiens from 1887 ! I’m intrigued by the idea to use the NEON unit finally for sorting. In the NEON Programmer’s Guide is a NEON enhanced Bitonic with Merge sort algorithm used for a median 7×7 filter…but that’s a lot of stuff to read into before getting that done…and still might be slower, who knows… |
Jeffrey Lee (213) 6048 posts |
That NEON programmer’s guide looks like a useful doc – thanks for pointing it out! It’s a bit of a shame though that it’s a bit patchy in some areas. e.g. it doesn’t give examples of how VRECPE/VRECPS and VRSQRTE/VRSQRTS should be used, and although it goes into detail about the Cortex-A8 NEON performance it has very little information about the other processors. But on the bright side, it does explain what the polynomial data type is meant to be used for, which I haven’t seen elsewhere at all. |
Martin Avison (27) 1494 posts |
@Jan: It is your code, so you can write it how you like. And as you say, your aim was to do a very valid investigation in to the effect of data sizes on performance of your sort code. I know from bitter experience that testing sorts is not easy. There are many variables to consider, and the ‘best’ will be a compromise. Some sorts can slow for relatively few duplicates, or runs in sequence (ascending or descending), or other ‘non-random’ distributions. |
Kuemmel (439) 384 posts |
@Jeffrey: A good example how to use VRECPE/VRECPS and VRSQRTE/VRSQRTS can be found in Lachlan Tychsen’s math function database => Link I extracted the lines here: You can “stop” the calculation before the last VRECPS or VRSQRTS, but of course you’ll get less accuracy, you could even only use the VRECPE or VRSQRTE, but end up even less accurate.
|
Jeffrey Lee (213) 6048 posts |
Yes, I know how to use them. The problem is that I doubt everyone else knows :-) I’ve seen people write general-purpose math code without using the ‘step’ instructions – not good! |
Chris Hall (132) 3554 posts |
what ‘step’ instructions? |
Jeffrey Lee (213) 6048 posts |
See what I mean? ;-) VRECPS and VRSQRTS are the step instructions (S = ‘step’), which are designed to be used after the VRECPE/VRSQRTE ‘estimate’ instructions in order to iteratively improve on the accuracy of the result. |
Steve Drain (222) 1620 posts |
I was interested in to see what these were, too, so I looked at the StrongHelp VFP manual I prepared from Jeffrey’s comprehensive lists of instructions. I knew I had not checked over the NEON bits and I have found that it is not good for the ones here. So, if you were going to to look there please wait for a couple of days until I do a proper job. ;-( |
Martin Avison (27) 1494 posts |
@Steve Drain: re program posted 27th Oct: I have been unable to get this to work! Does it only work for certain types of data? If I try to sort 100 random numbers from 0 to 255 it seems to sort the first 24 and leave the rest unsorted! Truly random signed integers seem even worse. And I have just cut’n’pasted your code! Honest!! |
Martin Avison (27) 1494 posts |
@Steve: Sorry for casting doubts on your program! I have now realised that the size% parameter needs to be set to 4*(elements+1)-1 for correct operation. |
Steve Drain (222) 1620 posts |
Yes, that was naughty of me – size% is the DIM size. I have been doing it that way for so long I forgot that others would not realise it. The effect is to change the end condition of the FOR loops when byte% is added. Sorry. |
jan de boer (472) 78 posts |
@Kuemmel: Thank you for the taking the trouble of doing all these tests. Quicksort is a nice example of O=2logN.N ! For Radixsort on IGEPv5, however, your code was faster (0.09 csec) so please stick to that code. My main reason for posting was, why do these timings seem to be so slow? Cache issues? What to do about it? |
Martin Avison (27) 1494 posts |
If anyone is still interested in testing and comparing various sort routines, then I have created SortTest which provides a test bed for integer sort routines, to help to make it easy to create consistent, comparable results, with plug-in sort programs that just do a sort. Test runs of the sorts are controlled by a simple script file. This should enable effort to be concentrated on the sort programs themselves! Some of the recent postings on this thread have been converted to plug-in sorts within SortTest. For a few further details and download please see my website Comments always welcome! |
Kuemmel (439) 384 posts |
I finally updated my !radix_sort app code with the latest input from Jan. You can find the latest version here It adds Jan’s “Radix,MSB first,then LSBs” and “Radix,MSB then LSB’s + Cache” routines, adding an even more major speed up on the Pandaboard (guess on other hardware too) regarding the task to sort integers. Also includes his code to experiment with the cache optimisations. |