C to BASIC struggle
Kuemmel (439) 384 posts |
I recently tried to port some nice radix integer sort code from Link written in C to BBC Basic. Here’s the C code:
…and here’s my try
…but it doesn’t work…kind of stuck now in that pointer or byte/word chaos…may be somebody got a clue. |
Chris Hall (132) 3554 posts |
For sorting arrays and strings in BASIC, I think you are best using ArmSort. It is excellent. |
Holger Palmroth (487) 115 posts |
Two things: You are dealing with 4 byte wide addresses here, so just incrementing by 1 won’t do it Here you were tricked by the C syntax, q will be incremented after at the end of the loop. Addresses!! :)
Thank you, that was fun! |
André Timmermans (100) 655 posts |
The following C line is puzzling:
I would decompose it as:
which means that the ++ serves no purpose. On second thought, considering the manipulation of count[] a few lines above, I guess it is:
|
Kuemmel (439) 384 posts |
@Chris: Thanks for the link, I’ll check that out, but it seems when it comes to speed for sorting huge amount of integers the radix can’t be beaten… At first I also tried not to use the dummy “d”, but basic doesn’t like pointer of a pointer, as gives an error.
…now it’s time to port it to assembler :-) |
Chris Hall (132) 3554 posts |
BASIC is quite happy with pointer to pointer (to pointer, to pointer … ) if you get the arguments right. |
Martin Avison (27) 1494 posts |
I would be astonished if a radix sort written in Basic beats ArmSort (written in assembler) used from Basic! But I have been astonished before. |
Holger Palmroth (487) 115 posts |
Obviously …
… porting it to assembler is Kuemmel’s next step. |
Rick Murray (539) 13840 posts |
I had a lame bubble-sort that moved 512 byte blocks. Written in BASIC, it took a rather long time. I don’t recall the time exactly (it was around twenty years ago) but I think sorting a 200 entry array in 40-odd seconds on an ARM3 was about right. The problem with BASIC is that it doesn’t really have any “memory”, so the same block of code will be parsed, interpreted, and executed again and again; instead of, say, interpret once, execute many which is a real basic description of a compiler. I do not believe that assembler would have offered much benefit. It would be cap |
Rick Murray (539) 13840 posts |
I had a lame bubble-sort that moved 512 byte blocks. Written in BASIC, it took a rather long time. I don’t recall the time exactly (it was around twenty years ago) but I think sorting a 200 entry array in 40-odd seconds on an ARM3 was about right. The problem with BASIC is that it doesn’t really have any “memory”, so the same block of code will be parsed, interpreted, and executed again and again; instead of, say, interpret once, execute many which is a real basic description of a compiler. I do not believe that assembler would have offered much benefit. It would be capable of running faster, certainly, but compare that against the extra time it would have taken to write and test. The leap of ~40s → <1s is significant. It is harder to justify trimming one “under a second” value to a different “under a second” value. |
Steve Pampling (1551) 8170 posts |
ping response 400ms naff link All < 1 second. Harder to justify trimming? I must introduce you to a few of my 9000 userbase, I’m sure you’d have an interesting debate.2 1 Location of selected gateway varies depending on BT load |
Paul Sprangers (346) 524 posts |
Since you want to sort integers in a memory block in Basic, is there any particular reason for not using OS_HeapSort? It’s easy to implement and very fast. Actually, I wouldn’t be surprised if ArmSort were wrapped around that very call. But like Martin, I have been surprised before. |
Rick Murray (539) 13840 posts |
If you are going to comment on a cherry-picked quote, some context might demonstrate that I wasn’t saying what you thought I was saying. I was referring to a specific case where compiler code would have been unlikely to be that much slower than pure assembler when implementing the same algorithm. In this case, a specific event was considered (on a much older machine) to take “less than a second”, as was its potential replacement. The envisaged difference did not justify the time expended on creating a third implementation. It would have been unlikely to offer the same dramatic difference as was offered by moving from BASIC to C. This was also for irregular events, so the gains from doing the same job twenty thousand times would be dispersed over several lifetimes of the software. Yes, if you look around you can find many examples of where trimming milliseconds counts. That internet with the treacle-slow ping, how fast is it going to load a web page that refers to numerous files on numerous servers, as opposed to the one that can manage the trip in 8ms? As a cumulative effect, it will add up. Also, the freakishly long ping indicates a potential connection problem so whilst the link may remain active, it could be retrying over and over in the background, net result being that “it’s slow”. Another example? My PVR claims that it is capable of D1 recording (which means it should cope with 720×576@25fps). It doesn’t. It manages 640×480, and I think part of the image decode resizes the vertical to 288px in order to reduce bandwidth, though I have not confirmed this. Perhaps optimised assembler might help eke out a bit more from this oversold and underpowered device? When you are trying to record live video to H.263 in real time and encode the audio samples as AAC at the same time and run some sort of OS to tie it all together, optimisations of microseconds could make big differences. Away from tech, differences of less than a second can be the difference between being a winner and being a corpse. Most forms of racing fall into this category. Did you ever watch “Tokyo Drift” (one of the fast/furious franchise)? There’s a scene where the cars “drift” through a road crossing with loads of people around (probably supposed to be that infamous crossing in Shibuya). A difference of less than a second would be the difference between drifting the crossing…and wrecking out having slaughtered a significant proportion of the pedestrians. |
Steve Pampling (1551) 8170 posts |
I was pointing out that everything is relative. I could equally have pointed to an instance of a database query which contained a leftover from a test setup that queried a non-existent test table that didn’t abort, rather it timed out.
Duplex mis-matches causing collisions and back off, high traffic and prioritised traffic in a shaped link, traffic policing capping things (great fun – silent drops when the limit is reached) QoS slowing less important traffic where someone deems your ICMP echo to be less important. Yeah, played with a few variants. Which was the exact point I was making. Cumulative effect.
I have seen TV clips of stuff from the genre. Strikes me as simulated game footage using live action figures. I don’t do games – built a games machine a few decades ago and handed it over to my sisters as soon as the fun bit was done. 1 The authors claimed it was a network problem, our capture of traffic showed the delay element happening repeatedly and generating the total delay. It was of course a quick fix since we had already done the diagnosis. |
Martin Avison (27) 1494 posts |
In fact ArmSort can use OS_HeapSort, but for single, ascending, large Integer arrays it is slower than ArmSort code! ArmSort also starts with Basic Arrays … and there can be several of them included, with different data types and sort requirements. Try sorting a set of arrays by Age%(), FirstName$(), Surname$(), Address$() using HeapSort! It is worth noting that sort times can vary dramatically depending on original sequence of data, the number of duplicate keys, and the length of the keys. I was also amused to discover that the radix sort is the algorithm for a sort used by a mechanical card sorter to sort 80-column punched cards … which I first used in 1969!! One big disadvantage is that the time taken increases with the length of the sort key. |
Kuemmel (439) 384 posts |
@Martin: Just for the comparison, can you check how fast your ArmSort can do the task of sorting 1000000 32bit random integers ? To create them I use I managed to do the assembler port, will put it online soon after cleaning up the code. The results on my Panda (1200 MHz) are In command line mode (no desktop yet) on my IGEPv5 (1000 MHz) the speed up of the assembler version is like “wow”
|
Paul Sprangers (346) 524 posts |
Just for your interest: Basic with HeapSort does it in 1.60 seconds. So, that’s a big win for Assembler – and I’ve only timed the sorting, not the creation of the memory block. Here’s the code (without the printing to screen):
Timings are done on an ARMiniX. |
Martin Avison (27) 1494 posts |
@Kuemmel: For 1,000,000 random Integers in a Basic array, generated with none deliberately set the same by: Times with ArmSort: Note these times are for a Basic array, not from an array of integers in storage (eg DIM array% max%*4) |
Kuemmel (439) 384 posts |
Find my code here It runs 3 assembler versions (basic implementation and 2 different ways of loop unrolling, not too much faster), the basic version and also for comparison the OS_HeapSort. You can change the maximum amount of data at the top of the basic code and toggle s% to show the values…what doesn’t make much sense for high numbers, just basically for debugging purpose. I’m still trying to find a better way of loop unrolling and creating a flow of indepedant instructions, but due to the nature of the algorithm most of the LDR/STR’s depend on each other and the content…let’s see if I can find something. @Martin can you run it on your machines, maybe something interesting comes up. I’m still impressed with the Cortex A-15 five times speed up. |
Martin Avison (27) 1494 posts |
@Kuemmel: I downloaded your sort test. Note that the data used by each sort is random (ie different) … for comparisons it is better to use the same fixed seed for each test, so I have added Here are the results on my Iyonix.
Note: These sorts cheat! … the numbers after sorting are not the same as before sorting … so their times should be discounted. |
Jeffrey Lee (213) 6048 posts |
So who’s going to be first to try something like this? :-) http://researcher.watson.ibm.com/researcher/view_person_subpage.php?id=3990 (Note that the multi-core part isn’t a requirement… but you’ll obviously get bonus points if you get a multi-core implementation working on RISC OS ;-)) |
Kuemmel (439) 384 posts |
@Martin: The v2 and v3 are not meant to cheat…must be something wrong with the code, I’ll check that out. @Jeffrey: Found that before on the net also. Not clearly sure if NEON would help as there’s so much memory indexed addressing…but it would be worth a try someday. Nowdays you can google for graphic card enhanced sort algorithms that are up to 100 times faster with big amounts of data. |
Kuemmel (439) 384 posts |
@Martin: Here Link I got a new version to test for you. I didn’t find any error for the 1000000 value test in the last one but made some loop code more “safe”. I also put in some comparison of all sorted data to OS_heapSort to check if my code produces the same exact results. May be you can check again, then I’ll update my original link. I also made sure each routine gets the same random values as you proposed. If successfull, can you also run it on the Beagle and RPi ? P.S.: I speed up the BASIC version a bit as it doesn’t like long variable names. |
Martin Avison (27) 1494 posts |
@Kuemmel: I will try your revised tests … but may not be until Monday after I have prepared for, attended & recovered from the show. |
Steve Drain (222) 1620 posts |
I realise that any assembler or C version is going to far outstrip anything in BASIC, but it is possible to speed up your routine in many ways as well as the length of variable names. However, such tweaks are not very significant in comparison to re-assessing the alogorithm to use the strengths of BASIC, rather than transliterating C code. This runs about twice as fast as your routine and there is still scope for quite a few easy tweeks before going into full crunching.
|