ARM NEON in ObJAsm
Pages: 1 2
Steve Drain (222) 1620 posts |
Three years ago there was a very extended topic in the Raspberry Pi forums about the suitability of various un-enhanced 1 languages to numeric tasks. This centred around finding the first fibonacci number with 1 million decimal digits. DavidS was there on the sidelines with his usual promotion of Assembler and BASIC, but the other contributors were fully engaged and there were some really interesting aspects discussed, so that I learned a great deal. To be part of it, I set about finding a solution in ARM BASIC, using array operations with some modifications of my own invention. I was able to get my solution to finish on my mini.m in about a day and a half! Towards the end, the experts, using various compiled languages, were shaving fractions off substantially less than a second on some quite fast machines. I then found the Numbers module that Gavin was maintaining. Using that, still in BASIC, I was able to get a result in only a few minutes. While working out how to use Numbers I wrote a StrongHelp manual for it, so if anyone is interested I could post a link. There are a couple of other things I learnt. First is that the algorithms used in Numbers have dated and there are significantly faster ones around now. The second is that there are C libraries that incorporate all the speed tricks for the fibonacci task and can produce a result in a single line. ;-) What is needed is a job like the RexEx module, which takes a C library and incorporates it into a RISC OS module. That is probably Zahl, of course, but I have not looked at it yet. 1 Un-enhanced was taken to mean no libraries or extensions. Edit: it was fibonacci number not factorial – oops. |
GavinWraith (26) 1563 posts |
I wrote an article The Triumph of Lazy Tables in PiAddict Vol 1. No 6.August 2013, comparing the performance of different algorithms for calculating the fibonacci series. Alas, that magazine is defunct, I believe. The contradiction between time and space (i.e. storage) is an interesting theme in computer science. Lua makes this explicit. Functions ( |
Steve Pampling (1551) 8170 posts |
That set me thinking back to school days. Our schoolboy minds quickly realised we could use a tetrahedron of values in the same way for (x+y+z)n |
GavinWraith (26) 1563 posts |
I had a long email correspondence with Richard Hallas, when he was editor of Foundation Risc User, about how people differ in their imaginations, particularly how they see numbers.
Good for your schoolboy minds. I think you were unusually quick. |
Steve Drain (222) 1620 posts |
Both the discussion and my reading around it introduced me to several of those. My own attempts used only slightly enhanced Karatsuba method.
Now there’s a concept that was new to me and cropped up a lot in the fibonacci topic. Nothing like that for BASIC, although I have learned that the use of precalculated data structures can save a lot of code and time. |
Steve Drain (222) 1620 posts |
I now realise that I did look at Zahl back then, but it is not the module I hoped for and not possible to integrate with BASIC, is it? |
Steve Drain (222) 1620 posts |
That reminds me of, first Mathematician’s Delight, but especially of the later Prelude to Mathematics, that introduced me to the idea of looking for such patterns. |
GavinWraith (26) 1563 posts |
Both excellent books. Alas, the IMath library, which Zahl uses, is in C and would be hard to integrate with BASIC. I felt a bit guilty that this aspect of Craig-Wood’s BigNumber module I was unable to keep alive. I think he went on to look at some other arithmetics that are possibly interesting for RISC OS. One was to choose the biggest prime, call it p (surprise?), smaller than 2^64. Its residues can be represented within 8 bytes. Modular arithmetic modulo p could be useful, and there are some clever ARM assembler tricks to make it fast. Another useful thing to consider is the Chinese Remainder Theorem. That tells you that modular arithmetic modulo a product of distinct primes can be done by working modulo each prime separately. Ideally you delegate a separate core for each prime. Something for the future, maybe? |
Pages: 1 2