No more big 32-bit cores for RISC OS from 2022
Pages: 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19
David J. Ruck (33) 1636 posts |
i.e. ULIW and predicates (conditionals on steroids), interesting yes, but in practice no – look how well that worked for Itanium. First version was a decade late, subsequent iterations were desperately underwhelming, while conventional ISAs (such as ARM64) ran rings around it. ARM definitely made the right choice with AArch64. |
Terje Slettebø (285) 275 posts |
@DavidS Thanks a lot for your substantial reply, much appreciated. I’m not a microprocessor expert, and you’ve inspired me to look more closely into these issues. Also, I’ll carefully read your reply and follow up any references. I really, really like the elegance of the original ARM processor, and I understand from your reply that the approach taken with AArch64 was not the only one possible. Working with extASM has been a delight when using ARM assembly, since you may write such compact and elegant instructions. Take this simple example, implementing the C sgn() function: CMP R0,#0 MOVLT R0,#-1 // extASM will transform this to MVNLT R0,#0 MOVGT R0,#1 This executes in 3S cycles no matter what. The alternative with only conditional branches is much more convoluted and unpredictable performance-wise. This also works nicely together with optionally setting the PSR, so you may have code like this: // Some computations, setting the PSR to the result // Some more computations, executed unconditionally, not setting the PSR // Some conditionally executed instructions, based on the earlier result I’ll address the license issues of extASM. I don’t remember the exact terms now, but it’s certainly free for anyone to use and distribute, and I’d welcome its inclusion in various distributions, such as making it freely available on the RISC OS Open site.
Thank you. :) I’ve been away from the RISC OS community for a long time, but using other systems like Windows has its way of reminding me just how great RISC OS is. Furthermore, the recent Cloverleaf campaign gave me hope for continued evolution of RISC OS, although having realised later that the posted roadmap was more based on hope rather than being backed by a plan with resources behind it. Coming back to RISC OS Open, I’m delighted to find a community that’s alive and kicking, and I realise of course that what brought us where we are today is development done by earlier RISC OS copyright holders (such as Castle), as well as developers in the RISC OS Open community. Coming back to ARM: Even though AArch64 may not be what we’d like it to be, for better or worse, it’s the 64-bit ARM processor, and as such is the one that is and will have wide usage in mobile computing and beyond, with implications for RISC OS. With this in mind, i.e. this thread, I intend to explore the possibility of making a 64-bit version of extASM, i.e. still running on 32-bit RISC OS but taking 64-bit ARM assembly as input and producing 64-bit machine code. This would enable us to start playing with 64-bit code in RISC OS already today, such as on the Raspberry Pi 4, which I’ve recently ordered for just this purpose. |
Rick Murray (539) 13850 posts |
Might I suggest you get in touch with the British “European Research Group”? They believe in unicorns and fairy dust and seem to have a way of making bad ideas sound vaguely positive by completely ignoring all the potential negatives. |
Terje Slettebø (285) 275 posts |
@DavidS
I did a search for “microprocessor parallel pipeline implementation”, but didn’t really find that much, and what I found tended to be academic papers behind a paywall. If you could provide links to online resources or books, I’d appreciate it.
I’d love that.
Very interesting. I’d love to learn more about this.
Then we are on the same page. :)
I’d love more elaboration on this part. While I understand that LDM/STM may increase the upper-bound IRQ latency, it seems you’re talking about something else here. For me, it doesn’t make a lot of sense to have to pore on instructions to save and restore registers, if a single one would do, but I’d be happy to be educated on this point.
It was fun, wasn’t it? :) And it made implementing jump tables very easy.
That would have been great, and it should significantly reduce the need for a lot of branch prediction and rollback of wasted execution, as you pointed out earlier.
Someone should give them a stern talking to. :)
Groan. And with an architecture that is mostly “ARM” in name, only, as you’ve pointed out. There’s never been a shortage of RISC processors, but ARM used to be special.
This is my dream. It’s not yet, but I’m moving there. Paul Vigay once wrote: “Windows is a stress inducer, RISC OS is a stress reducer.” This is how I feel as well. Continuing to use Windows is just not good for my heart, as I keep thinking “This would be so much easier in RISC OS!”. Yes, I know there’s Linux, and while it may be an improvement over Windows in some sense, it’s missing more or less everything I appreciate in RISC OS.
If your site is already up and running, even without these things, could you post a link? If not, please post one once you have put some of these things up.
I’d say that was lucky timing. :) From another posting:
Talk about turning the AArch64 philosophy on conditional execution on its head. For me, it never really made sense that having lots of little branches in the code – and consequently more instructions as well whose only job was to jump to a different place – was supposed to be more efficient than intelligently using conditional execution – coupled with other things like optionally setting the PSR and shift-and-operate in the same instruction – which tend to lead to dense, and I must say, code that is a delight to work with. However, I don’t yet have the knowledge necessary to compare these different approaches. I don’t know if I’d ever want to write 64-bit ARM assembly code, beyond potentially working on a translation or emulation layer for 32-bit ARM instructions on 64-bit ARM hardware. From yet another posting:
I, for one, sure am interested in such an alternative approach. Whether I’m one of the “correct people” is another matter. :) |
Terje Slettebø (285) 275 posts |
@DavidS Thanks a lot for your replies, and I’m very much appreciating all the work you’re putting into this. Make sure you get time off as well, and enjoy the holidays. :) |
Steffen Huber (91) 1953 posts |
You just need to find people who are happy to pay much more for much less performance, and are also happy to wait a few years – or perhaps decades – before such a solution may be available. Should be easy. |
David Feugey (2125) 2709 posts |
Do you expect all Pi4 to vanish after the Pi5 will be out? :) |
Clive Semmens (2335) 3276 posts |
Blimey. I had 12 RISC PCs at one time, bought for a song when my former employer stopped using them. Gave a couple away, used a couple for years, gave some to a museum. The last two, that I used for years, both with Strong ARMs, went to the dump once I’d got the Pi up and running. Little did I think they’d ever be worth anything… |
Terje Slettebø (285) 275 posts |
Is the idea to design a 64-bit ISA that retains the 32-bit ISA architecture, with 64-bit registers, and perhaps more registers, or is the idea to create a faster 32-bit processor through superscalar processing? If the former, and if we’re going to retain the 32-bit instruction size (which I think is a good idea) and increase the number of general-purpose registers to 32 (as in AArch64), I guess we’ll need additional three bits that we need to get from somewhere. I definitely love the idea, and if there is an architecture – as you’ve indicated – where the greater code density and less branching from conditional instructions is an advantage, rather than a hindrance, then that’s great. |
Clive Semmens (2335) 3276 posts |
Except in cryptography, where 64bits is still far too small… The original ARM architecture was actually pretty good at extremely large integers. I wrote routines to handle arbitrarily large integers quite efficiently. One day I might update them to more recent versions of the architecture…only I doubt if I’ll bother… |
Clive Semmens (2335) 3276 posts |
No, it’s absolutely true of some kinds of encryption, and as you say, irrelevant in others. If your encryption involves (as extremely strong encryption does) modulo arithmetic modulo the products of very large prime numbers, then you can be handling very large integers. The work you’ve been doing is evidently in one particular branch of the subject; I’m retired, but one of my former hats was as a lecturer in Mathematics, and this is one of my areas of expertise. I got my job at ARM partly on the strength of my knowledge of ARM assembly language, mostly gained working in precisely this area. |
Terje Slettebø (285) 275 posts |
Interesting idea, and I agree with you: For the most part, I’d think the top 32 bits of 64-bit registers would be wasted, and takes twice as much space to store. Do you know anybody else, i.e. outside your research institution thinking along the same lines? It seems to me that “the world” has more or less adopted by default that we “need” 64 bits, thinking of the increased address range, primarily, not considering how things like swappable banks could work. We might use the silicon saved this way for other purposes, such as more excution units and parallelism. |
GavinWraith (26) 1563 posts |
Or, products of lots of distinct medium-size primes, thanks to the Chinese Remainder Theorem. |
David Feugey (2125) 2709 posts |
Why not, but : Anyway, the first problem with the ARM 32bit ISA is “how far can I go without breaking the law”. The ARMv2 ISA is really too old, but some suggest that the ARMv4 ISA would be in public domain now (or soon). That will need to be checked if you want to have a good and stable base for a new and more modern Amber core. An open source ARM classic core (eventually with an optional 26bit support) would be great for many and could have a lot of uses (especially in the embedded market). |
Clive Semmens (2335) 3276 posts |
Indeed, and there are more examples in number theory where handling extremely large integers is fun, if not necessarily actually useful (yet…) That said, having registers much longer than 64 bits seems pretty silly, when it’s not hard to write software to handle extremely large integers in 32-bit registers & RAM, never mind 64-bit or more. Where the optimum place to draw the line is, I have no idea. But I’m inclined to the feeling that if one has any long registers at all, there’s absolutely no reason why all the registers need to be very long. As others have said, there are better things to do with the extra silicon… |
Clive Semmens (2335) 3276 posts |
I’m very sure it shouldn’t be, although I’d add a very speculative “yet.”
Exactly. A desire to address more than 4GB of memory is the main driver to >32bits, I’m pretty sure, having been privy to discussions among the engineers at ARM up to late 2007 – and then an unwillingness to contemplate a register size that wasn’t a power of two. Given that the instruction size remains 32 bits, I’ve never quite worked out why they couldn’t have most of the registers 32-bit, and maybe the first four or eight registers longer – possibly even implementation defined, starting at 40-bit or 48-bit and increasing by a byte or two as and when anyone feels the need. But that’s all water under the bridge now – unless someone feels like branching the architecture… gahh! |
Clive Semmens (2335) 3276 posts |
Of course you can page memory, ad infinitum should you be so inclined, no reason to stop at 256TB; that’s been done since CPUs had 8-bit registers. But it makes for more complicated memory systems. I can easily understand the desire to go beyond 32-bit registers, and I remember engineers at ARM discussing this 13 years ago and more. I don’t think they were being driven by marketing at that point. It’s the size of the step up that’s the big question – and I remember similar discussions about whether the step up from 8-bit to 32-bit was too big a step when most other companies had gone to 16-bit, or whether it made any sort of sense to go to 24-bit. The first computer I really got to grips with had 12-bit words…that was 1968. |
David J. Ruck (33) 1636 posts |
Right, can we get back to discussing RISC OS moving to 64 bit processors, rather than bring back 32 bit or ever 26 bit processors, and other flat earth nonsense. |
Rick Murray (539) 13850 posts |
I believe the issue is two related but different things. Firstly, in order to be able to manipulate pages of memory (they’re swapped around all the time, either in and out of “nowhere” like RISC OS does on Wimp polls; or to/from disc on systems with virtual memory), something needs to be able to see the entirety of the addressing space. Of course, it’s a better situation than the problems of having 128MiB fitted into a RiscPC with a processor stuck in a mode that can only directly address 64MiB (that’s why the limitations on application size, RMA size, etc on those machines).
There’s a difference between “flat address space” (ie. no MMU) and “being able to see everything”. The second issue is in the continuing evolution of the instruction set. While the original ARM set was aimed at programmers and had a beautiful flexibility practically unmatched elsewhere, the number of people writing code in assembler is negligible these days. It’s all compilers and such, which have decades of maturity in their code generation. It’s no longer seen as necessary to have such flexible things as LDM/STM, or have most of the instructions being conditionally executable. Removing these things in the 64 bit world can reduce the complication within the processor. Less complicated means fewer transistors. This can be translated into faster/cooler cores and enhanced optimisations with the pipelining and prediction. Don’t get me wrong here, I like the original instruction set. But it’s a different world now. |
Rick Murray (539) 13850 posts |
Just out of interest, has anybody benchmarked running a 32 bit Linux on a Pi3/Pi4, and then running a 64 bit build on the exact same setup? |
Kuemmel (439) 384 posts |
@Rick: As usual of course it would depend what benchmark is valuable to you…but here are quite some indications it’s worth to go 64 Bit => Link |
Rick Murray (539) 13850 posts |
It’s the same discussion as before. Do we…
While you’re wondering which option is the best and which is the most realistic, remember that we still, at this point in time, only utilise one of four cores (on multicore devices), we don’t have IPv6, nor onboard WiFi, nor anything that resembles a Bluetooth stack, nor a disc format that understands partitions, nor a way of loading arbitrary user code that doesn’t involve handing it the keys to the kingdom and saying “play nice”, nor anything that resembles acceleration for graphics/video operations… That last bit, that’s probably the real question right there. |
Rick Murray (539) 13850 posts |
Cheers – that’s exactly what I was looking for. ;-) |
David J. Ruck (33) 1636 posts |
Running what though? I’ve benchmarked every variant of Pi in 32 bit, and those that can do 64 bit, with some a couple of programs which are in g++, clang++, C#, pypy, pypy3, Python2, Python3 and Perl. It all comes down to what you are doing; tight CPU bound arithmetic comes out a lot quicker sometimes 50% to 100% faster on 64 bit. Larger code which puts more pressure on the cache can be 20% slower. That holds across all the languages, except for pypy and pypy3 which don’t yet have a ARM 64 bit JIT, so are 20x-30x slower. What difference would it make for a 64 bit vs a 32 bit RISC OS? Well a pure C 32 bit RISC OS will probably be a bit slower than the hand crafted 32 bit assembler version, and compiled as 64 bit probably slower again on the same processor. But the difference is faster 64 bit ARMs are coming along all the time, and we’ve already seen the faster ever 32 bit ARM.
It’s the same as the argument for not embarking on a space program until you’ve solved all poverty. |
Kuemmel (439) 384 posts |
@DavidS: Click on the word “Link” !??? |
Pages: 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19