ARMv8 goes 64-bit !
Kuemmel (439) 384 posts |
…seems they wanna go into more desktop,sever,Windows (at least my clue)→ |
Trevor Johnson (329) 1645 posts |
|
Eric Rucker (325) 232 posts | |
Kuemmel (439) 384 posts |
Finally : NEON / Adv SIMD with Double Precision Floats :-) |
Jeffrey Lee (213) 6048 posts |
From the contents of that PDF Eric linked, it looks like the AArch64 instruction set is going to drop support for pretty much all of the features that made the ARM instruction set unique:
So the big question is, will these changes help or hinder ARM when it comes to competing with other 64bit designs? Of course, a lot of those changes make sense when you consider that they’re sticking with a 32bit instruction size. 64bit instructions would simply have too much wasted space, and I think I remember reading somewhere that in modern processors the power cost of moving an instruction through the pipeline is often significantly greater than the power cost of executing the instruction – so it’s in your best interests to have instructions that are as small and easy to decode as possible. Also, it’s safe to say that there’ll be no way that RISC OS will be able to run in 64bit mode unless we rewrite all the assembler portions in a high level language (or in AArch64 assembler, but that would be silly from a maintenance perspective). The CPU can switch between 32bit and 64bit modes at runtime, so compatability with 32bit apps should be possible.It would be nice if we could make the 64bit transition a point where we draw a line in the sand and redesign all the APIs to allow the 64bit RISC OS to fully support multithreading. But if using threading requires people to modify their apps to run in 64bit mode, then there’s the age-old question of exactly how many apps people will update, and what will happen to the old apps that aren’t getting updated (force them all to run on a virtualised 32bit version of RISC OS?). And then the question of whether the users would be happy with such a system… |
Kuemmel (439) 384 posts |
…I think I would always put multithreading capability over 64bit, if both possible of course better (…no idea if something like that could done in the ‘situation’ where Risc OS is, regarding all the work…). But I think it’s no problem to stick for some years with 32bit…the benefit for desktops isn’t that big and there’s still Cortex A15 to come…I guess ARM really wants to prepare for the far future and servers. No phone or even tablet needs 64bit…the graphics stuff more and more relies on the GPU when you see the current evolutions from iPhone4 to iPhone4s GPU and the benchmarks. …on the other hand multihreading for multiple cores is kind of crucial I think…is there really no chance to have access to the second core with the current Risc OS ? …and if it’s only like assigning applications to a certain core, even by ‘hand’ ? Isn’t there some small code (even assembler) from ARM that shows a possible way (I didn’t find any useful documentation…) ? |
Jess Hampshire (158) 865 posts |
Is that effectively porting it to a new processor? (i.e. similar work to making it run on a PowerPC on x86?)
Would that deny multithreading to existing hardware?
From comments I’ve read, that is possible, the issue is having the OS share cores evenly, do you do nasty hacks to make all programs run on any core without modification, or do you run all existing code on one core, and allow new code to be written to use any core? |
Terje Slettebø (285) 275 posts |
Which makes it even less attractive to program in ARM assembly code for me, at least. Some of the things that made that enjoyable are the things you mentioned: An orthogonal instruction set where any register may be used in any instruction (including the PC), conditional execution of instructions, optional updating of the PSR, LDM/STM to load/store a bunch of registers in one go, etc. 1 These changes probably wouldn’t matter that much, though, if most of RISC OS hadn’t been written in assembly code… I guess this doesn’t make the issue of rewriting at least parts of RISC OS in a higher-level language any less relevant, if RISC OS is to survive in what may eventually be a 64-bit only ARM world. 1 Some of the good stuff of the new ISA includes a larger general-purpose register file, larger VFP/SIMD register file, and double-precision SIMD operations. |
Terje Slettebø (285) 275 posts |
More fodder: Apparently, Applied Micro is also working on an ARMv8 server processor. From the article, Paramesh Gopi, President and CEO of Applied Micro: “No wimpy fabric. Zero compromise. Do not start with the baggage. Clean slate. And at 3GHz from the get-go. No wimpy cores. This is not a wimpy computer. This is an ARM on steroids.” Edit: A video from the presentation with slides (requires registration) or without slides. |
Terje Slettebø (285) 275 posts |
I have difficulty understanding the reasons for some of these changes. I can understand some of them, like:
This makes two more general-purpose registers available.
Possibly deemed too expensive for mode switching. But what about:
Why not immediate constants? Why should you have to read from both the instruction cache and data cache to load a number into a register, if you could encode it in the instruction? Does this mean that rotate second operand by some amount is gone, as well? And maybe even the second operand?
What’s up with that? I though predicates was a good thing, especially with deep pipelines, so you avoid pipeline flushes from branch mispredictions?
And that one? Why would it be a good thing to fill the instruction cache with lots of load/store instructions, just to load/store a bunch of registers?
That I could understand, though: Avoiding polluting the data cache. Does all these changes mean that the design criteria that gave rise to them no longer apply? Would anyone like to make a guess at some of these, Acorn veterans or others…? :) |
Jeffrey Lee (213) 6048 posts |
I have difficulty understanding the reasons for some of these changes. I can understand some of them, like: Actually, I suspect that change is more to do with simplifying some of the internal logic. But what about: I suspect it’s because they found they didn’t have enough spare bits in the instruction encodings. Also I’m sure there’ll still be a MOV instruction which accepts an immediate constant – it’s just that most/all of the other instructions won’t accept them.
I doubt they’d be crazy enough to remove those features! * Far fewer conditional instructions Maybe they’re having issues with predicting the PSR flags. With condition codes on every instruction, it’s as if every instruction in the pipeline requires branch prediction. If you assume that an instruction isn’t going to be executed, and then start executing some following instructions which would have depended on the result of the first, then you’ve got to throw a lot of work away if you suddenly find that the first instruction did need executing. With Applied Micro’s chip using quad-issue out-of-order execution, I think you’ll agree that it’s pretty important for the dependencies between instructions to be as simple and clear as possible in order to keep the pipeline logic as simple as possible and to give the CPU the best chance of scheduling the instructions correctly. * No LDM/STM Maybe they just thought it would be too tricky to implement. With 30-ish registers, but only 32 bits in which to encode the instruction, there wouldn’t be enough space to include the full register list in the instruction. LDM/STM containing lots of registers are also bad for interrupt latency (although in ARMv7 they did add an option to enable interruptible LDM/STMs). Does all these changes mean that the design criteria that gave rise to them no longer apply? Supposedly they’ve spent a lot of time looking at Android and seeing how that utilises the instruction set. So it wouldn’t surprise me if a few of these decisions are based around analysing the output from compilers (*cough* GCC *cough*) that don’t always produce the best code for the platform. |
Terje Slettebø (285) 275 posts |
It could be due to instruction specialisation, yes.
Hm, good point… I guess what was a good idea for scalar execution is not such a good idea anymore for four-way, superscalar out-of-order execution… :) I guess they’ve taken the opportunity, when designing the 64-bit ISA, to start from a clean slate, and remove much of the stuff that was no longer relevant or counter-productive today, just like the original ARM processor removed unnecessary stuff from the current processors of the day.
Yeah, but they could have organised them into two banks of 16 registers each, with a bit selecting the bank.
Yeah, that could also be a factor.
That could be, although hopefully they are also based on sound engineering principles… :) Having got over the initial “shock” of these changes, this may turn out to be a pretty decent processor after all… It’s interesting that they’ve managed to keep the instruction size down to 32 bit, yet still increased the number of general-purpose registers to 32, so at least the instructions takes no more space than today. However, I’ve always felt it was rather wasteful with 64-bit registers for much of today’s tasks, as much of the numbers we’re dealing with will fit in 32 bit. Let’s say you have a loop from 1 to 10 in a program, and the counter will take 64 bits – 8 bytes… On the other hand, for bit manipulations, like processing long streams of bits, it’s great. |
Trevor Johnson (329) 1645 posts |
A read a post about the following link, to which access is restricted. “This document provides a high-level overview of the ARMv8 instructions sets, being mainly the new A64 instruction set used in AArch64 state but also those new instructions added to the A32 and T32 instruction sets since ARMv7-A for use in AArch32 state. For A64 this document specifies the preferred architectural assembly language notation to represent the new instruction set.”: |
Jeffrey Lee (213) 6048 posts |
It looks like that document isn’t very restricted anymore (or wasn’t very restricted to start with) – it looks like anyone with a standard silver.arm.com account (i.e. anyone who can download the ARMv7 ARM) has access to it. |
Trevor Johnson (329) 1645 posts |
There’s also this Look at the 64-Bit ARMv8 Architecture, for anyone who’s not seen it already. |
Mark (1799) 2 posts |
Just want to point out that a zero register is a classic feature of RISC architectures which avoids the need for explicit compare instructions such as CMP & CMN. |
Rick Murray (539) 13806 posts |
Since we’re resurrecting an old thread…
Since when has PC/R15 ever been a general purpose register? Just try using it as a loop counter and see how far you get. There’s a good way R0-R15 and PC/LR/SP… and there’s the cheap way… R0-R12 and don’t use the others. Then comes the crowd pleaser:
followed by:
Is there an El Reg style WTF icon? Now FIQ code will be that much slower due to the need to save/restore more state, but not only that, this change will make anything able to trash any register and remove the way to dump them to the stack in one go. What, we call STR a dozen times on entry to an abort handler or something? Unless there’s a new “stack all” type instruction, I don’t see the point of getting rid of the load/store multiples… |
Jeffrey Lee (213) 6048 posts |
SP & PC not general purpose registers “general purpose” from the point of view that they’re currently members of the main register file and can be read and written via 90% of all instructions. Unlike, for example, the PSR, which can only be directly interacted with via a handful of special-purpose instructions.
Remaining as-is, I believe.
That’s OK, because there’s no FIQ mode either :P
But it’s practically the same as things are now.
Remember that one of the main principles behind RISC design is to make sure that instructions execute quickly. “stack all” is not quick, and neither is “unstack all”. Neither are they likely to be easy to schedule from the CPU’s point of view. Breaking them up into smaller, faster operations will improve interrupt latency and give the CPU/compiler more freedom with scheduling – resulting in better code performance. Yes, it’ll bloat the code a bit, but the benefits are (presumably) significant enough for ARM to feel safe in getting rid of it. |
Stephen Leary (372) 272 posts |
Not really a fan of the new architecture. Not very “ARMesque”. I get the whys. just not a fan. |
Theo Markettos (89) 919 posts |
LDM/STM has been awkward since the days of StrongARM which had a famous bug – what happens when half your STM is on one side of a page boundary, and the other half is on the next page that isn’t mapped in? You have to abort, but first roll back the half the already committed. SA rev J, K and S didn’t do this right, which was why lazy task switching had problems. Doing multiple writes in a single instruction is awkward to design logic for and makes you a hostage to fortune in future designs, and not really needed since we have data caches now. However, the new arch is quite MIPS-like – it’s not really ARM any more to the assembler programmer. But then assembler programmers are rare these days… |
Mark (1799) 2 posts |
This radical overhaul of the architecture makes it inevitable that we’ll see more ARM-based platforms in the near future which I believe to be a good thing even if we do lose some fans along the way. I suspect Apple is working on unifying iOS and OSX at this very moment, and to be honest, I can see the attractions. For starters, no more x86-based iOS Simulator. |
Jeffrey Lee (213) 6048 posts |
For anyone feeling adventurous (and with a silver.arm.com account): A ‘beta’ release of the ARMv8-A ARM is now available. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.b/index.html |
Terje Slettebø (285) 275 posts |
And a nice motherboard to go with it. “The Opteron A1100 features eight ARM Cortex-A57 cores clocked at 2.0 GHz (or higher). AMD has further packed in an integrated memory controller, TrustZone encryption hardware, and floating point and NEON video acceleration hardware. Like a true SoC, the Opteron A1100 supports 8 lanes of PCI-E 3.0, eight SATA III 6Gbps ports, and two 10GbE network connections. Interestingly, the Jaguar cores are used in both Xbox One and Playstation 4. |