Apple's M1
Steve Pampling (1551) 8172 posts |
My understanding is that there is a dedicated set of logic around the core doing all the translation |
Steve Pampling (1551) 8172 posts |
They are known generically as “management”, the higher in the structure, the lower the user rating. |
Paolo Fabio Zaino (28) 1882 posts |
@ Jon
Totally agreed on the second part of your comment, the number of instructions is just an academic discussion these days and means nothing in terms of modern RISC / CISC architectures which themselves have “blurred their boundaries” as well. While for the first part of your comment, I understand that the Internet has done the usual internet-ism and defined that the microcode backend of a CPU must be RISC. But the reality is quite different and details may reveal a quite different thing. For instance microcode is not an architecture per-se since it doesn’t describe an abstract model that can be implemented, it is THE implementation of a microarchitecture and so it’s not standard and changes on each single new CPU family. It is true, however, that the way microcode works is a LoadStore implementation and this is historically a characteristic of the RISC Architecture, but doesn’t mean that microcode is RISC. For example microcode can be clustered and itself can also generate more microcode and this last one is typically CISCy if you want. This is also why the internal architecture of an x86 has RAM and ROM for the microcode, so it can have its own state. Some implementations I have studied in the past also presented a strong Stack machine based architecture and so each microcode operation (or most of them) updated constantly the stack. I could keep going with details and do not want to get in my usual ultra long posts, so will limit to what could help to understand that an x86 backend is not necessarily a RISC architecture, although if it does have elements taken from the RISC philosophy. |
Paolo Fabio Zaino (28) 1882 posts |
@ DavidS
That depends on how you measure performance. ARM has a much better ratio of performance per Watt, but it doesn’t yet have implementations that pushes the silicon where Intel/AMD have done so far. So, technically, right now the fastest architecture available in terms of performance (without counting power consumption) is AMD. While if we count power consumption then at the same power consumption ARM is always faster than Intel or AMD. So it’s a tricky definition that requires caution in the choice of words, and Intel right now is messing quite a lot with wording in the desperate attempt to stay relevant after AMD chewed them in perf on the x86 and ARM chewed them in power consumption on mobile and now on laptops. But Intel is just paying for what they seeded during the 90s/00s/10s, way too many proofs that they were trying to control the market using lawyers and money instead of technological innovation. So now it’s time for them to learn the lesson we all had to learn: tech market is about tech and innovation, not bollocks1 ;) 1 I hope I can use this term, if not please let me know and I’ll edit, thanks :) |
Steve Pampling (1551) 8172 posts |
tech market is about tech and innovation, not bollocks 1 ;) I’ve been known to use it during meetings :) |
Rick Murray (539) 13850 posts |
Me too, once or twice. I even mumbled that one of my cow orkers was “a right proper spanner”. If anybody translated that into French… an appropriate key? (un clé approprié, according to Google)
Ah, but how does one define “RISC”? My preferred definition is “a load/store architecture with a high degree of orthogonality”. Others like to add more constraints such as a large number of registers, but then IA64 has more registers than traditional ARM and might have more than AArch64. Keep it simple. Orthogonal load/store, and accept that some boundaries are blurred (the 6502 was CISC with a number of RISC like features, for instance). As for the core of the x86, it’s probably a jealousy guarded secret, but if what we know is a yellow aquatic rubber bird that quacks…
Why would this make it non RISC? It’s a processor, it’s running a program, it needs some working memory.
You need to step out into the real world a little more often David. Everybody I know that isn’t here or on TheRegister fits the description of a computer user who knows nothing about programming. For all of them, whether it is a laptop or a mobile phone, it’s a tool. A way to watch the Winx Saga or a way to waste twenty hours creating the perfect PowerPoint presentation that is never used… It’s a camera, it’s a browser, oh turn it up I like Ayreon, die alien scum!, and our projected sales forecast for this quarter is….. I wouldn’t insult them by referring to them as “low end”. They, the masses, might lack many clues when it comes to nerdy topics, but it’s selling machines to people like them that is driving the market. It’s also why we have UIs instead of command lines. As each year passes, computers are getting more and more suitable for normal people. |
Paolo Fabio Zaino (28) 1882 posts |
@ Rick
Good point, so technically the essence of a RISC architecture is actually in the “simple and highly optimised instructions”, that’s really the core and most basic definition if someone wants to be super-short. Now it come almost “naturally” that if you want simple and highly optimised instructions you need to make sure you have enough registers to make your ISA useful in most situations without losing the “highly optimised” part, hence you’ll need a “sufficient” amount of registers. So there has never been a default number, but having a load/store architecture it kinda implies that you’ll need more than the 2 user registers offered by the 6502 :) So the “many registers” is more of a common internet-ism if you want that was relevant in the past because CISC architecture did not “need” so many registers as much as a RISC architecture needed in order to be useful in most situations without losing the “highly optimised” part.
Yeah this one is another interesting definition, the truth is it’s hard to have a (for example) 32bit ISA that can also carry a 32bit immediate value in a single instruction… hence the “execute” and not fetch. But that execute implies pipe-lining and superscalar, so eventually those two should be referred as the “characteristic” while the “execute in a single clock cycle” should be a requirement (and so not directly a definition of RISC). The reason I mention the above is not because I am trying to be overly precise… it actually does matter in a RISC architecture and here is why: 1) Complex instructions are hard to be implemented in a superscalar fashion and pipeline, this because they require multiple clock cycles etc… So, someone picky here may ask, ok fine but then the microcode is RISC right? Well no, a clusterisation of microcode goes against the principle of the RISC rules above, and embrace more the CISCy side of the story.
The problem is that it can get quite more complicated and having its own state means that the microcode is not just a “partitioning” of the original x86 code and some of the implementations as I’ve mentioned are even more like a Stack machine than a pure register machine, hence someone may argue even a different logic, let me give you and example: A microinstruction cluster could be composed by: 0........64........128........192........224 | Op1 | Op2 | Op3 | Seq Word | ........................................... Now if it was a true RISC ISA then they won’t need to be clustered and surely they won’t need a Sequence WORD. BTW in Intel microcode some refers to the cluster above as “triad”, just as a note if people will go googling after this comment. Also, as I mentioned, microcode is not an ISA, it’s a microarchitecture used to implement an ISA, so has different purposes (mostly maintainability and bug fixing and also performance in a superscalar implementation of an ISA that is hard to be made superscalar). Now because microcode uses a sequence word it will obviously requires some form of microcode sequencer, which again adds complexity to what the original meaning of RISC was and especially on the matter of instruction decoding, not to mention that to decode an x86 instruction to microcode Intel uses multiple decoders included a vector decoder and this also implies the presence of an instruction buffer. So, to generate the triad above you’ll also need an operation packer.. Can you see where I am going? In simple terms and for non tech people, microcode is still complex compared to the RISC philosophy, BUT at the same time is simpler than the CISC philosophy. So if you really want to catalog it as a form of ISA (again it is not, but let’s assume by hypothesis) then the closest word I’d use is “Hybrid”. Makes sense? |
Steffen Huber (91) 1953 posts |
I would bet my money on POWER9 right now and POWER10 when it s available. For classic CPUs of course. Once you enter specific fields, it gets a lot more complicated (Nvidia Tesla et al). |
Glenn Moeller-Holst (8768) 16 posts |
More ARM laptop news: 2021 feb, Apple ‘M1X’ chip specification prediction appears on benchmark site The rest of the usual PC crowd are also moving the ARM way?: 26 February 2021, Watch out Apple M1! Samsung and AMD may have a killer Windows on ARM solution |
Clive Semmens (2335) 3276 posts |
But an ARM so alien as to be almost irrelevant to us 8~( |
Chris Gransden (337) 1207 posts |
Here’s some mainly floating point benchmarks to compare an M1 to Rpi400.
|
Chris Gransden (337) 1207 posts |
Dhrystone is single core integer. The rest are single core floating point.
There’s isn’t a native GCC11 port for RISC OS yet. GCC11-devel was slightly slower compared to clang. |
Clive Semmens (2335) 3276 posts |
Not intentionally – but perhaps I’ve not picked up correctly what CPUs you were talking about. If they’re pre-ARM8 beasts then I take back what I wrote. |
Clive Semmens (2335) 3276 posts |
I’m happy about the idea of designing a CPU with a useful superset of the AARCH32 architecture (maybe call it AARCH32++); that’s right up my street. What wouldn’t be would be marketing it and raising the money to get it implemented. |
Rick Murray (539) 13850 posts |
If you’re referring to the Cupertino Chaos, then “little” isn’t quite the right word.
This. So I’d say that apart from budget and low-end things, most stuff is using the alien ARM these days. |
Rick Murray (539) 13850 posts |
The problem here is not so much the CPU design, but all the sorts of tweaks to ramp up the speed so it can match an average Pi3 or Pi4. That means also it would need to be an actual CPU, not some sort of FPGA, in order to get that kind of speed. Might be cheaper to try asking Broadcom if they can bake a few more of their Pi cores? They’d probably do it, too, if you could secure a demand for a large enough number (but it would have to be a big number). Perhaps a cheaper investment would be, when the next Pi is announced, if it isn’t 32 bit compatible, to buy a number of Pi4 boards. Then RISC OS won’t exactly be moving forward, but it can carry on going on the available boards, for as long as they last… |
Stuart Swales (1481) 351 posts |
…but everything else!!! |
Clive Semmens (2335) 3276 posts |
Agreed entirely. I was responding to DS’s comment; when I say “I’m happy about it” what I mean is that it’s something I’d know how to approach, not that I think it’s a sensible undertaking. |
Clive Semmens (2335) 3276 posts |
Then RISC OS won’t exactly be moving forward, but it can carry on going on the available boards, for as long as they last I really don’t find any scenario longer term than that credible at all, sadly. The market surely isn’t big enough. |
Steffen Huber (91) 1953 posts |
Please provide some links. I know of exactly zero “high end hobby designs” in the CPU department. |
Paolo Fabio Zaino (28) 1882 posts |
Hummm sounds like we are at “let me design a better ARM32 for you” discussion again… Here: This is a fully functional ARMv2a + MEMC + everything else for an Archie, whoever is interested in re-designing ARM32 please give us some support in improving it more and more. @ DavidS, I hope to see your progresses being pushed there. Sorry not trying to be aggressive, just please please please remember that there is licensing in place, you just can’t go and redesign an ARMv6 or v7 without paying ARM Holdings and without having a full license like Apple does. You can only do it on very old ARM designs that are fully abandoned hence the link above. On top of that you need to know what you’re doing, I am not saying you don’t, I never worked with you so I do not know if you can or cannot. But: something is put some very raw ideas on a forum and something is designing (or re-designing) a CPU Architecture and testing it until it works, knowing the tools, knowing the pitfalls, knowing the tricks, knowing all the theory and all the details involved and, most of all, having the time to try/build/test redo until it’s done. I do have some doubts, because I have read some of the (IMHO unacceptable) explanations you made on the 64 bits vs 32 bit in other threads, so I am not going to join another discussion on re-designing ARM32, but I wish you sincerely good luck, because by trying to do it you’ll learn a lot and that is always a good thing :) For the comments FPGA vs ASIC… guys CPU design process is and will always be on FPGA (why? Convenience!), eventually if, by miracle, at the end of it you manage to convince a large enough number of buyers to buy your CPU then, and only then, you can think of using your Verilog/VHDL to be used to produce an ASIC, that is how the market works, unless you have some magic wand or something like that… P.S. waiting for the comments from Druck, I am pretty sure he’ll be more direct than me… |
Steffen Huber (91) 1953 posts |
I don’t think you have ever provided meaningful links whenever I asked for evidence of some of your more interesting claims. Maybe one of your many posts you decided to delete after a while?
Unless you come up with some evidence that what you propose has even a slight chance of happening, I remain sceptical. “Prior Art” would be a good way to provide such evidence. You propose something that I have never heard before being even seriously attempted. To make any sense, your project needs to design a CPU including a memory controller, an FPU, a GPU. You need to produce it in a quantity enough to drive the price down to reasonable levels (i.e. matching existing SoCs), then put it onto a board (i.e. doing a proper design including prototypes) and get that produced again in a quantity enough to drive the price down. And that quantity needs to come from the RISC OS market, because noone else has any interest in such a special Aarch32 solution. To do all this, you need to assemble a team of highly qualified engineers working for free, because if you start to price in the working hours, your project is even less realistic. Then, when you are ready in say 10 years, you will have – if your team has performed well – produced something that is perhaps the speed of a RPi 1, but costs at least 10 times the money. I.e. something worse than the Pi that was released nearly ten years agao. In the mean time, we will likely have much much faster SBCs with whatever CPU (maybe Aarch64, maybe x64, maybe RISC-V, who knows) which is likely faster when emulating an Aarch32 CPU than your project running the code natively. In contrast, the alternative you so happily dismiss, a complete rewrite of RISC OS in C, along with an integrated Aarch32 emulation solution, sounds like an easy job compared to your proposal. And it would provide something a lot more useful. And you just need a bunch of software developers that are available in vast numbers compared to CPU design experts.
Please provide at least a link to the post where you provided that bunch of links. You are making some very extraordinary claims in some of your posts. However, the evidence that any of these claims even approach workable ideas is very thin on the ground. The number of projects you have mentioned that you are working on is approaching triple digit numbers, and the results are very near zero. The RISC OS world has seen such projects before. The Omega was one of those, albeit at a much reduced copmlexity compared to what you propose – it “just” recreated IOMD and VIDC in FPGAs. The result is well known – it was late, it was expensive, it was a lot less powerful than promised, and various “future features” never surfaced. As you might remember, I have proposed a testbed for a simple CPU design before, where you could show in the real world that some of your CPU-related ideas actually have merit. We can talk all day long about the theory – nothing beats the real world to make a valid point. |
Paolo Fabio Zaino (28) 1882 posts |
@ Steffen Huber
I definitely love the POWER10 design and yes can’t wait to see it available :) On the POWER9 not so sure, it’s still 14nm FinFET, so AMD 7nm has quite an advantage on that, also AMD Zen3 has finally addressed the cache latency and unified the cache so each core has basically the same ultra low latency accessing data in cache and that is a big win for AMD. Also POWER9 has either SMT4 or SMT8 while AMD has SMT2 so at the same amount of cores AMD zen3 may be able to provide support to less threads than POWER9 but it does have a faster single core performance and that can counterbalance the lack of hardware threads… AMD zen 3 has at least a pipeline with 19 stages while POWER9 has only 12… Anyway the real deal will be when POWER10 will come out :))))) |
Steffen Huber (91) 1953 posts |
I am going by “real world performance” as measured with a quite complex component written in plain C at work. An AIX POWER9 machine outperformed the competition (e.g. Intel Xeon Cascade Lake at nearly 4 GHz IIRC) by quite a margin. That was basically testing single core performance, but overall not only CPU but also I/O and memory subsystem (because the customer was interested in a system level comparison of course). Always difficult to attribute performance diffences to the different subsystems of course. But in the price/performance ratio, x64/Linux still won by a large margin. IBM AIX pricing is really quite eccentric. |
Paolo Fabio Zaino (28) 1882 posts |
@ DaviS
So why this “better ARM32” keeps popping up in threads where you are commenting? There is no way you can make a better ARM32, the limitation is not just technical, it’s also patent based. If someone is feeling inspired by the Apollo Accelerator’s work on the 68080, that was a work that has been carried on for more than 10 years with big failures at the beginning and now it is possible because Motorola doesn’t care anymore about the 68K family. On top of all that the Vampire chip is still on FPGA only and had to compete with maximum a 68060 at 75/100 Mhz, single core, non super scalar… That in comparison is the ARMv2a link I have provided clocked at 33/45 Mhz… The 68080 now has a superscalar architecture and when overclocked it can perform like a StrongARM at 200Mhz… XD, So by the time it will reach performance of an RPi we’ll all be dead, buried and forgotten… But for the Amiga user that’s fine because they have never experienced anything beyond 100Mhz, while we can buy today an RPi400 and have it clocked at 2.4Ghz, with 4 cores and 4GB RAM (next model will also have 8GB RAM)… So again DavidS, while I encourage you to pursue your studies and learn more because that is ALWAYS good, please try at least to use a wording that makes your theories look less deliverable (because they aren’t deliverable) as a product in a usable form, unless you’re working on it actively and so have tests that can prove your theories, that’s all I am asking sir, hope it’s not too much to ask… |