Apple's M1

157 posts, 31 voices

Pages: 1 2 3 4 5 6 7

Feb 26, 2021 8:12pm Steve Pampling (1551) 8172 posts	I mean, think about the raw brute force that the Intel RISC core must have to translate x86 instructions, execute them, and still wipe the floor with most of the competition? My understanding is that there is a dedicated set of logic around the core doing all the translation

Feb 26, 2021 8:16pm Steve Pampling (1551) 8172 posts	I have heard of a creature that some call an low end user that does not write any code. I somehow doubt that such a computer user exists. They are known generically as “management”, the higher in the structure, the lower the user rating.

Feb 26, 2021 8:18pm Paolo Fabio Zaino (28) 1882 posts	@ Jon Quite. Most, if not all mass produced CPU’s are RISC at core; the number of instructions is now academic. Totally agreed on the second part of your comment, the number of instructions is just an academic discussion these days and means nothing in terms of modern RISC / CISC architectures which themselves have “blurred their boundaries” as well. While for the first part of your comment, I understand that the Internet has done the usual internet-ism and defined that the microcode backend of a CPU must be RISC. But the reality is quite different and details may reveal a quite different thing. For instance microcode is not an architecture per-se since it doesn’t describe an abstract model that can be implemented, it is THE implementation of a microarchitecture and so it’s not standard and changes on each single new CPU family. It is true, however, that the way microcode works is a LoadStore implementation and this is historically a characteristic of the RISC Architecture, but doesn’t mean that microcode is RISC. For example microcode can be clustered and itself can also generate more microcode and this last one is typically CISCy if you want. This is also why the internal architecture of an x86 has RAM and ROM for the microcode, so it can have its own state. Some implementations I have studied in the past also presented a strong Stack machine based architecture and so each microcode operation (or most of them) updated constantly the stack. I could keep going with details and do not want to get in my usual ultra long posts, so will limit to what could help to understand that an x86 backend is not necessarily a RISC architecture, although if it does have elements taken from the RISC philosophy.

Feb 26, 2021 8:28pm Paolo Fabio Zaino (28) 1882 posts	@ DavidS x86 has only been top of the performance stack from about 1995 to about 2018. Before most competing archetechures would wipe the floor with Intel as far as performance (for a while ARM was one of the ones ahead of Intel). That depends on how you measure performance. ARM has a much better ratio of performance per Watt, but it doesn’t yet have implementations that pushes the silicon where Intel/AMD have done so far. So, technically, right now the fastest architecture available in terms of performance (without counting power consumption) is AMD. While if we count power consumption then at the same power consumption ARM is always faster than Intel or AMD. So it’s a tricky definition that requires caution in the choice of words, and Intel right now is messing quite a lot with wording in the desperate attempt to stay relevant after AMD chewed them in perf on the x86 and ARM chewed them in power consumption on mobile and now on laptops. But Intel is just paying for what they seeded during the 90s/00s/10s, way too many proofs that they were trying to control the market using lawyers and money instead of technological innovation. So now it’s time for them to learn the lesson we all had to learn: tech market is about tech and innovation, not bollocks¹ ;) ¹ I hope I can use this term, if not please let me know and I’ll edit, thanks :)

Feb 26, 2021 8:41pm Steve Pampling (1551) 8172 posts	tech market is about tech and innovation, not bollocks ¹ ;) ¹ I hope I can use this term, if not please let me know, and I’ll edit, thanks :) I’ve been known to use it during meetings :)

Feb 26, 2021 9:44pm Rick Murray (539) 13850 posts	I’ve been known to use it during meetings :) Me too, once or twice. Thankfully it’s an English word and people around here don’t speak English (and those that do aren’t so hot on colloquial vernacular). I even mumbled that one of my cow orkers was “a right proper spanner”. If anybody translated that into French… an appropriate key? (un clé approprié, according to Google) x86 backend is not necessarily a RISC architecture Ah, but how does one define “RISC”? My preferred definition is “a load/store architecture with a high degree of orthogonality”. Others like to add more constraints such as a large number of registers, but then IA64 has more registers than traditional ARM and might have more than AArch64. Or saying that all instructions execute in a single machine cycle, which means none of the early ARMs were RISC. Or even that there’s only a small number of very simple instructions, which means pretty much everything after ARMv6 is not RISC. Keep it simple. Orthogonal load/store, and accept that some boundaries are blurred (the 6502 was CISC with a number of RISC like features, for instance). As for the core of the x86, it’s probably a jealousy guarded secret, but if what we know is a yellow aquatic rubber bird that quacks… an x86 has RAM and ROM for the microcode, so it can have its own state Why would this make it non RISC? It’s a processor, it’s running a program, it needs some working memory. As always, reality is a little more complicated. https://en.wikipedia.org/wiki/Intel_Microcode I have heard of a creature that some call an low end user that does not write any code. You need to step out into the real world a little more often David. Everybody I know that isn’t here or on TheRegister fits the description of a computer user who knows nothing about programming. For all of them, whether it is a laptop or a mobile phone, it’s a tool. A way to watch the Winx Saga or a way to waste twenty hours creating the perfect PowerPoint presentation that is never used… It’s a camera, it’s a browser, oh turn it up I like Ayreon, die alien scum!, and our projected sales forecast for this quarter is….. I wouldn’t insult them by referring to them as “low end”. They, the masses, might lack many clues when it comes to nerdy topics, but it’s selling machines to people like them that is driving the market. It’s also why we have UIs instead of command lines. As each year passes, computers are getting more and more suitable for normal people.

Feb 26, 2021 10:18pm Paolo Fabio Zaino (28) 1882 posts	@ Rick Ah, but how does one define “RISC”? My preferred definition is “a load/store architecture with a high degree of orthogonality”. Good point, so technically the essence of a RISC architecture is actually in the “simple and highly optimised instructions”, that’s really the core and most basic definition if someone wants to be super-short. Now it come almost “naturally” that if you want simple and highly optimised instructions you need to make sure you have enough registers to make your ISA useful in most situations without losing the “highly optimised” part, hence you’ll need a “sufficient” amount of registers. So there has never been a default number, but having a load/store architecture it kinda implies that you’ll need more than the 2 user registers offered by the 6502 :) So the “many registers” is more of a common internet-ism if you want that was relevant in the past because CISC architecture did not “need” so many registers as much as a RISC architecture needed in order to be useful in most situations without losing the “highly optimised” part. Or saying that all instructions execute in a single machine cycle, which means none of the early ARMs were RISC. Yeah this one is another interesting definition, the truth is it’s hard to have a (for example) 32bit ISA that can also carry a 32bit immediate value in a single instruction… hence the “execute” and not fetch. But that execute implies pipe-lining and superscalar, so eventually those two should be referred as the “characteristic” while the “execute in a single clock cycle” should be a requirement (and so not directly a definition of RISC). The reason I mention the above is not because I am trying to be overly precise… it actually does matter in a RISC architecture and here is why: 1) Complex instructions are hard to be implemented in a superscalar fashion and pipeline, this because they require multiple clock cycles etc… 2) Complex instructions are hard to optimise again because they require multiple clock cycles, a typical example here is IRQ handling… and IRQ handling gets quite hard when the instruction set is complex and even with variable length like the x86. So, someone picky here may ask, ok fine but then the microcode is RISC right? Well no, a clusterisation of microcode goes against the principle of the RISC rules above, and embrace more the CISCy side of the story. an x86 has RAM and ROM for the microcode, so it can have its own state Why would this make it non RISC? It’s a processor, it’s running a program, it needs some working memory. As always, reality is a little more complicated. The problem is that it can get quite more complicated and having its own state means that the microcode is not just a “partitioning” of the original x86 code and some of the implementations as I’ve mentioned are even more like a Stack machine than a pure register machine, hence someone may argue even a different logic, let me give you and example: A microinstruction cluster could be composed by: 0........64........128........192........224 \| Op1 \| Op2 \| Op3 \| Seq Word \| ........................................... Now if it was a true RISC ISA then they won’t need to be clustered and surely they won’t need a Sequence WORD. BTW in Intel microcode some refers to the cluster above as “triad”, just as a note if people will go googling after this comment. Also, as I mentioned, microcode is not an ISA, it’s a microarchitecture used to implement an ISA, so has different purposes (mostly maintainability and bug fixing and also performance in a superscalar implementation of an ISA that is hard to be made superscalar). Now because microcode uses a sequence word it will obviously requires some form of microcode sequencer, which again adds complexity to what the original meaning of RISC was and especially on the matter of instruction decoding, not to mention that to decode an x86 instruction to microcode Intel uses multiple decoders included a vector decoder and this also implies the presence of an instruction buffer. So, to generate the triad above you’ll also need an operation packer.. Can you see where I am going? In simple terms and for non tech people, microcode is still complex compared to the RISC philosophy, BUT at the same time is simpler than the CISC philosophy. So if you really want to catalog it as a form of ISA (again it is not, but let’s assume by hypothesis) then the closest word I’d use is “Hybrid”. Makes sense?

Feb 27, 2021 6:40pm Steffen Huber (91) 1953 posts	So, technically, right now the fastest architecture available in terms of performance (without counting power consumption) is AMD. I would bet my money on POWER9 right now and POWER10 when it s available. For classic CPUs of course. Once you enter specific fields, it gets a lot more complicated (Nvidia Tesla et al).

Feb 28, 2021 12:07pm Glenn Moeller-Holst (8768) 16 posts	More ARM laptop news: 2021 feb, Apple ‘M1X’ chip specification prediction appears on benchmark site Quote: "… The “M1X” chip is said to be a 12-core Apple Silicon CPU. As an iteration on the M1, the chip features 12 cores instead of its predecessor’s eight. Its internal GPU features 16 cores, instead of the 8-core GPU in the M1. …" The rest of the usual PC crowd are also moving the ARM way?: 26 February 2021, Watch out Apple M1! Samsung and AMD may have a killer Windows on ARM solution

Feb 28, 2021 2:00pm Clive Semmens (2335) 3276 posts	But an ARM so alien as to be almost irrelevant to us 8~(

Feb 28, 2021 2:05pm Chris Gransden (337) 1207 posts	Here’s some mainly floating point benchmarks to compare an M1 to Rpi400. The third columns is for Ubuntu 20.10 aarch64 running as guest on Parallels on the M1. Apple M1 @3.2GHz (clang 1200.0.32.29 -O3) and Ubuntu 20.10 aarch64 guest on Parallels (gcc 10.2.0.0 -O3) Rpi400 @2.4Hz (GCC 10.2.0.0 -O3) Povray 3.6.1: povray -w1024 -h768 +a0.3 teapot.pov 0.62s 3.37s 0.62s Dhrystone: Dhrystones / second: 76067754 15841584 71500519 VAX MIPS rating: 43294.11 9016.27 40694.66 Flops: MFLOPS(1) = 17054.7174 1613.5758 17622.8366 MFLOPS(2) = 7640.1406 1197.8452 8461.6110 MFLOPS(3) = 10095.6856 1861.3546 13906.3735 MFLOPS(4) = 11013.6704 2401.6495 16655.9421 SciMark2: Composite Score: 4057.01 908.32 4014.75 FFT Mflops: 2963.36 859.23 3355.39 SOR Mflops: 2325.88 1058.77 3222.42 MonteCarlo: Mflops: 734.15 322.44 455.00 Sparse matmult Mflops: 3986.88 1048.58 4485.09 LU Mflops: 10274.77 1252.60 8555.84 Whetstone: MWIPS 6832.541 2882.519 7299.951

Feb 28, 2021 2:24pm Chris Gransden (337) 1207 posts	Do you have any single core Integer benchmarks between the two? Dhrystone is single core integer. The rest are single core floating point. Also potentially with the same compiler? There’s isn’t a native GCC11 port for RISC OS yet. GCC11-devel was slightly slower compared to clang.

Feb 28, 2021 2:59pm Clive Semmens (2335) 3276 posts	I must assume that is said with a bit of the back of the tounge. Especially from you. Not intentionally – but perhaps I’ve not picked up correctly what CPUs you were talking about. If they’re pre-ARM8 beasts then I take back what I wrote.

Feb 28, 2021 4:02pm Clive Semmens (2335) 3276 posts	I’m happy about the idea of designing a CPU with a useful superset of the AARCH32 architecture (maybe call it AARCH32++); that’s right up my street. What wouldn’t be would be marketing it and raising the money to get it implemented.

Feb 28, 2021 5:28pm Rick Murray (539) 13850 posts	that is used by some little fruit company in California If you’re referring to the Cupertino Chaos, then “little” isn’t quite the right word. But an ARM so alien as to be almost irrelevant to us 8~( This. BTW, Fruity stuff uses 64 bit these days. There was a leak of the bootloader code a while back. I grabbed a copy of the initial startup in 32 bit and 64 bit versions to compare like with like (skipped all the rest, like Linux it hits C pretty quickly). Decent Android kit runs 64 bit. So I’d say that apart from budget and low-end things, most stuff is using the alien ARM these days.

Feb 28, 2021 5:38pm Rick Murray (539) 13850 posts	I’m happy about the idea of designing a CPU with a useful superset of the AARCH32 architecture The problem here is not so much the CPU design, but all the sorts of tweaks to ramp up the speed so it can match an average Pi3 or Pi4. That means also it would need to be an actual CPU, not some sort of FPGA, in order to get that kind of speed. Might be cheaper to try asking Broadcom if they can bake a few more of their Pi cores? They’d probably do it, too, if you could secure a demand for a large enough number (but it would have to be a big number). Perhaps a cheaper investment would be, when the next Pi is announced, if it isn’t 32 bit compatible, to buy a number of Pi4 boards. Then RISC OS won’t exactly be moving forward, but it can carry on going on the available boards, for as long as they last…

Feb 28, 2021 5:41pm Stuart Swales (1481) 351 posts	The problem here is not so much the CPU design …but everything else!!!

Feb 28, 2021 5:45pm Clive Semmens (2335) 3276 posts	The problem here is not so much the CPU design Agreed entirely. I was responding to DS’s comment; when I say “I’m happy about it” what I mean is that it’s something I’d know how to approach, not that I think it’s a sensible undertaking.

Feb 28, 2021 6:18pm Clive Semmens (2335) 3276 posts	Then RISC OS won’t exactly be moving forward, but it can carry on going on the available boards, for as long as they last I really don’t find any scenario longer term than that credible at all, sadly. The market surely isn’t big enough.

Feb 28, 2021 6:25pm Steffen Huber (91) 1953 posts	Then you do not think that thousands of different high end hobby designs that have made it to ASIC are nothings? Please provide some links. I know of exactly zero “high end hobby designs” in the CPU department.

Feb 28, 2021 7:25pm Paolo Fabio Zaino (28) 1882 posts	Hummm sounds like we are at “let me design a better ARM32 for you” discussion again… Here: https://github.com/MiSTer-devel/Archie_MiSTer This is a fully functional ARMv2a + MEMC + everything else for an Archie, whoever is interested in re-designing ARM32 please give us some support in improving it more and more. @ DavidS, I hope to see your progresses being pushed there. Sorry not trying to be aggressive, just please please please remember that there is licensing in place, you just can’t go and redesign an ARMv6 or v7 without paying ARM Holdings and without having a full license like Apple does. You can only do it on very old ARM designs that are fully abandoned hence the link above. On top of that you need to know what you’re doing, I am not saying you don’t, I never worked with you so I do not know if you can or cannot. But: something is put some very raw ideas on a forum and something is designing (or re-designing) a CPU Architecture and testing it until it works, knowing the tools, knowing the pitfalls, knowing the tricks, knowing all the theory and all the details involved and, most of all, having the time to try/build/test redo until it’s done. I do have some doubts, because I have read some of the (IMHO unacceptable) explanations you made on the 64 bits vs 32 bit in other threads, so I am not going to join another discussion on re-designing ARM32, but I wish you sincerely good luck, because by trying to do it you’ll learn a lot and that is always a good thing :) For the comments FPGA vs ASIC… guys CPU design process is and will always be on FPGA (why? Convenience!), eventually if, by miracle, at the end of it you manage to convince a large enough number of buyers to buy your CPU then, and only then, you can think of using your Verilog/VHDL to be used to produce an ASIC, that is how the market works, unless you have some magic wand or something like that… P.S. waiting for the comments from Druck, I am pretty sure he’ll be more direct than me…

Feb 28, 2021 7:27pm Steffen Huber (91) 1953 posts	Ok you do not know. I have linked in a number of examples when descused before, I don’t think you have ever provided meaningful links whenever I asked for evidence of some of your more interesting claims. Maybe one of your many posts you decided to delete after a while? and if memory serves you have always attempted to appose the concept that the simple is doable, let alone the entire idea. Unless you come up with some evidence that what you propose has even a slight chance of happening, I remain sceptical. “Prior Art” would be a good way to provide such evidence. You propose something that I have never heard before being even seriously attempted. To make any sense, your project needs to design a CPU including a memory controller, an FPU, a GPU. You need to produce it in a quantity enough to drive the price down to reasonable levels (i.e. matching existing SoCs), then put it onto a board (i.e. doing a proper design including prototypes) and get that produced again in a quantity enough to drive the price down. And that quantity needs to come from the RISC OS market, because noone else has any interest in such a special Aarch32 solution. To do all this, you need to assemble a team of highly qualified engineers working for free, because if you start to price in the working hours, your project is even less realistic. Then, when you are ready in say 10 years, you will have – if your team has performed well – produced something that is perhaps the speed of a RPi 1, but costs at least 10 times the money. I.e. something worse than the Pi that was released nearly ten years agao. In the mean time, we will likely have much much faster SBCs with whatever CPU (maybe Aarch64, maybe x64, maybe RISC-V, who knows) which is likely faster when emulating an Aarch32 CPU than your project running the code natively. In contrast, the alternative you so happily dismiss, a complete rewrite of RISC OS in C, along with an integrated Aarch32 emulation solution, sounds like an easy job compared to your proposal. And it would provide something a lot more useful. And you just need a bunch of software developers that are available in vast numbers compared to CPU design experts. As I said a while back, after providing a bunch of links to show what I said in the same issue “I am passed done using ignored links to defend the simple things I say”. Please provide at least a link to the post where you provided that bunch of links. You are making some very extraordinary claims in some of your posts. However, the evidence that any of these claims even approach workable ideas is very thin on the ground. The number of projects you have mentioned that you are working on is approaching triple digit numbers, and the results are very near zero. The RISC OS world has seen such projects before. The Omega was one of those, albeit at a much reduced copmlexity compared to what you propose – it “just” recreated IOMD and VIDC in FPGAs. The result is well known – it was late, it was expensive, it was a lot less powerful than promised, and various “future features” never surfaced. As you might remember, I have proposed a testbed for a simple CPU design before, where you could show in the real world that some of your CPU-related ideas actually have merit. We can talk all day long about the theory – nothing beats the real world to make a valid point.

Feb 28, 2021 7:33pm Paolo Fabio Zaino (28) 1882 posts	@ Steffen Huber I would bet my money on POWER9 right now and POWER10 when it s available. For classic CPUs of course. Once you enter specific fields, it gets a lot more complicated (Nvidia Tesla et al). I definitely love the POWER10 design and yes can’t wait to see it available :) On the POWER9 not so sure, it’s still 14nm FinFET, so AMD 7nm has quite an advantage on that, also AMD Zen3 has finally addressed the cache latency and unified the cache so each core has basically the same ultra low latency accessing data in cache and that is a big win for AMD. Also POWER9 has either SMT4 or SMT8 while AMD has SMT2 so at the same amount of cores AMD zen3 may be able to provide support to less threads than POWER9 but it does have a faster single core performance and that can counterbalance the lack of hardware threads… AMD zen 3 has at least a pipeline with 19 stages while POWER9 has only 12… Anyway the real deal will be when POWER10 will come out :)))))

Feb 28, 2021 8:00pm Steffen Huber (91) 1953 posts	On the POWER9 not so sure, it’s still 14nm FinFET, so AMD 7nm has quite an advantage on that, also AMD Zen3 has finally addressed the cache latency and unified the cache so each core has basically the same ultra low latency accessing data in cache and that is a bit win for AMD. I am going by “real world performance” as measured with a quite complex component written in plain C at work. An AIX POWER9 machine outperformed the competition (e.g. Intel Xeon Cascade Lake at nearly 4 GHz IIRC) by quite a margin. That was basically testing single core performance, but overall not only CPU but also I/O and memory subsystem (because the customer was interested in a system level comparison of course). Always difficult to attribute performance diffences to the different subsystems of course. But in the price/performance ratio, x64/Linux still won by a large margin. IBM AIX pricing is really quite eccentric.

Feb 28, 2021 8:02pm Paolo Fabio Zaino (28) 1882 posts	@ DaviS And that is why I am not doing the design level of it, just trying to give some (proven correct) pushes in the correct direction. So why this “better ARM32” keeps popping up in threads where you are commenting? There is no way you can make a better ARM32, the limitation is not just technical, it’s also patent based. If someone is feeling inspired by the Apollo Accelerator’s work on the 68080, that was a work that has been carried on for more than 10 years with big failures at the beginning and now it is possible because Motorola doesn’t care anymore about the 68K family. On top of all that the Vampire chip is still on FPGA only and had to compete with maximum a 68060 at 75/100 Mhz, single core, non super scalar… That in comparison is the ARMv2a link I have provided clocked at 33/45 Mhz… The 68080 now has a superscalar architecture and when overclocked it can perform like a StrongARM at 200Mhz… XD, So by the time it will reach performance of an RPi we’ll all be dead, buried and forgotten… But for the Amiga user that’s fine because they have never experienced anything beyond 100Mhz, while we can buy today an RPi400 and have it clocked at 2.4Ghz, with 4 cores and 4GB RAM (next model will also have 8GB RAM)… So again DavidS, while I encourage you to pursue your studies and learn more because that is ALWAYS good, please try at least to use a wording that makes your theories look less deliverable (because they aren’t deliverable) as a product in a usable form, unless you’re working on it actively and so have tests that can prove your theories, that’s all I am asking sir, hope it’s not too much to ask…

Pages: 1 2 3 4 5 6 7

Reply

To post replies, please first log in.

Forums → Aldershot →

Apple's M1

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options