The performance gap between ARM and X86 is not comparable, which may be several thousand times larger.

Source: Internet
Author: User

Transferred from CNBETA, original address: http://www.cnbeta.com/articles/167883.htmarmand x86are not comparable. The performance gap may be several thousand times higher.

Here we will not care about the details. Let's just talk about why ARM and X86 are not comparable. To clarify this question, first of all, we need to understand what architecture is. Many people have mentioned that the architecture is different, but what does the architecture mean? It is a relatively abstract concept and cannot be easily explained in a few words.

We need to understand that CPU is an execution part. The reason why it can be executed is that people make hardware circuits that execute various functions in it, then, we can use some logic to let it work in a certain order, so that people can complete the tasks assigned to it. That is to say, if the CPU is regarded as a person, it must first have the normal working ability (both the execution ability), and then have enough logic ability (can understand the order of work ), finally, you need to understand what others say (both instruction sets) to work properly. These together constitute the so-called "architecture", which can be understood as a set of "Tools", "methods", and "specifications. Different architectures may have different tools, methods, and specifications, this also leads to incompatibility between them-you give an Italian mastiff a guide to cooking in Chinese, and he certainly doesn't know what to do.


If you still cannot understand it, it doesn't matter. Let's continue. From the invention of the CPU to the present, there are a lot of architectures, from the x86, arm, to the unfamiliar MIPS and IA64, the gap between them is very large. However, if they are categorized from the most basic logic, they can be divided into two categories: the so-called "Complex Instruction Set" and "simplified instruction set" systems, that is, we often see "CISC" and "RISC ". The biggest difference between the two types of architectures is that their designers consider different ways of problem. We can continue to give an example. For example, if we want to order a person to eat, how should we execute this command? We can directly give him the "eat" command, or "take the spoon first, then scoop up a spoonful of food, then open his mouth, then deliver it to his mouth, and finally swallow ". From this point, we can see that different people have different understandings about how to command others to do things. Some people think that if I first train the person who receives the command, let him master various complex skills (that is, implementing corresponding complex functions in hardware ), in the future, you can use a very simple command to let him do complicated things-for example, if he says "eat", he will eat. However, some people think that this will make things too complicated. After all, the things that people who receive commands are very complicated. What if you want them to eat at this time? How does one continue training his food? Why can't we divide things into many very basic steps, so that only the people who receive commands know little basic skills can do the same job, it's nothing more than a little tiring-for example, I want him to eat now, you just need to replace the "scoop up a spoonful of food" in the meal command with "scoop up a spoonful of food". The problem is solved. How simple is it.


This is the logical difference between "Complex Instruction Sets" and "simplified instruction sets. Some people may say that it is good to streamline the instruction set, but it is difficult to judge who is between them, because both of them are currently booming, and they are all very successful-x86 is the representative of the Complex Instruction Set (CISC), while arm is the representative of the streamlined Instruction Set (RISC), and even the arm name directly shows its technology: advanced RISC machine-Advanced RISC machine.


At this point, you should understand why the performance is not direct and comparable between them, because the design ideas between them are too different. This approach leads to the separation of CISC and Proteus-the former focuses more on high-performance but at the same time high-power implementation, while the latter focuses on small-size low-power fields. In fact, there are also many things that are more appropriate for CISC, while some others are more appropriate for them. For example, CISC is more advantageous when executing high-density computing tasks, but when performing simple repetitive work, he can gain the upper hand. For example, if we are holding a meal competition, then CISC only needs to keep shouting "Eat eat, in contrast, he needs to repeat the meal process once and again. If the speaker is not fast enough (that is, the memory bandwidth is not large enough), then he can hardly eat CISC. However, if we only need two people to pull out the meal, CISC will be much more troublesome, because CISC does not have such a simple meal preparation action, instead, we only need to keep shouting "cook rice, cook rice, and cook rice.


This is the difference between CISC and RISC. However, in actual situations, the problem is much more complicated than that, because the designers of each camp want to improve the performance of their own architecture. The most common concept is the so-called "launch. What is launch? Transmitting means how many commands can be executed at the same time. For example, dual-launch means that the CPU can pick up two commands at the same time, and trigger three commands. Modern advanced processors have very few single-emission implementations. For example, Cortex A8 and A9 are both dual-emission-based reducers, while Cortex A15 is three-emission. ATOM is a dual-launch CISC, and the Core series even achieve four launches. In this regard, we are comparable, but do not forget that CISC commands are more complex, which means they are more powerful, for example, if CISC only needs one command, but also five command pairs are needed for a meal, when the memory bandwidth is the same, the performance achieved by CISC is greater than that achieved by the English version (five times as much as that of the English version). In practice, CISC's Core
The memory bandwidth of the I-processor has exceeded 100 GB/s, while ARM is still struggling for 10 GB/s. In a more bandwidth-consuming architecture, the bandwidth is only of the bandwidth of others, performance will naturally be severely constrained. This is also an important reason why ARM is inferior to X86, because different applications have different bandwidth requirements. Once you encounter a bandwidth bottleneck, even if the ARM processor has achieved high computing performance, it will naturally fail if it cannot be used.


Speaking of this, we should have understood the differences and characteristics between CISC and RISC. In short, CISC is actually at the cost of increasing the complexity of the processor itself, in exchange for higher performance, while the latter is to hand over the complexity to the compiler, sacrificing the program size and instruction bandwidth, in exchange for a simple and low-power hardware implementation. However, if this happens, the CISC processor will become larger and larger in order to improve performance, and the memory bandwidth required by the Proteus will go beyond the sky, which is all subject to technical restrictions. Therefore, over the past decade, the distinction between CISC and RISC has been gradually blurred. For example, since the P6 System (Pentium Pro), The X86 architecture represented by CISC has introduced the concept of microcode, correspondingly, the so-called decoder is added inside the processor, which is responsible for "unpacking" the traditional CISC command into a shorter microcode (uops ). After a CISC command comes in, it will be split into a number of microcodes by the decoder, and then sent to the execution pipeline of the processor-this can actually be understood as the Proteus kernel + CISC decoder. In contrast, he also introduced the instruction set, which is not concise from a logic point of view to increase the computing performance. Normally, an x86 command is split into 2 ~ On average, there are three uops. Therefore, in the same command density, the actual command execution capability of X86 is about three times that of arm. But don't forget that this is based on the assumption under "same command density". In fact, x86 can achieve ten or even hundreds of times the command density than arm.


The last thing to consider is the instruction set. The introduction of this item was designed to accelerate the performance of the processor in certain applications and has been around for decades. In fact, in the current application environment, the instruction set rather than the CPU core is the most decisive factor. The powerful X86 architecture is also derived from the powerful instruction sets. For example, we know that atom, although its x86 core is very weak, but because it supports sse3, in many cases, the performance can even exceed the core performance, far more powerful than its Pentium M, which is the power of the instruction set. At present, the x86 Instruction Set has evolved from MMX to SSE and avx, while arm still has only a simple and basic neon. The out-of-proportion gap between them results in hundreds of times the performance gap in actual applications. For example, even the most powerful ARM kernel is still providing a soft solution of 1080 p
H.264 is struggling, but a common mid-end core I processor can compress 264 p H.264 videos at nearly ten times the playback speed. At least in this regard, the performance of the PC processor is times that of arm, which is irrefutable. In reality, such examples are everywhere. This is also why I have mentioned that the average performance of ARM is only a fraction of the performance of X86.


After playing so many words, it is actually to illustrate that although arm is very powerful now, it is still far away from x86, and it has not been shortened due to advances in the past few years, in fact, it is being extended faster. After all, they are designed with different starting points, so there is no comparable performance at all. x86 cannot achieve arm power consumption, while arm cannot achieve x86 performance. This is also why atom has never been successful-intel is trying to use its own shortcomings to confront others' strengths, and the result is naturally not very good, without Intel's most advanced semiconductor technology on the planet, atom would not have been possible. However, if arm tries to work together with x86, the results will naturally not be good. The reason has just been explained. However, this does not mean that arm will only occupy the low end in the future. After all, any architecture has its advantages. Once an application is optimized for it, it can develop strengths and circumvent weaknesses. The prosperity of X86 is also caused by the optimization of resources in the whole world. As long as you can find suitable applications and fields for ARM, arm may not be able to enter a higher level in the future.

Here is my opinion ~ : After reading this report, I think there was a report a few days ago about battery technological breakthroughs. If the battery is powerful enough in the future or the power consumption of the X86 architecture is reduced, so if the Android system runs on the X86 architecture, isn't it smoother than iOS? Even if there is a gap, it is not something we can tell. We really look forward to the X86 architecture. The development of IOS system software is too high.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.