Multi-processor and multi-thread Technical Analysis

Source: Internet
Author: User

For wireless system design engineers, it is critical to have a clear understanding of the differences between multithreading (MT) on a single processor and using multi-processor (MP) processing. Cellular phones are the first large-scale application to implement dual-core design. However, dual-core implementation is also applicable to many wireless applications that require high performance and low power consumption.

 

Of course, the topics of MP and MT also apply to many systems except radio. The most common misunderstanding is that MP and MT processors are equivalent technologies with the same software complexity. This should be of interest to many design engineers.

 

Review speed

 

In the past decade, the difference in the design of a desktop computer processor is simple, that is, speed. Intel and AMD are devoted to improving the speed of their processor design, and are keen to develop a higher-frequency processor ahead of the other. AMD stands out as the winner in the fierce competition for the world's first 1 GHz processor. However, during this period, the industry gradually began to realize that the higher the clock speed of the processor, the higher the hardware complexity.

 

The industry also realized that the speed improvement route could not go without limit and other measures were needed. In addition to improving the efficiency of the processor, the overall performance can be improved through the line-Level Parallelism obtained through multi-processing through MP or MT technology.

 

Intel is the first company to promote Mt technology known as hyper-threading, while amd positions itself in dual-core and 6? Bit processor. However, this process has also become a dual-core competition. Both companies have made every effort to become the first company to provide real multi-processor solutions for both family and commercial computing applications.

 

Recently, this shift to multi-processing has pushed desktop computer software into embedded design. Over the years, embedded design engineers have been using MP in their designs to provide the required computing performance within a limited power budget.

 

The real change in the embedded market is that application software must regard a general processor unit as a multi-processing system to benefit from higher performance and lower power consumption. Although both MP and MT bring this complexity to software developers, when you examine the trade-offs between cost and complexity, you will find that not everything is the same.

 

Change Processing Features

 

To continuously improve performance, icdesign engineers need to design a processor architecture for their next generation to provide flexibility and scalability to meet consumer needs. If the brand-new architecture does not use traditional software, it will severely block any fundamental changes in the processor architecture. The history of computing technology is full of such architectures. No matter how high their computing performance is, they will not be used because of their demand for the software community and the interruptions they cause.

 

This must be taken into account for any transition to the multi-processing architecture. Therefore, we must find a way to balance the theoretical feasibility of Multi-processing based on the needs of existing software.

 

In addition, as the application software and operating systems of embedded devices are increasingly adopting concurrent behaviors, the transfer to multi-processing is promoted. The concurrency of such software helps to promote the application of MP or MT (or the combination of the two) to achieve the performance and efficiency required by next-generation embedded devices.

 

Multi-processor processing and multi-thread technology

 

Both MP and MT technologies strive to improve the overall performance of the processor and reduce the processing time of any application that uses concurrent software threads. However, these two technologies use different methods on the hardware to achieve these goals, and therefore provide different degrees of success for a variety of specific software code routines.

 

A common misunderstanding is that MP and MT are comparable technologies and require the same level of software complexity. If you look at other common multi-processing programming interfaces, you will find the difference between them. This shows that programmers must fully understand whether their multi-processing solutions are based on Mt, MP, or the combination of the two.

 

Memory with slow access by high-frequency processors produces a delay, which leads to an idle period in the execution unit. The goal of MT is to use this idle period to increase the overall performance of the processor. By adapting the thread to the idle cycle, the kernel efficiency is improved. However, history shows that the benefits of this multi-processing implementation method are not obvious. Mt is essentially a single processor technology, where only the minimum processor logic is copied to support additional hardware threads. Generally, the operating system may regard the hardware thread as a virtual processor because of the programmer's register settings and sufficient CPU management program status.

 

Then the remaining part of the processor logic is shared among various threads, which introduces a serious problem that increases the complexity of the software. Running two existing applications on a traditional single processor means that the operating system must share the processor resources between the two applications and switch through the context switch) exchange between two applications every second for 10 ~ 100 times.

 

Association Switching

 

The running application uses the execution status stored in the Processor register and memory, which needs to be exchanged with the existing application. In an MT system, when the execution unit is stopped, Association switching occurs. Association switching may occur several hundred times per second.

 

A significant increase in the degree of switching requires careful coordination between the operating system and the MT hardware design. Make sure that you have enough replication hardware to limit the storage and re-loading of the execution status, and make sure that this will not be the main cost of the processor.

 

Cache replication is rarely performed for each hardware thread. For software writers, this means they need to be very clear about the impact of higher Association switching rates on high-speed cache and applications.

 

In a simple example with two independent applications, if the two applications are only implemented by the operating system for simple time sharding, the MT machine will be executed more slowly than a single processor. To benefit from Mt technology, software writers must be very careful when expressing software threads so that execution States stored in the cache can be well shared between two threads.

 

For applications using Mt processors, this produces more software complexity because programmers need to handle the impact of threads on shared processor resources. However, there are other hardware design issues that need to be considered.

 

Multithreading hardware

 

Increasing the hardware thread will increase the complexity of the processor, without fundamentally changing the microarchitecture of the processor, it will also affect the overall peak clock speed that can be achieved by the design. The increase in complexity will also increase the overall power consumption, even when executing a single thread. The complexity of these Mt reduces the performance of the entire application, even if only one application or one thread is running.

 

Considering the increase in the cost of all these Mt compared with limited performance, it is clear why more and more companies in the industry have introduced dual-core and multi-core MP solutions. MP copies most of a processor design to maximize performance from multi-task software without introducing any software complexity resulting from managing shared processor resources.

 

In fact, if you try to run two independent applications on two independent processors at the same time, you will find that the overall performance will exceed double the performance of a single processor running at twice the speed. Association switching of all operating systems and all cache conflicts between the two applications are eliminated, and each application can be continuously executed at full speed.

 

The obvious assumption for MP is that the design cost doubles as the silicon wafer area, and the MT method is much more effective when the multi-processing capability is added to the embedded system.

 

However, overall target performance requirements should be considered in any such comparison. Using the latest silicon implementation technology and high-speed cache design, it is entirely possible that the MP processor provides the same performance points as the MT processor in the same silicon wafer area. However, MP obviously has another advantage-no additional Mt software problems.

 

Power consumption is a further consideration for multi-processing. In essence, MT is a more complex single processor, which limits all the power management technologies required for a single processor (such as clock gate, standby mode, voltage and frequency adjustment ).

 

However, in a multi-processor design, each processor can use these single-processor technologies, as well as the ability to turn off the entire processor to save all power consumption. This makes the MP always provide the maximum performance for the software, and the power consumption is directly related to the completed task. Figure 1 shows the differences between multithreading and multi-kernel processing.

 

 


Figure 1: The power saved by MP over Mt when processing a set of multimedia workloads with the same requirements

 

Mt is actually a technology that gets performance from the gap, because the increase in the processor frequency and memory speed is not proportional. Obviously, in these circumstances, MT is a fast repair technology that hides these increasing low efficiency problems. However, such a method has a limited life and applicability.

 

This has been proven by the design of a desktop processor that the CPU's increasingly high clock speed has a hard limit related to power consumption, regardless of whether it supports Mt technology.

 

If the software has concurrency, a more effective solution may be porting a single processor directly to a scalable MP architecture. After MP design considers the impact of inter-Processor Communication and distributed MP high-speed cache data sharing, there is almost no additional software cost. This means that the MP system can take advantage of the efficient single processor design and better efficacy to achieve a design point that surpasses the performance and power consumption of the MT processor.

 

Scalable Design and Performance

 

In essence, MP uses a divide-and-conquer method. By integrating multiple processing units, each processing unit can run an independent concurrent thread, A multi-processor is created based on the modular design principle.

 

This makes the entire design less complex and less risky, so that system design engineers can simply insert another processor as needed. The simplicity of this design makes the MP much more extensible than the MT, because the design costs associated with the increasing clock speed in the MT processor often limit its scalability, this is especially true when it comes to important costs because it does not hit any level of high-speed cache. Figure 2 illustrates the cost of L2 cache access.

 

 


Figure 2: Cost of thread access to L2 cache

 

Another option is to deploy both MP and MT in a single design. However, it has been confirmed that its software complexity is seriously underestimated by the existing multi-processor operating systems and the software programming community.

 

In this design, there is a basic contradiction. mt needs to carefully manage the access and sharing of processor resources, while MP is highly efficient when running an independent application. Many system design engineers have discovered that they have achieved higher performance after the system MT is actually disabled.

 

Considering that many software applications may have been specifically designed to take into account the characteristics of each solution, it is wise to generally claim that one solution is better than the other. However, traditional single-processor-based MP has greater scalability. Therefore, when selecting a development strategy, software design engineers can now benefit from a certain degree of flexibility, they feel that they can rest assured that the software architecture does not need to change in the future.

 

By John Goodacre

 

Multi-processor Program Manager

 

Arm 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.