Fault Tolerance, no performance bottleneck

Source: Internet
Author: User

In terms of function positioning, x86 is located on general servers, and Cert is positioned on key business application servers. Further subdivided, x86 can be divided into single-, dual-, and multi-channel servers, corresponding to Intel's 3000, 5000, and 7000 series Processors. In addition, there is a special 6000 series, it is an extension of the 7000 Series in dual-channel applications. Among them, dual-channel servers using 5000 series Processors are the mainstream in the market. In most application scenarios, the price is also relatively low, about yuan. In contrast, the price of multi-channel servers using 7000 series Processors is much higher. They are positioned in high-end applications and require high reliability. At present, in addition to the core business of users in the financial industry, more and more users are choosing x86 servers to undertake various key applications. Among them, high-end multi-channel servers have become the user's choice.

To further improve the reliability, it is a common choice to build a cluster with two multi-channel servers of the same model and configuration. A dual-host cluster is a Software Redundancy solution controlled by the cluster software. However, when one of the devices fails, another device takes over the faulty hardware to improve reliability. However, in practice, the dual-host cluster has high management requirements. Even if the switchover is successful, a certain fault recovery time is required, which may cause service interruption. If the switchover fails, the fault recovery time is longer. Therefore, for some key business needs, the dual-machine cluster solution cannot meet the high reliability requirements.

In comparison, the fault tolerance solution is an ideal solution. It is a hardware redundancy technology. It uses the unique locking technology to protect the system from the architecture. Fault tolerance not only enables server-level fault tolerance, but also enables fault tolerance for memory and I/O data. In the dual-host cluster solution, if a server fails due to a sudden failure, the memory data of the server device and the data waiting for read/write in I/O are displayed, there is no way to implement synchronous protection. It can only be reconstructed by using software technologies such as database data rollback. Although it will not cause losses to transactions, it takes time to restore and reconstruct services, this is why dual-machine clusters cannot maintain business continuity.

In contrast, the Fault Tolerance Technology ensures that the processor, memory, and I/O data are strictly synchronized during the clock cycle of each processor. Therefore, when a single feature fails suddenly, the business will not be interrupted. In this case, although the fault tolerance is lost, the system will not be interrupted and the business will not be affected. When a faulty part is replaced, the system returns to fault tolerance. However, users do not adopt Fault-Tolerant server solutions. performance bottleneck is a common topic. At present, the fault-tolerant server product is mainly based on dual-channel servers, which seems to have a performance bottleneck compared with multiple channels.

Can Fault-Tolerant server vendors provide multi-channel server solutions? The answer is yes. At the peak of the US Fault Tolerance Technology Consultant, the technical consultant said that fault tolerance for multiple servers does not have any technical barriers. Historically, fault tolerance in the United States once provided a multi-channel Proteus processor solution. At present, the reason why x86 multi-channel server fault tolerance solutions are not considered is mainly price considerations, which are often hard for users to bear. Peak hours indicate that at present, the performance of Dual 8-core processor can meet most users' needs compared with the performance of the original 16-core processor. From the perspective of users, there is no performance bottleneck in the fault tolerance solution. This is also the reason why fault tolerance in the United States does not provide multi-channel server fault tolerance. He said that not providing multiple channels is not a technical reason, but a market strategy-based choice. In other words, it is not necessary to provide multi-channel fault tolerance.

Peak hours indicate that users' worries about performance sometimes come not from performance, but from reliability considerations. From the product perspective, multi-channel servers have higher reliability than dual-channel servers, which is why users prefer multi-channel servers. However, the reliability of the Fault Tolerance solution does not rely on the reliability of the product, but on the system architecture to solve the problem. From the current technical perspective, the failure probability of two dual-channel servers with the same functional components at the same time is very low. Therefore, the fault-tolerant server can fully meet the reliability requirements of users.

In addition to fault tolerance, the fault tolerance solution provides trusted computing. Peak hours indicate that, in general, processor computing will not be faulty. However, as an electronic device, it is inevitable that it will be affected by various factors, and it will inevitably lead to high and low level judgment errors. Computers rely on high and low levels to determine "0" or "1 ", once an error occurs, this error is imperceptible. That is to say, computers also make mistakes. For fault tolerance, it compares the computing results of the two devices by using the "lock-step technology". Only consistent computing results are recognized. Therefore, it can effectively avoid unexpected errors, this is the unique performance of the Fault Tolerance solution. For users with Fault-Tolerant Systems, trusted computing is a value-added service and an extra reward for the high reliability of users.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.