Half-Year Development History of the video storage server (2)

Source: Internet
Author: User

I personally think that the most important thing for the server is stability, stability, and stability. The previous section mainly discusses the stability of the program at the code level. This section further strengthens the system stability through other mechanisms.

To further improve the stability of the server program, we often write another guard program, also known as the watchdog. The principle is that the service program registers with the guard dog at startup, that is, it tells the dog that the door needs to be guarded. Then, the service program sends a message to the guard program at a certain interval of time, it is the so-called dog Feed, telling the dog that the door is still safe; when the service program has not fed the dog for a long time, the watchdog considers the service program unsafe (Program deadlock or bug exit), and the watchdog will bite and force the previous service process to be killed, then generate a new service process. Of course, the premise here is that the guard program itself is stable. For a few hundred lines of code, it can still be trusted after some tests.

What is a stable storage server? The standard requirement is 99.9999%, that is, the packet loss rate is controlled. This is an arduous goal. packet loss may exist in front-end encoding, center switching, storage servers, or even every port and network in the process. It is impossible to solve all the problems on the storage server, but the storage server knows whether packet loss exists, and the packet loss rate is possible. Therefore, you must calculate the number of packet loss/Total Packet Loss Rate = packet loss rate. When the packet loss rate is too large, an alarm is triggered in time. My approach is to check the packet loss rate every one minute and send an alarm.

Software mechanisms may have many other good methods,

Other mechanisms: See the appendix, mainly for hardware.

Appendix:


The server system requires a long, high-speed, and reliable operation. It cannot be easily powered off, shut down, or stopped. Even if a fault occurs, it must be able to recover quickly. It is best to recover automatically. Therefore, when designing a server, you must consider the efficiency, stability, high reliability and availability of the entire system architecture.


So what are the measures to ensure the security and reliability of the server system? In fact, since the birth of the server, engineers have come up with many ways to improve its reliability. Some methods are still being improved, and some new methods are still being developed. The following is only part of the solution:

1. Start with the processing capabilities of the bus and processor of the server hardware system. The server's system bus has evolved from the past 16-bit and 32-bit to the current 64-bit; the local I/O bus technology (such as AGP and PCI-Express) is constantly improving; the Application of SMP (symmetric multi-processor) and DP (dual-processor) technology, the development of hardware redundancy and load balancing technology, the advancement of large-capacity memory verification, error correction and dedicated memory technology.

2. server hardware design improvement. The hardware design is highly modular to facilitate Fault Diagnosis and Maintenance. Hardware redundancy, such as dual-power supply and dual-CPU (dual-CPU can also improve performance ). High-power cooling system. Indicator fault warning.

3. High-speed, multi-number, large-capacity disk applications. Support for SCSI
High-speed hard drive and Raid
Technology, supporting array cards and optical communication devices. The external disk expansion array cabinet meets the needs of large storage capacity and improves storage I/O performance. The intelligent array can ensure data security and integrity. The local RAID 1 dual hard drive eliminates the possibility of OS damage due to disk damage.

4. Supports cluster, hot standby, and balancing technologies. The use of clusters and balancing technology enables the server system to have the overall Fault Tolerance function and carrying capacity, so we do not have to worry about service shutdown or even system crash caused by unexpected server faults and sudden access.

5. system backup and disaster tolerance. High-performance backup software can back up the system to facilitate timely recovery of the software system (OS, database system, email system, financial software, etc. Remote Disaster Tolerance and application-level disaster tolerance reduce the disaster of software systems suffering data loss and improve the efficiency of disaster recovery.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.