(3): C ++ distributed real-time application framework-system management module and framework system management

Source: Internet
Author: User

(3): C ++ distributed real-time application framework-system management module and framework system management

C ++ distributed real-time application framework-system management module

 

Part 1: (2): ZeroMQ-based real-time communication platform

 

A Distributed Real-Time System Cluster is prone to hundreds of machines, and the cluster size has been limited. This will be a "closed" system. You cannot operate more than machines on one machine. The traditional manual O & M method has long been unable to meet the current needs. All operations on a node in the cluster or cluster must be completed through the interface provided by the system. For a commercial distributed real-time system, how can we deal with sudden business peaks and detect faulty nodes in the cluster in a timely manner for better post-processing; load Balancing and adjustment for nodes with different processing capabilities in the cluster; overload protection before the system crashes due to heavy pressure; phased release capability tested by containers and operating containers on the same network, and so on. These are all problems that need to be solved by the system management module. They are also the key indicators for the commercial availability and intelligence of a system.

 

The system management module is divided into two parts: SmartService and SmartManger. Based on RESTful interfaces, SmartService provides various query and operation interfaces for external clusters. It can be easily connected to various management terminals (PCs, iOS and Android) for interface management. The complete framework also provides easy secondary development interfaces to facilitate customization of system-specific interfaces. Such as adjusting the log level, tracking logs with a single number, managing cluster configurations, and querying cluster real-time topology data. For clusters with hundreds of machines, manual maintenance is no longer a reality. Automatic Detection and independent O & M have become the key. SmartManger's automatic load management function is used to complete this function. In addition, the system management module works with the status center and communication platform.

The following describes the features in detail:

I. Automatic Load Management

Based on the delay, type, traffic, and other information reported by the Business container node, the information of all nodes in the cluster is combined to determine whether the cluster currently exists in the following situations and respond accordingly.

1. A container has a fault and cannot handle the service properly-the faulty node returns to the network.

2. A container has insufficient processing capability and Service Processing timeout-control node traffic

3. Some types of containers have insufficient processing capabilities, and such containers have Service Processing Time-out situations-expanding these types of containers

4. A certain type of container has surplus processing capacity, and the traffic of such containers meets the contraction conditions-scale down these containers

5. the processing capacity of the cluster has reached the limit, and the system may crash-overload protection for the cluster

 

 

II. the faulty node will automatically return to the network

When a business node encounters an unrecoverable fault, it can no longer process the business normally. The system management module automatically detects the fault node and exits the faulty node from the business cluster to ensure the normal operation of the cluster.

 

Iii. node traffic control

When a node does not have sufficient processing capabilities, for example, when performing log tracking, the system management module can reduce the number of messages sent to the node based on the node's processing capabilities for real-time load balancing.

 

 

Iv. dynamic resizing

When the processing capability of a certain type of business container is insufficient, the system can automatically scale up online. During the expansion, the business will not be affected. When the processing capability is sufficient, the system automatically scales up and down online so that resources can be provided to required services.

V. node overload protection

When the processing capacity of the entire cluster has reached the limit (expansion is not allowed), to prevent system crashes, overload protection can be performed based on business conditions, such: discards the initial authentication request.

 

Vi. phased release

The system supports the gray release capability to allow the test point section and normal business nodes to run on the same network. The test number is routed to the test node for processing without affecting other normal numbers.

 

To be continued...

 

Technical Exchange and Cooperation QQ group: 436466587 welcome to discussion

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.