Cluster Management essentials

Last Update:2015-10-26 Source: Internet

Author: User

Tags get ip

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Chapter One: Rd/op is actually writing the same distributed system

1, each application is a part of the cluster, each RD has its own cluster management method

Some are designed to be very simple: a configuration file that reads the IP and port of the database

Some are designed to be very complex: Use name services like zookeeper, monitor yourself, deploy code yourself, do service discovery, etc.

RD's perspective rarely considers operations, and RD view-based cluster management is basically a program that can run as standard.

RD does not allow for cluster management, because without this, the program cannot run independently.

Rd can't think too much about cluster management because most of the applications are open source and cannot assume the actual ops stack.

2, each operation and maintenance system is to patch the application, each OP is to the RD wipe the butt

Operation and maintenance systems are no different from distributed applications. Operational systems are actually doing adapter, modeling each of the access applications into the same distributed application.

Opdev is not actually writing operations automation systems, they are writing a cluster management module for a distributed system.

OP is not actually an access system, but a variety of chaotic, uncompleted distributed applications to the operational dimension of the model.

A bunch of different systems with each other, each system by the rd/op two roles of the people each engaged in uncompleted project splicing.

=================

Chapter Two: The thinking of the process oriented to the scene

Release: New version on-line

Change: All changes to an existing version

Configuration update: A change that modifies a configuration with an interface to make it effective

Expansion and contraction capacity: a change of one, increase the machine

Open Zone: One of the changes, adding a set on the business

Fault handling: One of the changes to repair the fault of the machine on the line

Process, file: A common file transfer mechanism is required

Procedure, execute script: Need a common get IP, script execution mechanism

Procedure, calling API: a common mechanism to invoke the API of a cloud service

Process, combination: For each action for each scene, there is a process. A larger combination is the assembling of a bunch of processes.

The result is a whole bunch of scripts that cannot be reused, cannot be reviewed, and cannot be verified. Bash/ant and other languages are not the only problems, and not enough people to write scripts is not the only problem. Why it takes so much, basically repeating the script, is the problem.

Unscalable thinking

==============

Chapter Three: Modeling the state

IDEA: Model the state. than the actual state and expected state of the action.

The idea is good, but in practice it turns out that this thing is really a pit:

Question 1, from the top down to describe the state: how many processes from the top of each machine, to the bottom of a few files, what content.

Question 2, statically describes the global: need to describe how many IP addresses, what is deployed on each IP, the dependencies in each configuration file are statically determined

Problem 3, the action is very difficult to get right: This deployment script cannot be tested. Many times running on an empty machine will hang up, but running on my machine is a success.

Problem 4, running too slow, and unreliable: need to download a bunch of things from the extranet, very slow. Even without downloading, running a bunch of apt-get is not going anywhere.

================

Fourth chapter: Docker

The problem with Docker solves is that the state can be a lot of granular stuff. Without fine-grained to the file level, you can package all dependencies of a process into a black box. Apt-get or Yum, it's okay.

================

The fifth chapter: Name Service, service registration, service discovery

What Smartstack do is in fact the unification of the name service. Build a process and service level name service (most operations start with the CMDB, are IP-level), and then all the names of all services unified into one, can be network-dependent problems.

================

Sixth chapter: Marathon & Helix

The two tools are similar in their hard work. Instead of describing the expected state in detail, give certain rules to the system to automatically determine what the best state is. Given 5 machines with different loads and 10 processes above, Marathon will help me with what processes are running on each machine.

Helix is similar, tell Helix need how many partition, should be what state, by Helix to assign.

==================

The seventh chapter: DETC

Status: A bunch of different systems of each other, each system by the rd/op two kinds of characters of the people each engaged in uncompleted engineering splicing. Use a lot of scripts to manage the system with a process of thinking.

Expected: A bunch of systems, though functionally different, are managed in a completely consistent manner. Take over the cluster management work undertaken by all rd/op and deal with the problem in a holistic manner. Modeling with state-oriented thinking, although there are still scripts, are atomically reusable.

Cluster Management essentials

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More