Chapter One: Rd/op is actually writing the same distributed system
1, each application is a part of the cluster, each RD has its own cluster management method
Some are designed to be very simple: a configuration file that reads the IP and port of the database
Some are designed to be very complex: Use name services like zookeeper, monitor yourself, deploy code yourself, do service discovery, etc.
RD's perspective rarely considers operations, and RD view-based cluster management is basically a program that can run as standard.
RD does not allow for cluster management, because without this, the program cannot run independently.
Rd can't think too much about cluster management because most of the applications are open source and cannot assume the actual ops stack.
2, each operation and maintenance system is to patch the application, each OP is to the RD wipe the butt
Operation and maintenance systems are no different from distributed applications. Operational systems are actually doing adapter, modeling each of the access applications into the same distributed application.
Opdev is not actually writing operations automation systems, they are writing a cluster management module for a distributed system.
OP is not actually an access system, but a variety of chaotic, uncompleted distributed applications to the operational dimension of the model.
A bunch of different systems with each other, each system by the rd/op two roles of the people each engaged in uncompleted project splicing.
=================
Chapter Two: The thinking of the process oriented to the scene
Release: New version on-line
Change: All changes to an existing version
Configuration update: A change that modifies a configuration with an interface to make it effective
Expansion and contraction capacity: a change of one, increase the machine
Open Zone: One of the changes, adding a set on the business
Fault handling: One of the changes to repair the fault of the machine on the line
Process, file: A common file transfer mechanism is required
Procedure, execute script: Need a common get IP, script execution mechanism
Procedure, calling API: a common mechanism to invoke the API of a cloud service
Process, combination: For each action for each scene, there is a process. A larger combination is the assembling of a bunch of processes.
The result is a whole bunch of scripts that cannot be reused, cannot be reviewed, and cannot be verified. Bash/ant and other languages are not the only problems, and not enough people to write scripts is not the only problem. Why it takes so much, basically repeating the script, is the problem.
Unscalable thinking
==============
Chapter Three: Modeling the state
IDEA: Model the state. than the actual state and expected state of the action.
The idea is good, but in practice it turns out that this thing is really a pit:
Question 1, from the top down to describe the state: how many processes from the top of each machine, to the bottom of a few files, what content.
Question 2, statically describes the global: need to describe how many IP addresses, what is deployed on each IP, the dependencies in each configuration file are statically determined
Problem 3, the action is very difficult to get right: This deployment script cannot be tested. Many times running on an empty machine will hang up, but running on my machine is a success.
Problem 4, running too slow, and unreliable: need to download a bunch of things from the extranet, very slow. Even without downloading, running a bunch of apt-get is not going anywhere.
================
Fourth chapter: Docker
The problem with Docker solves is that the state can be a lot of granular stuff. Without fine-grained to the file level, you can package all dependencies of a process into a black box. Apt-get or Yum, it's okay.
================
The fifth chapter: Name Service, service registration, service discovery
What Smartstack do is in fact the unification of the name service. Build a process and service level name service (most operations start with the CMDB, are IP-level), and then all the names of all services unified into one, can be network-dependent problems.
================
Sixth chapter: Marathon & Helix
The two tools are similar in their hard work. Instead of describing the expected state in detail, give certain rules to the system to automatically determine what the best state is. Given 5 machines with different loads and 10 processes above, Marathon will help me with what processes are running on each machine.
Helix is similar, tell Helix need how many partition, should be what state, by Helix to assign.
==================
The seventh chapter: DETC
Status: A bunch of different systems of each other, each system by the rd/op two kinds of characters of the people each engaged in uncompleted engineering splicing. Use a lot of scripts to manage the system with a process of thinking.
Expected: A bunch of systems, though functionally different, are managed in a completely consistent manner. Take over the cluster management work undertaken by all rd/op and deal with the problem in a holistic manner. Modeling with state-oriented thinking, although there are still scripts, are atomically reusable.
Cluster Management essentials