Dockone WeChat Share (117): Hujiang Practice of containerized operation and maintenance

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.
"Editor's word" Hujiang current container technology main application scenario: OCS courseware business stateless application; Based on Apache Mesos+marathon implementation Hujiang container system scheduling management; Consul + Consul Template + nginx for automatic discovery and registration of services , Prometheus + Grafana + alertmanager alarm to implement container monitoring alarm. This sharing will be explained in the following ways:
    • Select Container Technology Reasons
    • Container Technology Selection
    • Container storage
    • Container Network
    • Monitoring alarms
    • Image Management
    • Scheduling management
    • Service Registration Discovery
    • Automating deployment
    • Automated scaling capacity


"Shenzhen station |3 Day burning brain-type Kubernetes training camp" Training content includes: kubernetes overview, architecture, logging and monitoring, deployment, autonomous driving, service discovery, network solutions, such as core mechanism analysis, advanced article--kubernetes scheduling work principle, Resource management and source code analysis.

Select Container Technology Reasons

    • Lightweight
    • Fast delivery (MS level)
    • Environmental standardization
    • Flexible deployment migration Management
    • High resource utilization
    • CI, CD Natural Advantage
    • Cloudy platform Support
    • Open source


Container Technology Selection

Store:
    • Local Storage (Devicemapper LVM Direct)
    • Shared Storage (Ceph)


Internet:
    • Container Interconnection (Overlay)
    • Container physical network interworking (bridge, host)


Monitoring:
    • Prometheus + cadvisor + grafana (No business attributes)
    • Prometheus + Mesosexporter + Grafana (based on different business, add business attributes, based on business monitoring)


Image management:
    • VMware Harbor


Scheduling Management Platform:
    • Mesos + Marathon


Deployment release:
    • Jenkins + Marathon Deploy


Automated scaling Capacity:
    • Mesosphere Autoscale


Store

Storage classification:

Logs and shared storage, on the storage Docker and kubernetes have some differences, docker companies to promote the Volume-driver concept, that is, all storage is driven, local storage and network storage is only the corresponding driver is different.
    • Local Storage (Local disk)
    • Shared storage (Ceph file system)


Storage usage:
    • Local storage, log file persistence (local volume mapped to container)
    • Shared storage, stateful service state retention and data persistence (Ceph-fuse, equivalent to accessing local storage)


Storage FAQs:
    • Problem description, Devicemapper loop mode, Docker does not map the container log to the host, the container generates a large number of logs, resulting in containers or local volume logic use 100%, the container abnormally down.
    • Solution, container log local storage, enhanced container storage monitoring (Docker Version>=1.13.0,docker System DF view), container storage scheme selection Devicemapper Direct and based on logical volume monitoring (Docker monitoring first).


Internet

Network classification

There are several ways to enable container communication across hosts.

    • Sophomore, sophomore layer is the physical network card and container Network Bridge to the same Linux bridge, the container network can be connected to the Linux bridge, you can also connect the container network to the container Bridge Docker0, and then bridge the Docker0 to the Linux bridge , making the container network and the host network in the same two-tier network. Common implementation Scenarios host, Macvlan.

    • Nat, Nat is the default Docker network, which uses iptables address translation to implement host IP-to-container communication. Container external IP is the host's Ip,nat performance loss is relatively large, but as long as the host between the three-layer IP can be reached, the container can communicate.

    • Tunnel (Overlay) mode, VPN, Ipip, Vxlan and so on are tunnel technology, that is, in the container packets between packets of one layer or multilayer other data protocol header, to achieve the effect of connectivity. Docker's libnetwork supports Vxlan overlay mode, weave also supports Vxlan mode for UDP and overlay, flannel, calico, and so on. A global kv store (SDN Controller, Etcd, Consul, ZooKeeper) is generally required to hold control information. This way is generally only three layers can be reached, the container can be interoperable. Overlay Mode container has independent IP, the performance is generally better than NAT, different overlay schemes vary greatly.

    • The routing approach, the routing (SDN) scheme, is to allow the container to communicate with the container, the container, and the host by the way of the route setting. For example: Calico BGP Routing Scheme (non-IPIP). This approach generally applies to a single data center, most commonly used in the same VLAN, and different VLANs need to set up routing rules. The performance of the routing scheme is very low, and the performance of the host network is close.


Features of mainstream network solutions

Default Support (bridge, none, container, host), the default Docker supported network scenarios, configuration management convenience, without the need to introduce third-party tools to reduce operational costs, but bridge mode network performance loss is too high, Host mode Many scenarios below do not solve the port problem well, IP management tools need IP and port allocation management.

Overlay Vxlan A solution that compromises performance and availability, requires global kv storage, high kernel requirements (>3.16), and although packet unpacking is required, the process takes place in the kernel with better performance than flannel.

Flannel,flannel by default, UDP packet packets, in high concurrency, there will be packet loss problem, package packets are on the client side, there will be some performance losses; requires that all hosts on the same network can be routed directly; it will cause IP drift, which requires frequent encapsulation between source and target hosts, To consume a large amount of CPU resources while simultaneously combining other features of the overlay network.

Calico,calico BGP requires a physical network to support BGP routing protocols, while containers have a large intrusion on physical network device performance, especially when used in conjunction with existing core business.

Macvlan, performance is second only to the host mode, the container network and physical network are completely open, but need to evaluate the existing network equipment configuration bottlenecks, especially the number of containers increase the introduction of some in the traditional network can not occur in the problem.

Performance comparison of mainstream network scenarios

Reference Ucloud Cloud Host a Performance test report: Http://cmgs.me/life/docker-network-cloud.

CPU pressure mainly depends on load, the final ranking is host < Calico (BGP) < Calico (IPIP) = Flannel (VXLAN) = Docker (VXLAN) < flannel (UDP) < Weave (UDP)


Compare the test Macvlan and the bare network card on the physical machine:

Network selection

Based on the current status of Hujiang network (physical network 0 changes), while this project is just in the trial application phase, we use host, overlay network, host mode to solve the SLB container IP address and port fixed, container and external network communication, overlay network to achieve cross-host container network interoperability.

Marathon Network configuration:

Overlay network configuration is very simple, relying on distributed kv storage zookeeper, based on marathon framework configuration, to realize the interoperability between containers, containers and physical network.

Overlay Network Architecture:

To create a overlay network:
Docker Network create-d Overlay--subnet value NetworkName

Docker launches the specified network storage:
--cluster-store=zk://zk1.yeshj.com:2181,zk2.yeshj.com:2181,zk3.yeshj.com:2181,zk4.yeshj.com:2181,zk5.yeshj.com:2181/store

Marathon configuration:



Note: The use of the Docker network encountered a lot of pits, such as the kernel version or the Docker version is too low to cause instability, it is recommended that the use of the Linux kernel to 4.4,docker version rose to more than 1.12. Network optimization This block, follow-up plan in-depth study calico BGP Network and Macvlan, hope to have a greater improvement in network performance.

Monitoring alarms

Common Docker Monitoring

Cadvisor, Datadog

The Prometheus is ideal for monitoring container-based infrastructure. High dimensional data model, time series is identified by a measure name and a set of key-value pairs. The flexible query language allows querying and plotting of data. Advanced Metric types like summary (summaries), build ratios from the total number of specified time spans or alarm at any time of exception and no dependencies, making it a reliable system for debugging during outages. It is precisely because of its flexible query drawing statements that we finally choose Prometheus.

Detailed comparison of each monitoring system reference: http://dockone.io/article/397.

Prometheus Features

    • Container-based basic monitoring, all components are Docker, maintenance and deployment management is convenient
    • Distributed architecture
    • Data collection extensions do not rely on distributed storage
    • Support for multiple service discoveries (default support Kubernetes, Consul, EC2, Azure)
    • Common service monitoring with ready-made collector exporter (HAProxy, MySQL, PostgreSQL, Memcached, Redis, etc.)
    • Sequential-based KV storage database for very flexible query plotting


Selection of monitoring options

Selection: Cadvisor + Prometheus + Grafana

Docker monitoring this block we tried the traditional Zabbix + python implementation of the physical host-based container monitoring scheme, but the dynamic change of the container IP, the whole monitoring system to monitor the false alarm rate is particularly high, historical data can not be traced; then try Cadvisor + Prometheus + Grafana Way, today mainly shows this set of monitoring system.

System Architecture:

Component function characteristics

Cadvisor: Container Data Collector, based on the container name or ID main collection of host container CPU, memory, network, file system, container status and other basic information, Cadvisor Metrics API interface provides to Prometheus, easy to persist and flexible query display.

Prometheus: Sequential-based KV storage database with flexible class SQL query syntax, based on query drawing.

Grafana: Data presentation platform, supporting a variety of data sources (Prometheus, Zabbix, Elasticsearch, InfluxDB, opentsdb, graphite, etc.).

Nodeexporter: Host data logger, the current deployment of open source programs have problems, temporarily use the existing Zabbix solution to replace the physical machine monitoring.

Altermanager:grafana after 4.0 version Support alarm management, support email, slack and other alarm methods.

Monitoring page

Cadvisor




Grafana


Optimization

Optimization scheme: Mesos exporter/metrics + promatheus + Grafana

The container can be monitored by adding business attributes, based on business logic monitoring, by centralizing the Mesos Masters Metrics, and adding a business tag to the container name.

Image Management

Tool selection

VMware Harbor Open Source Tools

System architecture


Docker Hub mirrored storage warehouse back-end Storage: Ceph (storage docker images and MySQL data)

Dockerhub function

    • Project management: Add, delete, change, query
    • Project member Management: Images management, user management
    • Remote Mirror Warehouse synchronous replication
    • Project, warehouse, image query Search
    • System Management
    • User Management
    • Target Management
    • Remote Responsible Policy Management
    • Docker Client Push, pull mirror
    • deleting warehouses and mirrors


Harbor Management Platform Display




Multi-Machine Room deployment

Each DC deploys a set of harbor to synchronize each DC image via harbor remote replication:

Office Harbor = DC01 Harbor = DC02 Harbor

Using DNS zone resolution to implement different DC pull current DC Harbor mirroring, push action push to office Harbor.

Scheduling management

Scheduling System Selection

    • Docker Swarm, the official website to develop the default Docker container cluster The simplest management tool, production applications less, but as a lot of developers default selection tool, the official website community support; After 12.5, the default integration into the Docker Engine simplifies cluster management , with easy-to-understand policies and filters, but because it cannot handle node failures, it is not recommended to use in a real production environment. The swarm and Docker environments are well-integrated, use the same API as the Docker engine, and work well with Docker compose, so it's great for developers who don't know much about other schedulers.
    • Kubernetes, the community is most active and developing at its best, and the logic of Kubernetes is different from the standard Docker, but its concept of pod and service allows developers to think about what a combination of these containers is when using containers, which is really interesting. Google offers a very simple way to use kubernetes on its cluster solution, making kubernetes a reasonable choice for developers who already use the Google ecosystem.
    • Mesos, open source Tools, Mesos & Marathon A perfect combination of solutions. Ability to dispatch tasks like other Mesos frameworks, with strong compatibility, support for plug-in management integration, flexibility, and a large number of enterprise user experience, the highest stability; Have a description JSON file similar to Docker compose to make the task configuration, These features make it an excellent solution for running containers on a cluster.


By comparing the characteristics of three scheduling frameworks, we chose Mesos + Marathon as container cluster resource management and scheduling scheme.

System architecture


Scheduling system Components

Mesos,apache Open Source Unified resource management and scheduling platform, known as the kernel of distributed system; provides failure detection, task release, task tracking, task monitoring, low level resource management, and fine-grained resource sharing to scale up to thousands of nodes.

Marathon, the framework for managing long-running applications (long-running applications) on Apache Mesos, implementing discovery of services, providing rest API services for deployments, with authorization and SSL, configuration constraints, Realize service discovery and load balancing through Haproxy, and of course other third-party tools consul.

Chronos, the framework for managing short, timed, one-time tasks on Apache Mesos, job scheduler with fault-tolerant features that can handle dependencies and ISO8601-based scheduling instead of Cron's open source products. The job can be orchestrated to support the use of Mesos as the job executor.

ZooKeeper distributed, open source distributed Application Coordination Service is an open source implementation of Google Chubby and an important component of Hadoop and HBase. Software that provides consistent services for distributed applications, including configuration maintenance, name services, distributed synchronization, and group services.

Mesos+marathon Functional characteristics

    • High availability, support for automatic switching of multi-master nodes (primary and Standby mode)
    • Supports multiple container environments (Docker, Mesos Docker)
    • Support for stateful services such as databases
    • Use the Web management interface for configuring operations and monitoring System status
    • Constraint rules (Constraints), such as restricting the distribution of tasks to specific nodes, port allocations, etc.
    • Service discovery, Load balancing (Mesos DNS, marathon-lb)
    • Supports health checks for fault tolerance (HTTP, TCP)
    • Support for event subscriptions for integration into other systems
    • Running indicator monitoring interface (metrics centralized monitoring)
    • Complete easy-to-use REST API


Mesos Resource Management Scheduling principle

The Mesos framework is an application that runs distributed applications and has two components:
    • Scheduler: Interacts with Mesos, subscribes to resources, and then loads tasks from the server in Mesos.
    • Executor: Obtain information from the framework's environment variable configuration and run the task from the server in Mesos.


First, the Mesos primary server queries the available resources to the scheduler, the second scheduler sends the load task to the primary server, the primary server communicates to the slave server, loads the task execution from the server to the executor command, the executor executes the task, reports the state feedback to the slave server, and finally informs the scheduler. Managing multiple actuators from the server, each executor is a container that used to use the Linux container LXC and now uses the Docker container.

Mesos failure recovery and high availability

Mesos Primary server uses zookeeper for service election and discovery. There is a registrar that records all running any and all information from the server, using Multipaxos for log replication to achieve consistency.

Mesos has a recovery mechanism from the server, regardless of when a server freezes, the user's task can continue to run, from the server will be some key information such as task information status updates persisted to the local disk, These tasks (similar to passivation and wakeup in Java) can be resumed from disk upon reboot.

Scheduling System Management Platform

Marathon scheduling Platform

Marathon configuration JSON


Marathon Configuring the Web

Mesos


Chronos


Service Registration Discovery

Scheme

Nginx/marathon + Consul Agent + Consul Server + Consul Template

System components

    • Consul Agent: Gets the status of all the containers on the current machine (service name, physical node IP, service port, etc.) and registers to Consul server.
    • Consul server: A distributed storage system that maintains service information.
    • Consul Template: Reads the service of the Consul server and renders the module build configuration file Once the configuration has changed to reload the configuration.


System architecture


The user accesses the front-end application (nginx/marathon-lb/haproxy), application through the app Configuration file to obtain the provided service from the backend application, and then returns it to the user. The consul agent is installed on the backend application and the consul agent is added to the consul cluster. The Consul template connects to the Consul cluster server, dynamically pulls the backend application service information from the Consul service repository, which is written to the front-end application configuration file, After a write is completed (that is, when the background service changes), Consul-template automatically tells the front-end app to reload by command, enabling the front-end application to dynamically discover the backend service and the purpose of applying the new configuration file.

Using Nginx + Consul template Dynamic server discovery and registration as an example:
    • The consul cluster is made up of 3, 5, or 7 consul servers to ensure high availability and master elections for the entire cluster.
    • The Consul agent is deployed on each mesos slave node to capture information such as the current node service name, IP address, service port, and service status, and update to the Consul cluster in a timely manner.
    • Consul template timely read the Consul Cluster service information (service name, IP, service port, service status), once this information changes, update the Nginx configuration file, and reload Nginx.


Configuration, service Management

Consul cluster (3 nodes):
Docker run-d--net=host--name=consul dockerhub.domain.com/consul:0.6.4-server-advertise consul_server01_ip- Recursor dnsserver01_ip-recursor dnsserver02_ip-retry-join Consul_server02_ip-retry-join consul_server03_ip


Consul Agent:
Docker run-d--net=host--name=consul--restart=always dockerhub.domain.com/consul:0.6.4-advertise consul_agent_ip- Recursor dnsserver01_ip-recursor dnsserver02_ip-retry-join Consul_server01_ip-retry-join consul_server02_ip- Retry-join consul_server03_ip


Consul Template:
Consul-template--zk=zk://zk1.yeshj.com:2181,zk2.yeshj.com:2181,zk3.yeshj.com:2181,zk4.yeshj.com:2181, Zk5.yeshj.com:2181/mesos "


Consul server/agent Config:
Cat Consul_base.json
{
"Datacenter": "Shanghai",
"Data_dir": "/data",
"Ui_dir": "/webui",
"Client_addr": "0.0.0.0",
"Log_level": "INFO",
"Ports": {
"DNS": 53
},
"Rejoin_after_leave": True
}


Consul Template config:
/bin/sh-c echo "Upstream app {{{range service \" $SERVICE \ "}} server {{. Address}}: $CONTAINER _port; {{Else}}server 127.0.0.1:65535; {{End}}} server {listen default_server; location/{limit_rate_after $LIMIT _after; limit_rate $LIMIT _rate; Proxy_pass http://a pp }} "> $CT _file; Nginx-c/etc/nginx/nginx.conf & Consul_template_log=debug consul-template-consul= $HOST: 8500-template "$CT _file: $NX _file:nginx-s Reload "

consul-template-consul=consul_agent_ip:8500-template/etc/consul-templates/nginx.conf:/etc/nginx/conf.d/ App.conf:nginx-s Reload


Automating deployment

Scheme

Jenkins + Git + Marathon Deployment plugin + docker image build script

System components

Git program source code, Dockerfile, Marathon.json and other programs and configuration file version management.

Jenkins implementation of CI, CD tools, provide a variety of plug-ins, Marathon Deployment plugin through the Jenkins, Marathon release process, only need to provide Marathon certification information and configuration, you can complete the release.

Implementing logic

Create Gitlab Certification account + Build Jenkins Freestyle task + Add Gitlab info + + Create build trigger + build process (docker image, push image) = Add Docker Registry credential = Post Build (build marathon deploy)

Automated scaling capacity

Scheme

Marathon-lb-autoscale

System architecture


Implementation method

Docker container

Startup parameters


Marathon configuration file

{
"id": "Marathon-lb-autoscale",
"Args": [
"--marathon", "http://leader.mesos:8080",
"--haproxy", "http://marathon-lb.marathon.mesos:9090",
"--apps", "Nginx"
],
"CPUs": 0.5,
"Mem": 16.0,
"Instances": 1,
"Container": {
"Type": "DOCKER",
"Docker": {
"Image": "Mesosphere/marathon-lb-autoscale",
"Network": "HOST",
"Forcepullimage": True
}
}
}

Q&a

Q: How is Prometheus specific to play? For example, what do you do with hot and cold storage metrics data? Does the alarm only use the Grafana 4.0?

A: You can have a private look at the first close the document, here 1:30 will not be able to tell; metrics data in a sequential manner in Prometheus, do not do any processing, based on SQL-like query; Query statements can be set to filter, or conditional query; Grafana 4.0 and later only support the alarm, while the traditional monitoring alarm also has zabbix.
Q: How is QoS implemented for different applications? What is the tenant management and granularity of the container?

A: Flow control This can be considered in the Haproxy, or nginx this block to do API Gateway, the current container for their own use, tenant management This piece is not broken down.
Q: What is the process of marathon deployment, the new version is to replace the old version is to create a new app, delete the old, or how the app naming specification has no suggestions. How can the container's IP be associated with the upstream upstream domain name after the container is deployed and how to register it on Nginx?

A: First to a certain proportion (2) to publish a new instance, and then delete the old instance, the final implementation of the same number of instances; The naming is best regulated to prevent conflicts; Nginx + Consul serve R + Agent + template can solve all your questions.
Q: Are there network problems causing system anomalies in this network environment?

A: We have encountered a low-level docker 1.9.1 bug that caused the network to lose packets, and later upgraded Docker to 12.5, problem resolution; currently using Kernel 4.4.18,docker 1.13.0, there are no network problems.
Q: Ask the Prometheus cluster to do this? Back-End database didn't think about OPENTSDB?

A: There is no performance bottleneck at present, if there are performance problems can be considered sub-room deployment, the final display and alarm unified on the Grafana; no opentsdb is used.
The above content is organized according to the May 9, 2017 night group sharing content. Share people Xiu Xudong, Hujiang Education and operations architect. At present, it is mainly engaged in container technology learning research, CEPH distributed Storage System maintenance, Hujiang part of business operation and maintenance related work。 Dockone Weekly will organize the technology to share, welcome interested students add: Liyingjiesz, into group participation, you want to listen to the topic or want to share the topic can give us a message.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.