Dockone WeChat Share (91): Build flexible scheduling container platform for company aims Yangzhou data processing volume

Source: Internet
Author: User
Tags file handling percona docker swarm influxdb prometheus monitoring value store
This is a creation in Article, where the information may have evolved or changed.
"Editor's note" This share introduces the seven Cow data processing team's experience in container technology, and shares seven how to create a data processing platform that is easily scalable, easy to deploy, high degree of freedom, high availability and high performance through self-developed container scheduling framework.

The main contents include four aspects:
    1. Business scenarios for massive data processing
    2. The challenges of a massive data processing platform
    3. Introduction of self-research container dispatching framework
    4. The practice of massive data processing platform


First, data processing business scenarios

First of all, the background of the seven Cattle data processing business is introduced. Seven Qiniu currently has more than 500,000 corporate clients on the platform, with more than 200 billion images and over 1 billion hours of video. When users store these images and videos on seven of cows, they will have some data processing requirements, such as scaling, cropping, watermarks, etc. These files are continuously online and diverse, and it is very uneconomical for users to upload these files to seven cows when they are processed on their own boards. And seven cattle first provide storage-based data processing functions to facilitate the user to do data processing, these data processing is usually placed in the Enterprise Client or server side to operate, docking on the seven Cow cloud storage Data processing interface, you can be rich in real-time transcoding graphics and audio capabilities, The new spec file generated by transcoding is placed on the seven-cow cache layer for app calls, without taking up storage space, which is not only cost-effective for the enterprise, but also improves development efficiency. Example of data processing for a picture clipping:

The file handler for the seven Cows is abbreviated FOP (file operation), and different file handling operations use different FOP. Users can only upload a single original file to get a variety of style-rich files by using the seven KN data processing function. Flowchart for files from upload storage to processing to distribution:



Ii. challenges of massive data processing platform

The massive data of seven Qiniu achievement Dora very formidable data processing ability, at present seven cow's data processing service already processing several nearly times. Faced with such a huge amount of data processing requests, the original data processing platform also faces new challenges:

    1. Daily average request Volume company aims Yangzhou, CPU intensive computing
      At present, the system has nearly tens of billions of data processing requests per day, with nearly thousands of computing clusters, the entire stock, the increment is very large. The majority of the machine in the data processing cluster is used to run pictures, audio and video transcoding, these are CPU-intensive calculations, which means that the background requires a lot of machines, and the CPU more than the number of cores better. At the end of the year the data processing platform may be doubled on the basis of the current cluster of computing clusters, requiring rapid physical expansion and the ability to manage efficiently and intelligently.

    2. Server load imbalance, resource utilization is not high
      The processing time of real-time online processing is short, but the volume is large, and a large number of instances are needed to deal with high concurrency. and the processing time of asynchronous processing is long, also need to allocate enough resources. When the real-time business is not busy and the business grows asynchronously, it cannot use the resources assigned to the real-time business, which causes the unreasonable allocation of the static resource allocation mechanism, resulting in unbalanced server load and low resource utilization.

    3. Bursts of non-measurable, large amount of redundant resources
      In the new Access user is not exactly the correct forecast request volume, the original mode is through the rapid expansion of the machine and verify the on-line, the need for a certain amount of processing time, for such unplanned requests need to prepare a large number of redundant resources to deal with burst traffic.

    4. Cluster load is too heavy to automatically expand on demand
      When individual users burst data processing requests, the cluster load pressure is too large, CPU processing is slow, request time is longer, request task accumulates, affect other business, and can not be expanded rapidly on the basis of existing resources, and can not be automatically expanded according to the actual business pressure on-demand cluster instance.

    5. User-defined application (UFOP) quality and scale unknown
      In addition to providing official data processing services, we also support customers to deploy custom data processing modules to the nearest computing environment of seven cow cloud storage, avoid the performance and traffic cost of remote reading and writing data, and satisfy users ' multidimensional data processing needs. However, a variety of ufop running on the same platform may have some UFOP quality problems or excessive requests and insufficient resources allocated to affect the normal operation of other services on the platform.


Introduction of the self-research container dispatching system

In order to solve the above problems, seven cattle based on resource management system Mesos self-developed a set of container scheduling framework (Doraframework), through the container technology to create easy to expand, easy to deploy, high degree of freedom Data processing platform Dora. The overall architecture diagram looks like this:

Description of each component:

Mesos: Zookeeper, Mesos Master, Mesos Agent constitute the basis of the Mesos data center operating system, can be unified management of all physical machines in the room, responsible for resource-level scheduling, the two-tier scheduling system is the most basic operating environment.

Doraframework: The business layer scheduling framework, through the doraframework use Mesos to manage all the physical machine resources, the completion of business process scheduling and management.

Consul: An open-source cluster management system that includes service discovery, health check, and KV storage capabilities, the Doraframework Dispatch system provides basic service discovery capabilities using Consul's service discovery and health check mechanisms, Use the KV storage function to store doraframework metadata.

Prometheus: An open-source monitoring system that enables monitoring at the machine level, at the container level, and at the business system level.

Pandora: Seven cow's internal log control management System, responsible for the production environment all the log aggregation and processing.

In this architecture, we choose to implement the real-time scheduling of elasticity across machines through container technology. The scheduling framework can dynamically dispatch the number of containers according to the specific business load situation, which solves the problem of low resource utilization caused by static configuration. And the characteristics of the container seconds to solve the problem when there are a large number of sudden request to enter, can quickly start the service. In the network aspect, because the UFOP is the service which the user deploys to run, does not know whether the user has opened other ports to use, therefore uses the bridge mode, needs the external use port to be exposed through the NAT, so that the service internal use what port does not have the influence to the external environment, Very good security isolation of the platform environment.

The scheduling system of data processing platform we chose the Mesos self-research container scheduling framework (doraframework). Choose Mesos as a resource management system one is because Mesos's relative other container scheduling system is more mature, Kubernetes is 2015 to release the production environment to run the version, Docker Swarm was released in 2016, The production practice of the two products in the survey is basically no large-scale production practice experience, and Mesos has more than seven or eight years of history, and resource management has been in such as Apple, Twitter and other large companies have been production practices, stability is better; the second is because Mesos supports dispatching thousands of nodes, With seven cattle has reached the scale of nearly thousands of physical machines, and every year in a significant increase in the situation, Meoso this support for ultra-large-scale scheduling of resource management framework more suitable for seven cattle business development; The third is because of the simplicity of mesos, openness and scalability, Mesos is an open source distributed elastic resource management system, the entire Mesos system uses a two-tier scheduling framework: The first layer collects the resource information of the whole data center by Mesos, and then allocates the resources to the framework; the second layer assigns resources to its own internal tasks by the framework's own scheduler. Mesos itself only manages the resource layer, and this simplicity brings stability. The container scheduling framework can use open source frameworks such as marathon/chronos or self-developed. Kubernetes Although the function is very rich, but also more complex, the components and concepts are more, and lack of openness and scalability, can only use it to provide the scheduling function, and not according to their own business situation to customize the scheduling framework, will cause the kubernetes too dependent on the situation.

Why not choose Mesos's core Framework marathon and choose self-research, for three reasons: 1. Marathon Some aspects do not support the use of posture we expect, such as not very good seamless service discovery; 2. Marathon using Scala development, out of trouble to troubleshoot, it is not convenient for us to do two times development; 3. If the choice of marathon, we still have to do a layer of marathon packaging to serve as Dora scheduling services, so that the module will become more complex deployment operations.

Doraframework is a container scheduling framework for seven cows using go language self-research. Doraframework realizes the scheduling of the business process in the Mesos two-tier scheduling system, and is the core component of the Dora Dispatch Systems, and provides API interface through interaction with Mesos and consul components. The architecture diagram is as follows:

Doraframework Main Function Introduction:
    • Deployment of automated applications
    • Service Registration and Discovery
    • Number of flexible dispatch containers
    • Load Balancing
    • Support for adding or reducing instances on specified machines
    • Support for highly available
    • Version and upgrade management for apps
    • Support for getting the status and log data of an instance
    • Support for business-level monitoring
    • Support for failure repair of instances


Comparison of Doraframework and marathon scheduling architectures:
    1. Service registration and discovery of Doraframework dispatching system using consul implementation, Consul is used to realize the service discovery and configuration of distributed system, support the discovery of internal service or external service across the data center, provide DNS interface externally, MARATHON-LB does not support service discovery across data centers.
    2. Marathon is the discovery of services through the Serviceport service port or vhost of the MARATHON-LB node, which requires the network mode to be bridge. Because MARATHON-LB is also responsible for load balancing, in a large business environment, if the MARATHON-LB exception occurs, it will affect the framework's correct service discovery.
    3. The Dora dispatch system can do more precise and flexible scheduling. Because it not only supports the use of resource-level monitoring, but also to do business-level monitoring, in the case of scheduling can be based on the actual business pressure scheduling.
    4. The load Balancer component within the Dora dispatch system is load distributed by taking the addresses from the consul to all available instances, and can be distributed more accurately based on the business load of each instance. And marathon-lb does not have the monitoring data of the business layer.
    5. Consul provides system-level and application-level health checks that define health checks in two ways with configuration files and HTTP APIs, and support for check in five ways, TCP, HTTP, Script, Docker, and Timeto Live (TTL). The default health checks of marathon only checks the status of tasks in Mesos, and when the task is running, it is considered to be a state of heath so that it cannot be used for application-level checkup. Marathon can view the health status of an app through the rest API, but it supports only TCP, HTTP, and command three ways.
    6. Dora Scheduling System provides a monitoring stack in the business process running process will summarize the collection of business health indicators, such as the number of requests, request delay and other information, business processes exposed to a standard HTTP monitoring interface, monitoring interface data output in line with the Prometheus monitoring data format. Prometheus by configuring consul as the service discovery address, a list of business processes that need to collect monitoring data is obtained from Consul, pulling monitoring data from the HTTP monitoring interface exposed by the business process.


We use consul as a registration center to realize the registration and discovery of services. Consul comes with Key/value storage, which can be found through the DNS interface, and features specific health checks, and supports service discovery across data centers. API Gateway can query the list information of all available instances of the service through the DNS interface provided by Consul and forward the request.


    1. Automatic registration and revocation of services
      When you add a microservices instance, the principle is to wait for the instance to be running and then register the instance's access address with the Consul client's service registration, configure the Health check for the service, and then synchronize the data to the Consul server's service registry.

      When reducing an instance, the principle is to remove the instance from the service registry of the consul server, wait for the cooldown time, and then destroy the instance from the dispatch system. To complete the automatic registration and revocation of services.

    2. Service discovery
      External systems when you want to access a service, you can query the access address of all health instances that the current service has registered in Consul server through the service name from the DNS interface provided by the consul server, and then send the request to the instance.


Iv. practice of massive data processing platform

The configuration management of our production environment uses the ansible,ansible default SSH remote connection, no need to install additional software on the managed node, can be batch system configuration, batch deployment, batch Run command, etc., very suitable for large-scale IT environment of seven cattle. The playbooks is a simple configuration management system and a multi-machine deployment system based on a very simple and readable, very suitable for the deployment of complex applications. Through Ansible, we can realize the one-click installation and deletion of data processing platform, add and delete nodes, and also include the upgrade and rollback of component version, as well as the batch configuration modification of production environment, which simplifies complex operation and maintenance configuration management.

In practice, choose a host as a central control machine, install ansible, and then configure this console and all remote host SSH Trust, and then configure the playbook file on the control machine, you can batch operation of multiple hosts. For a simple operation, you can execute the following command:
$ansible-playbook Main.yml-i hosts

In the main.yml to edit all the required operations, in the Hosts file to write all the required operation of the host IP address, you can complete the Hosts file in the batch operation of all hosts. For complex operations, you can configure them by writing playbook. Roles store different role tasks, such as Mesos master on the task performed on the Mesos agent and perform different tasks, it can be placed in different roles, can also be Mesos, Zookeeper, Consul put in different roles. Tasks are performed in role, and handlers is the task that triggers execution in tasks. Templates are template files, for example, we need a personalized consul default profile, you can modify the configuration file in this directory, in the execution of this file to replace the default configuration file.

In monitoring, the data processing platform has a complete monitoring system, including host monitoring, container monitoring, service monitoring, traffic monitoring, log monitoring. Host and container monitoring mainly through the Prometheus of various exporter to do, collected to include CPU, memory, network and disk real-time usage, service monitoring and traffic monitoring through seven cattle own monitoring program to monitor, can monitor the service status, survivability, handle number, and the number of requests, failures, and so on for all processing commands. Log monitoring is monitored through the seven-cow log platform Pandora system, including the collection of system logs, container logs, and business process logs. By modifying the output of the open source file collector Filebeat, all collected logs are transferred to the seven internal log monitoring system Pandora for log monitoring.

The monitoring data is displayed as follows:

The above is the case of seven cow cloud data processing platform based on container technology practice. At present, the seven data processing platform with 0 operations, high availability, high-performance data processing service capabilities, can make it easy for users to deal with pictures, audio and video and other types of data in real-time, asynchronous processing scenarios. The seven-cow data processing business system not only handles data processing requests from the seven cow cloud storage, but also supports data processing requests from non-seven-qiniu storage, and can directly handle data processing requests from the seven cow cloud distribution fusion to improve the processing speed of CDN intermediate source data. and the data processing platform Dora is an open platform, not only can run seven of their own data processing services, but also support the operation of user-defined data processing services, and has a wealth of operations and maintenance management functions, can enable users from the complex operation and maintenance management and architecture design, to focus on the realization of data processing unit. The business support capabilities of the seven-cow data processing platform are as follows:

Q&a

Q: What is the development of the management system based on? Will this system be open source?

A:dora's scheduling framework is the basic go language development. Currently not open source, but provides private deployment.
Q: Just start looking at the Mesos framework implementation, how do I call custom executor in a custom scheduler?

A:schesuler and executor This is according to Mesos the latest V1 version of the HTTP API to do, this is not incompatible with the problem, just Mesos go version of the SDK is somewhat old, the update is relatively slow, this place we have to make some changes.
Q: How large is the current consul cluster? Have you considered the performance bottleneck of consul expansion?

A:consul is in each slave node will have a consul agent, we have a room has more than 200 units dedicated to data processing machines, so consul cluster size is so large, single room. There is no bottleneck for us at the moment, because our use of consul is relatively simple: as a reliable storage for metadata, metadata updates are not very frequent, we have consulted some of the performance tests that others have done and some of our own tests, and the performance is satisfying the demand. Another feature is the service discovery and instance health Check, health check is run on each machine Consul agent is responsible for, on each machine, in fact, the number of instances on a single machine is not particularly much, so this part is not too much pressure. Of course, this is also related to the size of the business, assuming that the day consul expansion into our problem, but also that our business volume is particularly large, we also very much expect this day to come.
can Q:dora support automatic scaling for MySQL?

The application scenario of the A:dora system is also a stateless service that runs some data processing commands. MySQL such a system is not suitable for running directly in the Dora, if you expect MySQL to run on the mesos above, you need to implement a dedicated MySQL scheduler, because the MySQL instance of the scaling capacity, the repair of instance failures have MySQL's own specific requirements. The container of stateful services like our company MySQL is implemented by another container platform of the company. MySQL uses the Percona XtraDB cluster scheme, we use another container platform API to write a Percona XtraDB cluster scheduler, Percona XtraDB Most operations on the cluster are automated on the container platform.
Q: Is your ansible host file dynamically generated? Code push is also through ansible? How are new delete nodes and rollback operations implemented?

A: At the beginning of the practice is not a dynamic generation, in fact, we can get from Consul to the current cluster nodes and nodes of some simple configuration information, after the consideration from the consul inside the node information, dynamically generated for ansible gray-scale host file. Code pushes are also used by ansible, and GitHub can also be used if the machine can be connected to an extranet. Because the role of our playbook is differentiated by component, the new delete node only needs to modify the host file, add the corresponding node to install or delete the corresponding component. such as rollback operations:
$ ansible-playbook rollback.yml-i hosts-e "hosts_env=xxx app_env=xxx version_env=xxx"

Parameter description:
  • Hosts_env: Represents the host group to be rolled back, such as Master
  • App_env: Represents the component to be rolled back, such as zookeeper
  • Xxx_version: Indicates the version number of the component to be rolled back, such as v1.0.1.20160918
What is the scheduling strategy for Q:dora? Could you give me a brief introduction?

A: First of all, ensure that the same instance of data processing commands are distributed evenly on different machines, and then ensure that the load on each machine is balanced.
Q:prometheus is now a stand-alone, the amount of data is big how to do? Is there a influxdb of Prometheus monitoring data?

A: Currently we are split server by business, the amount of data can be supported. We did not use influxdb, or the native leveldb.
Q: Do you have any special handling of storage technology with such a large file size? How to achieve high performance and mass storage balance?

A: Seven Cow cloud storage is designed for storing large volumes of small files, so the first change to the filesystem is to go to the directory structure (there is a directory that implies a parent-child relationship). So the seven Qiniu store is not a file system, but a key-value store, or an object store. Each of our large files are cut into small files stored, meta-information in a separate database, the user when requested by the business layer after the merger of processing and return. Therefore, in theory, the disk only stores small files, large file storage and read performance mainly in the file cutting and merging.
The above content is organized according to the November 1, 2016 night group sharing content. Share people Chen Aijin, seven Qiniu preacher, is responsible for the research and evangelism of DevOps, container, and microservices-related technologies. Years of enterprise-class system operations management experience, large-scale distributed system architecture design and operation and maintenance has a wealth of experience。 Dockone Weekly will organize the technology to share, welcome interested students add: Liyingjiesz, into group participation, you want to listen to the topic or want to share the topic can give us a message.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.