Dockone WeChat Share (131): juice--a task cloud framework based on Mesosframework

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.
"Editor's words" in recent years, with the popularization of Mesos in the production environment, so that large-scale cluster management has become simple, and based on mesosframework development of the juice framework, can complete the distribution of distributed tasks, processing, for the improvement of resource utilization has great help, Let's introduce this framework to you today.

"3 days Burn brain container Storage Network Training Camp |" This training is themed on container storage and networking, including: Docker Plugin, Docker storage driver, Docker Volume Pulgin, Kubernetes Storage mechanism, container network implementation principle and model, Docker network implementation, network plug-in, Calico, Contiv netplugin, open source enterprise-level image warehouse Harbor principle and implementation.

Before I introduce juice, I'd like to talk first. Mesos,mesos is called the 2-tier scheduling framework because master completes the first-level dispatch of master->framework through the internal allocator, The framework then completes the assignment of the resource-by-task through the scheduler, which is called the second-level dispatch.

About Mesosframework

Let's take a look at the overall architecture diagram of Mesos&framework:

The framework of the Mesos is divided into 2 parts, namely the scheduler and the actuator.

The scheduler is called Scheduler, starting with the Mesos 1.0 version, which officially provides HTTP-based RESTAPI for external invocation and two development.

The scheduler is used to handle callback events initiated by the Master side (resource list and load task, task status notification, etc.) and handle accordingly. When the agent receives the task assigned by master, it will perform different processing according to the container-type of the task when processing the default container-type= ' Mesos ', first check that the framework of the corresponding executor process is started, if not started will start the executor process, and then submit the task to the executor to execute, when running a container-type= ' Docker ' task, the Docker executor is started for processing, and the running state of the program depends entirely on the processing and return values within Docker.

Mesosframework Interactive API

The interaction is divided into 2 parts API, respectively Schedulerapi and EXECUTORAPI, each API will be differentiated by type, the specific process is as follows:
  1. Scheduler submits a request (type= ' SUBSCRIBE ') to Master (Http://master-ip:5050/api/v1/scheduler) and needs to set ' Subscribe.framework_ Info.id ', this ID is generated by scheduler, must be unique in a mesos cluster, Mesos frameworkid to differentiate the tasks submitted by each framework, after sending, The scheduler side waits for the ' SUBSCRIBE ' callback event of Master, and the return event of master is defined in the event object, Event.type ' SUBSCRIBE ' (Note: ' After the SUBSCRIBE ' request is initiated, the scheduler and Master will maintain a session connection (keep-alive), and the master-initiated event callback will be notified to scheduler through the connection. (Scheduler-http-api in interface ' SUBSCRIBE ')
  2. Master initiates the ' offers ' event callback, notifying scheduler that the current cluster can be assigned to use resources, and the event.type of the event is ' offers '. (Scheduler-http-api in interface ' offers ')
  3. Scheduler calls Resourcesoffer to schedule tasks for offers. When the task assignment is completed, initiate the ' ACCEPT ' event request to the master side to inform the Offers-tasks list. (Scheduler-http-api in interface ' ACCEPT ')
  4. After master receives the scheduler task request, it sends the task to the OfferID corresponding agent to perform the task.
  5. The agent receives the task and checks to see if the executor of the task is started, such as starting, calling the executor to perform the task, such as not starting, calling Lauchexecutor () to create the Executor object and executing initialize () During initialization, the registerexecutormessage is called to register on the agent during Executor,executor initialization, and then the task is accepted to begin execution. (Executor-http-api in interface ' LAUNCH ')
  6. Executor the task_status of the agent task is notified when execution is complete or error. (interface ' UPDATE ' in Executor-http-api)
  7. The agent resynchronization Task_status to Master,master calls the ' Update ' event callback, notifying scheduler to update the task status. (interface ' UPDATE ' in Scheduler-http-api)
  8. Scheduler send ' acknowledge ' request to notify Master that the status of the task has been confirmed. (Scheduler-http-api in interface ' acknowledge ')


Task status indication and agent outage processing

For the running state of a task, Mesos defines 13 types of task_status to mark, commonly used in the following categories:
    • Task_staging: The status of the task readiness State, which already has master assigned to slave, but slave is not yet running.
    • Task_running: The task is already running on the agent.
    • Task_finished: The task has finished running.
    • Task_killed: The task is actively terminated, calling the ' KILL ' interface in Scheduler-http-api.
    • Task_failed: Task execution failed.
    • Task_lost: Task loss, usually occurs in slave downtime.


When the agent outage caused task_lost, mesos how to deal with it?

Between master and agent, the general is the master actively send ping message to each agent, if within the set time (Flag.slave_ping_timeout, the default 15s) did not receive the agent's reply, and reach a certain number of times (flag.max_slave_ping_timeouts, the default number is 5), then Master will do the following steps:
    1. Remove the agent from master, at which point the agent's resources will no longer be assigned to scheduler.
    2. Traverse All tasks running on the agent, send the Task_lost status updates of the task to the corresponding framework, and remove the tasks from master.
    3. Traverse all executor on the agent and delete.
    4. Trigger the Recind offer to revoke the agent's offered assignment to scheduler.
    5. Remove the agent from Master's replicated log (Mesos master relies on some of the persisted cluster configuration information in replicated log for Failer over/recovery).


Easy to publish and deploy applications with marathon

There are many open source frameworks based on mesosframework, such as marathon. We have used the marathon framework in our production environment, which is commonly used to run Long-run service/application, and relies on marathon to manage application services, which supports automatic/manual start-Stop, horizontal scaling, health check, etc. We rely on Jenkins + Docker + marathon to automate the release and deployment of our services.

Why Juice

Below is a set of framework--juice that I developed based on Mesosframework. (Open Source Address: https://github.com/HujiangTechnology/Juice.git)

Before the development of juice, all of my company's audio and video transcoding slicing task is based on a queue assignment framework called Taskcenter, which does not have the function of distributed scheduling (resource allocation), so the resource utilization of the cluster is always a problem, so We want to develop a new framework based on the following three points to replace the old Taskcenter.
    1. A task-scheduling framework that requires the greatest possible utilization of resources (hardware).
    2. The framework must be able to run various types of tasks.
    3. The platform must be stable.


With the experience of using marathon and the Mesos-related documentation, we decided to develop a set of task-scheduling frameworks based on Mesosframework, and the features of Mesos and the framework have just been said, And we're going to do what we need to do in Docker, so for the framework itself, he doesn't have to care about the type of task, so the boundaries of the business and the framework become clear, and for the framework, it's convenient to run a docker task, Just said Mesos built-in Dockerexecutor can be the perfect start Docker task, so that our framework on the agent side of the need for the development of very little.

Juice framework In this context began the development process, we are positioning it is a set of distributed task cloud system, why this is called the Task cloud system? Because for callers, using juice, just do 2 things: Make a Docker image of the task you want to do, push it into the Docker repository, and then submit a Docker-type task to juice. Other, to the juice to complete the can, the caller does not care about the task will be executed on which physical machine, only need to care about the implementation of the task itself.

Juice Architecture

In addition to this, juice has the following characteristics, the juice framework is divided into Juice-rest (Juice Interactive API layer, can complete the external for Juice task crud operation) and Juice-service (juice core layer, Responsible for the interaction with Mesosmaster, resource allocation, task submission, task status updates, etc.), in a set of juice framework-based applications, typically 1-n juice-rest (depending on the system's TPS) is deployed, As well as n Juice-service (juice-service sub-master mode, 1 main and many from, by ZooKeeper), for the same mesos cluster, 1-n set Juice framework can be deployed, frameworkid to differentiate, If you need to deploy multiple sets, set the Mesos.framework.tag to a different value in the Juice-service configuration file.

Juice-rest parameter settings

The Juice-rest is written in Spring-boot (Juice-api interface See: Https://github.com/HujiangTech ... nt.md), dealing with externally initiated task curd operations, When submitting a task to juice-rest, you need to set some parameters, such as:
Example to run Docker:
{
"Callbackurl": "Http://www.XXXXXXXX.com/v5/tasks/callback",
"TaskName": "Demo-task",
"env": {"name": "Environment", "value": "Dev"},
"Args": ["This is a test"],
"Container": {
"Docker": {
"Image": "Dockerhub.xxxx.com/demo-slice"
},
' type ': ' DOCKER '
}
}

Where the type in container currently only supports ' Docker ', we are not joining the ' Mesos ' type of container mode because the services within the project group are already based on Docker, but the ' Mesos ' type is reserved, The ' Mesos ' type of tasks can be supported in the future.

The commands mode supports running Linux command-line commands and shell scripts, such as:
"Commands": "/home/app/entrypoint.sh"

There are 2 reasons to support commands mode:
    1. Sometimes the caller may just want to run a script on one of the established agents.
    2. Some other project groups within the company are also using the jar-package-initiated mode, which allows the entry of a shell script to support these projects.


Env Setup example, set the running task environment to Dev:
"env": {"name": "Environment", "value": "Dev"}

The args setting example sets the file path:
"Args": ["/tid/res/test.mp4"]

PS: The args option is not supported when using commands mode.

In addition, Juice-rest supports user-defined resource sizes (only custom CPUs and memory are currently supported), and if resources need to be specified, the resource object needs to be configured in the request interface, otherwise the task will be run with the default resource size. Juice-rest supports resource constraints (constrains), that is, to run a task on a specific host or rack_id tag agent, and to set the Constrains object field in the interface.

Middleware used by the juice (MQ, DB, etc.)

The following is a rest layer processing model, when the outside of a task request, Juice-rest received the task, not directly submitted to the Juice-service layer, but did the following 2 things:
    1. Put the task into MQ. (currently juice uses redis-list as the default queue, using the Lpush, Rpop mode, FIFO, why choose to use the list in Redis as a queue and not choose other such as RABBITMQ, Kafka these, first, Redis is relatively lightweight middleware, and the HA scheme is relatively mature, at the same time, in my opinion, the queue of the best task wait number should be <10000, otherwise, the execution cycle of the task will be pulled very long, with my company's juice system for example, Because of the time-consuming audio-video transcoding slicing task, typically 10,000 tasks are queued for more than a few hours, so when the number of tasks is large, consider expanding the processing power of the cluster rather than putting too many tasks in the queue, based on this, Choosing Redis-list is no disadvantage compared to other traditional MQ. Given some special cases, juice also allows the user to implement the Cacheutils interface to replace Redis-list with other MQ).
    2. Records the tasks information to the Juice-tasks table, equivalent to the data landing. Subsequent versions will implement the task retry mechanism based on this (the current 1.1.0 internal development version is implemented), or complete the task recovery after the failover switch, which is considered to be included in the subsequent 1.2.0 release. (MySQL is currently used in the database).


When Juice-rest accepts and completes a task submission, it is returned to the caller a long 18-bit number (Juiceid, globally unique) as the voucher number. When the task is completed, Juice-rest initiates a callback request informing the caller of the result of the task (as a business credential), provided that the caller must set the Callbackurl. At the same time, the caller can use the Juiceid to perform tasks such as querying, terminating, and so on.

In addition, a thread pool is maintained separately at the juice-rest layer to handle the task status information Task_status returned by the Juice-service side.

Juice-service Internal processing Flow

Juice-service can be seen as a mesosframework, and the communication protocol with Master is PROTOBUF, each event request is generated by the corresponding type of call, Here Juice-service initiates a subscribe request, which is generated requestbody by the Subscribecall () method, is sent by Okhttp, and maintains a long connection to master.


It then enters the while loop and calls the OnEvent () method to execute when the notification event on the master side occurs.

In Mesos callback events, the main events that require special handling are the following:
    1. The Subscribed:juice framework registers the Frameworkid record in master with the database Juice_framework table when this event is received.
    2. Offers: When Juice-service receives this type of event, it enters the resource/task allocation link, assigns the task resource and submits it to mesosmaster.
    3. UPDATE: When the Agent finishes working on a task, the task is executor->agent->master->juice-service to complete the status notification for the task. Juice-service will plug the results into the result-list.
    4. ERROR: The framework is a problem, usually there are two kinds of problems, one is more serious, for example, Juice-service using a frameworkid has been removed by the master side, Master will return "framework has been Removed "error message, Juice-service will throw a unrecoverexception error at this time:
      throw new UnrecoverException(message, true)


Juice-service the Reset service when handling errors in the Unrecoverexception class, and a new frameworkid is regenerated when the second argument is true.

And when other types of errors, such as long links between master and Juice-service, are interrupted, just reset service.

Next I would like to say in detail the second step, let's take a look at the ' offers ' request to process the code snippet:

This code is the core code that allocates offer-tasks, to see several methods:

1.schedulerservice.filterandaddattrsys (), the method is to filter the non-conforming offer, we know that in Mesos agent can be configured attr to make some machines run special tasks, The filtering here is based on this feature, such as when we set the Juice-service to use only resources with the following attr properties (in configuration file application.properties)
Mesos.framework.attr=lms,qa,mid|big

After filtering by the Schedulerservice.filterandaddattrsys () method, resources that meet the above attr are selected to perform the task. At the same time, the non-conforming offer will be added to declines List and sent to master via Auxiliaryservic.declineoffer () for ignoring.

The agent's attr settings are set by/etc/mesos-slave/attributes. This file is usually like this:
Cat/etc/mesos-slave/attributes

Bz:xx;
Env:xx;
Size:xx;
Rack_id:xx;
Dc:xx

2.schedulerservice.handleoffers (), this method realizes the function of Resourceoffer in the original mesosframework, assigns the assignment to the offer, and finally produces the TaskInfo list, Sent by Auxiliaryservice.acceptoffer () to the Master Notification Processing task.
Note: Master will remain in the wait state after sending an offer event notification until the framework side calls accept call (Auxiliaryservice.acceptoffer ()) or decline Call (Auxiliaryservic.declineoffer ()) to tell the master whether the resource is being used before it notifies the next framework to allocate resources. (The default master waits and if there is no notification, the resource utilization in the Mesos cluster may reach 100%, which can be avoided by setting a timeout on the master side.) )

Within Juice-service, when Schedulerdriver interacts with master, Juice-service's processing logic is implemented by Schedulerservice and Auxiliaryservice.

Schedulerservice handles the main logic of juice, such as resource allocation algorithm, task priority algorithm, and all master callback event handling methods are defined in Schedulerservice.

Auxiliaryservice maintains several sets of thread pools, completes their respective tasks, just saw the Auxiliaryservice.acceptoffer () and Auxiliaryservic.declineoffer (), is done by calling the Send-pool in Auxiliaryservic to complete the call, and there are some management tasks (such as real-time query task status, terminating the running task, etc.) through Auxiliary-pool. Therefore, the invocation of Auxiliaryservic is asynchronous.

Description of the various queues in juice

The task of juice was put into an MQ when Juicerest was submitted, and this MQ is called Juice.task.queue in Juice-service. In addition, there are several other MQ, namely Juice.task.retry.queue, Juice.task.result.queue, Juice.management.queue. Let's talk about the use of these queues separately.

Juice.task.retry.queue:juice-service is assigned according to each offer when the task is taken, and when an offer allocates resources, if the task R-pop out of MQ does not satisfy the offer (for example, Need-resources is greater than Fer max offer value, or if there is a constrains, the current offer is not match with the offer assigned to perform the task), Juice-service's approach is to put the current task into Juice.task.retry.queue, waiting for the next offer assignment, to take priority from the Juice.task.retry.queue to get the task and assignment, which involves Juice internally get the task queue priority, I used a simpler way, that is, each time a new offer resource is allocated, a certain number of tasks (Cache_tries = 5) are removed from the juice.task.retry.queue, and when there are remaining resources, Take the task from the Juice.task.queue until it fills the offer. In addition, in the Juice.task.retry.queue will have the elimination mechanism, the current task-elimination mechanism follows 2 points, yourselves trigger one of the following, the task will be considered a failure, the task Task_status is set to task_failed, Put Juice.task.result.queue, the task of the elimination algorithm as follows:
    1. Expiration time elimination system, the task is in Juice.task.result.queue time >task_retry_expire_time, then eliminated (Default_task_retry_expire_time = 86,400 seconds).
    2. is greater than the maximum number of searches, the task is fetched and retrieved but not executed to reach the maximum number of searches >max_reserved, then obsolete (default_max_reserved = 1024).


Juice.task.result.queue: Task result queue, juice-service the Taskresult object of the task into the juice.task.result.queue,juice-rest side after getting the status of a task (not necessarily the final state) The queue takes out the Taskresult, and if it is already the final state of the task, such as task_finished or task_failed, the task status is communicated to the caller by an external Callbackurl callback that is filled in when the task is submitted.

Juice.management.queue: Manages class queues, supports tasks placed in the Reconcile class or kill class, synchronizes or kill a task that is being performed by a query that Auxiliaryservice initiates a task.

Submit a task through the SDK

Currently open source juice version, has provided the complete SDK to complete the interaction between Juice-rest, the following is an example of submitting a Docker task:

The SDK uses streaming notation, and the caller can simply make a request to juice-rest.

Summary and future

Currently, the Juice 1.1.0 Open source version is in beta, and the new version adds 2 new features in addition to fixing some bugs:
    1. Added task queue-skipping, which is placed at the forefront of the processing queue by setting Priority=1 in incoming parameters to increase the execution priority of a task.
    2. Task failure automatic retry function, set incoming parameter Retry=1, task failure will automatically retry, up to 3 retries.


Faced with complex business needs, juice current version of some features/features do not support, for this, the best way is to please fork this project git, or directly contact me, we come together to make juice do.

Q&a

What are the differences between Q:juice and Elastic-job?

A: I am not too familiar with Elastic-job, casually say a few points, if the wrong also ask you to correct:
First juice and Elastic-job-cloud are based on Mesos, resource-task allocation this elastic-job uses Fenzo (Netflix), and juice is its own scheduling algorithm developed.
Juice does not require job registration when the job is called, as long as the upload task is mirrored (Docker) to the warehouse and the task is triggered. and elastic-job need to register for jobs.
The juice is nearly identical to the marathon on the Rest-api interface and facilitates some users who use marathon to deploy the service.
Juice The current version does not support job sharding.
Q: Can you describe the algorithm of task resource allocation in detail?

A: A code block that has been briefly introduced to trigger related tasks by receiving the ' offers ' event-resource allocation.
Since the given offer object is actually a list, the processing logic loops over each offer to assign a specific task, and the total resource (CPU, memory, etc.) of each offering must be less than the offer resources * Resources_use_ THRESHOLD (Resource usage threshold, can be set by configuration file Resources.use.threshold, default 0.8), after each allocation of an offer of Task_infos, then generate accept call by the Send thread pool to send processing, The entire process is asynchronous and non-blocking.
Q: All tasks are archived in Docker how do you handle some of the temporary tasks?

A: The temporary task does produce some garbage images that need to be regularly cleaned up by the Docker repository, with a general set-up period of 1 months.
Q: Does the task system help the user to complete the Docker package operation?

A: Not currently, so users must have some basic Docker operation, at least to the mirror, to submit the image and so on. Of course, like some Docker settings, such as Mount volume, the Network (bridge, host) can be set by parameter when submitting a task.
what are the advantages and disadvantages of Q:mesos and kubernetes?

A: In fact, I mainly use Mesos,mesos relative kubernetes should be a set of heavier system, mesos more like a distributed operating system, and kubernetes in the container orchestration more advantages (pod, etc.).
The above content is organized according to the July 13, 2017 night group sharing content. Share people Jia Xu, Hujiang Java engineer, open source Framework juice author, more than 10 years of development experience。 Dockone Weekly will organize the technology to share, welcome interested students add: LIYINGJIESA, into group participation, you want to listen to the topic or want to share the topic can give us a message.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.