Timing task execution engine for the log system

Last Update:2016-04-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview

Recently this time in strengthening the stability and reliability of the log system itself, a stable and reliable system can not be separated from the monitoring, we talk about the monitoring in addition to the survival of the service and the core Metrics collection and capture of these components, for which we have made the task of scheduled tasks to execute. Because of the general idea and design has been formed, so today to share the log system in the timing of the task of the selection and design.

Component run-time monitoring

From the articles I shared before, it is easy to see the selection of the various components of our log system:

Acquisition Agent:flume-ng
Message system: Kafka
Live Stream Processing: Storm
Distributed search/log storage (temporary): Elasticsearch

This is also a common choice for many Internet log solutions. However, we are finding inconsistent results when we investigate the monitoring options provided by these components themselves and the third-party monitoring tools they support:

Flume-ng: Support for HTTP/JMX metrics, supported monitoring tools: Ganglia
Kafka: Support for JMX metrics, supported monitoring tools: Yahoo!
Storm: Support for JMX metrics with Storm UI
Elasticsearch: Support for status requests in HTTP form

Judging from the above results, the ability to monitor and integrate with third-party monitoring systems is uneven. This is obviously not in line with our expectations, and we have several points of concern:

Monitoring unification, or isomerization.
With the stability of the system, we are able to freely configure the metrics that we think is very important and must care
Unified visualization, we look forward to seeing the metrics we want to see at a glance on our own control platform

To summarize, as these components on the monitoring ability of the different, but there are some similarities, that is, for:

Jmx
http

Metrics requests for both protocols, each component supports at least one of them.

In fact, monitoring the unification of this is not difficult to do, we can choose the current mainstream open source monitoring tool Zabbix (for JMX Metrics acquisition, Zabbix itself has provided the native support: Java Gateway). But for personalized monitoring, such as the extraction and presentation of specific metrics, Zabbix needs to be customized. For a variety of reasons, we are not using Zabbix-based custom solutions for the time being.

JMX Metrics Collection

Because Zabbix provides native support for JMX collection, and it itself is open source software, our JMX metrics collection is customized based on the Zabbix Java gateway.

A quick look at Zabbix Java Gateway,zabbix provides native support for JMX after 2.0. Its architecture is very simple, as shown in:

Working principle:
Zabbix server wants to know the specific JMX value on a host, it asks Zabbix Java gateway, and Zabbix Java gateway uses "JMXMANAGEMENTAPI" to query a particular application. The premise is that the application side needs -Dcom.sun.management.jmxremote parameters to open the JMX query when it is opened.

Zabbix server has a special process used to connect Javagateway called Startjavapollers.java Gateway is a standalone running Java daemon program, Similar to a proxy to decouple Zabbix and those component that provide JMX metrics.

We used the Java gateway to get the JMX code (Jmxitemchecker.java Class), and then dumped the acquired metrics into our database for display on the console of the log system. Since we have not adopted a whole set of mechanisms, we will not talk more about irrelevant details.

HTTP Metrics Collection

The HTTP metrics is primarily used to monitor the elasticsearch (because it does not support JMX), we use the httpclient to send requests, and then we also store the acquired information in our database.

Selection--quartz of timed task frame

Quartz is an open-source, powerful, mainstream scheduled task execution framework. Let's briefly mention a few of its core concepts:

Job: Define the specific processing logic for a task
Jobdetail: Encapsulates the necessary information that the quartz framework needs to perform a job
Trigger:job execution of triggers
Jobdatamap: Encapsulates the data needed during job execution

Of course there are many other concepts in the quartz framework, but as far as this article is concerned, it is enough for us to talk about this.

The overall design of the timed task execution engine

We have discussed the open-source scheduled task Framework Quatrz, which is not enough for a single framework, and we also need to plan, classify, and manage and distribute these tasks.

Types of Scheduled Tasks

For the time being, we will divide our scheduled tasks into two categories:

Simple off-line calculation: Offlinecalc
Metrics Collection: Metricspoller
Other routine maintenance tasks of the log system: such as the management of daily indexes

Here, the collection of metrics is the main requirement for us to introduce timed tasks, so we will use it as the main line to introduce our timed task execution engine.

Meta-data storage and design

Based on the concepts of the quartz described above, and the purpose of the generalization task we need to achieve, we need to think about how to make the timing task execution engine changes more automated, increasing its extensibility, which involves the metadata management required for timed task execution.

We designed a hierarchical organizational structure from top to bottom:

Job category
Job Type
Job
Job metadata
Job Trigger

categoryThe job is broadly divided, such as the Offlinecalcwe mentioned above,Metricspoller and so on. In quartz, job has the concept of Grouping (group), we also use this as the basis for job grouping.

typeDefines the type of task, which belongs to category . typenot only does the role of the organization job , to some extent, it should deal with a job class, which is a group of collations that follow the same processing logic job . For example, we mentioned above, for JMX and HTTP metrics poller.

jobCorresponding to the job in quartz, it is necessary to weigh its granularity. Take JMX metrics poller This type of job example, if you only need to crawl a component metrics, then the granularity of a job can be a metrics of a gain. But if you need to extract a lot of metrics from multiple component, then your job granularity can not be so fine, you may need to be responsible for a job component all metrics extraction. It depends on the amount of work you have and the number of jobs in a timed task frame that is reasonably controlled.

job metadatajobthe metadata that is required to store at run time. The above mentioned job is an abstract execution unit of a class of identical business logic. But they are not exactly the same, and different places distinguish the metadata that is needed for their execution. job metadata Correspondence is a one-to-many relationship. Like the JMX metrics Poller we mentioned above, it stores a job collection of metrics object attribute that needs to be extracted.

job triggerCorresponds to the quartz in the Trigger . job trigger The correspondence is one to the other.

All of the above data provides the ability to configure management on the control console.

Life Cycle Automation Management

To improve the scalability and self-management of the timed task execution engine. We chose Zookeeper to store the topology and metadata information for the entire job as above.

Zookeeper is a very good meta-data management tool, but also a very mainstream distributed collaboration tool. Its event mechanism makes it possible to automate the management of the job lifecycle. We can dynamically create or delete a job by monitoring the children Znode of each znode to dynamically perceive the changes in the job and perceive the changes in the nodes.

Summarize

In this article we take the monitoring of the various component of the log system as a practical requirement, and talk about our task design in the mainstream of the scheduled task execution Framework quartz to make it more scalable, At the same time, it is combined with zookeeper to enable the management of tasks with automation capability.

Timing task execution engine for the log system

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Timing task execution engine for the log system

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Timing task execution engine for the log system

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support