Automated, spark streaming-based SQL services for real-time automated operations

Source: Internet
Author: User
Tags dateformat stmt

Design Background

Spark Thriftserver currently has 10 instances on the line, the past through the monitoring port survival is not accurate, when the failure process does not quit a lot of situations, and manually to view the log and restart processing services This process is very inefficient, so design and use spark Streaming to the real-time acquisition of the spark Thriftserver log, through the log to determine whether the service stopped service, so that the corresponding automatic restart processing, the program can achieve the second level of 7 * 24h uninterrupted monitoring and maintenance services.

Design Architecture

    • Deploy the flume agent on the Spark Thriftserver Service node that requires detection to monitor the log stream (flume use interceptor to add host information to the log)
    • Flume collected log streams into Kafka
    • Spark streaming receives Kafka log stream, detects log content according to custom keywords, and if hit keyword thinks the service is unavailable, the host information corresponding to the log is entered into MySQL
    • Write a shell script to read the host information from MySQL, perform a restart service operation
software version and Configuration

Spark 2.0.1, Kafka 0.10, Flume 1.7

1) Flume Configuration and command:

Modify Flume-conf.properties

Agent.sources =Sparkts070agent.channels=cagent.sinks=kafkasink# for each one of the sources, the type is Definedagent.sources.sparkTS070.type=TAILDIRagent.sources.sparkTS070.interceptors=I1agent.sources.sparkTS070.interceptors.i1.type=Hostagent.sources.sparkTS070.interceptors.i1.useIP=falseAgent.sources.sparkTS070.interceptors.i1.hostHeader=agenthost# The channel can be defined as Follows.agent.sources.sparkTS070.channels=Cagent.sources.sparkTS070.positionFile=/home/hadoop/xu.wenchun/apache-flume-1.7.0-bin/taildir_position.jsonagent.sources.sparkTS070.filegroups=f1agent.sources.sparkTS070.filegroups.f1=/data1/spark/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.hivethriftserver2-1-hadoop070.dx.com.out# each sink' s type must be Definedagent.sinks.kafkaSink.type =Org.apache.flume.sink.kafka.KafkaSinkagent.sinks.kafkaSink.kafka.topic= mytest-topic1agent.sinks.kafkaSink.kafka.bootstrap.servers= 10.87.202.51:9092Agent.sinks.kafkaSink.useFlumeEventFormat=true#Specify The channel the sink should Useagent.sinks.kafkaSink.channel=C # each channel' s type is Defined.agent.channels.c.type = memory

To run the command:

Nohup bin/flume-ng agent-n agent-c conf-f Conf/flume-conf.properties-dflume.root.logger=info,logfile &
2) Kafka Configuration and execution command:

Modify Config/server.properties

Broker.id=1listeners=plaintext://10.87.202.51:9092log.dirs=/home/hadoop/xu.wenchun/ KAFKA_2.11-0.10.0.1/Kafka.logzookeeper.connect= 10.87.202.44:2181,10.87.202.51:2181,10.87.202.52:21811234

Run command

Nohup bin/kafka-server-start.sh Config/server.properties &
Spark Streaming executes the command:
/opt/spark-2.0.1-bin-2.6.0/bin/spark-submit--master yarn-cluster--num-executors 3--class Sparktslogmonito
3) Shell script

Write a shell script to read the host information from MySQL, perform a restart service operation

Spark streaming core code for monitoring job

This type of sharing spark streaming code, the following code after some pit groping out to verify available.

Stream.foreachrdd {Rdd =rdd.foreachpartition {rddofpartition=Val Conn=connectpool.getconnection println ("Conn:" +conn) Conn.setautocommit (false)//Set as manual commitVal stmt =conn.createstatement () Rddofpartition.foreach {event=val Body=Event.value (). Get () Val decoder= Decoderfactory.get (). Binarydecoder (Body,NULL) Val Result=NewSpecificdatumreader[avroflumeevent] (classof[avroflumeevent]). Read (NULL, decoder) val hostname= Result.getHeaders.get (NewUtf8 ("Agenthost")) Val text=NewString (Result.getBody.array ())if(Text.contains ("Broken pipe") | | text.contains ("No Active Sparkcontext")) {val Dateformat:simpledateformat=NewSimpleDateFormat ("Yyyymmddhhmmsssss") Val ID= Dateformat.format (NewDate ()) + "_" + (NewUtil. Random). Nextint (999) Stmt.addbatch ("INSERT into monitor (Id,hostname) VALUES ('" + ID + "', '" + hostname + "')") println ("INSERT into monitor (Id,hostname) VALUES ('" + ID + "', '" + hostname + "')")}} stmt.executebatch () Conn.commit () Conn.close ()}}

The above is a real-time processing of the typical entry application, just encounter such monitoring operation and maintenance problems, so the use of the program to deal with the effect is good.

Transferred from: http://blog.csdn.net/xwc35047/article/details/75309350

Automated, spark streaming-based SQL services for real-time automated operations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.