0073 Spark Streaming The method of receiving data from the port for real-time processing

0073 Spark Streaming The method of receiving data from the port for real-time processing _spark

Last Update:2018-08-22 Source: Internet

Author: User

Tags log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, environmental windows_x64 system Java1.8
Scala2.10.6 spark1.6.0 hadoop2.7.5
Idea IntelliJ 2017.2 nmap tool (NCAT command in which corresponds to NC commands in Linux)
Second, local application set up 2.1 environment variable setting method: System Parameter--"Add variable-" form: xxx_home, then copy the root directory of the corresponding installation package as the variable value; add:%xxx_home%\bin in Path variable;
1,hadoop need to set environment variables; 2,scala best to download and install the corresponding version, set the environment variable; 3,spark directly decompression;
Reference: Environment Build Reference 2.2 build test use SBT tool very convenient can complete build, use SBT to create Scala project. The project structure is generated as: where Testmain.scala:

/**
  * notes:to test Scala and Spark and Hadoop
  * date:2017.12.20
  * Author:gendlee
  /
Import Org.apa Che.spark. {Sparkconf,sparkcontext}
Import org.apache.log4j. {Level,logger}
Import com.test.SparkStreaming
Object Test {

  logger.getlogger ("org"). Setlevel (Level.error)

  def main ( Args:array[string]): unit = {

    sparkstreaming.printwebsites ()



    //initiate spark
    
    val sc = new Sparkcontext (conf)

    Read file from local disc
    val rdd = Sc.textfile ("F:\\code\\scala2.10.6_spark1.6_hadoop2.8\\test.log")


  }

}

Where Sparkstreaming.scala is:

/**
  *notes:to Test Spark streaming * date:2017.12.21 * Author:gendlee/package
com.test
Import Org.apache.spark. {Sparkconf,sparkcontext}
Import org.apache.spark.streaming. {Seconds, StreamingContext}

Object Sparkstreaming {
  def printwebsites (): unit= {

    val conf = new sparkconf (). Setmaster ("local[2]"). Setappname ("Printwebsites")
    val ssc = new StreamingContext (conf, Seconds (1))

    val output = "f:\\code\\ Scala2.10.6_spark1.6_hadoop2.8\\out\\gettedwebsites "

    val lines = Ssc.sockettextstream (" localhost ", 7777)

    val websitelines = Lines.filter (_.contains ("http"))
    Websitelines.print ()
    // Websitelines.repartition (1). Saveastextfiles (output)

    Ssc.start ()
    ssc.awaittermination ()
  }

}

I'm going to extract the field containing the URL from the input (including HTTP): Step on the Pit:

Val conf = new sparkconf (). Setmaster ("local[2]"). Setappname ("Printwebsites")

Here the Setmaster parameter must be local[2], for here to open two processes, one to receive, if the default local will not receive data.

After compiling, you can run it and find that printing this information:

Using Spark ' s default log4j profile:org/apache/spark/log4j-defaults.properties
17/12/22 16:39:14 INFO Slf4jlogger : Slf4jlogger started
17/12/22 16:39:14 info remoting:starting Remoting 17/12/22 16:39:14
Info remoting:remotin G started; Listening on addresses: [akka.tcp://sparkdriveractorsystem@169.254.78.142:64905]
17/12/22 16:39:15 ERROR  Receivertracker:deregistered Receiver for Stream 0:restarting receiver with delay 2000ms:socket data stream had no more Data
-------------------------------------------
time:1513931956000 Ms
---------------------------- ---------------
time:1513931957000 Ms
-------------------------------------------

    An error occurred. No hurry, that's because 7777 ports do not receive the data, the following pause the program, we need to send data to Port 7777.

     With the Sockettextstream () function, we can receive data from a specific port on the specified host. Let's look at how to send data on port 7777.

     Open the Power shell or cmd for Windows, and enter:

Ncat-lk-p 7777

Then run the program in idea, and then enter it in the Open cmd window, and when the input field contains HTTP, it will print out in idea's running display window.

Idea End Filter Print:

There is a problem here, in fact, like HTTPS this I do not, that is, HTTP as a part of the word this is not, so follow up and try to see how to filter.
To complete the subject of the request.
Third, reference: http://blog.csdn.net/gendlee1991/article/details/78066548 https://www.cnblogs.com/FG123/p/5324743.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More