0073 Spark Streaming The method of receiving data from the port for real-time processing _spark

Source: Internet
Author: User
Tags log4j
First, environmental windows_x64 system Java1.8
Scala2.10.6 spark1.6.0 hadoop2.7.5
Idea IntelliJ 2017.2 nmap tool (NCAT command in which corresponds to NC commands in Linux)
Second, local application set up 2.1 environment variable setting method: System Parameter--"Add variable-" form: xxx_home, then copy the root directory of the corresponding installation package as the variable value; add:%xxx_home%\bin in Path variable;
1,hadoop need to set environment variables; 2,scala best to download and install the corresponding version, set the environment variable; 3,spark directly decompression;
Reference: Environment Build Reference 2.2 build test use SBT tool very convenient can complete build, use SBT to create Scala project. The project structure is generated as: where Testmain.scala:
/**
  * notes:to test Scala and Spark and Hadoop
  * date:2017.12.20
  * Author:gendlee
  /
Import Org.apa Che.spark. {Sparkconf,sparkcontext}
Import org.apache.log4j. {Level,logger}
Import com.test.SparkStreaming
Object Test {

  logger.getlogger ("org"). Setlevel (Level.error)

  def main ( Args:array[string]): unit = {

    sparkstreaming.printwebsites ()



    //initiate spark
    
    val sc = new Sparkcontext (conf)

    Read file from local disc
    val rdd = Sc.textfile ("F:\\code\\scala2.10.6_spark1.6_hadoop2.8\\test.log")


  }

}

Where Sparkstreaming.scala is:
/**
  *notes:to Test Spark streaming * date:2017.12.21 * Author:gendlee/package
com.test
Import Org.apache.spark. {Sparkconf,sparkcontext}
Import org.apache.spark.streaming. {Seconds, StreamingContext}

Object Sparkstreaming {
  def printwebsites (): unit= {

    val conf = new sparkconf (). Setmaster ("local[2]"). Setappname ("Printwebsites")
    val ssc = new StreamingContext (conf, Seconds (1))

    val output = "f:\\code\\ Scala2.10.6_spark1.6_hadoop2.8\\out\\gettedwebsites "

    val lines = Ssc.sockettextstream (" localhost ", 7777)

    val websitelines = Lines.filter (_.contains ("http"))
    Websitelines.print ()
    // Websitelines.repartition (1). Saveastextfiles (output)

    Ssc.start ()
    ssc.awaittermination ()
  }

}


I'm going to extract the field containing the URL from the input (including HTTP): Step on the Pit:
Val conf = new sparkconf (). Setmaster ("local[2]"). Setappname ("Printwebsites")
Here the Setmaster parameter must be local[2], for here to open two processes, one to receive, if the default local will not receive data.
After compiling, you can run it and find that printing this information:
Using Spark ' s default log4j profile:org/apache/spark/log4j-defaults.properties
17/12/22 16:39:14 INFO Slf4jlogger : Slf4jlogger started
17/12/22 16:39:14 info remoting:starting Remoting 17/12/22 16:39:14
Info remoting:remotin G started; Listening on addresses: [akka.tcp://sparkdriveractorsystem@169.254.78.142:64905]
17/12/22 16:39:15 ERROR  Receivertracker:deregistered Receiver for Stream 0:restarting receiver with delay 2000ms:socket data stream had no more Data
-------------------------------------------
time:1513931956000 Ms
---------------------------- ---------------
time:1513931957000 Ms
-------------------------------------------


    An error occurred. No hurry, that's because 7777 ports do not receive the data, the following pause the program, we need to send data to Port 7777.
     With the Sockettextstream () function, we can receive data from a specific port on the specified host. Let's look at how to send data on port 7777.
     Open the Power shell or cmd for Windows, and enter:
Ncat-lk-p 7777
Then run the program in idea, and then enter it in the Open cmd window, and when the input field contains HTTP, it will print out in idea's running display window.
Idea End Filter Print:



There is a problem here, in fact, like HTTPS this I do not, that is, HTTP as a part of the word this is not, so follow up and try to see how to filter.
To complete the subject of the request.
Third, reference: http://blog.csdn.net/gendlee1991/article/details/78066548 https://www.cnblogs.com/FG123/p/5324743.html


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.