There are two ways, one is sparkstreaming in the driver from listening, flume to push the data, the other is sparkstreaming according to the time policy rotation to flume pull data.
At first I thought there was only the first method, but the Nima problem is that driver up the knot is flaky, so every time I restart streaming found that every time to change the flume, the egg pain died, later found there is the method, OK, the different method code written out, Actually, it doesn't change much. (The code is transferred from the official githup)
The first, listening port:
Package Org.apache.spark.examples.streamingimport Org.apache.spark.SparkConfimport Org.apache.spark.storage.StorageLevelimport Org.apache.spark.streaming._import Org.apache.spark.streaming.flume. _import org.apache.spark.util.intparam/** * Produces a count of events received from Flume. * * This should is used in conjunction with a avrosink in Flume. It would start * An Avro server on at the request Host:port address and listen for requests. * Your Flume Avrosink should is pointed to this address. * * Usage:flumeeventcount The second is that rotation take the data to flume actively.
Package Org.apache.spark.examples.streamingimport Org.apache.spark.SparkConfimport Org.apache.spark.storage.StorageLevelimport Org.apache.spark.streaming._import Org.apache.spark.streaming.flume. _import Org.apache.spark.util.IntParamimport java.net.inetsocketaddress/** * Produces a count of events received from Flu Me. * * This should is used in conjunction with the Spark Sink running in a Flume agent. See * The Spark Streaming Programming Guide for more details. * * Usage:flumepollingeventcount
Using flume data sources in spark