Analysis of Spark Streaming principlesReceive Execution Process Data
StreamingContextDuring instantiation, You need to inputSparkContextAnd then specifyspark matser urlTo connectspark engineTo obtain executor.
After instantiation, you must first specify a method for receiving data, as shown in figure
val lines = ssc.socketTextStream(localhost, 9999)
In this way, text data is received from the socket. In thi
) The exception here is because the Kafka is reading the specified offset log (here is 264245135 to 264251742), because the log is too large, causing the total size of the log to exceed Fetch.message.max.bytesThe Set value (default is 1024*1024), which causes this error. The workaround is to increase the value of fetch.message.max.bytes in the parameters of the Kafka client.For example://kafka configuration file val kafkaparams = map[string, String] (
, StringDecoder](ssc, kafkaParams, topicMap, StorageLevel.MEMORY_AND_DISK_SER).map(_._2)There are still data loss issues after opening WalEven if the Wal is officially set, there will still be data loss, why? Because the task is receiver also forced to terminate when interrupted, will cause data loss, prompted as follows:0: Stopped by driverWARN BlockGenerator: Cannot stop BlockGenerator as its not in the Active state [state = StoppedAll]WARN BatchedWriteAheadLog: BatchedWriteAheadLog Writer que
Mode CREATE temporary TABLE USING OPTIONSAfter Spark1.2, a table that creates an external data source is supported by the DDL syntax for create temporary table USING options.CREATE Temporary TABLE jsontableusing org.apache.spark.sql.jsonOPTIONS ( path '/path/to/data.json ')1. Operation Example:Let's take example down the People.json file to do an example.shengli-mac$ cat/users/shengli/git_repos/spark/exam
, DISTINCT, subtract, sample, takesample
Cache type
Cache, persist
1.2 transfromation operators for Key-value data types
type
operator
input partition and output partition one-to-one
Mapvalues
For a single Rdd
Combinebykey, Reducebykey, Partitionby
Two Rdd aggregation
Cogroup
Connection
Join, Leftoutjoin, Rightoutjoin
1.3 Action operator
type
operator
The program simply reads the data from the file and calculates it.Package com.bill.www/** * Created by Bill on 2016/2/3. * Purpose: Simple data calculation using Scala * source file: Interface record number of 20, including timestamp and floating-point data * execution: Scala Readfile.scala "E:\\spark\\data\\i_22_221000000073_l_ 20151016\\i_22_221000000073_l_2015
dataset = spark. Read. Format ("libsvm"). Load ("Data/mllib/sample_libsvm_data.txt ")// Split the data into training and Test Sets (30% held out for testing)Val array (tranningdata, testdata) = dataset. randomsplit (Array (0.7, 0.3), seed = 1234l)// Train a naviebayes ModelVal model = new naivebayes (). Fit (tranningdata)// Select example rows to display.Val predictions = model. Transform (testdata)Predict
One of the simplest examples of Spark's own is mentioned earlier, as well as the section on Sparkcontext, which describes the transformation in the rest of the content.Object SPARKPI { def main (args:array[string]) { val conf = new sparkconf (). Setappname ("Spark Pi") val spark = New Sparkcontext (conf) val slices = if (args.length > 0) args (0). ToInt Else 2 val n = math.min (100000L * Slice
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.