99th lesson: Using Spark streaming the multi-dimensional analysis of dynamic behavior of forum website
/* Liaoliang teacher http://weibo.com/ilovepains every night 20:00yy Channel live instruction channel 68917580*/
/**
* 99th lesson: Using Spark streaming the multi-dimensional analysis of dynamic behavior of forum website
* Forum data automatically generated code, the generated data will be sent as producer to Kafka, and then sparkstreaming program from
* Kafka on-line user behavior information of online pull to forum or website, then conduct multi-dimension online analysis
* The data format is as follows:
* Date: Dates, formatted as YYYY-MM-DD
* Timestamp: Time stamp
* UserID: User ID
* PageID: Page ID
* Chanelid: The ID of the plate
* Action: Click and register */
The generated user clicks the simulation data as follows:
Key steps to generate simulation data:
Enable a thread to simulate producer generated user click Behavior data, sent to KAKFA
Two: We calculate the PV of different modules on line
Three: Start Hadoop, Spark, zookeeper, Kafka cluster
1. Start Hadoop
2. Start the Spark cluster
3. Start Zookeeper
4. Sparkstreamingdatamanuallyproducerforkafka the jar package file and upload it locally to the virtual machine using WINSCP
5. Start the Kafka cluster
6. Run on Linux, run the Sparkstreamingdatamanuallyproducerforkafka jar package, load the generated data into the Kafka cluster, test the situation of the producer consumers on the Kafka
First step: Kafka build Topic
kafka-topics.sh--create--zookeeper master:2181,worker1:2181,worker2:2181--replication-factor 1--partitions 1-- Topic Userlogs
Kafka View Topic
kafka-topics.sh--describe--zookeeper master:2181,worker1:2181,worker2:2181
Step two: The solution of running Sparkstreamingdatamanuallyproducerforkafka,java.lang.noclassdeffounderror
Run the Sparkstreamingdatamanuallyproducerforkafka jar package on Linux
When you run an application with Java-jar Sparkstreamingdatamanuallyproducerforkafka.jar, you will not find a third-party jar package. When run with the-jar parameter, the JVM masks all external classpath, but only the internal class as the search scope for the classes.
Workaround: BootStrap Class expansion scenario
7.kafka Consumer Topic
In master production data:
[Email protected]:/usr/local/imf_testdata# Java-xbootclasspath/a:/usr/local/kafka_2.10-0.8.2.1/libs/kafka_ 2.10-0.8.2.1.jar:/usrocal/scala-2.10.4/lib/scala-library.jar:/usr/local/kafka_2.10-0.8.2.1/libs/ log4j-1.2.16.jar:/usr/local/kafka_2.10-0.8.2.1/libs/metrics-core-2.2.0.jar:/usr/local/ spark-1.6.1-bin-hadoop2.6/lib/spark-streaming_2.10-1.6.1.jar:/usr/local/kafka_2.10-0.8.2.1/libs/ Kafka-clients-0.8.2.1.jar:/usr/local/kafka_2.10-0.8.2.1/libs/slf4j-log4j12-1.6.1.jar:/usr/local/kafka_ 2.10-0.8.2.1/libs/slf4j-api-1.7.6.jar-jar Sparkstreamingdatamanuallyproducerforkafka.jar
Start spending on the Work1
[Email protected]:~# kafka-console-consumer.sh--zookeeper master:2181,worker1:2181,worker2:2181--from-beginning-- Topic Userlogs
8. Onlinebbsuserlogs the jar package file and upload it locally to the virtual machine using WINSCP
9. To avoid typing the command incorrectly, write a script to perform the data generation
[Email protected]:/usr/local/imf_testdata# cat producerforkafka.sh
java-xbootclasspath/a:/usr/local/kafka_2.10-0.8.2.1/libs/kafka_2.10-0.8.2.1.jar:/usr/local/scala-2.10.4/lib/ scala-library.jar:/usr/local/kafka_2.10-0.8.2.1/libs/log4j-1.2.16.jar:/usr/local/kafka_2.10-0.8.2.1/libs/ metrics-core-2.2.0.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-streaming_2.10-1.6.1.jar:/usr/local/ kafka_2.10-0.8.2.1/libs/kafka-clients-0.8.2.1.jar:/usr/local/kafka_2.10-0.8.2.1/libs/slf4j-log4j12-1.6.1.jar:/ usr/local/kafka_2.10-0.8.2.1/libs/slf4j-api-1.7.6.jar-jar/usr/local/imf_testdata/ Sparkstreamingdatamanuallyproducerforkafka.jar
[Email protected]:/usr/local/imf_testdata#
10.OnlineBBSUserLogs successful consumption data, and statistics of the value, experimental success
Report:
Source code for Onlinebbsuserlogs
Knowledge Points:
1. Create Kafka Createdirectstream, return the line value of Javapairinputdstream type
Org.apache.spark.streaming.kafka.createDirectStream Source Code
2, after reading the Kafka data stream value, carries on the related Maptopair, reducebykey operation
Source code for Maptopair-reducebykey-pairfunction-function2
The appendix generates the source code for the simulation data:
Liaoliang: DT Big Data Dream Factory founder and chief expert.
Contact e-mail: [Email protected] Tel: 18610086859 qq:1740415547
Number: 18610086859 Weibo: http://weibo.com/ilovepains/
20:00YY Channel Live channel 68917580 per night
IMF Spark Source code version custom class students:
Shanghai-De Zhihua qq:1036179833 mail:[email protected] 18918561505
99th lesson: Using spark Streaming+kafka to solve the multi-dimensional analysis and java.lang.NoClassDefFoundError problem of dynamic behavior of Forum website full Insider version decryption