99th lesson: Using spark Streaming+kafka to solve the multi-dimensional analysis and java.lang.NoClassDefFoundError problem of dynamic behavior of Forum website full Insider version decryption

Source: Internet
Author: User
Tags log4j

99th lesson: Using Spark streaming the multi-dimensional analysis of dynamic behavior of forum website

/* Liaoliang teacher http://weibo.com/ilovepains every night 20:00yy Channel live instruction channel 68917580*/

/**
* 99th lesson: Using Spark streaming the multi-dimensional analysis of dynamic behavior of forum website
* Forum data automatically generated code, the generated data will be sent as producer to Kafka, and then sparkstreaming program from
* Kafka on-line user behavior information of online pull to forum or website, then conduct multi-dimension online analysis
* The data format is as follows:
* Date: Dates, formatted as YYYY-MM-DD
* Timestamp: Time stamp
* UserID: User ID
* PageID: Page ID
* Chanelid: The ID of the plate
* Action: Click and register */

The generated user clicks the simulation data as follows:


Key steps to generate simulation data:
Enable a thread to simulate producer generated user click Behavior data, sent to KAKFA

Two: We calculate the PV of different modules on line




Three: Start Hadoop, Spark, zookeeper, Kafka cluster


1. Start Hadoop





2. Start the Spark cluster







3. Start Zookeeper





4. Sparkstreamingdatamanuallyproducerforkafka the jar package file and upload it locally to the virtual machine using WINSCP




5. Start the Kafka cluster


6. Run on Linux, run the Sparkstreamingdatamanuallyproducerforkafka jar package, load the generated data into the Kafka cluster, test the situation of the producer consumers on the Kafka



First step: Kafka build Topic
kafka-topics.sh--create--zookeeper master:2181,worker1:2181,worker2:2181--replication-factor 1--partitions 1-- Topic Userlogs
Kafka View Topic
kafka-topics.sh--describe--zookeeper master:2181,worker1:2181,worker2:2181





Step two: The solution of running Sparkstreamingdatamanuallyproducerforkafka,java.lang.noclassdeffounderror
Run the Sparkstreamingdatamanuallyproducerforkafka jar package on Linux
When you run an application with Java-jar Sparkstreamingdatamanuallyproducerforkafka.jar, you will not find a third-party jar package. When run with the-jar parameter, the JVM masks all external classpath, but only the internal class as the search scope for the classes.
Workaround: BootStrap Class expansion scenario


7.kafka Consumer Topic
In master production data:
[Email protected]:/usr/local/imf_testdata# Java-xbootclasspath/a:/usr/local/kafka_2.10-0.8.2.1/libs/kafka_ 2.10-0.8.2.1.jar:/usrocal/scala-2.10.4/lib/scala-library.jar:/usr/local/kafka_2.10-0.8.2.1/libs/ log4j-1.2.16.jar:/usr/local/kafka_2.10-0.8.2.1/libs/metrics-core-2.2.0.jar:/usr/local/ spark-1.6.1-bin-hadoop2.6/lib/spark-streaming_2.10-1.6.1.jar:/usr/local/kafka_2.10-0.8.2.1/libs/ Kafka-clients-0.8.2.1.jar:/usr/local/kafka_2.10-0.8.2.1/libs/slf4j-log4j12-1.6.1.jar:/usr/local/kafka_ 2.10-0.8.2.1/libs/slf4j-api-1.7.6.jar-jar Sparkstreamingdatamanuallyproducerforkafka.jar




Start spending on the Work1
[Email protected]:~# kafka-console-consumer.sh--zookeeper master:2181,worker1:2181,worker2:2181--from-beginning-- Topic Userlogs


8. Onlinebbsuserlogs the jar package file and upload it locally to the virtual machine using WINSCP


9. To avoid typing the command incorrectly, write a script to perform the data generation
[Email protected]:/usr/local/imf_testdata# cat producerforkafka.sh
java-xbootclasspath/a:/usr/local/kafka_2.10-0.8.2.1/libs/kafka_2.10-0.8.2.1.jar:/usr/local/scala-2.10.4/lib/ scala-library.jar:/usr/local/kafka_2.10-0.8.2.1/libs/log4j-1.2.16.jar:/usr/local/kafka_2.10-0.8.2.1/libs/ metrics-core-2.2.0.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-streaming_2.10-1.6.1.jar:/usr/local/ kafka_2.10-0.8.2.1/libs/kafka-clients-0.8.2.1.jar:/usr/local/kafka_2.10-0.8.2.1/libs/slf4j-log4j12-1.6.1.jar:/ usr/local/kafka_2.10-0.8.2.1/libs/slf4j-api-1.7.6.jar-jar/usr/local/imf_testdata/ Sparkstreamingdatamanuallyproducerforkafka.jar


[Email protected]:/usr/local/imf_testdata#





10.OnlineBBSUserLogs successful consumption data, and statistics of the value, experimental success





Report:
Source code for Onlinebbsuserlogs

Knowledge Points:
1. Create Kafka Createdirectstream, return the line value of Javapairinputdstream type
Org.apache.spark.streaming.kafka.createDirectStream Source Code



2, after reading the Kafka data stream value, carries on the related Maptopair, reducebykey operation
Source code for Maptopair-reducebykey-pairfunction-function2



The appendix generates the source code for the simulation data:








Liaoliang: DT Big Data Dream Factory founder and chief expert.

Contact e-mail: [Email protected] Tel: 18610086859 qq:1740415547

Number: 18610086859 Weibo: http://weibo.com/ilovepains/
20:00YY Channel Live channel 68917580 per night




IMF Spark Source code version custom class students:
Shanghai-De Zhihua qq:1036179833 mail:[email protected] 18918561505

99th lesson: Using spark Streaming+kafka to solve the multi-dimensional analysis and java.lang.NoClassDefFoundError problem of dynamic behavior of Forum website full Insider version decryption

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.