Note:
1. The second chapter of this book the sample data because of the short link, the domestic users may not be able to download. I copied the data set to the Baidu network disk. You can download from this place:
Http://pan.baidu.com/s/1pJvjHA7
Thank you reader Mr. Qian for pointing out the problem.
2.P11, remember to set the Log4j.properties file, change the log level to warn, or the output may not look the same: there are many info!
Errata
1.2nd Chapter P16 The beginning of the page, "Create an RDD action (action) does not cause the cluster to perform distributed calculations" in the "Create Rdd action" error, should be "create RDD operation"
2. The results of the 2nd Chapter P30 the middle of the page should be:
(1007, 0.2854529057466858)
(5645434, 0.09104268062279874)
(0, 0.6838772482597568)
(5746668, 0.8064147192926266)
(0, 0.03240818525033484)
(795, 0.7754423117834044)
(795, 0.5109496938298719)
(795, 0.7762059675300523)
(12843, 0.9563812499852178)
The sequence number and parentheses after each line are not!
Spark advanced data Analytics in Chinese-reader communication