1. copy files to Hdfs:[email protected]:/usr/local/hadoop$ Bin/hdfs Dfs-mkdir/user
[Email protected]:/usr/local/hadoop$ Bin/hdfs Dfs-mkdir/user/hadoop
[Email protected]:/usr/local/hadoop$ Bin/hdfs dfs-copyfromlocal/usr/local/spark/spark-1.3.1-bin-hadoop2.4/ readme.md/user/hadoop/
2. Running Spark-shell
3. read file statistics spark the word occurrences scala> SC
Res0:org.apache.spark.SparkContext = [email protected]
scala> val file = Sc.textfile ("hdfs://mhadoop:9000/user/hadoop/readme.md") file:org.apache.spark.rdd.rdd[string] = Hdfs://mhadoop:9000/user/hadoop/readme.md Mappartitionsrdd[1] at Textfile at <console>:21
The file variable is a mappartitionsrdd; then filter the word Spark.
Scala> val sparks = file.filter (line = Line.contains ("Spark"))
Sparks:org.apache.spark.rdd.rdd[string] = mappartitionsrdd[2] At the filter at <console>:23
count the number of spark occurrences as a result:Scala> Sparks.countAnother terminal with Ubuntu comes with the WC command verified under:
[Email protected]:/usr/local/spark/spark-1.3.1-bin-hadoop2.4$ grep Spark README.MD|WC
One50 761
4. Perform spark cache to see efficiency improvementScala> Sparks.cache
Res3:sparks.type = mappartitionsrdd[2] At the filter at <console>:23
Log in to the console:
http://192.168.85.10:4040/stages/
after the cache is visible, the time-consuming change from S to Ms increases significantly.
Spark-shell First Experience