0 Spark development environment is created according to the following blog:
http://blog.csdn.net/w13770269691/article/details/15505507
http://blog.csdn.net/qianlong4526888/article/details/21441131
1
Create a Scala development environment in Eclipse (Juno version at least)
Just install scala:help->install new Software->add Url:http://download.scala-ide.org/sdk/e38/scala29/stable/site
Refer to:http://dongxicheng.org/framework-on-yarn/spark-eclipse-ide/
2
write WordCount in eclipse with Scala
Create a Scala project and a WordCount class as follow:
Package com.qiurc.test
Import org.apache.spark._
import sparkcontext._
object WordCount {
def main ( Args:array[string]) {
if (args.length!= 3) {
println ("Usage:com.qiurc.test.WordCount <master> < Input> <output> ")
return
}
val sc = new Sparkcontext (args (0)," WordCount ",
system.getenv (" Spark_home "), Seq (System.getenv (" Spark_qiutest_jar "))
val textfile = Sc.textfile (args (1))
Val result = Textfile.flatmap (_.split (""))
. Map (Word => (Word, 1)). Reducebykey (_ + _)
Result.saveastextfile ( Args (2))
}
}
3 exported as a
Jar PackRight click the project and export as Spark_qiutest. jar.
Then put it into some dir, such as spark_home/
qiutest
4 Get a run script run this jar pack
Copy Run-example (in sparkhome) and change it!
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ CP Run-example Run-qiu-test
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ Vim Run-qiu-test
__________________
scala_version=2.9.3
# Figure out where the Scala framework is installed
Fwdir= "$ (CD ' dirname $ '; pwd)"
# Export this as Spark_home
Export Spark_home= "$FWDIR"
# Load environment variables from conf/spark-env.sh, if it exists
If [-e $FWDIR/conf/spark-env.sh]; Then
. $FWDIR/conf/spark-env.sh
Fi
If [-Z "$"]; Then
echo "Usage:run-example <example-class> [<args>]" >&2
Exit 1
Fi
# Figure out of the JAR file that we examples were packaged into. This includes a bit of a hack
# to avoid the-sources And-doc packages this are built by publish-local.
Qiutest_dir = "$FWDIR"/qiutest
Spark_qiutest_jar = ""
If [e "$QIUTEST _dir"/spark_qiutest.jar]; Then
Export spark_qiutest_jar= ' ls ' $QIUTEST _dir '/spark_qiutest. JAR '
Fi
if [[Z $SPARK _qiutest_jar]]; Then
echo "Failed to find Spark qiutest jar assembly in $FWDIR/qiutest" >&2
echo "You need to build spark test jar Assembly before running the This program" >&2
Exit 1
Fi
# Since The examples JAR ideally shouldn ' t include Spark-core (that dependency should is
# "provided"), also add our standard Spark classpath, built using compute-classpath.sh.
Classpath= ' $FWDIR/bin/compute-classpath.sh '
Classdata-path= "$SPARK _qiutest_jar: $CLASSPATH"
# find Java Binary
If [-N "${java_home}"]; Then
Runner= "${java_home}/bin/java"
Else
If [' command-v Java ']; Then
Runner= "Java"
Else
echo "Java_home is not set" >&2
Exit 1
Fi
Fi
If ["$SPARK _print_launch_command" = = "1"]; Then
Echo-n "Spark Command:"
echo "$RUNNER"-CP "$CLASSPATH" "$@"
echo "========================================"
Echo
Fi
Exec "$RUNNER"-CP "$CLASSPATH" "$@"
5 Run it in spark with Hadoop HDFs
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ LS Assembly LICENSE Pyspark.cmd Spa Rk-class
A.txt logs Python Spark-class2.cmd
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ Cat A.txt
A
B
C
C
D
D
E
E
(Note:put a.txt into HDFs)
Hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$hadoop fs-put a.txt./
(Note:check a.txt in HDFs)
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ Hadoop Fs-ls
Found 6 Items
-rw-r--r--2 hadoop supergroup 4215 2014-04-14 10:27/user/hadoop/readme.md
-rw-r--r--2 Hadoop supergroup 2014-04-14 15:58/user/hadoop/a.txt
-rw-r--r--2 hadoop supergroup 0 2013-05-29 17:17/user/hadoop/dumpfile
-rw-r--r--2 hadoop supergroup 0 2013-05-29 17:19/user/hadoop/dumpfiles
Drwxr-xr-x-hadoop supergroup 0 2014-04-14 15:57/USER/HADOOP/QIURC
Drwxr-xr-x-hadoop supergroup 0 2013-07-06 19:48/user/hadoop/temp
(Note:create a dir named "Qiurc" to store the output of WordCount in HDFs) hadoop@debian-master:~/spark-0.8.0-incubating -bin-hadoop1$ Hadoop FS-MKDIR/USER/HADOOP/QIURC
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ Hadoop Fs-ls
Found 5 Items
-rw-r--r--2 hadoop supergroup 4215 2014-04-14 10:27/user/hadoop/readme.md
-rw-r--r--2 hadoop supergroup 0 2013-05-29 17:17/user/hadoop/dumpfile
-rw-r--r--2 hadoop supergroup 0 2013-05-29 17:19/user/hadoop/dumpfiles
Drwxr-xr-x-hadoop supergroup 0 2014-04-14 15:32/USER/HADOOP/QIURC
Drwxr-xr-x-hadoop supergroup 0 2013-07-06 19:48/user/hadoop/temp
Start to run our WordCount program. Specifies the input and output location. Test only add hdfsxxx absolute path to write HDFs
(Note:prefix "hdfs://debian-master:9000/user/hadoop/" can ' t beforgot)
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$./run-qiu-testcom.qiurc.test.wordcount spark:// Debian-master:7077hdfs://debian-master:9000/user/hadoop/a.txthdfs://debian-master:9000/user/hadoop/qiurc
(Note:get command is OK, too)
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ Hadoop Fs-copytolocal/user/hadoop/qiurc/localfile
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ ls localfile/
part-00000 part-00001 part-00002 _success
(Note:let me show this result)
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ Cat localfile/part-00000
(, 1)
(c,2)
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ Cat localfile/part-00001
(d,2)
(a,1)
hadoop@debian-master:~/spark-0.8.0-incubating-bin-hadoop1$ Cat localfile/part-00002
(e,3)
(b,1)
Finish it! ^_^