"Original Hadoop&spark Hands-on 5" Spark Basics Starter, cluster build and Spark Shell

Last Update:2017-05-13 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to spark Basics, cluster build and Spark Shell

The main use of spark-based PPT, coupled with practical hands-on to enhance the concept of understanding and practice.

Spark Installation Deployment

The theory is almost there, and then the actual hands-on experiment:

Exercise 1 using Spark Shell (native mode) to complete wordcount

Spark-shell to Spark-shell native mode

First step: Import data by file mode

scala> val rdd1 = Sc.textfile ("File:///tmp/wordcount.txt")
Rdd1:org.apache.spark.rdd.rdd[string] = File:///tmp/wordcount.txt mappartitionsrdd[3] at Textfile at <console> : 24

Scala> Rdd1.count
Res1:long = 3

Second step: Using Flatmap (_.split ("")) for word breaker operation

scala> val rdd2 = Rdd1.flatmap (_.split (""))
Rdd2:org.apache.spark.rdd.rdd[string] = mappartitionsrdd[4] at FlatMap at <console>:26

Scala> Rdd2.count
Res2:long = 8

Scala> Rdd2.take
Take Takeasync takeordered takesample

Scala> Rdd2.take (8)
Res3:array[string] = Array (hello, World, Spark, world, Hello, Spark, Hadoop, great)

The third step: using map to convert to KV form

scala> val kvrdd1 = rdd2.map (x = (x,1))
kvrdd1:org.apache.spark.rdd.rdd[(String, Int)] = mappartitionsrdd[5] at map at <console>:28

Scala> Kvrdd1.count
Res4:long = 8

Scala> Kvrdd1.take (8)
res5:array[(String, Int)] = Array ((hello,1), (world,1), (spark,1), (world,1), (hello,1), (spark,1), (hadoop,1), great,1 ))

Fourth step: The Reducebykey operation of the KV map

scala> val resultRdd1 = Kvrdd1.reducebykey (_+_)
resultrdd1:org.apache.spark.rdd.rdd[(String, Int)] = shuffledrdd[6] at Reducebykey at <console>:30

Scala> Resultrdd1.count
Res6:long = 5

Scala> Resultrdd1.take (5)
res7:array[(String, Int)] = Array ((hello,2), (world,2), (spark,2), (hadoop,1), (great,1))

Fifth step: Keep the results in the file

Scala> resultrdd1.saveastextfile ("FILE:///TMP/OUTPUT1")

Exercise 2 Using the Spark Shell (Yarn client mode) to complete wordcount

Spark-shell--master yarn-client start Spark-shell yarn client mode

First step: Import data by file mode

scala> val rdd1 = Sc.textfile ("Hdfs:///input/wordcount.txt")
Rdd1:org.apache.spark.rdd.rdd[string] = Hdfs:///input/wordcount.txt mappartitionsrdd[1] at TextFile at <console >:24

Scala> Rdd1.count
Res0:long = 260

Scala> Rdd1.take (100)
Res1:array[string] = Array (HDFs users Guide, "", HDFs Users Guide, Purpose, overview, prerequisites, Web Interface, Shell Commands, Dfsadmin Command, secondary NameNode, Checkpoint node, Backup node, Import Checkpoint, Balancer, Rack Awareness , SafeMode, fsck, FETCHDT, Recovery Mode, Upgrade and Rollback, DataNode hot Swap Drive, File Permissions and Security, Sc Alability, related documentation, Purpose, "", this document was a starting point for users working with Hadoop distributed File System (HDFS) either as a part of a Hadoop cluster or as a stand-alone general purpose distributed file System. While HDFs was designed to "just work" in many environments, a working knowledge of HDFS helps greatly with configuration I Mprovements and diagnostics on a specific cluster., "", Overview, "",...

Second step: Using Flatmap (_.split ("")) for word breaker operation

scala> val rdd2 = Rdd1.flatmap (_.split (""))
Rdd2:org.apache.spark.rdd.rdd[string] = mappartitionsrdd[2] at FlatMap at <console>:26

Scala> Rdd2.count
Res2:long = 3687

Scala> Rdd2.take (100)
Res3:array[string] = Array (HDFs, Users, Guide, "", HDFs, Users, Guide, Purpose, overview, Prerequisites, Web, Interface, Shell, Commands, Dfsadmin, Command, Secondary, NameNode, Checkpoint, node, Backup, node, Import, Checkpoint, Balancer, Rac K, awareness, SafeMode, fsck, FETCHDT, Recovery, Mode, Upgrade, and, Rollback, DataNode, hot, Swap, drive, File, Permissio NS, and, Security, Scalability, related, documentation, Purpose, "", this, document, was, a, starting, point, for, users, W Orking, with, Hadoop, distributed, File, System, (HDFS), either, as, a, part, of, A, Hadoop, cluster, or, as, a, Stand-alo NE, general, purpose, distributed, file, System., while, HDFS, was, designed, to, ' Just, work ', in, many, environments,, A, Working, knowledge, of, HDFS, helps, greatly, with, configuratio ...

The third step: using map to convert to KV form

scala> val kvrdd1 = rdd2.map (x = (x,1))
kvrdd1:org.apache.spark.rdd.rdd[(String, Int)] = mappartitionsrdd[3] at map at <console>:28

Scala> Kvrdd1.count
Res4:long = 3687

Scala> Kvrdd1.take (100)
res5:array[(String, Int)] = Array ((hdfs,1), (users,1), (guide,1), ("", 1), (hdfs,1), (users,1), (guide,1), (purpose,1), (O verview,1), (prerequisites,1), (web,1), (interface,1), (shell,1), (commands,1), (dfsadmin,1), (command,1), (Secondary, 1), (namenode,1), (checkpoint,1), (node,1), (backup,1), (node,1), (import,1), (checkpoint,1), (balancer,1), (rack,1), ( awareness,1), (safemode,1), (fsck,1), (fetchdt,1), (recovery,1), (mode,1), (upgrade,1), (and,1), (rollback,1), ( datanode,1), (hot,1), (swap,1), (drive,1), (file,1), (permissions,1), (and,1), (security,1), (scalability,1), (Related, 1), (documentation,1), (purpose,1), ("", 1), (this,1), (document,1), (is,1), (a,1), (starting,1), (point,1), (for,1), (use rs,1), (working,1), (with,1), (hadoop,1), (distributed,1), (file,1), (system,1), (HDF ...

Fourth step: The Reducebykey operation of the KV map

scala> var resultRdd1 = kvrdd1.reduce
Reduce Reducebykey reducebykeylocally

scala> var resultRdd1 = Kvrdd1.reducebykey
Reducebykey reducebykeylocally

scala> var resultRdd1 = Kvrdd1.reducebykey (_+_)
resultrdd1:org.apache.spark.rdd.rdd[(String, Int)] = shuffledrdd[4] at Reducebykey at <console>:30

Scala> Resultrdd1.count
Res6:long = 1084

Scala> Resultrdd1.take (100)
res7:array[(String, Int)] = Array ((because,1), (-reconfig,2), (guide,4), (under-replicated,1), (blocks,5), (maintained , 1), (responsibility,1), (filled,1), (order,5), ([key-value,1), (prematurely,1), (cluster:,1), (type,1), (behind,1), ( however,,1), (competing,1), (been,2), (begins,1), (up-to-date,3), (permissions,3), (browse,1), (list:,1), (improved,1) , (balancer,2), (fine.,1), (over,1), (dfs.hosts,,2), (any,7), (connect,1), (select,2), (version,7), (disks.,1), file,33 ), (documentation,,1), (file.,7), (performs,2), (million,2), (ram,1), (are,27), ((data,1), (supported.,1), consists,1 , (existed,1), (brief,2), (overwrites,1), (safely,1), (guide:,1), (safemode,6), (only,1), (currently,1), (first-time,1) , (dfs.namenode.name.dir,1), (thus,2), (salient,1), (query,1), (page)., 1), (status,5 ...

Fifth step: Keep the results in the HDFs file

Scala> resultrdd1.saveastextfile ("Hdfs:///output/wordcount1")

Localhost:tmp jonsonli$ Hadoop fs-ls/output/wordcount1
17/05/13 17:49:28 WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable
Found 3 Items
-rw-r--r--1 jonsonli supergroup 0 2017-05-13 17:47/output/wordcount1/_success
-rw-r--r--1 Jonsonli supergroup 6562 2017-05-13 17:47/output/wordcount1/part-00000
-rw-r--r--1 Jonsonli supergroup 6946 2017-05-13 17:47/output/wordcount1/part-00001

"Original Hadoop&spark Hands-on 5" Spark Basics Starter, cluster build and Spark Shell

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More