[Spark Asia Pacific Research Institute Series] the path to spark practice-Chapter 1 building a spark cluster (step 4) (1)

Source: Internet
Author: User
Step 1: Test spark through spark Shell

 

Step 1:Start the spark cluster. This is very detailed in the third part. After the spark cluster is started, webui is as follows:

Step 2:Start spark shell:

In this case, you can view the shell in the following Web console:

Step 3:Copy the spark installation directory "readme. md" to the HDFS system.

Start a new command terminal on the master node and go to the spark installation directory:

Copy the file to the root folder of HDFS:

At this point, we can observe the Web console and find that the file has been successfully uploaded to HDFS:

 

Step 4:Write code under SPARK shell and operate the uploaded "readme. md ":

First, let's take a look at the "SC" that automatically helps us to produce environment variables in the shell environment:

We can see that SC is the sparkcontext instance, which is automatically generated by the system when spark shell is started. sparkcontext submits the code to the cluster or local channel. We compile the spark code, A sparkcontext instance is required for both local and cluster running.

Next, read the "readme. md" file:

We saved the read content to the file variable. In fact, file is a mappedrdd. In Spark code writing, everything is based on RDD;

 

Next, we will filter out all the "spark" words from the read files.

A filteredrdd is generated;

Next, let's count the total number of "Spark" occurrences:

From the execution results, we found that the word "spark" appeared for a total of 15 times.

In this case, view the spark shell Web console:

The console displays a task submitted and completed successfully. Click the task to view its execution details:

So how can we verify that spark shell is correct for the "spark" in the readme. md file for 15 times? In fact, the method is very simple. We can use the WC command that comes with Ubuntu for statistics, as shown below:

The execution result is also 15 times, which is the same as the spark shell count.

[Spark Asia Pacific Research Institute Series] the path to spark practice-Chapter 1 building a spark cluster (step 4) (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.