Spark starter Combat Series--3.spark programming Model (bottom)--idea Construction and actual combat

Source: Internet
Author: User

"Note" this series of articles, as well as the use of the installation package/test data can be in the "big gift –spark Getting Started Combat series" get

1 Installing IntelliJ Idea

Idea full name IntelliJ ideas, a Java language development integration Environment, IntelliJ is recognized as one of the best Java development tools in the industry, especially in smart Code helper, code auto hint, refactoring, Java EE support, Ant, JUnit, CVS integration, code review , innovative GUI design and other aspects of the function can be said to be extraordinary. Idea is the product of JetBrains, a company headquartered in Prague, the capital of the Czech Republic, where developers are based in Eastern European programmers with a rigorous reputation.
Idea each version is available in community and ultimate two versions, as shown, where community is completely free, and ultimate version can be used for 30 days, after which a charge will be charged. From the post-installation comparison, it is sufficient to download a community version.

1.1 Installing the Software

1.1.1 Download Idea installation file

You can go to jetbrains official website http://www.jetbrains.com/idea/download/, select the latest installation files. Since future exercises require the development of a Scala application in Linux, choose Linux System IntelliJ IDEA14 as shown in:

Note "ideaic-14.0.2.tar.gz (Community Edition) and ideaiu-14.0.2.tar.gz (official edition) installation files are available under the install directory for this series of supporting resources, and the two versions are not very different for Scala development.

1.1.2 Unzip and move the directory

Upload the downloaded installation file to the target machine, unzip the IntelliJ idea installation file with the following command, and migrate to the/app directory:

-zxf ideaIU-14.0.2.tar.gzsudo mv idea-IU-139.659.2 /app/idea-IU

1.1.3 Configuring/etc/profile Environment variables

Open the/etc/profile file using the following command:

sudo vi /etc/profile

Verify that the JDK configuration variables are configured correctly (see section 2nd, "Spark compilation and Deployment" for an introduction to the infrastructure environment):

export JAVA_HOME=/usr/lib/java/jdk1.7.0_55export PATH=$PATH:$JAVA_HOME

1.2 Configuring the Scala environment

1.2.1 Start IntelliJ idea

There are two ways to start IntelliJ idea:

    • To the IntelliJ idea installation directory, go to the Bin directory and double click idea.sh start IntelliJ idea;
    • In the command line terminal, enter the $idea_home/bin directory, enter the./idea.sh to start
      Idea initial startup directory is as follows, idea by default does not install the Scala plugin, need to install manually, the installation process is not complicated, the following shows how to install.

1.2.2 Download Scala Plugin

See, select the "configure–>plugins" option on the launch screen, and then pop up the plugin management interface, which lists all the installed plugins, as the Scala plugin is not installed, you need to click "Install JetBrains Plugins" To install as shown in the following:

To install a lot of plug-ins can be found through the query or alphabetical order to find Scala plug-ins, select the plugin on the right side of the interface to appear the plugin details, click the green button "Install plugin" installation plug-in, as shown:

The installation process will appear with the installation Progress interface, which shows the progress of the plug-in installation, as shown in:

After installing the plug-in, select Create New item in the launch screen, the "Scala" type Item will appear in the popup screen, and the project will appear when prompted to create a Scala code project or an SBT code project, as shown in:

1.2.3 Setting the interface theme

The new user interface for the Darcula theme has been launched from IntelliJ IDEA12, which has been loved by many developers in the theme of black, and we'll show you how to configure it. Select the File menu in the main interface and select the Setting submenu, as shown in:

In the pop-up interface, select appearance &behavior in appearance, where Darcula theme is selected in theme, as shown in:

Save the theme to re-enter, you can see the development tools like style, is not very cool!

2 using idea to write an example

2.1 Creating a Project

2.1.1 Setting project basic information

In the Idea menu bar, select File->new Project and the following interface appears, choosing to create a Scala project:

In the basic information of the project, fill in the project name, location, Project SDK, and Scala SDK, where the project name is CLASS3, the installation of the Scala SDK is described in section 2nd, "Spark compilation and Deployment" Under Spark compilation Installation:

2.1.2 Setting Modules

After you create the project, you can see that there are no source files, only a directory src that holds the source files, and miscellaneous items that hold the other information for the project. Open the project configuration interface by double-clicking the src directory or tapping the item structure icon on the menu, as shown in:

In the modules Setup interface, SRC Right-click to select "New Folder" to add Src->main->scala directory:

In the modules Setup interface, set the Main->scala directory to the sources type, respectively:

2.1.3 Configuring the Library

Select the Library directory, add the Scala SDK library, select the scala-2.10.4 version here

Add Java Library, here is selected in the $spark_home/lib/spark-assembly-1.1.0-hadoop2.2.0.jar file, add the finished interface as follows:

2.2 Example 1: Run directly

"Spark programming Model (top) – Concept and Shell test" using Spark-shell for the search of Sogou logs, here we use idea to re-practice the number of Session query leaderboard, you can find that the use of professional development tools can be convenient and quick many.

2.2.1 Writing code

Create the CLASS3 package under Src->main->scala and add the Sogouresult object file to the package, with the following code:

Package Class3import org. Apache. Spark. Sparkcontext. _import org. Apache. Spark. {sparkconf, Sparkcontext}object sogouresult{def main (args:array[string]) {if (args. Length==0) {System. Err. println("Usage:sogouresult <file1> <file2>") System. Exit(1)} val conf = new sparkconf (). Setappname("Sogouresult"). Setmaster("Local"Val sc = new Sparkcontext (conf)//session number of queries leaderboard val Rdd1 = SC. Textfile(Args (0)). Map(_. Split("\ T")). Filter(_. Length==6) Val Rdd2=rdd1. Map(x= (x(1),1)). Reducebykey(_+_). Map(x= (x. _2,x. _1)). Sortbykey(false). Map(x= (x. _2,x. _1)) RDD2. Saveastextfile(Args (1)) SC. Stop()  }}

2.2.2 Compiling code

The code needs to be compiled before it runs, you can click on the menu Build->make project or ctrl+f9 to compile the code, the compilation results will be prompted in event log, if an exception can be modified according to the prompts

2.2.3 Run Environment configuration

Sogouresult First Run or click menu run->edit configurations Open "Run/Debug Configuration Interface"

When running Sogouresult, you need to enter the Sogou log file path and output result path two parameters, note that HDFs Path parameter path requires a full path, otherwise the operation will be error:
? Sogou log file path: Use the previous section to upload sogou query log file Hdfs://hadoop1:9000/sogou/sogouq1.txt
? Output result path: Hdfs://hadoop1:9000/class3/output2

2.2.4 Running Results View

Launch the Spark cluster, click menu run->run or Shift+f10 run Sogouresult, run the results window to run the situation. Of course, if you need to observe the detailed procedure of running the program, you can add breakpoints and use debug mode to run the process according to the program.

Use the following command to view the result of the run, which is consistent with the results from the previous section

-ls /class3/output2  -cat /class3/output2/part-00000| less

2.3 Example 2: Package run

The last example uses the idea to run the result directly, and in that case the Idea Packager will be used to execute

2.3.1 Writing code

Add the Join object file in the CLASS3 package, with the following code:

Package Class3import org. Apache. Spark. Sparkcontext. _import org. Apache. Spark. {sparkconf, Sparkcontext}object join{def main (args:array[string]) {if (args. Length==0) {System. Err. println("Usage:join <file1> <file2>") System. Exit(1)} val conf = new sparkconf (). Setappname("Join"). Setmaster("Local"Val sc = new Sparkcontext (conf) val format = new Java. Text. SimpleDateFormat("Yyyy-mm-dd") Case Class Register (D:java. Util. Date, Uuid:string, cust_id:string, lat:float,lng:float) Case class Click (D:java. Util. Date, Uuid:string, Landing_page:int) val reg = SC. Textfile(Args (0)). Map(_. Split("\ T")). Map(R = (r) (1), Register (format. Parse(R (0)), R (1), R (2), R (3). Tofloat, R (4). Tofloat)) Val CLK = SC. Textfile(Args (1)). Map(_. Split("\ T")). Map(c = (c) (1), Click (format. Parse(C (0)), C (1), C (2). Trim. ToInt)) Reg. Join(CLK). Take(2). foreach(println) SC. Stop()  }}

2.3.2 Generating a packaged file

First step to configure packaging information
Select "Artifacts" in the project structure interface, select the green "+" in the right-hand interface, select "from modules with dependencies" to add the jar package, the following interface appears, in which the main function entry is selected as join:

Step two: Fill in the jar name and adjust the output
"Note" is that "output Layout" comes with a Scala-related class package by default, and because the runtime already has a Scala-related class package, removing these packages here preserves only the output of the project.

Third Step output Package file
Click menu Build->build Artifacts, pop up select action, select Build or Rebuild action

The fourth step is to copy the package files to the spark root directory

cd /home/hadoop/IdeaProjects/out/artifacts/class3cp LearnSpark.jar  /app/hadoop/spark-1.1.0/ls /app/hadoop/spark-1.1.0/

2.3.3 Running View Results

Call the Join method in the package using the following command, and the result is as follows:

cd /app/hadoop/spark-1.1.0bin/spark-submit --master spark://hadoop1:7077 --class class3.Join1g LearnSpark.jar hdfs://hadoop1:9000/class3/join/reg.tsv hdfs://hadoop1:9000/class3/join/clk.tsv

3 Problem Solving

3.1 appears "^^ ^is already defined as Object ^^ ^ error

After compiling the Sogouresult, the error "Sogou is already as Object Sogouresult" appears.

This error is most likely not a problem with the program code, most likely using the Scala JDK version issue, When the author encounters this problem with scala-2.11.4, the problem is resolved by scala-2.10.4, and two local configurations need to be checked: libraries and global libraries are respectively modified to scala-2.10.4

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Spark starter Combat Series--3.spark programming Model (bottom)--idea Construction and actual combat

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.