IntelliJ idea to build spark development environment _intellij

Source: Internet
Author: User

The installation and configuration of Spark is described in the Spark QuickStart Guide –spark installation and foundation use, where it also introduces the use of Spark-submit submission applications, but it is not possible to develop spark applications using VIM, which is handy for the IDE. This article describes using IntelliJ idea to build a spark development environment.

1, the installation of IntelliJ idea

Because Spark is installed in the Ubuntu environment, the idea here is also installed in Ubuntu. First is the download, to the official website download can be. After downloading, extract to the directory to be installed:

sudo tar-zxvf ideaiu-2016.1.tar.gz-c/usr/local/

I unpacked it in the/usr/local directory, and then changed the folder name:

MV ideaIU-2016.1 Idea
Then modify the user and user groups for the file:

sudo chown-r hadoop:hadoop idea
The Hadoop here is my username and group name. So idea was installed successfully.

To start idea, go into the Idea/bin directory and execute the idea.sh inside:

bin/idea.sh
So you can start idea. However, this is inconvenient, you can create a new file idea.desktop on the desktop, enter the following:

[Desktop Entry]
Name=ideaiu
comment=rayn-idea-iu
exec=/usr/local/idea/bin/idea.sh
icon=/usr/local/idea/bin/ Idea.png
terminal=false
type=application
categories=developer;
This creates a desktop shortcut.

2, MAVEN installation and configuration

Maven is a project management and build automation tool. As a programmer, there has been an experience of adding jar packs to a project in order to use a feature, with more frameworks and more jar packages to add, and Maven can automatically add the jar packages we need. First download maven on the MAVEN Web site:

After downloading, the following files are available under the downloads directory:

liu@binja:~/downloads$ ls
apache-maven-3.3.9-bin.tar.gz
Extract to the directory to be installed:

liu@binja:~/downloads$ sudo tar-zxvf apache-maven-3.3.9-bin.tar.gz-c/usr/local/
Similarly, modify the folder name and user name:

liu@binja:/usr/local$ sudo mv apache-maven-3.3.9/maven
liu@binja:/usr/local$ sudo chown-r liu:liu maven
liu@binja:/usr/local$ ll maven Total
drwxr-xr-x  6 Liu  Liu   4096  March 20:24./
Drwxr-xr-x root  4096  March 28 20:26. /
drwxr-xr-x  2 Liu  Liu   4096  March 20:24 bin/
drwxr-xr-x  2 Liu  Liu   4096  March 20:24 boot/
drwxr-xr-x  3 Liu  Liu   4096 November 00:38 conf/drwxr-xr-x  3 Liu  Liu   4096  March 20:24 lib/
-rw-r--r--  1 Liu  Liu  19335 November 00:44 LICENSE
- rw-r--r--  1 Liu  Liu    182 November one 00:44 NOTICE
-rw-r--r--  1 Liu  Liu   2541 November 11 00:38 README.txt
Then add maven to the environment variable:

sudo vim ~/.BASHRC
Add the following at the end:

Export path= $PATH:/usr/local/maven/bin

To make a change take effect:

liu@binja:/usr/local$ Source ~/.BASHRC
So maven is installed.

3, with the newly installed MAVEN configuration idea

Start idea brings Maven and configures its own installed Maven.

Select File->setting->build,execution,deployment->build Tools->maven at once, as shown in the following figure:


In the Maven home directory on the right, set up the MAVEN installation directory, I am here/usr/local/maven, set up the Mavne profile in user settings file, I use the default file here, in the local Repository set up the local package management warehouse, select the right side of the override, you can customize their own warehouse directory, later maven automatically download the package will be stored here.

When you click OK, MAVEN is configured. You can then create a MAVEN project.

4. Create Maven Project

Select File->new->new Project in turn, and the following interface appears:


On the left you can select the type of item, select Maven here, choose whether to use the template on the right, check the create from archetype above, and select the project template below, choose Scala's template here.

After all the way to next, fill in the GroupID and Artifactid, and take the name casually:


Then go all the way next, fill in the name of the item, OK.

This creates a successful new project, and the file structure of the new project is as follows:


One of the pom.xml is to configure our project's dependency pack. SRC is the directory where the project is stored, and there are two identical directories main and test, in which we write code in the main directory, test code, here, without testing, you can delete the test directory. The right side shows the contents of the Pom.xml file:


Check the Enable Auto-import in the upper-right corner, so idea will automatically download the dependent packages required by the project. Also pay attention to the middle Scala version and choose your own version.

You can add dependencies for items under the Dependencies tab in the following illustration:


Each dependency is under a dependency tag, which includes GroupID, Artifactid, and version. If you don't know the contents of a dependency package, you can query it here, and the query results have this information. For example, to query the spark of dependence, like the following results:


Select the dependencies you want to add, and then select the appropriate version number after entering, with some information that Maven needs, as well as information about other package management tools, such as SBT:


Can be copied into the Pom.xml file.

Maven automatically downloads the dependencies that are added in the Pom.xml and eliminates the hassle of not adding them ourselves.

Then you can write code, under Src/main/scala/com/liu new Scala class, select Type Object, fill in class name, you can write code. As an example, here is an example of a wordcount:

Package Com.liu

/**
  * Created by Hadoop on 16-3-28.
  * *
import Org.apache.spark. {sparkcontext,sparkconf}
Object Test {
  def main (args:array[string]): Unit ={
    val conf=new sparkconf ()
    Val sc=new sparkcontext (conf) C8/>val text=sc.textfile ("File:///usr/local/spark/README.md")
    Val Result=text.flatmap (_.split (")"). Map ((_,1) ). Reducebykey (_+_). Collect ()
    Result.foreach (println)
  }
}
This does not describe the specific meaning of the code. Once the code is written, you need to generate the jar package and commit it to spark.

The following steps to generate a jar package. Select File->project structure->artifacts in turn, as shown in the following figure:


The green plus sign between single hits, select Jar->from modules with dependencies, as shown below:


Select the main class for the item in main class, OK. The results are as follows:


In the middle of output layout will list all the dependencies, we want to submit to the spark, so do not need here spark and Hadoop dependencies, delete to save space, but do not delete the final compile output, otherwise it will not be a jar package. Click OK to complete the configuration.

After selecting Build->build artifact->build, you can generate a jar package, which results in the following figure:


There is an out folder in the image above, and a jar package below shows that the build was successful.

5. Submit Spark Application

After the jar package is generated, you can use Spark-submit to submit the application, using the following command:

Spark-submit--class "Com.liu.Test" ~/sparkdemo.jar
You can submit your application. The results are as follows:


Indicates the success of the operation, listing the count statistics for the word.

At this point, Spark's idea development environment was built successfully.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.