The installation and configuration of Spark is described in the Spark Quick Start Guide –spark installation and basic use, where it is also described using the SPARK-SUBMIT submission application, but it is not possible to use VIM to develop the spark application, with the IDE handy. Here's a development environment that uses INTELLIJ idea to build spark.
1. Installation of Intellij idea
Because Spark is installed in the Ubuntu environment, the idea here is also installed in Ubuntu. The first is to download, to the official website to download. Unzip to the directory you want to install after downloading:
sudo tar-zxvf ideaiu-2016.1.tar.gz-c/usr/local/
I unzipped in the/usr/local directory, and then changed the folder name:
MV ideaIU-2016.1 Idea
Then modify the user and user groups for the file:
sudo chown-r hadoop:hadoop idea
Hadoop here is my user name and group name. So the idea was installed successfully.
To start idea, go to the Idea/bin directory and execute the idea.sh inside:
bin/idea.sh
This allows you to start idea. However, this is not convenient, you can create a new file on the desktop Idea.desktop, enter the following content:
[Desktop Entry] Name=ideaiucomment=rayn-idea-iuexec=/usr/local/idea/bin/idea.shicon=/usr/local/idea/bin/idea.pngterminal= Falsetype=applicationcategories=developer;
This creates a desktop shortcut.
2. MAVEN Installation and configuration
Maven is a project management and building automation tool. As a programmer, there has been a lot of experience in adding jar packages to a project in order to use a feature, more frameworks to add, and more jar packages to be added, and MAVEN will automatically add the required jar packages to us. First download maven on the MAVEN official website:
After downloading, the following files are available in the downloads directory:
[Email protected]:~/downloads$ lsapache-maven-3.3.9-bin.tar.gz
Unzip to the directory you want to install:
[Email protected]:~/downloads$ sudo tar-zxvf apache-maven-3.3.9-bin.tar.gz-c/usr/local/
Similarly, modify the folder name and user name:
[Email protected]:/usr/local$ sudo mv apache-maven-3.3.9/maven[email protected]:/usr/local$ sudo chown-r liu:liu maven[ Email protected]:/usr/local$ ll Maventotal 52drwxr-xr-x 6 Liu Liu 4096 March 20:24./drwxr-xr-x Root R Oot 4096 March 28 20:26.. /drwxr-xr-x 2 Liu Liu 4096 March 20:24 bin/drwxr-xr-x 2 Liu Liu 4096 March 28 20:24 Boot/drwxr-xr-x 3 Liu Liu 4096 November 00:38 conf/drwxr-xr-x 3 Liu Liu 4096 March 28 20:24 lib/-rw-r--r-- 1 Liu Liu 19335 November one 00:44 license-rw-r--r-- 1 Liu Liu 182 November 11 00:44 notice-rw-r--r-- 1 Liu Liu
Then add maven to the environment variable:
sudo vim ~/.BASHRC
At the end, add the following:
Export path= $PATH:/usr/local/maven/bin
Make the changes effective:
[Email protected]:/usr/local$ Source ~/.BASHRC
So maven installs it.
3. Configure idea with a newly installed Maven
Start idea comes with Maven, which configures its own installed Maven.
Select File->setting->build,execution,deployment->build Tools->maven at once, such as:
In Maven home directory on the right, set up MAVEN's installation directory, just for me here is/usr/local/maven, in user settings file set mavne configuration file, I use the default file here, in the local Repository set up the management warehouse of the local package, select the right side of the override, you can customize your own warehouse directory, the future MAVEN automatically download the package will be stored here.
When you click OK, maven is finished configuring. You can then create a MAVEN project.
4. Create a MAVEN project
Select File->new->new Project and the following interface appears:
The left side can choose the type of project, here Maven, the right can choose whether to use the template, check the above create from archetype, you can select the project template below, here choose Scala template.
After all the way next, fill in the GroupID and Artifactid, name casually:
Then all the way next, fill in the name of the project, OK.
The new project is created successfully, and the file structure of the new project is as follows:
The Pom.xml is the dependency package that configures our project. SRC is the directory where the project holds the code, and below are two directories with the same structure, main and test, where we write code in the main directory, test writes the code, and you can delete the test directory without using tests first. The content of the Pom.xml file is shown on the right:
Tick the Enable Auto-import in the upper right corner so that idea will automatically download the dependent packages required for the project. Also pay attention to the middle Scala version and choose the version you use.
Under the Dependencies tab, you can add dependencies for your project:
Each dependency is under a dependency tag, which includes GroupID, Artifactid, and version. If you do not know the contents of the dependency package, you can query here, the query results will have this information. For example, to query the dependency of spark, it has the following result:
Select the dependencies you want to add, select the appropriate version number when you enter, and here are some information that Maven needs, along with other package management tools, such as SBT:
Can be copied to the Pom.xml file.
Maven automatically downloads the dependencies that are added to the pom.xml without having to add them ourselves, eliminating the hassle.
After that, you can write code, create a new Scala class under Src/main/scala/com/liu, choose Type Object, fill in the class name, and you can write code. As an example, here is an example of a wordcount:
Package com.liu/** * Created by Hadoop on 16-3-28. */import Org.apache.spark. {Sparkcontext,sparkconf}object Test { def main (args:array[string]): Unit ={ val conf=new sparkconf () Val Sc=new sparkcontext (conf) Val text=sc.textfile ("File:///usr/local/spark/README.md") Val result= Text.flatmap (_.split (")). Map ((_,1)). Reducebykey (_+_). Collect () Result.foreach (println) }}
The specific meaning of the code is not described here. Once the code is written, the jar package needs to be generated and submitted to spark to run.
The following steps to build the jar package. Select File->project structure->artifacts, for example:
Click the green plus sign in between, and select Jar->from modules with dependencies, such as:
Select the main class of the project in Main, OK. The results are as follows:
The middle output layout lists all the dependencies, and we're going to submit it to spark, so we don't need the spark and Hadoop dependencies here to save space, but don't delete the last compile output, or the jar package will not be born. Click OK to complete the configuration.
After choosing Build->build artifact->build, you can generate the jar package with the results such as:
More than one out folder, there is a jar package below that indicates that the build was successful.
5. Submit the Spark Application
Once you have generated the jar package, you can use Spark-submit to submit the application, using the following command:
Spark-submit--class "Com.liu.Test" ~/sparkdemo.jar
You can submit your app. The results are as follows:
Indicates the success of the run, and lists the count statistics for the words.
At this point, Spark's idea development environment has been built successfully.
Intellij idea builds spark development environment