The installation and configuration of Spark is described in the Spark QuickStart Guide –spark installation and foundation use, where it also introduces the use of Spark-submit submission applications, but it is not possible to develop spark applications using VIM, which is handy for the IDE. This article describes using IntelliJ idea to build a spark development environment.
1, the installation of IntelliJ idea
Because Spark is installed in the Ubuntu environment, the idea here is also installed in Ubuntu. First is the download, to the official website download can be. After downloading, extract to the directory to be installed:
sudo tar-zxvf ideaiu-2016.1.tar.gz-c/usr/local/
I unpacked it in the/usr/local directory, and then changed the folder name:
MV ideaIU-2016.1 Idea
Then modify the user and user groups for the file:
sudo chown-r hadoop:hadoop idea
The Hadoop here is my username and group name. So idea was installed successfully.
To start idea, go into the Idea/bin directory and execute the idea.sh inside:
bin/idea.sh
So you can start idea. However, this is inconvenient, you can create a new file idea.desktop on the desktop, enter the following:
[Desktop Entry]
Name=ideaiu
comment=rayn-idea-iu
exec=/usr/local/idea/bin/idea.sh
icon=/usr/local/idea/bin/ Idea.png
terminal=false
type=application
categories=developer;
This creates a desktop shortcut.
2, MAVEN installation and configuration
Maven is a project management and build automation tool. As a programmer, there has been an experience of adding jar packs to a project in order to use a feature, with more frameworks and more jar packages to add, and Maven can automatically add the jar packages we need. First download maven on the MAVEN Web site:
After downloading, the following files are available under the downloads directory:
liu@binja:~/downloads$ ls
apache-maven-3.3.9-bin.tar.gz
Extract to the directory to be installed:
liu@binja:~/downloads$ sudo tar-zxvf apache-maven-3.3.9-bin.tar.gz-c/usr/local/
Similarly, modify the folder name and user name:
liu@binja:/usr/local$ sudo mv apache-maven-3.3.9/maven
liu@binja:/usr/local$ sudo chown-r liu:liu maven
liu@binja:/usr/local$ ll maven Total
drwxr-xr-x 6 Liu Liu 4096 March 20:24./
Drwxr-xr-x root 4096 March 28 20:26. /
drwxr-xr-x 2 Liu Liu 4096 March 20:24 bin/
drwxr-xr-x 2 Liu Liu 4096 March 20:24 boot/
drwxr-xr-x 3 Liu Liu 4096 November 00:38 conf/drwxr-xr-x 3 Liu Liu 4096 March 20:24 lib/
-rw-r--r-- 1 Liu Liu 19335 November 00:44 LICENSE
- rw-r--r-- 1 Liu Liu 182 November one 00:44 NOTICE
-rw-r--r-- 1 Liu Liu 2541 November 11 00:38 README.txt
Then add maven to the environment variable:
sudo vim ~/.BASHRC
Add the following at the end:
Export path= $PATH:/usr/local/maven/bin
To make a change take effect:
liu@binja:/usr/local$ Source ~/.BASHRC
So maven is installed.
3, with the newly installed MAVEN configuration idea
Start idea brings Maven and configures its own installed Maven.
Select File->setting->build,execution,deployment->build Tools->maven at once, as shown in the following figure:
In the Maven home directory on the right, set up the MAVEN installation directory, I am here/usr/local/maven, set up the Mavne profile in user settings file, I use the default file here, in the local Repository set up the local package management warehouse, select the right side of the override, you can customize their own warehouse directory, later maven automatically download the package will be stored here.
When you click OK, MAVEN is configured. You can then create a MAVEN project.
4. Create Maven Project
Select File->new->new Project in turn, and the following interface appears:
On the left you can select the type of item, select Maven here, choose whether to use the template on the right, check the create from archetype above, and select the project template below, choose Scala's template here.
After all the way to next, fill in the GroupID and Artifactid, and take the name casually:
Then go all the way next, fill in the name of the item, OK.
This creates a successful new project, and the file structure of the new project is as follows:
One of the pom.xml is to configure our project's dependency pack. SRC is the directory where the project is stored, and there are two identical directories main and test, in which we write code in the main directory, test code, here, without testing, you can delete the test directory. The right side shows the contents of the Pom.xml file:
Check the Enable Auto-import in the upper-right corner, so idea will automatically download the dependent packages required by the project. Also pay attention to the middle Scala version and choose your own version.
You can add dependencies for items under the Dependencies tab in the following illustration:
Each dependency is under a dependency tag, which includes GroupID, Artifactid, and version. If you don't know the contents of a dependency package, you can query it here, and the query results have this information. For example, to query the spark of dependence, like the following results:
Select the dependencies you want to add, and then select the appropriate version number after entering, with some information that Maven needs, as well as information about other package management tools, such as SBT:
Can be copied into the Pom.xml file.
Maven automatically downloads the dependencies that are added in the Pom.xml and eliminates the hassle of not adding them ourselves.
Then you can write code, under Src/main/scala/com/liu new Scala class, select Type Object, fill in class name, you can write code. As an example, here is an example of a wordcount:
Package Com.liu
/**
* Created by Hadoop on 16-3-28.
* *
import Org.apache.spark. {sparkcontext,sparkconf}
Object Test {
def main (args:array[string]): Unit ={
val conf=new sparkconf ()
Val sc=new sparkcontext (conf) C8/>val text=sc.textfile ("File:///usr/local/spark/README.md")
Val Result=text.flatmap (_.split (")"). Map ((_,1) ). Reducebykey (_+_). Collect ()
Result.foreach (println)
}
}
This does not describe the specific meaning of the code. Once the code is written, you need to generate the jar package and commit it to spark.
The following steps to generate a jar package. Select File->project structure->artifacts in turn, as shown in the following figure:
The green plus sign between single hits, select Jar->from modules with dependencies, as shown below:
Select the main class for the item in main class, OK. The results are as follows:
In the middle of output layout will list all the dependencies, we want to submit to the spark, so do not need here spark and Hadoop dependencies, delete to save space, but do not delete the final compile output, otherwise it will not be a jar package. Click OK to complete the configuration.
After selecting Build->build artifact->build, you can generate a jar package, which results in the following figure:
There is an out folder in the image above, and a jar package below shows that the build was successful.
5. Submit Spark Application
After the jar package is generated, you can use Spark-submit to submit the application, using the following command:
Spark-submit--class "Com.liu.Test" ~/sparkdemo.jar
You can submit your application. The results are as follows:
Indicates the success of the operation, listing the count statistics for the word.
At this point, Spark's idea development environment was built successfully.