Spark Source Learning--in the Linux environment with idea to see Spark source
This article mainly solves the problem
1.Spark under the Linux experimental environment to build A, spark source reading environment preparation
This paper introduces the various configuration methods under CentOS.
Here are a list of the components needed to build this environment: JDK installation (JDK1.7) Scala installation sbt install git installation idea installation
Finally, according to the installation environment, import source code, the study of the source code, the whole piece of article will be very meticulous, suitable for beginners to learn reference. second, the installation of JDK
1. When installing CentOS, the system will automatically install the JDK, you need to uninstall it first, install the custom JDK 1.7.
Uninstall steps as follows:
[Root@localhost ~]# rpm-qa|grep jdk
java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
[root@localhost ~]# rpm-qa|grep gcj
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
libgcj-4.1.2-48.el5
[root@localhost ~]# yum-y Remove Java java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
[root@localhost ~]# yum-y Remove Java java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
[root@localhost ~]# yum-y Remove Java java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
[root@localhost ~]# yum-y Remove Libgcj-4.1.2-48.el5
After the operation completes, the input java-version will show that the Java version cannot be queried, indicating that the uninstall was successful.
2. First download the JDK version we need from the website, for JDK1.7,
Download link for: JDK1.7 version download link
It involves versioning, and we need to check the current version of the system,
Enter input at the command line
Uname-a or More/proc/version
After the kernel version, there is a x86_64 that represents the 64-bit system.
Otherwise, the 32-bit system will then download the corresponding JDK according to the system version. My side of the download is jdk-7u45-linux-x64.tar.gz.
3. To install the JDK
Extract Source Pack
To create a new Java folder in the/usr/local directory via the terminal, command line:
sudo mkdir/usr/local/java
Then copy the download to the compressed package to the Java folder, the command line:
Enter the directory where the JDK source package resides
CP Jdk-7u45-linux-x64.tar.gz/usr/local/java
Then enter the Java directory, the command line:
Cd/usr/local/java
To extract the compressed package, command line:
sudo tar xvf jdk-7u45-linux-x64.tar.gz
You can then delete the compressed package, the command line:
sudo rm jdk-7u45-linux-x64.tar.gz
setting environment variables for JDK
The global setting method here is to modify Etc/profile, which is a common environment variable for all users.
sudo gedit/etc/profile
Please open with vi without a graphical interface.
Add at the end after opening
Export java_home=/usr/local/java/jdk1.7.0_79
export JRE_HOME=/USR/LOCAL/JAVA/JDK1.7.0_79/JRE
Export Classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar: $JRE _home/lib: $CLASSPATH
export path= $JAVA _home/ Bin: $PATH
Keep in mind that during the above add process, do not add a space on either side of the equal sign, otherwise "not valid identifier" will appear, because the source/etc/profile is not recognized as redundant to the space, will be understood as part of the path.
and then save
Source/etc/profile
Make profile effective
Finally, in the terminal input Java-verson, verify that the installation is successful.
Success is displayed
Java Version "1.7.0_79"
Java (TM) SE Runtime Environment (build 1.7.0_79-b18)
Java HotSpot (TM) 64-bit Server VM (build 24.45-b08, Mixed mode)
The JDK is now installed. third, the installation of Scala
1.scala Download
First download the installation package Scala download link
I downloaded the 2.10.6 version.
2. Decompression
Unzip the downloaded installation package and move it to the specified directory.
#tar-zxf scala-2.10.6.tgz
#sudo mv scala-2.10.6/usr/local
3. Configure Environment Variables
# sudo gedit/etc/profile
At the end of the add
Export scala_home=/usr/local/scala-2.10.6
export path= $PATH: $SCALA _home/bin
Make the configuration file effective immediately
Source/etc/profile
4. Test
Enter scala-version for testing.
Display: Scala code runner version 2.10.6 ... Successful installation four, SBT installation
1. Download the SBT installation package
Installation package Address: SBT installation package Address
2. Create a directory, extract files to the established directory
$ sudo mkdir/usr/local/sbt
$ sudo tar zxvf sbt-0.13.5.tgz-c/usr/local/
3. Create a script file to start SBT
/Select a location to create a script text file to start SBT, such as a text file with a new file name of SBT under the/usr/local/sbt/directory/
$ cd/usr/local/sbt/
$ vim SBT
Add in SBT text file
Bt_opts= "-xms512m-xmx1536m-xss1m-xx:+cmsclassunloadingenabled-xx:maxpermsize=256m"
Then press ESC key input: Wq save exit, note that the path needs to be able to correctly navigate to the extracted SBT package of Sbt-launch.jar files can
/x Modify SBT file permissions x/
4. Configure the PATH environment variable to ensure that the SBT command is available in the console
$ vim ~/.BASHRC
/After adding the following code at the end of the file, save exit/
Export path=/opt/scala/sbt/: $PATH
Make the configuration file effective immediately
$ source ~/.BASHRC
5. Test whether the SBT is installed successfully
Input Sbt-version
The first time you execute, you will download some packages before you can use them properly, make sure you are networked, and the installation is successful after the
[INFO] Set current project to SBT (in Build file:/opt/scala/sbt/)
[INFO] 0.13.5
At this point, SBT has been successfully installed. v. Installation of Git
The new version of CentOS can be installed directly from the command line
$ yum Install git
But although this method is simple, but the general warehouse version of the update is not timely, such as the CentOS warehouse in the latest version of Git is 1.7.1, but git officially to the 2.x version. For systems that want to get the latest git, you can only lower the RPM package or use the source code.
The steps are as follows:
1. Download Git's installation package
Download installation package: Git installation package
2. Extract source Files
TAR-ZXVF git-latest.tar.gz
or
xz-d git-latest.tar.xz
tar-xvf Git-latest.tar
3. Compile and install
Go to Git's directory and run the following command in turn
$ autoconf
$/configure $ make
$ make
Install
You may encounter unrecognized autoconf This command, you will need to first install the
$ yum-y Install autoconf
If an error occurs when performing a make operation: GCC is missing and GCC is installed via Yum
$ yum-y Install GCC
Re-execute make operation, also reported error, missing Zlib.h. We can see if there are any zlib.h
$ Whereis zlib
If not, you need to install zlib, no longer detailed here, after installing zlib, rerun the above command.
4. Verify Success
[Root@sl git-1.7.3]# whereis git
git:/usr/local/bin/git
[root@sl git-1.7.3]# git-version
git version 1.7.3
So far, Git has been installed.
vi. installation of idea
1. Download installation package
First you need to download IntelliJ Idea's installation package from the official website.
Download the address: Idea download link
2. Installation
Copy the downloaded installation package to the location you want to install.
Then decompression, the process of decompression is actually the process of installation,
$
Installation is relatively simple, this has been completed. introduction of the source code in idea
Import Spark Project from Github
After opening IntelliJ idea, select Vcs→check out from Version control→git in the menu bar, and then fill in the address of the Spark project in the Git Repository URL and specify the local path.
Https://github.com/apache/spark
After clicking on the clone in the window, start cloning the project from Github, which will take about 3-10 minutes to speed.
Compiling Spark
When the clone is finished, IntelliJ idea will automatically prompt you that the item has a corresponding Pom.xml file and whether it is open. Here directly select Open the Pom.xml file, and then the system will automatically resolve the dependencies of the project, this step will also be due to your network and system-related environment, the time required is different.
After this step is completed, manually edit the Pom.xml file in the Spark root directory to find the line (java.version) where the Java version is specified, depending on your system environment, if you are using jdk1.7, you may need to change the value to 1.7 (default is 1.6).
Then open the shell Terminal, at the command line into the Spark project root directory that you just imported, execute
$ SBT/SBT Assembly
The compilation command will all take the default configuration to compile Spark, and if you want to specify the version of the related component, you can view the Build-spark on the Spark website (http://spark.apache.org/docs/latest/ building-spark.html) To view all the common compilation options.
At this point, if you can read the entire article, you should be able to complete the installation of the entire environment.