Spark Source Learning--in the Linux environment with idea to see Spark source __linux

Source: Internet
Author: User
Tags file permissions
Spark Source Learning--in the Linux environment with idea to see Spark source

This article mainly solves the problem
1.Spark under the Linux experimental environment to build A, spark source reading environment preparation

This paper introduces the various configuration methods under CentOS.

Here are a list of the components needed to build this environment: JDK installation (JDK1.7) Scala installation sbt install git installation idea installation

Finally, according to the installation environment, import source code, the study of the source code, the whole piece of article will be very meticulous, suitable for beginners to learn reference. second, the installation of JDK

1. When installing CentOS, the system will automatically install the JDK, you need to uninstall it first, install the custom JDK 1.7.

Uninstall steps as follows:

[Root@localhost ~]# rpm-qa|grep jdk
java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64

[root@localhost ~]# rpm-qa|grep gcj
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115  
libgcj-4.1.2-48.el5  

[root@localhost ~]# yum-y Remove Java java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
[root@localhost ~]# yum-y Remove Java java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
[root@localhost ~]# yum-y Remove Java java-1.4.2-gcj-compat-1.4.2.0-40jpp.115  
[root@localhost ~]# yum-y Remove Libgcj-4.1.2-48.el5

After the operation completes, the input java-version will show that the Java version cannot be queried, indicating that the uninstall was successful.

2. First download the JDK version we need from the website, for JDK1.7,
Download link for: JDK1.7 version download link

It involves versioning, and we need to check the current version of the system,
Enter input at the command line
Uname-a or More/proc/version

After the kernel version, there is a x86_64 that represents the 64-bit system.
Otherwise, the 32-bit system will then download the corresponding JDK according to the system version. My side of the download is jdk-7u45-linux-x64.tar.gz.

3. To install the JDK
Extract Source Pack
To create a new Java folder in the/usr/local directory via the terminal, command line:

sudo mkdir/usr/local/java

Then copy the download to the compressed package to the Java folder, the command line:
Enter the directory where the JDK source package resides

CP Jdk-7u45-linux-x64.tar.gz/usr/local/java

Then enter the Java directory, the command line:

Cd/usr/local/java

To extract the compressed package, command line:

sudo tar xvf jdk-7u45-linux-x64.tar.gz

You can then delete the compressed package, the command line:

sudo rm jdk-7u45-linux-x64.tar.gz

setting environment variables for JDK
The global setting method here is to modify Etc/profile, which is a common environment variable for all users.

sudo gedit/etc/profile

Please open with vi without a graphical interface.

Add at the end after opening

Export java_home=/usr/local/java/jdk1.7.0_79 
export JRE_HOME=/USR/LOCAL/JAVA/JDK1.7.0_79/JRE
Export Classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar: $JRE _home/lib: $CLASSPATH
export path= $JAVA _home/ Bin: $PATH

Keep in mind that during the above add process, do not add a space on either side of the equal sign, otherwise "not valid identifier" will appear, because the source/etc/profile is not recognized as redundant to the space, will be understood as part of the path.
and then save

Source/etc/profile

Make profile effective
Finally, in the terminal input Java-verson, verify that the installation is successful.
Success is displayed
Java Version "1.7.0_79"
Java (TM) SE Runtime Environment (build 1.7.0_79-b18)
Java HotSpot (TM) 64-bit Server VM (build 24.45-b08, Mixed mode)
The JDK is now installed. third, the installation of Scala

1.scala Download
First download the installation package Scala download link
I downloaded the 2.10.6 version.

2. Decompression
Unzip the downloaded installation package and move it to the specified directory.

#tar-zxf scala-2.10.6.tgz
#sudo mv scala-2.10.6/usr/local

3. Configure Environment Variables

# sudo gedit/etc/profile

At the end of the add

Export scala_home=/usr/local/scala-2.10.6
export path= $PATH: $SCALA _home/bin

Make the configuration file effective immediately

Source/etc/profile

4. Test
Enter scala-version for testing.
Display: Scala code runner version 2.10.6 ... Successful installation four, SBT installation

1. Download the SBT installation package
Installation package Address: SBT installation package Address

2. Create a directory, extract files to the established directory

$ sudo mkdir/usr/local/sbt
$ sudo tar zxvf sbt-0.13.5.tgz-c/usr/local/

3. Create a script file to start SBT
/Select a location to create a script text file to start SBT, such as a text file with a new file name of SBT under the/usr/local/sbt/directory/

$ cd/usr/local/sbt/
$ vim SBT

Add in SBT text file

Bt_opts= "-xms512m-xmx1536m-xss1m-xx:+cmsclassunloadingenabled-xx:maxpermsize=256m"

Then press ESC key input: Wq save exit, note that the path needs to be able to correctly navigate to the extracted SBT package of Sbt-launch.jar files can

/x Modify SBT file permissions x/

4. Configure the PATH environment variable to ensure that the SBT command is available in the console
$ vim ~/.BASHRC
/After adding the following code at the end of the file, save exit/

Export path=/opt/scala/sbt/: $PATH

Make the configuration file effective immediately

$ source ~/.BASHRC

5. Test whether the SBT is installed successfully
Input Sbt-version
The first time you execute, you will download some packages before you can use them properly, make sure you are networked, and the installation is successful after the
[INFO] Set current project to SBT (in Build file:/opt/scala/sbt/)
[INFO] 0.13.5
At this point, SBT has been successfully installed. v. Installation of Git

The new version of CentOS can be installed directly from the command line

$ yum Install git

But although this method is simple, but the general warehouse version of the update is not timely, such as the CentOS warehouse in the latest version of Git is 1.7.1, but git officially to the 2.x version. For systems that want to get the latest git, you can only lower the RPM package or use the source code.

The steps are as follows:

1. Download Git's installation package
Download installation package: Git installation package

2. Extract source Files

TAR-ZXVF git-latest.tar.gz
or
xz-d git-latest.tar.xz
tar-xvf Git-latest.tar

3. Compile and install
Go to Git's directory and run the following command in turn

$ autoconf
$/configure $ make
$ make
Install

You may encounter unrecognized autoconf This command, you will need to first install the

$ yum-y Install autoconf

If an error occurs when performing a make operation: GCC is missing and GCC is installed via Yum

$ yum-y Install GCC

Re-execute make operation, also reported error, missing Zlib.h. We can see if there are any zlib.h

$ Whereis zlib

If not, you need to install zlib, no longer detailed here, after installing zlib, rerun the above command.

4. Verify Success

[Root@sl git-1.7.3]# whereis git
git:/usr/local/bin/git
[root@sl git-1.7.3]# git-version
git version 1.7.3

So far, Git has been installed.

vi. installation of idea

1. Download installation package
First you need to download IntelliJ Idea's installation package from the official website.
Download the address: Idea download link

2. Installation
Copy the downloaded installation package to the location you want to install.
Then decompression, the process of decompression is actually the process of installation,

$  

Installation is relatively simple, this has been completed. introduction of the source code in idea

Import Spark Project from Github
After opening IntelliJ idea, select Vcs→check out from Version control→git in the menu bar, and then fill in the address of the Spark project in the Git Repository URL and specify the local path.

Https://github.com/apache/spark

After clicking on the clone in the window, start cloning the project from Github, which will take about 3-10 minutes to speed.
Compiling Spark
When the clone is finished, IntelliJ idea will automatically prompt you that the item has a corresponding Pom.xml file and whether it is open. Here directly select Open the Pom.xml file, and then the system will automatically resolve the dependencies of the project, this step will also be due to your network and system-related environment, the time required is different.
After this step is completed, manually edit the Pom.xml file in the Spark root directory to find the line (java.version) where the Java version is specified, depending on your system environment, if you are using jdk1.7, you may need to change the value to 1.7 (default is 1.6).
Then open the shell Terminal, at the command line into the Spark project root directory that you just imported, execute

$ SBT/SBT Assembly

The compilation command will all take the default configuration to compile Spark, and if you want to specify the version of the related component, you can view the Build-spark on the Spark website (http://spark.apache.org/docs/latest/ building-spark.html) To view all the common compilation options.

At this point, if you can read the entire article, you should be able to complete the installation of the entire environment.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.