Before installing spark, we need to install the JDK and Scala in our own system.
can go to the corresponding official Web download:
Jdk:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
scala:http://www.scala-lang.org/download/
After the download is complete, you can get two compressed packages
Installing the JDK
First we'll install the JDK,
sudo mkdir/usr/lib/jdk
Use this statement to create the JDK installation directory, where we plan to install it in the/USR/LIB/JDK directory,
Then switch to the directory where the JDK compressed package is located, for example, where we put the package in the ~/desktop directory
CD ~/desktop
Perform the extract command to extract the compressed package into the/USR/LIB/JDK directory
sudo tar-zxvf jdk-8u91-linux-x64.tar.gz-c/usr/lib/jdk
Note that this requires root access. Otherwise, there's no way to write data to the/USR/LIB/JDK directory.
Then we need to configure the path path so that the JDK command can execute directly under any path
sudo vim/etc/profile
Open the configuration file, some tutorials will let you edit the ~/.BASHRC file in your own directory, the. bashrc file changes will only affect the current user, and/etc/profile changes will work for all users after the restart
At the end of the configuration file, add
Export java_home=/usr/lib/jdk/jdk1.8.0_91 export
jre_home=${java_home}/jre
export Classpath=.:${java_home }/lib:${jre_home}/lib
export Path=${java_home}/bin: $PATH
The path to the Java_home is configured according to the directory that you unzipped.
Then the introduction of VIM, if you do not use VIM, you can appear in all commands in the vim of the place with Gedit to replace, the following is the same.
Source/etc/profile
The current terminal is re-loaded into the/etc/profile configuration file and executed
Java
If there is a large list of things, then congratulations on your JDK installed successfully, or it is likely that your environment configuration is a problem, please check carefully. Install Scala
And then we need to install Scala. The specific installation process is similar to the JDK,
The first is to create the installation directory
sudo mkdir/usr/lib/scala
Then, unzip the package.
sudo tar-zxvf scala-2.11.8.tgz-c/usr/lib/scala
Finally, open the/etc/profile and add the configuration at the end
Export scala_home=/usr/lib/scala/scala-2.11.8
export path=${scala_home}/bin: $PATH
After exiting, proceed to Source/etc/profile
Then execute Scala and the following interface shows that the installation was successful
Can be entered here
: Quit
Exit Scala install spark
After installing the above, we need to install today's lead spark, first to download the package we need
Http://spark.apache.org/downloads.html
This is its download address, it is important to note that we are in choose a package type: Here is the pre-build for Hadoop2.6
Then click on the Download Spark link below to start the download.
This file will appear when finished
Also we need to give spark an installation directory
sudo mkdir/usr/lib/spark
Unzip the file
sudo tar-zxvf spark-1.6.1-bin-hadoop2.6.tgz-c/usr/lib/spark
Configuring in/etc/profile
Export spark_home=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6
export path=${spark_home}/bin: $PATH
Source/etc/profile
After that, the execution
Pyspark
This shows that the installation is complete and you can enter the appropriate Python code here to perform the operation. using Pyspark in Python
Of course, it's not possible to say that we're developing in such an interpreter in the later development process, so what we're going to do next is let Python load the spark library.
So we need to add the Pyspark to the Python search directory, and we need to edit the/etc/profile file in the end, add
Export Pythonpath=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6/python:/usr/bin/python
This adds the Python library under the Spark directory to the Python search directory.
But since Python needs to call the Java library, we need to add a py4j folder under the/usr/lib/spark/spark-1.6.1-bin-hadoop2.6/python path, which can be/usr/lib/ Spark/spark-1.6.1-bin-hadoop2.6/python/lib directory, in this directory there is a py4j-0.9-src.zip compression package, unzip him and put it into
You can do it under the/usr/lib/spark/spark-1.6.1-bin-hadoop2.6/python/directory.
Of course, this operation needs to be done under root authority.
Enter Python at this time in any directory
And then enter it here
Import Pyspark
See if Pyspark can be imported correctly, and if there are no prompts, it means Pyspark can import normally.
In this way, you can write the. py file anywhere, and you need to import it in Pyspark place. pycharm Import Pyspark
Of course some users like to use Pycharm to write Python, so for pycharm use Pyspark also do a description
First we need to click the dropdown box in the top right corner and choose Edit configurations ...
Then in the popup dialog box, click enviroment Variables: Edit button on the right
Click the plus sign to add two new data,
Pythonpath and
Spark_home
The data content is the same as the content in the/etc/profile
Then test it with the following code
Import pyspark
conf = Pyspark. Sparkconf (). Setappname ("Sparkdemo"). Setmaster ("local")
sc = Pyspark. Sparkcontext (conf=conf)
Appear
Note that pycharm can also be loaded into the pyspark normally.