International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Linux under Spark Framework configuration (Python)

Last Update:2016-07-08 Source: Internet

Author: User

Tags pyspark hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Briefly

Spark is the universal parallel framework for the open source class Hadoop MapReduce for UC Berkeley AMP Labs, Spark, with the benefits of Hadoop MapReduce But unlike MapReduce, the job intermediate output can be stored in memory, eliminating the need to read and write HDFs, so spark is better suited for algorithms that require iterative mapreduce such as data mining and machine learning. Since Spark has a Python API, I'm more specific to the Python language. So here's a look at how I'm configuring spark and what I've learned.

Configuration process

Step One:

Download the Scala Zip package, go to the link http://www.scala-lang.org/, click Download to download Scala, and unzip it into the current directory.

Download the JDK package and go to the link http://www.oracle.com/technetwork/java/javase/downloads/ index.html, download the latest version of the JDK, if the 64-bit system please download jdk-8u91-linux-x64.tar.gz (i download version of 8u91, the system is 64-bit), 32-bit system download jdk-8u91-linux-i586.tar.gz, the download is completed and extracted to the current directory.

Download the spark compression package, enter the link https://spark.apache.org/downloads.html, select the current latest version of the person is 1.6.2, click Download.

Step Two:

1. Open the command-line window.

2. Execute Command sudo-i

3. Go to the directory where the extracted files are located

4. Transfer the J decompression file to the OPT directory

Performing MV Jdk1.8.0_91/opt/jdk1.8.0_91

Performing MV scala-2.11.8/opt/scala-2.11.8

Performing MV Spark-1.6.2-bin-hadoop2.6/opt/spark-hadoop

Step Three:

Configure environment variables, edit/etc/profile, execute the following command

sudo gedit/etc/profile

Add at the bottom of the file (note the version):

#Seeting JDK JDK Environment variables

Export java_home=/opt/jdk1.8.0_91

Export JRE_HOME=${JAVA_HOME}/JRE

Export Classpath=.:${java_home}/lib:${jre_home}/lib

Export Path=${java_home}/bin:${jre_home}/bin: $PATH

#Seeting Scala Scala environment variables

Export scala_home=/opt/scala-2.11.8

Export Path=${scala_home}/bin: $PATH

#setting Spark Spark Environment variables

Export spark_home=/opt/spark-hadoop/

#PythonPath Add the Pyspark module in spark to the Python environment

Export Pythonpath=/opt/spark-hadoop/python

Save the file, reboot the computer, make/etc/profile permanent, take effect temporarily, open the command window, execute source/etc/profile in the current window

Step Four:

Test the installation Results

Open a command window and switch to the spark root directory

Execute./bin/spark-shell, open the Scala to Spark connection window

The result of the execution is the correct

Execute./bin/pyspark, open the Python connection window to spark

The installation is correct

Python Hair Spark Application

Pythonpath is set up earlier, adding Pyspark to the Python search path
Open the Spark installation directory (/opt/spark-hadoop), unzip the py4j under the/opt/spark-hadoop/python/lib folder, and go to the/opt/spark-hadoop/python directory.

Test in Pycharm, the following red word appears, then the configuration is successful.

Reference to: http://www.open-open.com/lib/view/open1432192407317.html

Linux under Spark Framework configuration (Python)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

spark framework tutorial spark set hadoop configuration python configuration spark python tutorial apache spark python tutorial spark machine learning example python python etl framework

Python design mode-UML-Package diagrams (Package Diagram) 09-09

Python abstract class (ABC module) 09-18

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux under Spark Framework configuration (Python)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support