Spark builds a development environment in Ubuntu

Source: Internet
Author: User
Tags ssh server pyspark

First, install Ubuntu dual system in Windows7Tools/Materials

Windows7 64-bit

Ubuntu 16.04 32-bit

UltraISO the latest version (used to bake the image file onto a USB flash drive)

Empty u disk (if you have files, please backup first)

  1. Allocate a piece of disk space (100G) for the installation of ubuntu16.04

Win7 comes with the tools for allocating disks ([Computer]->[management]->[Disk Management]-> Select compressed disks->[Right-click]->[Compressed volumes]), only need to compress the steps, do not need to continue the partition format and other operations.

  2. Write the image file ISO to the USB flash drive   

Download and install the UltraISO software, because we are just going to write the ubuntu13.04 image file to the U-disk, so "install virtual ISO drive (isodriver)" is not necessary to install.

   

After installation, enter the registration code to activate the software. Run the UltraISO software, find the location of the ubuntu16.04 image file in the local directory, and then double-click the image file when you find it on the right.

   

After completing the previous step, select "Start"-"Write to hard disk image ..." above.

   

In the pop-up window, such as settings, burning check : Mark; write : usb-zip+; Easy start : Writes a new hard disk master boot record (MBR)-usb-zip+. When the settings are complete, click Write, such as:

   

After you click Write in the previous step, click Yes when the following prompt appears. It takes a few minutes for the image file to start writing.

3. Installing Ubuntu16.04

1) Restart the computer, boot from the USB stick, press F12, select "USB HDD".

2) After a moment, you will enter the installation screen of ububtu16.04, select "Chinese (Simplified)" On the right, and click "Install Ubuntu".

3) Next will ask if you need to connect to the network, can be networked, or not networked, if the network choose to install third-party software and updates, you can choose not to install, after the installation of the system to select the update, after completion click Continue.

4) Installation Type Be sure to select "Other options" to continue, so that you can partition yourself.

5) partition settings, select the free partition, which is the 100G space we previously divided under Win7.

6) Next, we will make four partitions, each of which is separated from the "idle" part of the area:

First time partition:

     The previous step point "+", the following settings: mount point: "/"; Size: 22000MB; new partition type: Primary partition, new partition location: space start position; for: EXT4 log file system.

The first partition is complete:

Secondary partition:

"Idle", continue to point "+", the following settings: mount point: (not set); Size: 2048MB; Type of new partition: logical partition; Location of new partition: space start position; for: Swap space.

The second partition is complete:

Third Partition:

"Idle" place, continue to point "+", the following settings: mount point:/boot, size: 200MB; Type of new partition: Logical partition, location of new partition: space start position; for: EXT4 log file system.

The third partition is complete:

Fourth sub-partition:

"Idle" place, continue the point "+", the following settings: mount point:/home; Size: all remaining space; new partition type: logical partition; Location of new partition: space starting position; For: EXT4 log file system.

Fourth partition completed:

7) After the partition is set up, there is also a "device to install boot launcher" below, select the drive letter where the/boot is located, then click "Install Now".

8) Then follow the prompts to continue, after entering the user settings, set the user name, password. Then go to the installation screen and wait for the update to install.

9) After the installation is complete, enter Ubuntu with the username and password, restart the computer and enter into the Win7 system.

   4. Modify the system startup item with EASYBCD

After installing EASYBCD, click "Add New Entry" and select Linux/bsd.

Type select Grub;name custom; drive: Choose the/boot partition we have set, with the Linux tag. Click "Add Entry" when Setup is complete.

   

Restart your computer and now display Win7 and Ubuntu two boot items to choose from.

Ii. Creating a Hadoop user

1. First press ctrl+alt+t to open the Terminal window, enter the following command to create a new user:

$ sudo useradd-m hadoop-s/bin/bash

This command creates a Hadoop user that can log in and uses/bin/bash as the shell.

sudo command:  The sudo command is used extensively in this article. sudo is a privilege management mechanism in Ubuntu, where administrators can authorize some ordinary users to perform actions that require root privileges. When using the sudo command, you will need to enter your current user's password. Password: Enter the password in the Linux terminal, the terminal will not display any of your current input password, will not prompt you have entered the number of characters password. In a Windows system, entering a password typically indicates the password character you entered with "*". IME in English switch:ubuntu in the terminal input commands are generally used in English input. The way to switch between English and Chinese in Linux is to use the keyboard "shift" key to switch, or you can click on the top menu of the Input Method button to switch. The Sunpinyin Chinese input method that comes with Ubuntu is enough for readers to use. ubuntu Terminal Copy and paste shortcut keys: in the Ubuntu Terminal window, copy and paste shortcut keys need to add shift, that is, paste is ctrl+shift+v.

2. Then use the following command to set the password, can be easily set to Hadoop, as prompted to enter the password two times:

$ sudo passwd Hadoop

3. Add administrator privileges to Hadoop users for easy deployment and avoid some tricky permissions issues for newbies:

$ sudo adduser hadoop sudo

4. Finally log off the current user (click the gear in the upper right corner of the screen, select logout) and return to the login screen. In the login screen, select the Hadoop user you just created to log in.

Third, update apt

After logging in with a Hadoop user, we'll update apt, and we'll use apt to install the software, and there may be some software that can't be installed if it's not updated. Press Ctrl+alt+t to open the terminal window and execute the following command:

$ sudo apt-get update

If the following "hash check and inconsistent" prompt, you can change the software source to resolve. If you do not have the problem, you do not need to change it. In the process of downloading some software from a software source, it is recommended that you change the source of the software because of the inability to download it for network reasons. During the course of learning Hadoop, the installation of Hadoop is not affected even if the "hash check and mismatch" prompt appears.

Subsequent needs to change some configuration files, I prefer to use VIM (vi enhanced version, basic usage of the same), it is recommended to install (if you really do not use Vi/vim, please use the following vim to Gedit, so you can use a text editor to modify, And each time the file changes completed, please close the entire gedit program, otherwise it will occupy the terminal):

$ sudo apt-get install vim

If you need confirmation when installing the software, enter Y at the prompt.

Four, install SSH, configure SSH login without password

The cluster, single-node mode requires SSH login (similar to remote login, you can log on to a Linux host and run commands on it), Ubuntu has the SSH client installed by default, and also needs to install SSH server:

$ sudo apt-get install Openssh-server

After installation, you can use the following command to log on to the machine:

$ ssh localhost

At this point, you will be prompted with the following (SSH first login hint), enter Yes. Then follow the prompts to enter the password Hadoop, so it landed on the machine.

But this login is required to enter the password every time, we need to configure SSH without password login more convenient.

First exit the SSH just now, go back to our original terminal window, then use Ssh-keygen to generate the key and add the key to the authorization:

~/.ssh/  # If you do not have this directory, please first execute ssh localhost$ ssh                 -keygen-t RSA   # will be prompted, all press ENTER can be $ cat. /id_rsa.pub >>./authorized_keys  # Join license
~ meaning:  in a Linux system, ~ represents the user's home folder, the "/home/User name" directory, such as your user name is Hadoop, then ~ represents "/home/hadoop/". In addition, the text after # in the command is a comment, just enter the preceding command.

At this point ssh localhost , the command, no need to enter the password can be directly logged in.

v. Installing the Java JDK

 Start Firefox, download jdk-*-linux-x*.tar.gz, unzip to/opt/jdk1.8.0_*

: http://www.oracle.com/technetwork/java/javase/downloads/index.html

  Problem: The file cannot be decompressed to/opt first.

Reason: Opt is the system folder, the rights are protected, requires a certain permission to operate.

Method: Open terminal to enter the following command:

$ sudo chmod 777/opt

under Ubuntu, modify the directory Permissions command as follows: chmod-Name (only the owner has read and write permissions)
chmod 644 Name (owner has read and write permissions, group user only Read permissions)
chmod name (only the owner has read and write and Execute permissions)
chmod 666 Name (everyone has read and write permissions)
chmod 777 Name (everyone has read and write and Execute permissions)
recursively Modify permission methods for all files in a directory : Enter the directory input command chmod 777-r * (where-R means recursive processing, * represents all files) or chmod 777-r/home/abc/dirctoryname (This command does not go into the directory ,/home/abc/dirctoryname is the directory path).

vi. installation of Hadoop 2

In the Ubuntu system open Firefox Browser click on the address below to download: hadoop-2.7.1

  Hadoop 2 typically passes http://mirror.bit.edu.cn/apache/hadoop/common/ or http://mirrors.cnnic.cn/apache/hadoop/ common/ Download the latest stable version, that is, download "stable" under the hadoop-2.x.y.tar.gz this format of the file, has been compiled, another containing SRC is the Hadoop source code, need to compile to use.

  Unzip the downloaded Hadoop file to/opt/hadoop-2.6.0. You can change the file name to a short Hadoop and modify its file permissions:

$ cd/opt/hadoop-2.6.0/$ sudo mv. /hadoop-2.6.0/./Hadoop    -R hadoop./hadoop       # Modify file permissions

Hadoop can be used after decompression. Enter the following command to check if Hadoop is available, and success will display the Hadoop version information:

$ cd/opt/hadoop$. /bin/hadoop version
vii. installation of Scala

  Download scala-2.11.6.tgz in Browser, unzip to/opt/scala-2.11.6

: http://www.scala-lang.org/

Viii. installing Spark

  Download spark-*-bin-hadoop2.6.tgz in Browser, unzip to/opt/spark-*-bin-hadoop2.6

: http://spark.apache.org/downloads.html

  

IX. Configuring environment variables

To edit/etc/profile, execute the following command:

*@*: ~$  sudo gedit/etc/profile

The file will be opened in an edited manner, with the maximum number of files added:

#Seeting JDK JDK environment variable export java_home=/opt/jdk1.8. 0_45export jre_home=${java_home}/jreexport CLASSPATH=.:${java_home}/lib:${jre_home}/lib export PATH=${java_home}/bin:${jre_home}/bin:$ Path         #Seeting Scala Scala environment variable export scala_home=/opt/scala-2.11.6export path=${scala_home}/  Bin: $PATH #setting Spark SPARK environment variable export spark_home=/opt/spark-hadoop/#PythonPath Add the Pyspark module in spark to the Python environment in the export PYTHONPATH=/opt/spark-hadoop/python

Restart the computer, make/etc/profile permanent, temporary effective, open the command window, execute Source/etc/profile, in the current window to take effect.

Ten, test the installation results

Open a command window and switch to the spark root directory:

*@*:~$ cd/opt/spark-*-bin-hadoop2.6/*@*:/opt/spark-*-bin-hadoop2.6$

Execute./bin/spark-shell, open the Scala connection window to spark:

*@*:~$ cd/opt/spark-*-bin-hadoop2.6/*@*:/opt/spark-*-bin-hadoop2.6$./bin/spark-shell

There is no error message during startup, Scala> occurs, and startup is successful.

Execute the./bin/pyspark, open the Python connection window to spark:

*@*:/opt/spark-*-bin-hadoop2.6$./bin/pyspark

There is no error during startup and the startup succeeds when it appears as shown above.

Access via browser: The following page appears:

Test Spark is available.

Spark builds a development environment in Ubuntu

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.