How to install Hadoop in CentOS7
Hadoop is a distributed system infrastructure that allows users to develop distributed programs without understanding the details of the distributed underlying layer.
Important core of Hadoop: HDFS and MapReduce. HDFS is responsible for storage, while MapReduce is responsible for computing.
The following describes how to install Hadoop:
In fact, it is not troublesome to install Hadoop. It mainly requires the following prerequisites. If the following prerequisites are met, it is very easy to start Hadoop based on the official website configuration.
1. Java Runtime Environment. We recommend that you release Sun.
2. SSH public key password-free Authentication
The above environment is done, and the rest is only the Hadoop configuration. Different versions of these configurations may be different. For details, refer to the official documentation.
Environment
Virtual Machine: VMWare10.0.1 build-1379776
Operating System: 64-bit CentOS7
Install the Java environment
: Http://www.Oracle.com/technetwork/cn/java/javase/downloads/jdk8-downloads-2133151-zhs.html
Select the corresponding download package based on your operating system version. If the rpm package is supported, download the rpm directly or use the rpm address.
Rpm-ivh http://download.oracle.com/otn-pub/java/jdk/8u20-b26/jdk-8u20-linux-x64.rpm
JDK will be updated continuously. to install the latest JDK version, you need to go to the official website to obtain the rpm address of the latest installation package.
Configure SSH public key password-free Authentication
By default, CentOS comes with openssh-server, openssh-clients, and rsync. If your system does not, find the installation method.
Create a Common Account
Create a hadoop (custom name) account on all machines, and set the password to hadoop.
Useradd-d/home/hadoop-s/usr/bin/bash-g wheel hadoop
Passwd hadoop
SSH Configuration
Vi/etc/ssh/sshd_config
Find the following three configuration items and change them to the following settings. If it is commented out, remove the previous # uncommented to make the configuration take effect.
RSAAuthentication yes
PubkeyAuthentication yes
# The default is to check both. ssh/authorized_keys and. ssh/authorized_keys2
# But this is overridden so installations will only check. ssh/authorized_keys
AuthorizedKeysFile. ssh/authorized_keys
. Ssh/authorized_keys is the storage path of the public key.
Key Public Key Generation
Log on with a hadoop account.
Cd ~
Ssh-keygen-t rsa-p''
Will generate ~ /. Save the ssh/id_rsa.pub file ~ /. Ssh/authorized_keys
Cp ~ /. Ssh/id_rsa.pub ~ /. Ssh/authorized_keys
Use the scp command to copy the. ssh directory to other machines. This will make all machines share the same key and share the public key.
Scp ~ /. Ssh/* hadoop @ slave1 :~ /. Ssh/
Note ~ The access permission for/. ssh/id_rsa must be 600, and access by other users is prohibited.
Hadoop Installation
Refer to official configuration documents
Or the following article:
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition