Hadoop Big Data deployment 1. System Environment configuration: 1. Disable the firewall and SELinux
Disable Firewall:
systemctl stop firewalldsystemctl disable firewalld
Set SELinux to disable
# cat /etc/selinux/config SELINUX=disabled
2. Configure the NTP Time Server
# yum -y install ntpdate# crontab -l*/5 * * * * /usr/sbin/ntpdate 192.168.1.1 >/dev/null 2>&1
Change the IP address to the available time server IP Address
3. Modify System Restrictions
# cat /etc/security/limits.conf* soft nproc 100000* hard nproc 100000* soft nofile 102400* hard nofile 102400hadoop soft nproc 100000hadoop hard nproc 100000hadoop soft nofile 102400hadoop hard nofile 102400
4. Create a hadoop user
groupadd -g 1002 hadoopuseradd -u 1002 -g hadoop hadoop
5. Configure hosts
[[email protected] ~]# cat /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.24.43 hadoop1192.168.24.216 hadoop2192.168.24.7 hadoop3
6. Distribution Public Key
# su - hadoop$ ssh-keygen$ ssh-copy-id [email protected]$ ssh-copy-id [email protected]$ ssh-copy-id [email protected]
Make sure that the public keys of all nodes exist on a node.
7. Install JDK
# yum -y install jdk-8u171-linux-x64.rpm# java -versionjava version "1.8.0_171"Java(TM) SE Runtime Environment (build 1.8.0_171-b11)Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
8. Install Scala
Scala is a multi-paradigm programming language designed to integrate various features of object-oriented and functional programming. Scala runs on a Java Virtual Machine and is compatible with existing Java programs. Scala source code is compiled into Java bytecode, so it can run on JVM and can call the existing Java class library.
cd /apptar -zxvf /home/Software/scala-2.11.12.tgz -C . mv scala-2.11.12 scala
9. Install snappy
Snappy is a compressed/decompressed library. It targets neither the maximum compression nor any other compression libraries; on the contrary, it targets extremely high speed and reasonable compression. For example, compared with the fastest zlib mode, for most input, snappy is faster than an order of magnitude, but the size of the generated compressed file is 20% to 100%.
yum -y install automake autoconf libtool openssl openssl-devel gcc gcc-c++tar -zxvf snappy-1.1.3.tar.gzcd snappy-1.1.3./autogen.sh./configuremake & make install
10. Install lzo and lzop
Lzo is a lossless compression library written in ansi c. He can provide very fast compression and decompression functions. Decompression does not require memory support. Even if a large compression ratio is used to compress the data slowly, the data can still be decompressed very quickly. Lzo follows the gnu gpl license.
Lzo is very suitable for real-time data compression and decompression, that is, it is more concerned with the operation speed, rather than the compression ratio.
Lzo is written in ansi c, and the compressed data is also designed as a cross-platform format.
tar -xvf lzo-2.06.tar.gzcd lzo-2.06./configure --enable-sharedmake && make install
Lzop is a program written in the lzo library. It can be compressed and decompressed directly using shell commands.
tar -xvf lzop-1.03.tar.gzcd lzop-1.03./configuremake && make install
Ii. zookeeper Cluster
Zookeeper has three installation modes: standalones mode for Single Node installation; pseudo cluster mode: Start Multiple zookeeper instances on one host; cluster mode: requires an odd number of servers, at least three zookeeper instances are activated for each instance.
1. Unzip and install zookeepr
su - hadoopmkdir /apptar -zxvf zookeeper-3.4.10.tar.gz -C /app/cd /appsudo mv zookeeper-3.4.10 zookeepermkdir data logs
2. Modify the zoo. cfg file
[[email protected] ~]$ vim /app/zookeeper/conf/zoo.cfg tickTime=2000initLimit=20syncLimit=10dataDir=/app/zookeeper/datadataLogDir=/app/zookeeper/logsclientPort=2181server.1=hadoop1:2888:3888server.2=hadoop2:2888:3888server.3=hadoop3:2888:3888
Initlimit: The maximum heartbeat time between follower and leader during link initialization. 20*2000 means 40 seconds.
Synclimit: the maximum length of time for sending a message between a leader and a follower, that is, 20 seconds.
Server. X = A: B: C where X is a number, indicating the number of the server. A is the IP address of the server. b: configure the port used by the server and the leader in the cluster to exchange messages. c: configure the port used for election leader
3. Modify myid
In/app/zookeeper/data/
Add a myid file and write X in server. X in the preceding configuration file.
[[email protected] ~]$ cat /app/zookeeper/data/myid 1
4. Modify the log output path of zookeeper:
Modify/app/zookeeper/bin/zkEnv.sh
UnderZOO_LOG_DIR
Change to the path written in the configuration file/app/zookeeper/logs
.
if [ "x${ZOO_LOG_DIR}" = "x" ]then ZOO_LOG_DIR="/app/zookeeper/logs"fi
5. Start and debug zookeeper
Start:
$ Zkserver. Sh start
View status:
$ Zkserver. Sh status
[[email protected] ~]$ zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /app/zookeeper/bin/../conf/zoo.cfgMode: follower
Ii. hadoop ha Installation
Hadoop is divided into two major versions: 1.0 and 2.0. You can find the specific differences by yourself. This article mainly uses hadoop2.0. Hadoop2.0's ecosystem mainly includes the following core projects: HDFS yarn mapreduce.
1. Unzip and install
sudo tar -zxvf hadoop-2.9.1.tar.gz -C /app/$ pwd/app/hadoop/etc/hadoop$ lscapacity-scheduler.xml httpfs-env.sh mapred-env.shconfiguration.xsl httpfs-log4j.properties mapred-queues.xml.templatecontainer-executor.cfg httpfs-signature.secret mapred-site.xmlcore-site.xml httpfs-site.xml mapred-site.xml.templatehadoop-env.cmd kms-acls.xml slaveshadoop-env.sh kms-env.sh ssl-client.xml.examplehadoop-metrics2.properties kms-log4j.properties ssl-server.xml.examplehadoop-metrics.properties kms-site.xml yarn-env.cmdhadoop-policy.xml log4j.properties yarn-env.shhdfs-site.xml mapred-env.cmd yarn-site.xml
2. Modify hadoop environment variables (hadoop-env.sh)
Export hadoop_heapsize = 16196 export java_home =/usr/Java/1.8.0 _ 171 export java_library_path =/APP/hadoop-2.9.1/lib/nativeexport hadoop_opts = "-djava. library. path =/APP/hadoop-2.9.0/lib/native Note: If in centos 6 environment, the path after the variable name must use double quotation marks, otherwise the variable cannot be found when it is started later.
3. Modify core-site.xml
<configuration><property> <name>fs.defaultFS</name> <value>hdfs://myhadoop</value></property><property> <name>ha.zookeeper.quorum</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value></property><property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value></property><property> <name>io.file.buffer.size</name> <value>131072</value></property><property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value></property><property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value></property><property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value></property><property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value></property></configuration>
3. Modify hdfs-site.xml
<configuration><property> <name>dfs.nameservices</name> <value>myhadoop</value></property><property> <name>dfs.ha.namenodes.myhadoop</name> <value>nn1,nn2</value></property><property> <name>dfs.namenode.rpc-address.myhadoop.nn1</name> <value>hadoop1:8020</value></property><property> <name>dfs.namenode.rpc-address.myhadoop.nn2</name> <value>hadoop2:8020</value></property><property> <name>dfs.namenode.http-address.myhadoop.nn1</name> <value>hadoop1:50070</value></property><property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>hadoop2:50070</value></property><property> <name>dfs.journalnode.edits.dir</name> <value>/app/hadoop/qjournal</value></property><property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/myhadoop</value></property><property> <name>dfs.client.failover.proxy.provider.myhadoop</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value></property><property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value></property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property><property> <name>dfs.namenode.name.dir</name> <value>file:/app/hadoop/dfs/name,file:/hadoop/dfs/name</value></property><property> <name>dfs.datanode.data.dir</name> <value>file:/app/hadoop/dfs/data</value></property><property> <name>dfs.datanode.handler.count</name> <value>100</value></property><property> <name>dfs.namenode.handler.count</name> <value>1024</value></property><property> <name>dfs.datanode.max.xcievers</name> <value>8096</value></property></configuration>
3. Modify yarn-site.xml
<configuration><property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value></property><property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster1</value></property><property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value></property><property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop1</value></property><property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop2</value></property><property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hadoop1:8088</value></property><property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hadoop2:8088</value></property><property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value></property></configuration>
5. Modify mapred-site.xml
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property> <name>mapreduce.jobhistory.address</name> <value>hadoop1:10020</value></property><property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop1:19888</value></property><property> <name>mapreduce.job.tracker</name> <value>hdfs://hadoop1:8021</value></property><property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>50</value></property><property> <name>mapreduce.map.java.opts</name> <value>-Xmx4096M</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx8192M</value></property><property> <name>mapreduce.map.memory.mb</name> <value>4096</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>8192</value></property><property> <name>mapreduce.map.output.compress</name> <value>true</value></property><property> <name>mapred.child.env</name> <value>JAVA_LIBRARY_PATH=/app/hadoop-2.9.1/lib/native</value></property><property> <name>mapreduce.map.output.compress.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value></property><property> <name>mapreduce.task.io.sort.mb</name> <value>512</value></property><property> <name>mapreduce.task.io.sort.factor</name> <value>100</value></property><property> <name>mapred.reduce.tasks</name> <value>4</value></property><property> <name>mapred.map.tasks</name> <value>20</value></property><property> <name>mapred.child.java.opts</name> <value>-Xmx4096m</value> </property><property> <name>mapreduce.reduce.shuffle.memory.limit.percent</name> <value>0.1</value></property><property> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.6</value></property></configuration>
6. Modify the yarn-env.sh environment and add Environment Variables
Set the yarn heap size after the yarn-env.sh file. Append the following sentence
Yarn_heapsize = 4000
Add environment variables:
$ tail .bash_profileexport JAVA_HOME=/usr/java/jdk1.8.0_171-amd64export HADOOP_HOME=/app/hadoopexport ZOOKPEER_HOME=/app/zookeeperexport LIBRAY_PATH=$HADOOP_HOME/lib/nativeexport SCALA_HOME=/app/scalaexport PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKPEER_HOME/bin:$SCALA_HOME/bin
7. Cluster startup and monitoring
Install psmisc; otherwise, automatic switch fails:
yum -y install psmisc
Start the cluster:
#1. run zkserver on all zookeeper nodes. sh start #1.1 is executed on the leader and initialized with zookeeper. This will create a znode to implement an automatic backup system on zookeeper. HDFS zkfc-formatzk #1.2 If you are setting up a fresh HDFS cluster, you should first run the format command, on one of namenodes. HDFS namenode-format #2. automatic Start HDFS service start-dfs.sh #2.1 Note: To manually manage your cluster services, you must start your namenode through zkfc deamon. The command is as follows: hadoop-daemon.sh -- script HDFS start zkfc #3. start Resourcemanagerstart-yarn.sh in hadoop #4. start standby resourcemanageryarn-daemon.sh start ResourceManager # other commands: # Start namenodehadoop-daemon.sh start/stop namenode # Start datanodehadoop-daemon.sh start/stop namenode
View status:
# View the status of each node $ jps2049 running successfully without using nodemanager1_5 JPs # check the status of HDFS haadmin-getallservicestate # check the status of nn1/nn2 HDFS haadmin-getservicestate nn1hdfs haadmin- getservicestate nn2 # view the active/standby status of the ResourceManager cluster $ yarn rmadmin-getallservicestate hadoop1: 8033 active hadoop2: 8033 standby # view the status of each node in the ResourceManager cluster $ yarn rmadmin-getservicestate rm1active $ yarn rmadmin-getservicestate rm2standby
Hadoop Cluster Monitoring Port:
NameNode: http://namenode_host:50070ResourceManager: http://resourcemanager_host:8088MapReduce JobHistory Server: http://jobistoryserver_host:19888
Hadoop Big Data deployment