1. Overview of Hadoop ecology
Hadoop is a distributed system integration architecture developed by the Apache Foundation that allows users to develop distributed programs without knowing the underlying details of the distribution, leveraging the power of the cluster for high-speed computing and storage, with reliable, efficient and scalable features
The core of Hadoop is yarn,hdfs,mapreduce, the Common module architecture is as follows
650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M00/89/76/wKioL1gUT8nCycQjAAKOnd4EMEo166.png "title=" 1.png " alt= "Wkiol1gut8ncycqjaakond4emeo166.png"/>
2. HDFS
GFS paper from Google, published in October 2013, is a clone of GFs, HDFs is the foundation of data storage management in Hadoop, a highly fault-tolerant system capable of detecting and responding to hardware failures
HDFs simplifies the file consistency model with streaming data access, providing high-throughput application data access capabilities for applications with large datasets, providing a mechanism to write multiple reads at once, data in blocks, distributed across different physical machines in the cluster
3. Mapreduce
A mapreduce paper from Google, used to calculate large amounts of data, shielding the details of the distributed computing framework and abstracting the calculations into the map and reduce two parts
4. HBASE (Distributed Columnstore database)
BigTable paper from Google, a column-oriented, structured data-scalable, highly reliable, high-performance distributed and column-oriented dynamic schema database based on HDFs
5, Zookeeper
Solve the problem of data management in distributed environment, unified naming, state synchronization, cluster management, configuration synchronization, etc.
6. HIVE
Open source from Facebook, defines a SQL-like query language that transforms SQL into a mapreduce task executed on Hadoop
7, Flume
Log Collection Tool
8. Yarn Distributed Resource Manager
Is the next generation of MapReduce, mainly to solve the original Hadoop scalability is poor, do not support a variety of computing framework, the architecture is as follows
650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M01/89/79/wKiom1gUUtfyL4A3AAIHT-X65D8457.png "title=" 1.png " alt= "Wkiom1guutfyl4a3aaiht-x65d8457.png"/>9, Spark
Spark provides a faster and more general-purpose data processing platform, and spark allows your program to run in-memory compared to Hadoop
10, Kafka
Distributed Message Queuing, primarily for handling active streaming data
11. Hadoop Pseudo-distributed deployment
Currently, there are three main versions of Hadoop that are not charged, and are foreign manufacturers, respectively
1. Apache Original version
2, CDH version, for domestic users, the vast majority of select this version
3. HDP version
Here we choose CDH version hadoop-2.6.0-cdh5.8.2.tar.gz, environment is CENTOS7.1,JDK need 1.7.0_55 above
[Email protected] ~]# Useradd Hadoop
My system comes with the following Java environment by default
[[Email protected] ~]# ll /usr/lib/jvm/total 12lrwxrwxrwx. 1 root root 26 oct 27 22:48 java -> /etc/alternatives/java_ sdklrwxrwxrwx. 1 root root 32 oct 27 22:48 java-1.6.0 -> /etc/alternatives/java_sdk_1.6.0drwxr-xr-x. 7 root root 4096 oct 27 22:48 java-1.6.0-openjdk-1.6.0.34.x86_64lrwxrwxrwx. 1 root root 34 Oct 27 22:48 java-1.6.0-openjdk.x86_64 -> Java-1.6.0-openjdk-1.6.0.34.x86_64lrwxrwxrwx. 1 root root 32 oct 27 22:44 java-1.7.0 -> /etc/alternatives/java_sdk_1.7.0lrwxrwxrwx. 1 root Root 40 oct 27 22:44 java-1.7.0-openjdk -> /etc/alternatives /java_sdk_1.7.0_openjdkdrwxr-xr-x. 8 root roOt 4096 oct 27 22:44 java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64lrwxrwxrwx. 1 root root 32 oct 27 22:44 java-1.8.0 -> / Etc/alternatives/java_sdk_1.8.0lrwxrwxrwx. 1 root root 40 oct 27 22:44 java-1.8.0-openjdk -> /etc/alternatives/java_sdk_1.8.0_openjdkdrwxr-xr-x. 7 root root 4096 oct 27 22:44 java-1.8.0-openjdk-1.8.0.31-2.b13.el7.x86_ 64lrwxrwxrwx. 1 root root 34 oct 27 22:48 java-openjdk -> /etc/alternatives/java_sdk_openjdklrwxrwxrwx. 1 root root 21 Oct 27 22:44 jre -> /etc/alternatives/jrelrwxrwxrwx. 1 root Root 27 oct 27 22:44 jre-1.6.0 -> /etc/alternatives/jre_ 1.6.0lrwxrwxrwx. 1 root root 38 oct 27 22:44 jre-1.6.0-openjdk.x86_64 -> Java-1.6.0-openjdk-1.6.0.34.x86_64/jrelrwxrwxrwx. 1 root root 27 oct 27 22:44 jre-1.7.0 -> /etc/alternatives/jre_1.7.0lrwxrwxrwx. 1 root root 35 oct 27 22:44 jre-1.7.0-openjdk -> /etc/ Alternatives/jre_1.7.0_openjdklrwxrwxrwx. 1 root root 52 oct 27 22:44 jre-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64 -> Java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64/jrelrwxrwxrwx. 1 root root 27 oct 27 22:44 jre-1.8.0 -> /etc/alternatives/jre_1.8.0lrwxrwxrwx. 1 root root 35 oct 27 22:44 jre-1.8.0-openjdk -> / Etc/alternatives/jre_1.8.0_openjdklrwxrwxrwx. 1 root root 48 Oct 27 22:44 jre-1.8.0-openjdk-1.8.0.31-2.b13.el7.x86_64 -> java-1.8.0-openjdk-1.8.0.31-2.b13.el7.x86_64/jrelrwxrwxrwx. 1 root root 29 oct 27 22:44 jre-openjdk -> /etc/alternatives/jre_openjdk
[[email protected] ~]# CAT/HOME/HADOOP/.BASHRC Add the following environment variables
Export Java_home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64export CLASSPATH=.: $JAVA _home/jre/ Lib/rt.jar: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $PATH: $JAVA _home/binexport hadoop_prefix=/ Opt/hadoop/currentexport Hadoop_mapred_home=${hadoop_prefix}export Hadoop_common_home=${hadoop_prefix}export Hadoop_hdfs_home=${hadoop_prefix}export Hadoop_yarn_home=${hadoop_prefix}export HTTPS_CATALINA_HOME=${HADOOP_ Prefix}/share/hadoop/httpfs/tomcatexport Hadoop_conf_dir=/etc/hadoop/confexport YARN_CONF_DIR=/etc/hadoop/ Confexport https_config=/etc/hadoop/confexport path= $PATH: $HADOOP _prefix/bin: $HADOOP _prefix/sbin
We installed Hadoop under the/opt/hadoop directory, creating the following soft connection, the configuration file is placed under the/etc/hadoop/conf directory
[email protected] hadoop]# ll current
lrwxrwxrwx 1 root root Oct 11:02 current--hadoop-2.6.0-cdh5.8.2
Do the following authorization
[Email protected] hadoop]# chown-r hadoop.hadoop hadoop-2.6.0-cdh5.8.2
[Email protected] hadoop]# chown-r hadoop.hadoop/etc/hadoop/conf
CDH5 new version of the HADOOP boot service footstep is located under the $hadoop_home/sbin directory, the startup service has the following
Namenode
Secondarynamenode
Datanode
Resourcemanger
NodeManager
This is where Hadoop users manage and start Hadoop services
[Email protected] etc]# cd/etc/hadoop/conf/
[Email protected] conf]# vim Core-site.xml
<configuration><property> <name>fs.defaultFS</name> <value>hdfs://hadoop1</ Value></property></configuration> formatted Namenode[[email protected] conf]# cd/opt/hadoop/current/bin[[ Email protected] bin]# HDFs namenode-format start namenode service [[email protected] bin]# Cd/opt/hadoop/current/sbin/[[email Protected] sbin]#/hadoop-daemon.sh start namenode[[email protected] sbin]$./hadoop-daemon.sh start Datanode
View service startup status
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/89/77/wKioL1gUV3WC1ynsAAAUH-zmTVk706.png "title=" 1.png " alt= "Wkiol1guv3wc1ynsaaauh-zmtvk706.png"/>
After the Namenode boot is complete, you can view the status through the Web interface, the default port is 50070, we access the test
650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/89/79/wKiom1gUWonxt4H4AAEUR2zLnmI763.png "title=" 1.png " alt= "Wkiom1guwonxt4h4aaeur2zlnmi763.png"/>
This article from "Thick tak" blog, declined reprint!
Big Data Hadoop Quick Start