Big Data Hadoop Quick Start

Source: Internet
Author: User

1. Overview of Hadoop ecology

Hadoop is a distributed system integration architecture developed by the Apache Foundation that allows users to develop distributed programs without knowing the underlying details of the distribution, leveraging the power of the cluster for high-speed computing and storage, with reliable, efficient and scalable features

The core of Hadoop is yarn,hdfs,mapreduce, the Common module architecture is as follows

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M00/89/76/wKioL1gUT8nCycQjAAKOnd4EMEo166.png "title=" 1.png " alt= "Wkiol1gut8ncycqjaakond4emeo166.png"/>


2. HDFS

GFS paper from Google, published in October 2013, is a clone of GFs, HDFs is the foundation of data storage management in Hadoop, a highly fault-tolerant system capable of detecting and responding to hardware failures

HDFs simplifies the file consistency model with streaming data access, providing high-throughput application data access capabilities for applications with large datasets, providing a mechanism to write multiple reads at once, data in blocks, distributed across different physical machines in the cluster


3. Mapreduce

A mapreduce paper from Google, used to calculate large amounts of data, shielding the details of the distributed computing framework and abstracting the calculations into the map and reduce two parts


4. HBASE (Distributed Columnstore database)

BigTable paper from Google, a column-oriented, structured data-scalable, highly reliable, high-performance distributed and column-oriented dynamic schema database based on HDFs


5, Zookeeper

Solve the problem of data management in distributed environment, unified naming, state synchronization, cluster management, configuration synchronization, etc.


6. HIVE

Open source from Facebook, defines a SQL-like query language that transforms SQL into a mapreduce task executed on Hadoop


7, Flume

Log Collection Tool


8. Yarn Distributed Resource Manager

Is the next generation of MapReduce, mainly to solve the original Hadoop scalability is poor, do not support a variety of computing framework, the architecture is as follows

650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M01/89/79/wKiom1gUUtfyL4A3AAIHT-X65D8457.png "title=" 1.png " alt= "Wkiom1guutfyl4a3aaiht-x65d8457.png"/>9, Spark

Spark provides a faster and more general-purpose data processing platform, and spark allows your program to run in-memory compared to Hadoop


10, Kafka

Distributed Message Queuing, primarily for handling active streaming data


11. Hadoop Pseudo-distributed deployment

Currently, there are three main versions of Hadoop that are not charged, and are foreign manufacturers, respectively

1. Apache Original version

2, CDH version, for domestic users, the vast majority of select this version

3. HDP version


Here we choose CDH version hadoop-2.6.0-cdh5.8.2.tar.gz, environment is CENTOS7.1,JDK need 1.7.0_55 above


[Email protected] ~]# Useradd Hadoop


My system comes with the following Java environment by default

[[Email protected] ~]# ll /usr/lib/jvm/total 12lrwxrwxrwx. 1 root root    26 oct 27 22:48 java -> /etc/alternatives/java_ sdklrwxrwxrwx. 1 root root   32 oct 27 22:48 java-1.6.0  -> /etc/alternatives/java_sdk_1.6.0drwxr-xr-x. 7 root root 4096 oct  27 22:48 java-1.6.0-openjdk-1.6.0.34.x86_64lrwxrwxrwx. 1 root root    34 Oct 27 22:48 java-1.6.0-openjdk.x86_64 ->  Java-1.6.0-openjdk-1.6.0.34.x86_64lrwxrwxrwx. 1 root root   32 oct 27  22:44 java-1.7.0 -> /etc/alternatives/java_sdk_1.7.0lrwxrwxrwx. 1 root  Root   40 oct 27 22:44 java-1.7.0-openjdk -> /etc/alternatives /java_sdk_1.7.0_openjdkdrwxr-xr-x. 8 root roOt 4096 oct 27 22:44 java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64lrwxrwxrwx.  1 root root   32 oct 27 22:44 java-1.8.0 -> / Etc/alternatives/java_sdk_1.8.0lrwxrwxrwx. 1 root root   40 oct 27  22:44 java-1.8.0-openjdk -> /etc/alternatives/java_sdk_1.8.0_openjdkdrwxr-xr-x. 7  root root 4096 oct 27 22:44 java-1.8.0-openjdk-1.8.0.31-2.b13.el7.x86_ 64lrwxrwxrwx. 1 root root   34 oct 27 22:48 java-openjdk  -> /etc/alternatives/java_sdk_openjdklrwxrwxrwx. 1 root root   21  Oct 27 22:44 jre -> /etc/alternatives/jrelrwxrwxrwx. 1 root  Root   27 oct 27 22:44 jre-1.6.0 -> /etc/alternatives/jre_ 1.6.0lrwxrwxrwx. 1 root root   38 oct 27 22:44 jre-1.6.0-openjdk.x86_64 ->  Java-1.6.0-openjdk-1.6.0.34.x86_64/jrelrwxrwxrwx. 1 root root   27 oct  27 22:44 jre-1.7.0 -> /etc/alternatives/jre_1.7.0lrwxrwxrwx. 1 root  root   35 oct 27 22:44 jre-1.7.0-openjdk -> /etc/ Alternatives/jre_1.7.0_openjdklrwxrwxrwx. 1 root root   52 oct 27  22:44 jre-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64 ->  Java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64/jrelrwxrwxrwx. 1 root root   27  oct 27 22:44 jre-1.8.0 -> /etc/alternatives/jre_1.8.0lrwxrwxrwx. 1  root root   35 oct 27 22:44 jre-1.8.0-openjdk -> / Etc/alternatives/jre_1.8.0_openjdklrwxrwxrwx. 1 root root   48 Oct 27 22:44 jre-1.8.0-openjdk-1.8.0.31-2.b13.el7.x86_64 ->  java-1.8.0-openjdk-1.8.0.31-2.b13.el7.x86_64/jrelrwxrwxrwx. 1 root root    29 oct 27 22:44 jre-openjdk -> /etc/alternatives/jre_openjdk

[[email protected] ~]# CAT/HOME/HADOOP/.BASHRC Add the following environment variables

Export Java_home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64export CLASSPATH=.: $JAVA _home/jre/ Lib/rt.jar: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $PATH: $JAVA _home/binexport hadoop_prefix=/ Opt/hadoop/currentexport Hadoop_mapred_home=${hadoop_prefix}export Hadoop_common_home=${hadoop_prefix}export Hadoop_hdfs_home=${hadoop_prefix}export Hadoop_yarn_home=${hadoop_prefix}export HTTPS_CATALINA_HOME=${HADOOP_ Prefix}/share/hadoop/httpfs/tomcatexport Hadoop_conf_dir=/etc/hadoop/confexport YARN_CONF_DIR=/etc/hadoop/ Confexport https_config=/etc/hadoop/confexport path= $PATH: $HADOOP _prefix/bin: $HADOOP _prefix/sbin


We installed Hadoop under the/opt/hadoop directory, creating the following soft connection, the configuration file is placed under the/etc/hadoop/conf directory

[email protected] hadoop]# ll current

lrwxrwxrwx 1 root root Oct 11:02 current--hadoop-2.6.0-cdh5.8.2


Do the following authorization

[Email protected] hadoop]# chown-r hadoop.hadoop hadoop-2.6.0-cdh5.8.2

[Email protected] hadoop]# chown-r hadoop.hadoop/etc/hadoop/conf


CDH5 new version of the HADOOP boot service footstep is located under the $hadoop_home/sbin directory, the startup service has the following

Namenode

Secondarynamenode

Datanode

Resourcemanger

NodeManager

This is where Hadoop users manage and start Hadoop services


[Email protected] etc]# cd/etc/hadoop/conf/

[Email protected] conf]# vim Core-site.xml

<configuration><property> <name>fs.defaultFS</name> <value>hdfs://hadoop1</ Value></property></configuration> formatted Namenode[[email protected] conf]# cd/opt/hadoop/current/bin[[ Email protected] bin]# HDFs namenode-format start namenode service [[email protected] bin]# Cd/opt/hadoop/current/sbin/[[email Protected] sbin]#/hadoop-daemon.sh start namenode[[email protected] sbin]$./hadoop-daemon.sh start Datanode


View service startup status

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/89/77/wKioL1gUV3WC1ynsAAAUH-zmTVk706.png "title=" 1.png " alt= "Wkiol1guv3wc1ynsaaauh-zmtvk706.png"/>


After the Namenode boot is complete, you can view the status through the Web interface, the default port is 50070, we access the test

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/89/79/wKiom1gUWonxt4H4AAEUR2zLnmI763.png "title=" 1.png " alt= "Wkiom1guwonxt4h4aaeur2zlnmi763.png"/>




This article from "Thick tak" blog, declined reprint!

Big Data Hadoop Quick Start

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.