Big Data Hadoop Quick Start

Last Update:2016-10-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Overview of Hadoop ecology

Hadoop is a distributed system integration architecture developed by the Apache Foundation that allows users to develop distributed programs without knowing the underlying details of the distribution, leveraging the power of the cluster for high-speed computing and storage, with reliable, efficient and scalable features

The core of Hadoop is yarn,hdfs,mapreduce, the Common module architecture is as follows

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M00/89/76/wKioL1gUT8nCycQjAAKOnd4EMEo166.png "title=" 1.png " alt= "Wkiol1gut8ncycqjaakond4emeo166.png"/>

2. HDFS

GFS paper from Google, published in October 2013, is a clone of GFs, HDFs is the foundation of data storage management in Hadoop, a highly fault-tolerant system capable of detecting and responding to hardware failures

HDFs simplifies the file consistency model with streaming data access, providing high-throughput application data access capabilities for applications with large datasets, providing a mechanism to write multiple reads at once, data in blocks, distributed across different physical machines in the cluster

3. Mapreduce

A mapreduce paper from Google, used to calculate large amounts of data, shielding the details of the distributed computing framework and abstracting the calculations into the map and reduce two parts

4. HBASE (Distributed Columnstore database)

BigTable paper from Google, a column-oriented, structured data-scalable, highly reliable, high-performance distributed and column-oriented dynamic schema database based on HDFs

5, Zookeeper

Solve the problem of data management in distributed environment, unified naming, state synchronization, cluster management, configuration synchronization, etc.

6. HIVE

Open source from Facebook, defines a SQL-like query language that transforms SQL into a mapreduce task executed on Hadoop

7, Flume

Log Collection Tool

8. Yarn Distributed Resource Manager

Is the next generation of MapReduce, mainly to solve the original Hadoop scalability is poor, do not support a variety of computing framework, the architecture is as follows

650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M01/89/79/wKiom1gUUtfyL4A3AAIHT-X65D8457.png "title=" 1.png " alt= "Wkiom1guutfyl4a3aaiht-x65d8457.png"/>9, Spark

Spark provides a faster and more general-purpose data processing platform, and spark allows your program to run in-memory compared to Hadoop

10, Kafka

Distributed Message Queuing, primarily for handling active streaming data

11. Hadoop Pseudo-distributed deployment

Currently, there are three main versions of Hadoop that are not charged, and are foreign manufacturers, respectively

1. Apache Original version

2, CDH version, for domestic users, the vast majority of select this version

3. HDP version

Here we choose CDH version hadoop-2.6.0-cdh5.8.2.tar.gz, environment is CENTOS7.1,JDK need 1.7.0_55 above

[Email protected] ~]# Useradd Hadoop

My system comes with the following Java environment by default

[[Email protected] ~]# ll /usr/lib/jvm/total 12lrwxrwxrwx. 1 root root    26 oct 27 22:48 java -> /etc/alternatives/java_ sdklrwxrwxrwx. 1 root root   32 oct 27 22:48 java-1.6.0  -> /etc/alternatives/java_sdk_1.6.0drwxr-xr-x. 7 root root 4096 oct  27 22:48 java-1.6.0-openjdk-1.6.0.34.x86_64lrwxrwxrwx. 1 root root    34 Oct 27 22:48 java-1.6.0-openjdk.x86_64 ->  Java-1.6.0-openjdk-1.6.0.34.x86_64lrwxrwxrwx. 1 root root   32 oct 27  22:44 java-1.7.0 -> /etc/alternatives/java_sdk_1.7.0lrwxrwxrwx. 1 root  Root   40 oct 27 22:44 java-1.7.0-openjdk -> /etc/alternatives /java_sdk_1.7.0_openjdkdrwxr-xr-x. 8 root roOt 4096 oct 27 22:44 java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64lrwxrwxrwx.  1 root root   32 oct 27 22:44 java-1.8.0 -> / Etc/alternatives/java_sdk_1.8.0lrwxrwxrwx. 1 root root   40 oct 27  22:44 java-1.8.0-openjdk -> /etc/alternatives/java_sdk_1.8.0_openjdkdrwxr-xr-x. 7  root root 4096 oct 27 22:44 java-1.8.0-openjdk-1.8.0.31-2.b13.el7.x86_ 64lrwxrwxrwx. 1 root root   34 oct 27 22:48 java-openjdk  -> /etc/alternatives/java_sdk_openjdklrwxrwxrwx. 1 root root   21  Oct 27 22:44 jre -> /etc/alternatives/jrelrwxrwxrwx. 1 root  Root   27 oct 27 22:44 jre-1.6.0 -> /etc/alternatives/jre_ 1.6.0lrwxrwxrwx. 1 root root   38 oct 27 22:44 jre-1.6.0-openjdk.x86_64 ->  Java-1.6.0-openjdk-1.6.0.34.x86_64/jrelrwxrwxrwx. 1 root root   27 oct  27 22:44 jre-1.7.0 -> /etc/alternatives/jre_1.7.0lrwxrwxrwx. 1 root  root   35 oct 27 22:44 jre-1.7.0-openjdk -> /etc/ Alternatives/jre_1.7.0_openjdklrwxrwxrwx. 1 root root   52 oct 27  22:44 jre-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64 ->  Java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64/jrelrwxrwxrwx. 1 root root   27  oct 27 22:44 jre-1.8.0 -> /etc/alternatives/jre_1.8.0lrwxrwxrwx. 1  root root   35 oct 27 22:44 jre-1.8.0-openjdk -> / Etc/alternatives/jre_1.8.0_openjdklrwxrwxrwx. 1 root root   48 Oct 27 22:44 jre-1.8.0-openjdk-1.8.0.31-2.b13.el7.x86_64 ->  java-1.8.0-openjdk-1.8.0.31-2.b13.el7.x86_64/jrelrwxrwxrwx. 1 root root    29 oct 27 22:44 jre-openjdk -> /etc/alternatives/jre_openjdk

[[email protected] ~]# CAT/HOME/HADOOP/.BASHRC Add the following environment variables

Export Java_home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64export CLASSPATH=.: $JAVA _home/jre/ Lib/rt.jar: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $PATH: $JAVA _home/binexport hadoop_prefix=/ Opt/hadoop/currentexport Hadoop_mapred_home=${hadoop_prefix}export Hadoop_common_home=${hadoop_prefix}export Hadoop_hdfs_home=${hadoop_prefix}export Hadoop_yarn_home=${hadoop_prefix}export HTTPS_CATALINA_HOME=${HADOOP_ Prefix}/share/hadoop/httpfs/tomcatexport Hadoop_conf_dir=/etc/hadoop/confexport YARN_CONF_DIR=/etc/hadoop/ Confexport https_config=/etc/hadoop/confexport path= $PATH: $HADOOP _prefix/bin: $HADOOP _prefix/sbin

We installed Hadoop under the/opt/hadoop directory, creating the following soft connection, the configuration file is placed under the/etc/hadoop/conf directory

[email protected] hadoop]# ll current

lrwxrwxrwx 1 root root Oct 11:02 current--hadoop-2.6.0-cdh5.8.2

Do the following authorization

[Email protected] hadoop]# chown-r hadoop.hadoop hadoop-2.6.0-cdh5.8.2

[Email protected] hadoop]# chown-r hadoop.hadoop/etc/hadoop/conf

CDH5 new version of the HADOOP boot service footstep is located under the $hadoop_home/sbin directory, the startup service has the following

Namenode

Secondarynamenode

Datanode

Resourcemanger

NodeManager

This is where Hadoop users manage and start Hadoop services

[Email protected] etc]# cd/etc/hadoop/conf/

[Email protected] conf]# vim Core-site.xml

<configuration><property> <name>fs.defaultFS</name> <value>hdfs://hadoop1</ Value></property></configuration> formatted Namenode[[email protected] conf]# cd/opt/hadoop/current/bin[[ Email protected] bin]# HDFs namenode-format start namenode service [[email protected] bin]# Cd/opt/hadoop/current/sbin/[[email Protected] sbin]#/hadoop-daemon.sh start namenode[[email protected] sbin]$./hadoop-daemon.sh start Datanode

View service startup status

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/89/77/wKioL1gUV3WC1ynsAAAUH-zmTVk706.png "title=" 1.png " alt= "Wkiol1guv3wc1ynsaaauh-zmtvk706.png"/>

After the Namenode boot is complete, you can view the status through the Web interface, the default port is 50070, we access the test

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/89/79/wKiom1gUWonxt4H4AAEUR2zLnmI763.png "title=" 1.png " alt= "Wkiom1guwonxt4h4aaeur2zlnmi763.png"/>

This article from "Thick tak" blog, declined reprint!

Big Data Hadoop Quick Start

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Big Data Hadoop Quick Start

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Big Data Hadoop Quick Start

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support