Get a little bit every day------Hadoop overview

Last Update:2015-06-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the Origin of Hadoop

The idea of Hadoop comes from Google in the search engine when there is a big problem is that so many pages how I can be the fastest speed to search, because of this problem Google invented the inverted index algorithm, by adding map-reduce thought to calculate page Rank , through the continuous evolution of Google has given us the GFS, map-reduce, bigtable these three key technologies and ideas. Because Google doesn't have open source code for these technologies. There's a person who mimics Google's implementation of the framework lucene like Google full-text search, which provides the architecture of the full-text search engine, including the full query engine and search engine. Faced with big data, Lucene faces the same difficulties as Google. This allows Lucene's authors to mimic the problems that Google solves by doing a subproject Nutch under the Lucene project. A few years later, Google disclosed some of the details of GFS and MapReduce, and the author has made a formal introduction to the Apache fund as part of the Hadoop,hadoop Nutch as a sub-project of Lucene.

Second, what problems does Hadoop solve?

The progress of Hadoop over time solves several problems:

1. Timely analysis and processing of massive data.

2. Deep analysis and mining of massive data.

3, the long-term preservation of data.

4. Realize cloud computing.

5, can run on thousands of nodes, processing data volume and sequencing time is constantly shortened.

Third, the basic architecture of Hadoop.

3.1 The basic composition of the Hadoop framework.

Hbase:nosql database, Key-value storage, NoSQL database chain storage, data analysis to improve the corresponding speed. Maximize memory utilization.

Hdfs:hadoop distribute file system distributed filesystem maximizes disk utilization

MapReduce: The programming model is primarily used for data analysis to maximize CPU utilization.

Pig: User-to-mapreduce converter.

Hive:sql language to the MapReduce converter.

Zookeeper: Communication between the server node and the process.

Chukwa: Data integration communication.

3.2 Hadoop Framework cluster architecture

The Namenode:hdfs daemon that records how files are partitioned into chunks of data. and to which nodes the data blocks are stored. Centralized management of memory and I/O. is a single point, a failure will cause the cluster to crash.

Secondary Namenode: A secondary daemon that monitors the status of HDFs, has one in each cluster, communicates with Namenode to save HDFs metadata snapshots when Namenode failures can be used as backup Namenode.

Datenode: Each slave server is responsible for reading and writing HDFS data blocks to the local file system.

Jobtracker: A daemon that handles user-submitted code, determines which files are involved in processing, and then cuts the task and assigns nodes. Monitor the task, restart the failed task, and only one jobtracker per cluster is located on the master node.

Iv. Summary.

The advent of Hadoop solved our big data analysis and mining, but also greatly reduced the cost, not to buy a very powerful server, as long as a PC we can hang it to the Hadoop node can make it for our big data analysis and mining to contribute. Hadoop also solves our storage problem with big data, so we don't have to worry about the bottlenecks that big data poses to disk i/0 operations.

Welcome you to discuss the exchange: qq:747861092

QQ Group:163354117 (group name:codeforfuture)

Get a little bit every day------Hadoop overview

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Get a little bit every day------Hadoop overview

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Get a little bit every day------Hadoop overview

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support