Introduction to distributed open-source concurrent framework Hadoop

Last Update:2018-06-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When the system is running on the current network, many error logs fail to be analyzed in a timely manner, resulting in system problems always discovered by users, and then we are told to solve them, instead of actively solving system problems, therefore, if you want to create a log analysis system in your spare time, you can analyze the logs thrown by the application layer and message transmission module to locate the problem easily. In the era of multi-core CPU, concurrent programming is a trend. To make better use of the 4-core CPU in the current network and test environment, we need to first study the Distributed Concurrency framework Hadoop.

What is Hadoop?

Hadoop is a distributed parallel computing framework under apache and a framework extracted from lunece. Hadoop's core design philosophy is: MapReduce and HDFS. MapReduce is a software architecture proposed by Google and is used for Parallel Computing of large-scale datasets (larger than 1 TB. Concepts such as Map and Reduce are borrowed from functional programming languages and features borrowed from Vector programming languages; HDFS is short for Hadoop Distributed File System. It is a Hadoop Distributed File System that provides underlying support for Distributed Computing and storage. Note: MapReduce (google mapreduce paper click)Here), GFS (Google File System) and bigtable are three core technologies of google.

HadoopMapReduce Introduction

Map and reduce are processed separately. map Splits a task into multiple tasks for execution, and reduce aggregates multiple tasks to get the desired results. Splitting a list into multiple threads and placing them in the thread pool starts multiple threads to calculate the value of the list. Then, merging the results returned by multiple tasks into a total result is actually a simple MapReduce application.

In the official Hadoop documentation (Click here), we introduced the three steps of HadoopMapReduce, map (mainly to break down parallel tasks) and combine (mainly to improve the efficiency of reduce ), reduce (summarize the processed results)

HDFS Introduction

HDFS consists of Client, Datanodes, and Namenode3 (click here for details ). NameNode can be viewed as a manager in a distributed file system. It is mainly responsible for managing File System namespaces, cluster configuration information, and storage block replication. NameNode stores the Meta-data of the file system in the memory. The information mainly includes the file information, the information of the file block corresponding to each file, and the information of each file block in DataNode. DataNode is the basic unit of file storage. It stores blocks in the local file system, stores Block Meta-data, and periodically sends all existing Block information to NameNode. The Client is the application that needs to obtain distributed file system files. Here, we use three operations to describe the interaction between them (Introduction to distributed computing open-source framework Hadoop at the beginning of via 文)

Hadoop is very suitable for analyzing massive data. If the error log in our system is at the GB level, it is estimated that the problem has long been detected by users. Therefore, the analysis of such logs only draws on the idea of Hadoop, and the specific implementation should be based on concurrent. Finally, we shared a PPT copy of Doug Cutting in!

Hadoop

View more documents from longhao.

Read more:

1: Introduction to distributed computing open-source framework Hadoop

2: Hadoop cluster configuration and usage skills

3: Hadoop basic process and application development

4: distributed parallel programming with Hadoop

Original article address: Introduction to the distributed open-source concurrency framework Hadoop. Thank you for sharing it with me.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to distributed open-source concurrent framework Hadoop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to distributed open-source concurrent framework Hadoop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support