Introduction to distributed open-source concurrent framework Hadoop

Source: Internet
Author: User
When the system is running on the current network, many error logs fail to be analyzed in a timely manner, resulting in system problems always discovered by users, and then we are told to solve them, instead of actively solving system problems, therefore, if you want to create a log analysis system in your spare time, you can analyze the logs thrown by the application layer and message transmission module to locate the problem easily. In multi-core C

When the system is running on the current network, many error logs fail to be analyzed in a timely manner, resulting in system problems always discovered by users, and then we are told to solve them, instead of actively solving system problems, therefore, if you want to create a log analysis system in your spare time, you can analyze the logs thrown by the application layer and message transmission module to locate the problem easily. In multi-core C

When the system is running on the current network, many error logs fail to be analyzed in a timely manner, resulting in system problems always discovered by users, and then we are told to solve them, instead of actively solving system problems, therefore, if you want to create a log analysis system in your spare time, you can analyze the logs thrown by the application layer and message transmission module to locate the problem easily. In the era of multi-core CPU, concurrent programming is a trend. To make better use of the 4-core CPU in the current network and test environment, we need to first study the Distributed Concurrency framework Hadoop.

What is Hadoop?

Hadoop is a distributed parallel computing framework under apache and a framework extracted from lunece. Hadoop's core design philosophy is: MapReduce and HDFS. MapReduce is a software architecture proposed by Google and is used for Parallel Computing of large-scale datasets (larger than 1 TB. Concepts such as Map and Reduce are borrowed from functional programming languages and features borrowed from Vector programming languages; HDFS is short for Hadoop Distributed File System. It is a Hadoop Distributed File System that provides underlying support for Distributed Computing and storage. Note: MapReduce (google mapreduce paper click)Here), GFS (Google File System) and bigtable are three core technologies of google.

HadoopMapReduce Introduction

Map and reduce are processed separately. map Splits a task into multiple tasks for execution, and reduce aggregates multiple tasks to get the desired results. Splitting a list into multiple threads and placing them in the thread pool starts multiple threads to calculate the value of the list. Then, merging the results returned by multiple tasks into a total result is actually a simple MapReduce application.

In the official Hadoop documentation (Click here), we introduced the three steps of HadoopMapReduce, map (mainly to break down parallel tasks) and combine (mainly to improve the efficiency of reduce ), reduce (summarize the processed results)

HDFS Introduction

HDFS consists of Client, Datanodes, and Namenode3 (click here for details ). NameNode can be viewed as a manager in a distributed file system. It is mainly responsible for managing File System namespaces, cluster configuration information, and storage block replication. NameNode stores the Meta-data of the file system in the memory. The information mainly includes the file information, the information of the file block corresponding to each file, and the information of each file block in DataNode. DataNode is the basic unit of file storage. It stores blocks in the local file system, stores Block Meta-data, and periodically sends all existing Block information to NameNode. The Client is the application that needs to obtain distributed file system files. Here, we use three operations to describe the interaction between them (Introduction to distributed computing open-source framework Hadoop at the beginning of via 文)

Hadoop is very suitable for analyzing massive data. If the error log in our system is at the GB level, it is estimated that the problem has long been detected by users. Therefore, the analysis of such logs only draws on the idea of Hadoop, and the specific implementation should be based on concurrent. Finally, we shared a PPT copy of Doug Cutting in!

Hadoop

View more documents from longhao.

Read more:

1: Introduction to distributed computing open-source framework Hadoop

2: Hadoop cluster configuration and usage skills

3: Hadoop basic process and application development

4: distributed parallel programming with Hadoop

Original article address: Introduction to the distributed open-source concurrency framework Hadoop. Thank you for sharing it with me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.