When the system is running on the current network, many error logs fail to be analyzed in a timely manner, resulting in system problems always discovered by users, and then we are told to solve them, instead of actively solving system problems, therefore, if you want to create a log analysis system in your spare time, you can analyze the logs thrown by the application layer and message transmission module to locate the problem easily. In multi-core C
When the system is running on the current network, many error logs fail to be analyzed in a timely manner, resulting in system problems always discovered by users, and then we are told to solve them, instead of actively solving system problems, therefore, if you want to create a log analysis system in your spare time, you can analyze the logs thrown by the application layer and message transmission module to locate the problem easily. In multi-core C
When the system is running on the current network, many error logs fail to be analyzed in a timely manner, resulting in system problems always discovered by users, and then we are told to solve them, instead of actively solving system problems, therefore, if you want to create a log analysis system in your spare time, you can analyze the logs thrown by the application layer and message transmission module to locate the problem easily. In the era of multi-core CPU, concurrent programming is a trend. To make better use of the 4-core CPU in the current network and test environment, we need to first study the Distributed Concurrency framework Hadoop.
What is Hadoop?
Hadoop is a distributed parallel computing framework under apache and a framework extracted from lunece. Hadoop's core design philosophy is: MapReduce and HDFS. MapReduce is a software architecture proposed by Google and is used for Parallel Computing of large-scale datasets (larger than 1 TB. Concepts such as Map and Reduce are borrowed from functional programming languages and features borrowed from Vector programming languages; HDFS is short for Hadoop Distributed File System. It is a Hadoop Distributed File System that provides underlying support for Distributed Computing and storage. Note: MapReduce (google mapreduce paper click)Here), GFS (Google File System) and bigtable are three core technologies of google.
HadoopMapReduce Introduction
Map and reduce are processed separately. map Splits a task into multiple tasks for execution, and reduce aggregates multiple tasks to get the desired results. Splitting a list into multiple threads and placing them in the thread pool starts multiple threads to calculate the value of the list. Then, merging the results returned by multiple tasks into a total result is actually a simple MapReduce application.
In the official Hadoop documentation (Click here), we introduced the three steps of HadoopMapReduce, map (mainly to break down parallel tasks) and combine (mainly to improve the efficiency of reduce ), reduce (summarize the processed results)
HDFS Introduction
HDFS consists of Client, Datanodes, and Namenode3 (click here for details ). NameNode can be viewed as a manager in a distributed file system. It is mainly responsible for managing File System namespaces, cluster configuration information, and storage block replication. NameNode stores the Meta-data of the file system in the memory. The information mainly includes the file information, the information of the file block corresponding to each file, and the information of each file block in DataNode. DataNode is the basic unit of file storage. It stores blocks in the local file system, stores Block Meta-data, and periodically sends all existing Block information to NameNode. The Client is the application that needs to obtain distributed file system files. Here, we use three operations to describe the interaction between them (Introduction to distributed computing open-source framework Hadoop at the beginning of via 文)
Hadoop is very suitable for analyzing massive data. If the error log in our system is at the GB level, it is estimated that the problem has long been detected by users. Therefore, the analysis of such logs only draws on the idea of Hadoop, and the specific implementation should be based on concurrent. Finally, we shared a PPT copy of Doug Cutting in!
Hadoop
View more documents from longhao.
Read more:
1: Introduction to distributed computing open-source framework Hadoop
2: Hadoop cluster configuration and usage skills
3: Hadoop basic process and application development
4: distributed parallel programming with Hadoop
Original article address: Introduction to the distributed open-source concurrency framework Hadoop. Thank you for sharing it with me.