HDFS of hadoop

Source: Internet
Author: User
Tags hadoop ecosystem

HDFS is one of our common components in big data. HDFS is an indispensable framework in the hadoop ecosystem. Therefore, when we enter hadoop, we must have a certain understanding of it.

First, we all know that HDFS is a Distributed File System in the hadoop ecosystem. It stores massive data in our big data,

It is precisely because of the release of Google's paper that we will produce HDFS, along with the advent of the big data era.

Next we will introduce the major components (excluding the HA mode), ① namenode (saving metadata, keeping heartbeat with datanode, and establishing communication with the client, now we have to talk about the communication method here. We all know that TCP is a common communication method in Java, and RPC (Remote transitional call) is introduced in hadoop ), is to maintain sessions between our nodes. ② Datanode (where data is actually stored, we usually set a multi-copy mechanism (the number of copies is 3 the best), ③ secondrynamenode (here we need to remember that it is not a backup of namenode, instead, the editlog and fsimage are merged regularly (namenode defaults to 6 hours or 1 million operations), and then refreshed to the image in our namenode, that is, fsimage)

HDFS writing process (not to mention nonsense, directly !!!)

Problem 1: node3 is dead, blk1 will continue to upload, and it will not be received after a certain number of Heartbeat times. At this time, namenode will clear the metadata of DD3. If it has not been deleted, after node3 restarts, namenode notifies the datanode next to him to copy a copy of data to him. If it is replaced, it cannot be used again.

Problem 2 multiple communication connections, and then cannot connect, it will be considered not to be connected

The problem 3 was tried multiple times and finally the job failed.

Problem 4: The job fails to be Submitted multiple times and the connection is considered as failed.

Problem 5 job failure

 

HDFS of hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.