Understanding cloud computing HDFS from another perspective

Source: Internet
Author: User

To learn about cloud computing, you must understand hadoop. HDFS in hadoop (Distributed File System) is a foundation. Next, let's write what I understand about HDFS.

There was a very special village in which there was a very awesome man called "Eldest Brother", and the villagers trusted him, so we will package all kinds of food, farm tools, and so on into a large package for the "Eldest Brother" to store. When it is ready for use, we will go to the eldest brother for help. We can simply think that "Big Brother" is a server.ServerThe villagers are clients.ClientWhile a large package packed by villagers is a huge package.File.

At first, everyone had nothing to do. Later, people from other villages (More clientsHe thinks that he has put so many things in his home, so he finds the "Eldest Brother" and wants to put things here. The "Eldest Brother" is also good, and the visitor does not refuse, but his home is so many places, can't let go what should I do?

So "Eldest Brother" came up with a way to expand his home (Server disk capacity expansion) To solve the problem.

However, "Eldest Brother" is becoming more and more famous, and there are more and more people saving things to him. It is not a way to expand his home endlessly. "Eldest Brother" began to study hard and asked his friends to seek advice. In the end, a friend called "siege Lion" was very important: "Eldest Brother, the problem you are encountering is big data, some people are also called massive data, while others are also called bigdata ". "Is there a way to crack the platform ?" "It is not a way to expand your home location, but you have to think of a 10 thousand plan ". After thinking day and night, they finally came up with a wonderful solution and listened to me slowly.

The eldest brother hired other people. Let's call them "younger siblings". They are "master-slave relationships ". In this way, if the villagers store things to the eldest brother, the eldest brother does not use his home, but stores things in other younger siblings. As the villagers store more and more things, you just need to hire a younger brother.

The above is the 10 thousand strategy that "Eldest Brother" and "siege Lion" think of. They call it HDFS, And the eldest brother gave himself a nickname for the convenience of their younger siblings, called "namenode ", it also gave the younger siblings a nickname: datanode1, datanode2, datanode3 ,....... For the villagers (clients), I need to store things to you and just give it to you. I don't have to worry too much about it, but how can I know if my eldest brother is sad. Let's take a look at how HDFS works well.

Suppose a villager wants to save a large package to his eldest brother. How big is this big package? It is called "6 GB ". Next we will discuss some of the problems and solutions for this "6 GB.

1. How to store such a large object?

He split such a big thing into 64 MB blocks and split it out for the younger siblings (datanode), while the older brother (datanode) brain (memory) which of the following is the partition assigned. In this way, when a villager comes to fetch things, the eldest brother "namenode" will tell the younger brother in which each block of the villagers is in, and then you can directly find the younger brother.

2. I can't simply remember it in my mind. What should I do if I forget it?

During operations such as creation, modification, and deletion of villagers, the eldest brother can not only keep them in his mind, but also prepares two books (files), one is editlog and the other is fsimage. No matter whether the villagers store new things, modify their own things, or delete their own things, the eldest brother "namenode" will write down a record in the editlog.

What if editlog or fsimage is lost? So he found a manager called "secondarynamenode" and helped him save the two files. Remember that although namenode is also named, he is only a "manager", not a "Big Brother ", if the eldest brother dies, the butler cannot take the place of the eldest brother.

3. What should I do if the younger brother who saves data blocks is "disconnected?

(1) how can I know that my younger brother is "hung up?

To ensure that the younger siblings work well, the younger brother (datanode) needs to periodically report the situation to the eldest brother (namenode), which is called "heartbeat signal". If the younger brother receives the signal periodically, it means that the younger brother is okay; if not, it indicates that the younger brother is abnormal.

(2) What is the response policy of Eldest Brother?

The eldest brother divided the large files into 64 MB small pieces and gave them to the younger brother. If the younger brother crashes, the data block will not be lost, and the villagers will not block the door of my house. Therefore, before giving a piece to one of the younger siblings, the eldest brother will first copy two copies called "copies" and store them to other younger siblings. This redundant storage ensures data security.

(3) What are the replicas?

The default three copies of a data block are not provided to any younger brother. Generally, the following requirements are met:

1) The first copy is given to any younger brother (datanode ).

2) the second copy is sent to the neighbor of the first younger brother. (On the same rack)

3) the third copy is sent to the other younger brother, farther away from the first two. (Datanode of different racks)

Note: The bandwidth of two servers in the same rack is larger than that of two servers in different racks.

(4) how to give copies to younger brother?

Copy Replication uses the pipeline replication method. Specifically, when the client saves data, it first writes the data to a temporary file on the local disk. After writing 64 MB, The namenode will tell the client a datanode address, let the client write data to it. The first datanode receives a small part of the data, then copies the received data to the second datanode, and then copies the second datanode to the third, copy in this pipeline.

(5) What if a younger brother crashes?

If one of the younger siblings fails, the number of replicas is now two, and the older brother will create another copy for the other younger siblings, so that the number of replicas is no less than three, everything is safe.

 

Nima, finally solved the problem of massive data storage. The eldest brother hired so many younger siblings and could not only store data, so he had to do something else, so they wanted to help the villagers complete some of their computing tasks. At this time, the eldest brother had a name called "jobtracker ", the younger siblings have a new name called "tasktracker" and form a new team called mapreduce.

Understanding cloud computing HDFS from another perspective

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.