How Hadoop works

Source: Internet
Author: User

Hadoop is a software framework that enables distributed processing of large amounts of data, a technology that is becoming more and more popular, a programmer will have, and a skill to master ...

First of all, let's talk about what is Hadoop, I believe the programmer is not unfamiliar with this, Hadoop is known as big Data processing, or distributed file storage and computing systems. Do not say anything else, for example, we have used the network disk bar, such as the current hot Baidu network disk, now Baidu is to give you free allocation of 2T network disk space, 2048 g ah, or not a small space, presumably ordinary people's computers have not reached 2T of hard disk storage space it, So Baidu is how to dare to allocate 2T of hard disk space for everyone, which is also due to the construction of Hadoop cluster Server. Then Baidu is where so many hard disk space to give you two T storage space???

First of all, the allocation of two tons of storage space for each person, not just a 2T hard disk on the server shelf to write your name waiting for you to upload files for you to store. Instead of putting a lot of hard drives on the server shelves and assigning 2T of space to each of you, you don't have to use 2T, so smart engineers use Hadoop clusters to dynamically allocate storage for everyone, and everyone assigns you the data you can upload up to 2 T, If the server's hard disk is not enough, then add a few more servers, and then put into the Hadoop cluster.

Well, now to each user dynamically assigned two T space, everyone can upload data to their own network, then everyone upload data This is no problem, if a lot of people upload is the same data, it is too occupied the server's hard disk resources, hard disk resources are still very valuable. So how to solve the same file upload problem? Smart engineers also think of a new approach, that is, when you upload files to file detection, check to see if the server has this file, using a detection method to detect the file you want to upload the server is there, if there is, directly to the server's file point to the file you want to upload, Sometimes you feel good hundreds of trillion files can be uploaded all at once, do not be happy too early, not your Internet connection is good, but other people on the server just happen to have a copy of the same file. The clever engineer has a nice name for uploading, called the second pass.

Well, that's how Hadoop works in general.

How Hadoop works (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.