Why does Facebook choose cloud computing open-source hadoop?

Source: Internet
Author: User

Some time ago, Facebook's new message system was released. Facebook's successful use of hbase also led to the emergence of many hbase cases. The following is the hadoop series published by Facebook's hadoop engineer dhruba borthakur.ArticleThis section describes the reasons why Facebook chose hadoop and hbase.

Dhruba
Borthakur first summarizes the advantages of hadoop and hbase. He believes that hbase is highly horizontally scalable. For a small data storage scenario like Facebook, resizing is almost
Hbase makes data resizing easy. It also supports high write throughput. Facebook has a huge amount of message data and a large amount of writing every day. At the same time, in the same data center
Ensure strong consistency. Facebook uses hbase to store message data, and the business needs a consistent data storage (which is not used by Facebook
Cassandra ). Hbase also has good random read performance. The business logic of the message system causes many random read operations that penetrate the cache layer.

Due to the large amount of data, there may be a lot of distributed machines, and frequent failures or routine upgrades may occur. Therefore, high availability and fault recoverability are also extremely important. Error isolation refers to 1.
Node errors do not affect other nodes. Disk faults only affect small-scale data. It also provides atomic read-Modify-write operations. Atomicity
Increment or modified operations after comparison are very convenient for processing many businesses. Finally, it provides the ability to obtain data within a certain range. For example, a function like getting someone's latest 100 messages,
It is also a common requirement in the message system.

Of course, the following aspects of hadoop and hbase are also worth mentioning. First, Disaster Tolerance is achieved under the same data center network split. A problem occurs in the network of the same data center,
As a result, communication between nodes fails. In this case, you can usually configure some spare network devices to avoid this. Second, a data center failure does not affect services. This situation is rare. Finally
Real-time data exchange between data centers. This is not realistic. Generally, the cache layer is used to implement real-time access to unprovoked data.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.