Facebook real-time information system: HBase stores 135 billion pieces of information each month

Source: Internet
Author: User
Tags cassandra email account

BKJIA: You may have seen this message in some places. Facebook has developed a new social inbox, integrates email, instant messaging, text messages, text information, and Facebook site information. Most importantly, they need to store 135 billion pieces of information each month. Where do they store this information? Facebook's Kannan Muthukkaruppan provides an amazing answer in "technology behind information": HBase. HBase beat MySQL, Cassandra, and other options and became Facebook's choice.

BKJIA recommends Facebook database tool Flashcache to you

Why is this choice amazing? Facebook created Cassandra to build an inbox-type application, but eventually they found that Cassandra's consistency model was not well suited to Facebook's new real-time information system. In addition, Facebook also has an extended MySQL architecture, but they found that when data sets and indexes increase, performance will become intolerable. In addition, they could have developed their own system, but they finally chose HBase.

HBase is a horizontally scalable table storage system that provides fast and low-level updates for large-scale data. This is exactly the function required by the information system. In addition, HBase is a column-based Key-value storage system built on the BigTabe model. HBase is good at key-based row access and scanning and filtering a series of traveling rows. This is also a function required by the information system. However, it does not support complex queries. Queries are usually handed over to analysis tools. For example, Hive and Facebook have created Hive to process data warehouses with a capacity of up to petabyte. At the same time, Hive is a Hadoop-based file system HDFS, while HBase uses this file system.

Facebook chose HBase because they monitored their applications and understood what they really needed. What they need is a data mode that can process the following two types of data:

1. A group of temporary data that frequently changes;

2. A group of increasing but infrequently accessed data.

This makes sense. You can only view emails in the current inbox once, and then you will seldom read these emails. These two similar data types are so different that someone may be thinking about using two different systems. However, it is obvious that HBase can well process these two types of data. It is unclear how they handle regular search functions, because this is not the advantage of HBase. However, HBase can integrate multiple search systems.

Key points of the Facebook system:

● HBase:

○ A more concise consistency model than Cassandra.

○ Excellent scalability and processing capability for their data models.

○ Most functions can meet their needs: Automatic Load Balancing and failover, compression support, and multi-fragment of a single server.

○ HDFS used by HBase supports replication, end-to-end checksum, and automatic rebalancing.

○ Facebook's operation team has rich experience in using HDFS, because Facebook is a big user of Hadoop, and Hadoop uses HDFS as its distributed file system.

● Haystack is used to store attachments.

● From scratch, you can write custom application servers to meet the needs of a large amount of inbound information from multiple different sources.

● The user discovery service is built on Zookeeper.

● Architecture services can be accessed for the following functions: email account verification, friend relationship, privacy decision-making, and sending decision-making. Can a message be sent through chat tools or text messages ?)

● Maintain the consistent style of small teams to do big things. 15 engineers released 20 new architecture services within one year.

● Facebook will not standardize a single database platform. They will use different platforms for different tasks.

Facebook's selection of HBase will greatly promote the adoption of the system, while Facebook has rich experience in HDFS/Hadoop/Hive. When you think of this, you will be excited to fall asleep. This is the dream of any product: to become a partner of another very popular product and to be part of its ecosystem. This is exactly the success of HBase. HBase has achieved better performance in many aspects: Real-time, distribution, linear expansion, robustness, BigData, open source, key-value, and column-facing. We will see that HBase has become more popular, in particular, it has been favored by Facebook.

Original article title: Facebook's New Real-Time Messaging System: HBase To Store 135 + Billion Messages A Month

Reading

HBase is a distributed, column-oriented open source database. This technology comes from the Google paper "Bigtable: a distributed storage system for structured data" written by Chang et al ". Just as Bigtable uses the distributed data storage provided by the Google File System), HBase provides Bigtable-like capabilities on Hadoop. HBase is a subproject of Apache Hadoop project. HBase is a database suitable for storing unstructured data. Another difference is that HBase is column-based instead of Row-based. HBase uses the same data model as Bigtable. The user stores data rows in a table. A Data row has an selectable key and any number of columns. Tables are loosely stored, so you can define different columns for rows. HBase is mainly used to read and write Big Data in real time, which requires random access ).

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.