Maybe you have seen this message in some places. Facebook has developed a new social inbox that integrates email, instant messaging, text messages, text information, and Facebook site information. Most importantly, they need to store 135 billion pieces of information each month. Where do they store this information? Facebook's Kannan muthukkaruppan provides an amazing answer in "technology behind information": hbase. Hbase beat MySQL, Cassandra, and other options and became Facebook's choice.
Why is this choice amazing? Facebook
Cassandra was created to build an inbox-type application.ProgramBut eventually they found that Cassandra's consistency model does not work well.
Facebook's new real-time information system. In addition, Facebook also has an extended MySQL
Architecture, but they found that when the data set and index become larger, the performance will become unbearable. In addition, they could have developed their own system, but they finally chose hbase.
Hbase is a horizontally scalable table storage system that provides fast and low-level updates for large-scale data. This is exactly the function required by the information system. In addition, hbase is
Column-based Key-value storage system, which is built on bigtabe
Model. Hbase is good at key-based row access and scanning and filtering a series of traveling rows. This is also a function required by the information system. However, it does not support complex queries. Query is usually handed over
Analysis Tool processing, such as hive, Facebook created hive to process a data warehouse with a capacity of up to multiple byte (petabyte. At the same time, hive
HDFS is a hadoop-based file system, while hbase uses this file system.
Facebook chose hbase because they monitored their applications and understood what they really needed. What they need is a data mode that can process the following two types of data:
1. A group of temporary data that frequently changes;
2. A group of increasing but infrequently accessed data.
This makes sense. You can only view emails in the current inbox once, and then you will seldom read these emails. These two similar data types are so different, so some people may want to use
Different systems. However, it is obvious that hbase can well process these two types of data. It is unclear how they handle regular search functions, because this is not hbase
However, hbase can integrate multiple search systems.
Key points of the Facebook system:
● Hbase:
○ A more concise consistency model than Cassandra.
○ Excellent scalability and processing capability for their data models.
○ Most functions can meet their needs: Automatic Load Balancing and failover, compression support, and multi-fragment of a single server.
○ HDFS used by hbase supports replication, end-to-end checksum, and automatic rebalancing.
○ Facebook's operation team has rich experience in using HDFS, because Facebook is a big user of hadoop, and hadoop uses HDFS as its distributed file system.
● Haystack is used to store attachments.
● From scratch, you can write custom application servers to meet the needs of a large amount of inbound information from multiple different sources.
● The user Discovery Service is built on zookeeper.
● Architecture services can be accessed for the following features: email account verification, friend relationship, privacy decision making, and sending decision-making (send a message through chat tools or text messages ?)
● Maintain the consistent style of small teams to do big things. 15 engineers released 20 new architecture services within one year.
● Facebook will not standardize a single database platform. They will use different platforms for different tasks.
Facebook's selection of hbase will greatly promote the adoption of the system, while Facebook has a wide range of HDFS/hadoop/hive
Experience. When you think of this, you will be excited to fall asleep. This is the dream of any product: to become a partner of another very popular product and to be part of its ecosystem. This is exactly
The success of hbase. Hbase
We have achieved remarkable results in many aspects: Real-time, distribution, linear expansion, robustness, bigdata, open source, key-value, and column-facing. We will see hbase
It has become more popular, especially since it has gained the favor and favor of Facebook.
Hbase is a distributed, column-oriented open source database. This technology comes from Chang et
The Google paper "bigtable: a distributed storage system for structured data" written by Al ". Just as bigtable uses the Google File System (File
Like distributed data storage, hbase provides bigtable-like capabilities on top of hadoop. Hbase is Apache
A subproject of the hadoop project. Hbase is a database suitable for storing unstructured data. Another difference is that hbase is column-based instead of Row-based.
. Hbase uses the same data model as bigtable. The user stores data rows in a table. A Data row has an selectable key and any number of columns. Tables are loose Storage
Therefore, you can define different columns for rows. Hbase is mainly used to read and write your big data in real time, which requires random access ).
Hbase Architecture