Big Data graph database-TAO Database

Source: Internet
Author: User
Excerpted from "Big Data daily report: Architecture and algorithm" Chapter 14 14.1.2TAO graph database Facebook is currently the world's most famous social networking site. From the perspective of data abstraction, facebook's social graphs include not only the relationship between friends, but also the relationship between people and entities and between entities. Each user, every page, every image, and every

Excerpted from "Big Data daily report: Architecture and algorithm" Chapter 14 14.1.2 TAO graph database Facebook is currently the world's most famous social networking site. From the perspective of data abstraction, facebook's social graphs include not only the relationship between friends, but also the relationship between people and entities and between entities. Each user, every page, every image, and every


Excerpted from Chapter 14 big data day: Architecture and Algorithm


14.1.2 TAO graph database

Facebook is currently the most famous social networking site in the world. From the perspective of data abstraction, Facebook's social graphs not only include the relationship between friends, it also includes the relationship between people and entities and between entities. Each user, page, image, application, location, and comment can be used as an independent entity, if a user prefers a page, the relationship between the user and the page is established. If the user signs in at a location, the relationship between the user and the location is established ...... If we regard each object as a node in the graph and the relationship between entities as directed edges in the graph, then all Facebook data will constitute a giant Entity Graph with over billion edges ). The relationship in the object graph is bidirectional, for example, friend relationship, and some are unidirectional. For example, the user signs in at a certain location. At the same time, entities also have their own attributes, such as a user who graduated from Stanford University and was born in 1988. These are the attributes of user entities. 4-2 is a schematic part of the Facebook entity diagram.


4-2 Facebook entity diagram (Fbid is the unique ID number within Facebook)

Facebook stores all entities, their attributes, and Object-link data in the TAO graph database. TAO provides services for data read/write requests on website pages. TAO is a distributed graph database that uses "eventual consistency" of data across data centers. It consists of thousands of servers distributed across multiple data centers. in order to respond to application requests in real time, at the cost of High Consistency, TAO's system architecture focuses more on high availability and low latency, especially the optimization of read operations, this ensures the efficiency of website page generation at extremely high loads.

TAO encapsulates graph operation-related data access APIs for the client, so that the client can not only access entities and their attributes, but also easily access object link data. For example, you can provide the following link list Query Interfaces for access to relational data:

(ID, aType)-> [anew ,..., Aold]

ID indicates the unique identifier of an object. aType indicates the link type (friend relationship, etc ), the Link List lists other object IDs that meet the aType type of the ID in chronological order. For example, (I, COMMENT) can list all comments about I.


1. overall architecture of TAO

TAO is a quasi-real-time graph database that spans multiple data centers. Its overall architecture is 14-3. First, TAO combines multiple close data centers into a single partition (Region) to form multiple partitions. The cache in each partition is responsible for storing all entity and relational data. Raw data is stored in the database and cache of a primary partition, while multiple other slave partitions store data copies (this is a unique design method, it is recommended that you take some time to consider the starting point of the design and then read the subsequent content ).


This architecture is designed for the following reasons: the cache structure is a very important part of TAO and plays a huge role in responding to user read requests quickly, while the cache needs to be stored in the memory, if the memory resource cost is low and large enough, it is ideal that each data center stores complete data copies to quickly respond to users' read operations, avoid time-consuming operations such as reading data across data centers. However, considering that the amount of data to be stored is too large (PB-level), each data center stores a complete backup data, which costs too much. Therefore, the next step is to remove it, as a whole, multiple data centers close to each region are used to completely store all backup data. Because the data center is close to each other, the communication efficiency is high, in this way, there is a trade-off between cost and efficiency.

Each partition stores the complete entity and its relational data. TAO's Storage Architecture in the partition can be divided into three layers (see Figure 4-3). The underlying layer is the MySQL database layer, because there is too much data, after the data is divided into tables, several data slices (Shard) are formed. One data segment is stored in a logical relational database, and one server can store multiple data slices. The second layer is the Cache layer that corresponds to the underlying data slice. It is called the main Cache layer (Leader Cache). The master Cache caches the corresponding logical database content and performs read/write communication with the database, the top layer is the Follower Cache. Multiple slave caches correspond to one master Cache and are responsible for caching the content in the master Cache. TAO designs the cache into a second-level structure, which reduces the coupling between caches and facilitates the scalability of the entire system. When the system load increases, as long as the server that adds the storage from the Cache can easily resize the system.


2. TAO read/write operations

The client program can only interact with the outermost layer from the Cache layer and cannot communicate with the primary Cache directly (see Figure 4-4 ). When the client has a data request, it establishes a connection with the latest data from the Cache. If it is a read operation and the data is cached from the Cache, it will return directly. For Internet applications, the proportion of read operations is much larger than that of write operations. Therefore, the Cache can respond to most website loads.

If the slave Cache does not hit the user request (Cache Miss), it will be forwarded to the corresponding master Cache. If the master Cache does not hit either, it will be read from the database by the master Cache, and update the master Cache (marked as A and D in 4-4). Then, send A message to the corresponding slave Cache to load new data from the master Cache.


For read operations, all partitions follow the above logic regardless of the Master/Slave, but for write operations sent by the client, the primary partition and the slave partition behavior are different. For the primary partition, when a write operation request is received from the Cache and transferred to the corresponding primary Cache, the primary Cache is responsible for writing the request to the corresponding logical database. After the database write operation is successful, the primary Cache sends a message to the corresponding slave Cache to inform the original information of failure or request it to be reloaded. For a shard, when a write request is received from the Cache, it is transferred to the primary Cache corresponding to the shard. At this time, the primary Cache is not directly written to the local database, instead, the request is forwarded to the master Cache of the primary partition (4-4 indicates the position where the C is won), which writes data to the primary database.

That is to say, for write operations, whether it is the primary partition or the slave partition, it will be handed over to the primary Cache of the primary partition to update the primary database. After the master database is successfully updated, the master database notifies the slave database of the shard through the message to maintain data consistency, and also notifies the master Cache of the shard, and trigger the master Cache notification to update the Cache content from the slave Cache of the slave partition (see position 4-4 bits ).

Please think about: Why do I read data from the local database when the primary Cache of the partition fails to hit, instead of forwarding data to the primary partition like a write operation? The disadvantage of reading data from a local database is obvious, which may result in inconsistent data. Because the slave database may be outdated data at this time, what is the purpose or benefit of doing so?

Answer: because the probability that data cannot be hit in the Cache is much higher than the number of write operations (about 20 times in Facebook), cross-partition operations are applicable to write operations, the overall efficiency is not significant, but if many read operations adopt the cross-partition method, the read operation efficiency will be greatly reduced. TAO sacrifices data consistency to ensure low latency of read operations.


3. TAO Data Consistency

To give priority to the efficiency of read operations, TAO sacrifices data consistency and adopts eventual consistency instead of strong consistency. When the master database has data change notifications from the database, asynchronous notifications instead of synchronous notifications are adopted, that is, the client's corresponding request can be returned without the need to confirm that the update is complete from the database. Therefore, there is a time difference between the primary database and the data in the slave database. During this period, the expired data may be read from the partition client, but with a small latency, this kind of data change will certainly be reflected in all slave databases, so it follows the final consistency.

Specifically, in most cases, TAO ensures the consistency of data "reading what you write. That is, the client sending the write operation must be able to read the new value after the update instead of the expired data, which is necessary in many cases. For example, if a user deletes a friend, however, if the message sent by this friend is still visible in the message stream, this is intolerable.

How does TAO achieve this? First, if the data update operation occurs in the primary partition, we can see from the above writing process that the consistency of "read what you write" is guaranteed. The tricky situation is to send a Write Request from the partition client. In this case, the request is forwarded from the Cache to the master Cache. The master Cache forwards the write request to the master Cache of the master partition again and writes the request to the master database. After the write is successful, the primary Cache of the shard notifies the shard of updating the Cache value from the Cache. the above operations are completed synchronously, although the shard database may not receive the UPDATE message from the primary database yet, however, the Cache at all levels of the partition has been synchronously updated, and the read requests sent from the partition will certainly be able to read the newly written content from the Cache at all levels. By this means, we can ensure the consistency of reading what you write from the partition.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.