Tao Database of Big Data graph database

Source: Internet
Author: User
Tags unique id


Excerpt from "Big Data Day: Architecture and Algorithms" Chapter 14


14.1.2 Tao Graph Database

Facebook is now the world's most famous social networking site, and if viewed from a data abstraction perspective, Facebook's social graph includes not only relationships between friends, but also relationships between people and entities and entities and entities, each user, every page, every picture, every app, Each location and each comment can be used as a separate entity, and the user likes a page to establish a relationship between the user and the page, and the user to sign in at a location establishes the relationship between the user and the location ... If you treat each entity as a node in a diagram, the relationship between entities is considered to be a forward edge in the graph, and all of Facebook's data will form a giant entity graph that exceeds the Ching edge. Some of the relationships in the entity diagram are two-way, for example, friends, others are one-way, such as users checking in at a certain location. At the same time, the entity also has its own attributes, such as a user graduated from Stanford University, was born in 1988, these are the attributes of the user entity. Figure 14-2 is a schematic fragment of the Facebook entity graph.


Figure 14-2 Facebook Entity graph (Fbid is the unique ID number within Facebook)

Facebook keeps all entities and their attributes, entity relationship data in the Tao Graph database, and the data read and write requests for Web pages are serviced by Tao. Tao is a cross-data center distributed graph database with "final consistency" of data, consisting of thousands of servers distributed in multiple data centers, in order to be able to respond to application requests in real time, the system architecture pays more attention to the high availability and low latency, especially to the reading operation. This ensures high efficiency when generating site pages at extremely heavy loads.

Tao encapsulates the data access APIs associated with graph operations for the client, enabling clients to access not only entities and their attributes, but also various entity relationship data. For example, access to relational data can provide a query interface with the following relational list:

(ID, Atype)->[anew,..., Aold]

Where the ID represents a unique tag for an entity, Atype indicates the relationship type (friend relationship, etc.), and the relationship list lists the other entity IDs that the ID points to that satisfy the Atype type relationship in chronological order. For example, (i,comment) You can list all the comments about I.


1. The overall structure of Tao

Tao is a quasi-real-time graph database spanning multiple data centers, as shown in the overall architecture of 14-3. First, Tao synthesizes multiple, near-range data center groups into one partition (region), which forms multiple partitions, and caches within each partition are responsible for storing all of the entity and relational data. Where the original data is centrally stored in the database and cache of one primary partition, and several other copies of the data are stored from the partition (this is a more unique design approach, it is recommended that the reader take some time here to consider the starting point of its design and then read the following).


The architecture is designed for the following reasons: the cache structure is a very important part of Tao, and it is very helpful for fast response to user read requests, and the cache needs to be put in memory, if the memory resources are low and large enough, Ideally, each datacenter will hold a complete copy of the data to quickly respond to the user's read operations and avoid the time-consuming operations of users reading data across the data center. However, considering that the amount of data to be stored is too large (petabytes), each data center stores a full backup data cost is too high, so back to the second, will be geographically closer to a number of data centers as a whole to completely store all the backup data, because the data center geographically close, so the communication efficiency is also high, This makes a tradeoff and compromise between cost and efficiency.

In each partition will store the complete entity and its relational data, Tao in the partition of the storage architecture can be divided into three layers (see figure 14-3), the bottom is the MySQL database layer, because of too much data, the data is divided into a table after a number of data slices (Shard), a data slice is stored by a logical relational database, A single server can store multiple slices of data. The second layer is the cache layer corresponding to the underlying data slice one by one, called the main cache layer (Leader cache), the main cache is responsible for caching the corresponding logical database content, and the database for read and write communication, the top level is from the cache layer (Follower cache), More than one master cache from the cache, responsible for caching content in the main cache. Tao has designed the cache into a two-level structure to reduce the degree of coupling between caches, which facilitates the scalability of the entire system, and when the system load increases, it is easy to expand the system by adding storage from the cache server.


2. Read and write operation of Tao

Client programs can only interact with the outermost layer from the cache, and cannot communicate directly with the master cache (see figure 14-4). When the client has a data request, and the most recent connection from the cache, if it is a read operation and cached from the cache of the data, then directly back, for Internet applications, read operation ratio is much larger than the write operation, so from the cache can respond to most of the site load.

If the user request is not hit from the cache (cache Miss), it is forwarded to the corresponding master cache, and if the primary cache is not hit, it is read from the database by the main cache and the main cache is updated (the location shown in Figure 14-4 for A and D shows this logic). The message is then sent to the corresponding cache requesting it to load new data from the main cache.


For read operations, all partitions follow the above logic regardless of the master and slave, but for write operations made by the client, the primary partition and the behavior from the partition are different. For the primary partition, when a write request is received from the cache, it is forwarded to the corresponding master cache, the primary cache is responsible for writing it to the corresponding logical database, and after the database write operation succeeds, the main cache sends a message from the cache to tell the original message to fail or to reload it. For a partition, when a write request is received from the cache, it is transferred to the main cache of the partition, and the primary cache is not written directly to the local database, but instead forwards the request to the main cache of the primary partition (the position of C in Figure 14-4 illustrates this). It is written to the primary database.

That is, for a write operation, whether it is a primary partition or a partition, it must be referred to the primary cache to update the primary database. After the primary database has been successfully updated, the primary database notifies the change from the partition from the database to maintain data consistency through the message, and also notifies the master cache from the partition of the change, and triggers the primary cache notification to update the cached content from the cache from the partition (see Figure 14-4, the location of Mark B).

think: Why read from the local database from the primary cache of the partition when the read operation is missing, rather than forwarding to the primary partition like a write operation? The disadvantage of reading from the local database is obvious, which leads to inconsistencies in the data, since the database may be outdated at this time, so what is the purpose or benefit of doing so?

Answer: Because the probability of reading data not being hit in the cache is far greater than the number of writes (approximately 20 times times the difference in Facebook), cross-partition operations have little effect on the overall efficiency of the write operation, but if many read operations take a cross-partition approach, the read operation is significantly less efficient. Tao sacrifices data consistency to ensure low latency for read operations.


3. Data consistency of Tao

Tao in order to give priority to the efficiency of the read operation, the data consistency has made sacrifices, and adopted the final consistency rather than strong consistency. When the primary database has a data change notification from the database, an asynchronous notification is taken instead of a synchronous notification, that is, the client-side request is returned without confirming that the update is complete from the database. So there is a time difference between the primary database and the data from the database, which may cause the client to read out the stale data from the partition, but after a small delay, this data change must be reflected in all the databases, so it follows the final consistency.

Specifically, in most cases, Tao guarantees the "read what you write" consistency of the data. That is, the client issuing the write operation must be able to read the updated new value instead of the expired data, which is necessary in many cases, for example, the user deleted a friend, but if you can also see the message flow from the friend, this is not tolerated.

How does Tao do this? First, if the data update operation occurs in the primary partition, it is known from the above writing process that the "read what you write" consistency is guaranteed, and the tricky case is to send a write request from the partition's client. In this case, the request is forwarded from the cache to the main cache, the primary cache forwards the write request to the main cache of the primary partition, which is written to the master database, after the successful write, from the partition's main cache to notify the sub-region from the cache to update the cached value, the above operation is synchronous completion, Although the database from the partition may not have received the update message from the primary database at this time, the cache at all levels from the partition has been updated synchronously, and then the read request from the partition will be able to read the newly written content from all levels of cache. By this means you can guarantee the "read what you write" consistency from the partition.


Tao Database of Big Data graph database

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.