Author: Liu Xuhui Raymond reprinted. Please indicate the source
Email: colorant at 163.com
Blog: http://blog.csdn.net/colorant/
More paper Reading Note http://blog.csdn.net/column/details/cloudpaper.html
=
Target question=
Tao's goal is to build a data warehouse that efficiently generates precise and customized content from massive associated data in a large-scale social networking distributed application service such as Facebook. Its application scenarios are global, massive dynamic change data, and highly concurrent queries.
=
Core Ideology=
The data model of Facebook's social network service is built based on the association between objects. Data mainly includes objects and associations, such as users, images, posts, comments, and one-time checkin. Association is the relationship between various objects, friends, and posts, which post is the comment. All objects and associations have an ID field as a unique identifier.
In this application model, various discrete data are correlated with each other, making it difficult to classify and process data in a simple manner, and the final application presentation is ever-changing, therefore, a large amount of work is not completed during data update, but can only be processed during query, so it is a read dominate process.
Facebook's original Framework relies on applications to interact with MySQL and memcached servers to manage and cache data. The problem is that memcached cannot effectively use the information of this object association model, and each client cannot effectively plan and manage the cache globally, and there is also a high price for consistency after data update.
Tao still uses MySQL as the underlying database to store data. The data is divided into multiple shard instances by ID. Each MySQL server is responsible for managing several shard instances and the cache layer of Tao, multiple cache servers form one tier, and one tier contains the information required to support all Tao operation requests. The client program communicates with a specific cache server through a similar shard algorithm. The Cache Server completes data read/write requests and interacts with the MySQL database.
=
Implementation=
To improve the concurrent processing capability, the cache layer of Tao is actually composed of two levels of tier (one leader and multiple Follower), the client and the nearest follower
Tier communication, while followertier forwards write requests through LEADER tier. Read requests are mainly composed of follower
Tier: the request is sent to the leader tier unless any data Miss is not in the cache.
We can see that followers does not interact with the database.
In order to adapt to the global layout and reduce the impact of the global network communication delay, Tao's database and cache layer are actually shown in, and further divided into Master/Slave
Region, each region has the above two-pole tier, and all write operations must be performed through the master
The leader of the region, and then asynchronously synced to the database of the Slave region. The read operation is performed by the slave
Region is completed locally. If the local database is not updated in time, it is possible to read outdated data.
The above region division is added to each shard. Different shard may be handled by different masters. Because the associated update operation may involve multiple Shard, in order to reduce the communication overhead, all master nodes tend to be allocated within the same region.
It is worth noting that each region requires complete data. Because of the huge data volume, a single region may be composed of data centers close to each other in multiple regions.
=
Related research and projects=
Due to the existence of the cache layer, the acid indexes guaranteed by the rmdbs database (MySQL here) are weakened to some extent. Of course, according to cap theory, this is also an inevitable problem for large-scale distributed databases, which usually reduces the consistency requirements. In Tao, the sacrifice of C is not completely out of meeting the requirements of AP. A large part of the reason is to solve the latency problem, as mentioned here:Http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html
Other global databases, such as Google's External Store and spanner, ensure data read/write consistency through various mechanisms such as paxos, GPS, and atomic clock. From the above we can see that Tao adopts progressive consistency in terms of data consistency, and there are various problems such as reading outdated data. The overall multi-layer framework also has the feeling of piecing together, but in general, the compromise was made to maximize the throughput of massive concurrent requests.