Sina Weibo's engineers have talked about it in several public places, and Weibo is currently using and maintaining the world's largest Redis cluster, one of the largest businesses, with a single business using more than 10T of memory, and this is what it says about Weibo relationships.
Wind up
When Weibo was just on the line in 2009, the Weibo relationship service was using the most traditional memcache+mysql solution. Mysql by the UID hash of the Sub-database table, the table structure is very simple:
There are two types of queries for business parties:
- Query the user's watchlist: Select Touid from table where fromuid=? ORDER BY addtime Desc
- Query the user's fan list: Select Fromuid from table where touid=? ORDER BY addtime Desc
There are contradictions between the business requirements of the two queries and the architecture design of the sub-database, resulting in redundant storage: a copy of the Fromuid as a hash key, and a touid for the hash key to save a copy. Memcache key is fromuid.suffix, using different suffix to distinguish between watchlist or fan list, cache value is the Array after PHP Serialize. Later, to optimize performance, the value was replaced with a byte array of its own assembly.
Billows
In the course of the platform transformation of Weibo in 2011, the business proposed a new requirement: The step of "judging the relationship between two users" was added in the core interface, and the concept of "two-way attention" was added. So there are four statuses for two users: attention, fans, bi-directional attention, and no relationship to each other. To implement this requirement efficiently, the platform introduces Redis to store relationships. The platform uses Redis's hash to store relationships: key is still uid.suffix, watchlist, fan list, and two-way watchlist each have a different suffix,value is a Hash,field is Touid,value is addtime. The function of order by Addtime is implemented by the Service internal sort. Part of the Big V fan list may be long, and after negotiation with the product staff, the storage is limited to the "latest 5,000 fan list".
Micro-blogging relationship storage Redis structure
Requirements implementation:
- Query user watchlist: Hgetall uid.following, then sort
- Query user fan list: Hgetall uid.follower,then sort
- Query user bi-directional watchlist: hgetall uid.bifollow,then sort
- Determine two user relationships: Hget uida.following uidb && hget uidb.following Uida
Later, a few more complex needs were added: "I have a list of common concerns with him", "who pays attention to someone I care about," and so on.
The platform has stepped on a number of pits in the first few years of the introduction of Redis, for example:
1, operation and maintenance tools and processes from scratch to do, operation and maintenance of mature speed can not meet the speed of business growth: before the time to arrange performance tuning work, FD has reached the upper limit of the default configuration, and finally we have to take advantage of the early morning business low peak period restart Redis cluster, in order to set up new ulimit parameters;
2, the platform to start using the Redis version is 2.0, because the Redis code is simple enough, from the introduction to the micro-blog, we began to customize its development, from master-slave replication, to write disk speed limit, and memory management, have been customized. As a result, there was a period of time when there were over 5 different Redis revisions on the Weibo line, which caused great trouble for operations, Bugfix, and upgrades. Later by Tanaka @ Fruit Dad for the internal Redis version provides a non-stop upgrade function, only slowly improve.
3, the platform has a business has used a non-default DB, and later took a great effort to do the migration
4, the platform also has a business need to regularly flush DB data to make room to store the latest data. To avoid affecting the online business in the flush DB phase, we have made a lot of changes from the client to the server.
5, the platform every year before the holiday will do some online business troubleshooting, and Fault simulation (2013 years even did a name touchstone of the disaster-tolerant pressure measurement system). 2011 11 ago, we used iptables to drop all the packets from the Redis port, and the client side waited 120 seconds to return. So we stay up late before the holiday to the client to add time-out detection function, but really online or wait until the holiday back.
Broken Cocoon
The biggest challenge for microblogging services is the rapid growth in capacity and access, which brings a lot of trouble to our Redis solution:
The first trouble is that Redis's hgetall in a larger scenario with a hash size is higher than a slow request. We have adjusted the hash-max-zip-size, saving 1/3 of the memory, but the overall performance of the business has been increased limited. Finally, we had to block a layer of memcache in front of Redis for anti-hgetall reading problems.
The second problem is the new demand: "Who pays attention to the person I care about", because the user's fan list may not be complete, in this case can not be used in the way the watchlist and fan list intersection to calculate the results, can only downgrade to the requirements of the literal description of the step: Take my followers list, and then judge each of those who pay attention Client side in parallel to initiate requests, fortunately, the single relationship of Redis is very fast.
The third trouble, but also the biggest trouble, is the problem of capacity growth. The original design scheme, according to the UID hash into 16 ports, each 64G memory of the machine deployed 2 ports, each business IDC room deployed a set. Later, only one port was deployed on each machine. Later, the 128G memory of the machine has not entered the company procurement directory, 64G of memory is about to OOM, so we have to do a port expansion: 16 Port 64 port, still on each 64G memory machine deployed 2 ports. Later, only one port is deployed. Later, upgrade to the 128G memory machine. Later, the 128G machine appeared on the OOM! Now what?
Butterfly
To fundamentally solve the problem of capacity, we began to look for an essential solution. Originally opted to introduce Redis as a storage, because the user relationship judgment function requested data hotspot is not very focused, long tail effect is obvious, the cache miss may affect the core interface performance, and to ensure an acceptable cache hit rate, the consumption of memory and storage difference is not small. But Weibo has evolved over the past 3 years, and the data metrics have changed as a result of the assumptions underlying the selection: with the increase in user base, the absolute number of cold users is increasing; Redis as storage, for data reliability, you must turn on RDB and AoF, This causes the business to use only half of the machine's memory, and Redis hash storage is inefficient, especially compared to Rediscounter, which is extremely optimized internally. All of these factors are added together, and the final direction to be determined is to lower the storage role of Redis here to the cache role.
The current business scenario of the microblog relationship service mentioned earlier can be summed up in two categories: one is to take a list, and the other is to determine whether the element exists in the collection and is in bulk. Even Redis as a storage era, take the list to rely on the previous memcache to help anti-, then as the cache scheme, the list is all by Memcache to do. Batch Judging whether the element exists in the collection, the Redis hash is still the best data structure, but there are two problems: when the cache miss, the set cache performance is poor after fetching the data from the DB: For Weibo members who are concerned about 3000 people, the set cache is occasionally time-consuming Can reach around 10ms, which is fatal for single-threaded Redis, meaning that this port does not provide any other service within 10MS. Another problem is that the memory usage of Redis hash is too low and the cache size required is too large for the target's cache hit rate. So, we also sacrificed the "Redis customization" Magic: to replace the Redis hash with a "fixed length open hash address array", in the Redis view is a byte array, set cache only need one Redis set. By carefully selecting the hash algorithm and the array filling rate, the performance of batch judging elements is equivalent to the native Redis hash.
We have reduced the Redis memory footprint by an order of magnitude by using micro-blogging to service Redis Storage's cache transformation. It may lose the title of "the largest single business Redis cluster", but we are more fulfilling and happier than before.
Sina uses Redis