1. Problems with the mysql+memcached architecture
The actual MySQL is suitable carries on the massive data storage, through memcached hot spot data loading to the cache, speeds up the visit, many companies have used this kind of architecture, but along with the business data quantity unceasing increase, and the traffic growth, we have encountered many problems:
1.MySQL need to continue to disassemble and disassemble, memcached also need to continuously follow the expansion, expansion and maintenance work occupy a lot of development time.
2.Memcached data consistency problem with MySQL database.
3.Memcached data hit low or down machine, a large number of access to direct penetration to db,mysql can not support.
4. Cross room cache synchronization problem.
Many NoSQL blossom, how to choose
In recent years, the industry has sprung up a lot of various nosql products, so how to correctly use these products, maximize their strengths, is that we need in-depth research and thinking of the problem, in fact, the most important thing is to understand the positioning of these products, and understand the tradeoffs of each product , in the practical application to avoid weaknesses, in general, these nosql mainly used to solve the following several problems
1. A small amount of data storage, high-speed read and write access. This kind of product through the data all In-momery way to guarantee the high speed access, simultaneously provides the data to fall the function, actually this is redis most main application scene.
2. Massive data storage, distributed system support, data consistency assurance, convenient cluster node Add/delete.
3. The most representative of this aspect is the ideas elaborated in the 2 papers Dynamo and bigtable. The former is a completely gossip design, the node through the way to pass the cluster information, data to ensure the final consistency, the latter is a central design, through a similar distributed lock service to ensure strong consistency, data writes first write memory and redo log, Then the periodic compat is merged into the disk, and the random writes are optimized to write in order to improve the write performance.
4.Schema free,auto-sharding and so on. For example, some common document databases are supported Schema-free, directly store JSON format data, and support auto-sharding functions such as MongoDB.
In the face of these different types of nosql products, we need to choose the most suitable products according to our business scenarios.
Redis is most suitable for all data in-momory scenarios, although Redis also provides the persistence function, but actually more is a disk-backed function, compared with the traditional sense of persistence has the big difference, then perhaps everybody may have the question, It seems that redis more like a strengthened version of the memcached, then when to use memcached, when to use Redis?
If you simply compare the differences between Redis and memcached, most will get the following view:
1, Redis not only support the simple k/v type of data, but also provide list,set,zset,hash and other data structure storage.
2, Redis support data backup, that is, the Master-slave mode of data backup.
3, Redis support data persistence, you can keep the data in memory in the disk, restart can be loaded again to use.
2. Redis Common data types
The most commonly used data types for Redis include the following:
String
Hash
List
Set
Sorted Set
Pub/sub
Transactions
Before describing these types of data, let's take a picture of how these different data types are described in Redis internal memory management:
First, Redis internally uses a Redisobject object to represent all key and value,redisobject most important information as shown above:
Type represents the specific data type of a value object.
Encoding is the way in which different data types are stored inside Redis,
For example: type=string represents the value stored is a normal string, then the corresponding encoding can be raw or int, if it is an int represents the actual Redis internally is stored and represented by numeric types of the string, Of course, the premise is that the string itself can be expressed numerically, such as "123″" 456″ such a string.
This requires a special description of the VM field, only if the Redis virtual Memory feature is turned on, this field will actually allocate memory, which is turned off by default, which is described later. The above figure shows that Redis use Redisobject to represent all the key/value data is a waste of memory, of course, these memory management costs are mainly to give redis different data types to provide a unified management interface, The actual author also provides a variety of ways to help us save memory usage as much as possible, and we'll discuss it later.
3. Application and implementation of various data types
Let's first analyze the use and internal implementation of these 7 types of data:
String:
The Strings data structure is a simple key-value type, and value is not only a string, but also a number.
Common commands: Set,get,decr,incr,mget and so on.
Scenario: string is one of the most commonly used data types, and ordinary key/value storage can be grouped into this class. That is, the functionality of the current Memcached can be fully realized, and the efficiency is higher. You can also enjoy the redis of timing, operation log and replication functions. In addition to providing operations such as GET, set, INCR, DECR, and Memcached, Redis also provides the following actions:
Get string length
Append content to a string
Set up and get a section of a string
Sets and gets a character (bit) of a string
Batch set the contents of a series of strings
Implementation: string Redis internal storage default is a string, referenced by Redisobject, when encountered INCR,DECR and other operations will be converted to a numeric calculation, at this time redisobject encoding field Int.
Hash
Common commands: Hget,hset,hgetall and so on.
Application scenario: In memcached, we often package structured information into HashMap, which is stored as a string after the client is serialized, such as the user's nickname, age, gender, integral, and so on, and when one of the items needs to be modified, it is often necessary to remove all values from the deserialization. Modifies the value of an item and then serializes the store back. This not only increases overhead, but also does not apply to situations where there may be concurrent operations (for example, two concurrent operations need to modify the integral). The REDIS hash structure allows you to modify only one property value just as you would update an attribute in a database.
We simply cite an example to describe the application scenario for the hash, for example, we want to store a user information object data containing the following information:
The user ID is lookup key, the stored value user object contains the name, the age, the birthday and so on information, if uses the ordinary key/value structure to store, mainly has the following 2 kinds of storage way:
The first way is to use the user ID as the lookup key, the disadvantage of encapsulating other information as a serialized object is to increase the cost of serialization/deserialization and to retrieve the entire object when one of the information needs to be modified, and the modification operation requires concurrency protection. Introduce complex issues such as CAs.
The second method is how many members of the user information object are stored as Key-value pairs, the value of the corresponding attribute is obtained with the name of the corresponding property of the user id+, although the serialization cost and concurrency problems are omitted, but the user ID is duplicate storage, and if there is a large amount of such data, The memory waste is still very considerable.
So Redis provides a good solution to this problem, the hash of Redis is actually the internal storage value of a hashmap, and provides a direct access to this map member of the interface, the following figure:
In other words, the key is still the user ID, value is a map, the map key is the member's attribute name, value is the attribute value, so that the modification and access to the data can be directly through its internal map key (Redis called the internal Map key field), That is, the key (user ID) + field (attribute tag) can manipulate the corresponding attribute data, neither need to duplicate the data storage, nor bring serialization and concurrency modification control problems. Solved the problem very well.
At the same time, note that Redis provides an interface (Hgetall) to directly fetch all of the attribute data, but if there are many members of the internal map, then it involves traversing the entire internal map, which may be time-consuming due to the Redis single-threaded model, The other client's request is not responding at all, which requires extra attention.
Implementation mode:
Above already said Redis hash corresponds to value inside actually is a hashmap, actually there will be 2 different implementations, this hash member is relatively young redis in order to save memory will be similar to a one-dimensional array of methods to compact storage, and will not adopt a real HASHMAP structure , the corresponding value Redisobject encoding is Zipmap, and when the number of members increases, it automatically turns into a real hashmap, at which point the encoding is HT.
List
Common commands: Lpush,rpush,lpop,rpop,lrange and so on.
Application Scenario:
Redis list has a lot of applications and is one of the most important data structures in Redis, such as Twitter's list of concerns, and the list of fans can be implemented using the REDIS list structure.
Lists is a linked list, I believe that the knowledge of a little data structure should be able to understand its structure. With the lists structure, we can easily implement features such as the latest news ranking. Another application of lists is Message Queuing,
You can take advantage of the lists push operation to have the task in lists, and then the worker thread will then use the pop operation to take the task out for execution. Redis also provides an API for manipulating a section of lists, which you can query directly to remove elements from a section of the lists.
Implementation mode:
Redis list is implemented as a two-way linked list, which can support reverse lookup and traversal, more convenient operation, but brings some extra memory overhead, many implementations within Redis, including sending buffer queues are also used in this data structure.
Set
Common commands:
Sadd,spop,smembers,sunion and so on.
Application Scenario:
Redis set provides a function that is similar to a list, especially if the set is automatic, and when you need to store a list of data and do not want duplicate data, set is a good choice, and set provides an important interface for determining whether a member is within a set set, which is not available in the list.
The concept of a Sets set is a combination of a bunch of distinct values. Using the sets data structure provided by Redis, you can store some aggregated data, for example, in a microblog application, you can have all the followers of a user in a collection, and all their fans in one collection. Redis also provides an intersection for the collection, and set, the difference set and so on, can be very convenient to achieve such as common concern, common preferences, two degrees friends and other functions, on the above all the set operation, you can also use different commands to choose the results returned to the client or save set to a new collection.
Implementation mode:
The internal implementation of set is a HashMap value that is always null, and is actually used to compute the hash in order to quickly row the weight, which is why set provides a way to judge whether a member is within a set.
Sorted Set
Common commands:
Zadd,zrange,zrem,zcard, etc.
Usage scenarios:
The use scenario for the Redis sorted set is similar to set, except that the set is not automatically ordered, and sorted set can be sorted by the user providing an additional priority (score) parameter, and is inserted in an orderly, automatic sort. When you need an ordered and not duplicated list of collections, you can choose to sorted set data structures, such as Twitter's public timeline can be stored as score by publication time, which is automatically sorted by time.
In addition, you can use sorted sets to do with the weight of the queue, such as the ordinary message of the score 1, the important message of score 2, then the worker can choose to press the reverse of the score to get work tasks. Give priority to important tasks.
Implementation mode:
Redis sorted set of the internal use of HashMap and jump Table (skiplist) to ensure the storage and order of data, HashMap is a member to the score of the map, and the jump tables are stored in all the members, sorted according to the HashMap in the score , the use of jump table structure can obtain a relatively high search efficiency, and in the implementation is relatively simple.
Pub/sub
Pub/sub is literally the publication (Publish) and Subscription (Subscribe), in Redis, you can set a key value for the message and message subscriptions, when a key value of the message published, all subscribed to its clients will receive the corresponding message. The most obvious use of this function is to use as a real-time messaging system, such as ordinary instant chat, group chat and other functions.
Transactions
Who says NoSQL does not support transactions, although Redis's transactions does not provide strict acid transactions (such as a string of commands executed with Exec, the server goes down in execution, a portion of the command executes, and the rest is not executed), However, this transactions provides basic commands for packaging execution (when the server is not in trouble, you can guarantee that a series of commands are executed sequentially, and that there will be other client commands inserted to execute). Redis also provides a Watch function where you can Watch a key and then perform transactions, in which case the watched will find and refuse to execute if the value of the transactions is modified.
4. Redis actual application scene
Redis differs in many ways from other database solutions: It uses memory to provide primary storage support, and uses hard disks for persistent storage; its data model is unique and single-threaded. Another big difference is that you can use Redis functionality in your development environment, but you don't need to go to Redis.
Steering Redis is certainly also desirable, with many developers using Redis as a preferred database from the outset, but assuming that if your development environment is already built and your application is already running on it, then replacing the database framework is obviously not easy. In addition, in some applications that require a large-capacity dataset, Redis is also not appropriate because its dataset does not exceed the memory available to the system. So if you have large data applications and are primarily read access patterns, then Redis is not the right choice.
What I like about Redis, though, is that you can incorporate it into your system, which solves a lot of problems, such as the tasks that your existing database feels slow to handle. This allows you to optimize by Redis or create new features for the application. In this article, I want to explore how to add Redis to the existing environment, and use its primitive commands to solve some of the common problems encountered in the traditional environment. In these examples, Redis is not a preferred database.
1, show the latest list of items
The following statement is commonly used to display the most recent items, and as more data is available, there is no doubt that the query will become increasingly slow.
SELECT * from foo WHERE ... Order by Time DESC LIMIT 10
In Web applications, queries such as "list the latest replies" are common, which often leads to scalability issues. This is frustrating because the project was created in this order, but it has to be sorted to output this order.
A similar problem can be solved with redis. For example, one of our web apps wants to list the latest 20 comments posted by users. We have a "show all" link on the latest comment, and you can get more comments when clicked.
We assume that each comment in the database has a unique incremented ID field.
We can use pagination to make the home page and comment page, using the Redis template, each time a new comment is published, we will add its ID to a redis list:
Lpush latest.comments
We crop the list to a specified length, so Redis only needs to save the last 5,000 comments:
LTRIM latest.comments 0 5000
Each time we need to get the project scope for the latest comment, we call a function to complete (using pseudocode):
FUNCTION get_latest_comments (Start, num_items):
Id_list = Redis.lrange ("latest.comments", start,start+num_items–1)
IF Id_list.length < Num_items
Id_list = sql_db ("Select ...") ORDER by Time LIMIT ... ")
End
Return id_list
End
What we do here is very simple. Our latest ID in Redis uses the resident cache, which is updated all the time. But we did limit it to no more than 5,000 IDs, so our Get ID function would always ask Redis. You need to access the database only if the Start/count parameter is out of range.
Our system does not "flush" the cache as traditional, and the information in the Redis instance is always consistent. The SQL database (or other type of database on the hard disk) is triggered only when the user needs to get "very far" data, and the home page or the first comment page does not bother the database on the hard drive.
2, delete and filter
We can use Lrem to delete comments. If the deletion is very small, the other option is to skip directly to the entry of the comment entry and report that the comment no longer exists.
Sometimes you want to attach a different filter to a different list. If the number of filters is limited, you can simply use a different Redis list for each of the different filters. After all, there are only 5,000 items per list, but Redis can use very little memory to handle millions of items.
3, the list of related
Another common requirement is that the data for various databases is not stored in memory, so the performance of the database is not ideal in order of scoring and real-time updating of these functions that need to be updated almost every second.
Typically, for example, a list of those online games, such as a Facebook game, according to the score you usually want to:
-List Top 100 high score players
– List the current global rankings for a user
These operations are a piece of cake for redis, and even if you have millions of users, there will be millions of new points per minute.
The pattern is this, each time we get a new score, we use this code:
Zadd Leaderboard
You may use UserID to replace username, depending on how you designed it.
The top 100 high score users are very simple: Zrevrange leaderboard 0 99.
The user's global ranking is similar, only need: Zrank leaderboard .
4, according to the user vote and time sorting
A common variant of the list, like Reddit or hacker news, is sorted by score according to a formula similar to the following:
Score = Points/time^alpha
So the user's vote will be the corresponding to dig out the news, but the time will be according to a certain index will bury the news. Here is our pattern, of course the algorithm is up to you.
The pattern is this, the first to observe those may be the latest items, such as the first page of the 1000 news is a candidate, so we first ignore the other, this is easy to implement.
Each time a new news post comes up, we add IDs to the list and use Lpush + LTRIM to ensure that only the latest 1000 items are removed.
A background task gets the list and continues to compute the final score for each piece of news in the 1000 news articles. The calculation results are populated by the Zadd command in a new order, and the old news is cleared. The key idea here is that the sort work is done by background tasks.
5, Processing expired items
Another common sort of item is sorted by time. We use Unix time as a score.
The pattern is as follows:
– Each time a new item is added to our Redis database, we add it to the sorted collection. Then we used the time attribute, Current_time and time_to_live.
– Another background task uses the Zrange ... Scores query sort collection, take out the latest 10 items. If the Unix time is found to have expired, the entry is deleted in the database.
6. Counting
Redis is a good counter, thanks to Incrby and other similar commands.
I believe that you have tried many times to add new counters to the database to get statistics or to display new information, but in the end you have to give them up because of write sensitivity.
OK, now use redis don't need to worry anymore. With atomic increments (atomic increment), you can safely add a variety of counts, reset them with getset, or let them expire.
For example this operation:
INCR User: EXPIRE
User:
You can calculate the number of pages that the user has recently paused for no more than 60 seconds, and when the count reaches like 20 o'clock, you can display some banner hints or anything else you want to show.
7. Specific projects within a specific period of time
Another is difficult for other databases, but the easy part of Redis is to count how many specific users have access to a particular resource during a certain feature time. For example, I want to know some specific registered users or IP address, how much they have access to an article.
I only need to do this every time I get a new Page view:
Sadd page:day1:
Of course, you may want to replace day1 with Unix time, such as Times ()-(Time ()%3600*24) and so on.
Want to know the number of specific users? You only need to use SCard page:day1: .
Need to test whether a particular user has accessed this page? Sismember page:day1: .
8, real-time analysis of what is happening, for data statistics and prevention of spam and so on
We've only done a few examples, but if you look at the Redis command set and combine it, you can get a lot of real-time analytics that are efficient and very labor-saving. Using the Redis command, it is easier to implement spam filtering systems or other real-time tracking systems.
9, Pub/sub
The Redis pub/sub is very, very simple, stable and quick to run. Supports pattern matching, enables real-time subscription and cancellation of channels.
10. Queue
You should have noticed that the Redis commands like the list push and list pop are handy for performing queues, but can do more than that: Redis also has a variation command for list pops that blocks queues when the list is empty.
Modern Internet applications use a large number of message queues (messaging). Message Queuing is used not only for communication between components within a system, but also for interaction between systems and other services. The use of Message Queuing can increase the scalability, flexibility, and user experience of the system. Non-Message Queuing based systems whose speed depends on the speed of the slowest component in the system (note: the short plate effect). Based on Message Queuing, the components of the system can be decoupled so that the system is not bound by the slowest components, and the components can run asynchronously to perform their work faster.
In addition, when the server is in a high concurrency operation, such as writing log files frequently. You can use Message Queuing to implement asynchronous processing. To achieve high-performance concurrent operations.
11, caching
Redis's cache portion is worth writing a new article, I'm just saying it briefly. Redis can replace memcached, so that your cache can only be stored from data to be able to update the data, so you no longer need to regenerate the data every time.
The original address of this section: http://antirez.com/post/take-advantage-of-redis-adding-it-to-your-stack.html
5. The Redis experience and use scene shared by three different field giants at home and abroad
With the increasing application of high performance requirements, NoSQL gradually rooted in the system architecture of various enterprises. Here we will share the Redis practices of the social giants Sina Weibo, the media giants Viacom and the photo-sharing Pinterest, first of all, we look at the Redis experience of Sina Weibo @ Kai Hope Cobain:
One, Sina Weibo: The largest redis cluster in history
Tape is Dead,disk is Tape,flash are Disk,ram locality is King. -jim Gray
Redis is not a more mature memcache or MySQL substitute, and is a good architectural complement to the application of large Internet classes. Now there are more and more applications are also based on Redis to do the transformation of the architecture. First of all, simply publish the actual situation of Redis platform:
2200+ billion Commands/day 500 billion read/day 50 billion write/day
18tb+ Memory
500+ Servers in 6 IDC 2000+instances
Should be a relatively large redis use platform at home and abroad, today mainly from the application point of view to talk about Redis service platform.
Redis Use scenes
1.Counting (count)
The application of the count in another article in a more detailed description, the optimization of the counting scene http://www.xdata.me/?p=262 here is not more description.
Predictably, there are a lot of students who think it's very expensive to put all the counts in memory, and I'm going to use a chart here to express my opinion:
There are many scenarios in which it is conceivable that pure memory-using schemes can be very costly, but the reality is often different:
Cost, for a certain throughput requirements of the application, will definitely apply for DB, cache resources, a lot of people worried about DB write performance will also actively take DB update into the asynchronous queue, and these three of resources are generally not too high utilization. Resources to calculate, you are amazed to find: Instead of pure memory scheme will be more streamlined!
The kiss principle, which is very friendly for development, I just need to build a set of connection pools without worrying about data consistency maintenance without maintaining asynchronous queues.
Cache penetration risk, if the backend using DB, will certainly not provide a high throughput capacity, cache downtime if not properly handled, it would be a tragedy.
Most of the initial storage requirements are small in size.
2.Reverse Cache (Reverse cache)
The face of micro-bo often appear hot spots, such as the recent emergence of a relatively hot short chain, a short time there are tens of thousands of people click, jump, and there will often be a number of needs, such as we to quickly in the jump to determine the user level, whether there are some account binding, gender-like what, has shown different content or information.
The common use of memcache+mysql solution, when the call ID is legitimate, can support a large throughput. However, when the call ID is not controllable, there are more garbage user calls, because the memcache did not hit, will be a large number of penetration to the MySQL server, an instant result of the number of connections, the overall throughput is reduced, response time slows down.
Here we can use Redis to record a full amount of user information, such as String Key:uid Int:type, do a reverse cache, when the user in Redis quickly obtain their level information, and then go to the MC+MYSQL layer to obtain the full amount of information. As shown in figure:
Of course, this is not the most optimized scenario, such as using Redis to do bloomfilter, may be more time-saving memory.
3.Top List
Product operations will always show you the top list of the most recent, hottest, most-clicked, most active, and so on. Many newer and more frequent lists if you use Mc+mysql maintenance, the likelihood of caching failure is greater, given the low memory footprint, using Redis to do storage is pretty good.
4.Last Index
The user's recent access record is also a good application scenario for Redis list, Lpush lpop automatically expired old log log, for the development is still very friendly.
5.Relation List/message Queue
Here two functions in the end, because these two features in the real problems encountered some difficulties, but at a certain stage also did solve many of our problems, so here only to explain.
Message queue is through the list of Lpop and Lpush interface for queue writing and consumption, because of its good performance can also solve most of the problems.
6.Fast transaction with Lua
Redis's LUA feature extensions actually bring more scenarios to Redis, and you can write several command combinations as a small non-blocking transaction or update logic, such as when a message is pushed, and 1. Add an unread dialog to yourself 2. Add an unread message to your own DMS 3. Finally give the sender the receipt of a complete push message, this layer of logic can be fully implemented on the Redis server side.
However, it is important to note that Redis will record all the contents of the Lua script in aof and send to Slave, which will also be a small overhead for the disk and the network card.
7.Instead of Memcache
Many tests and applications have proven that
In the performance aspect Redis does not lag behind memcache how many, but the single-threaded model has brought the very strong expansibility to the Redis.
In many scenarios, Redis's memory overhead for the same data is less than memcache slab allocations.
Redis provides the data synchronization function, in fact, is a powerful function of cache expansion.
Important points used by Redis
1.rdb/aof backup!
Our online Redis 95% is responsible for the back-end storage function, we are not only used as cache, and a more k-v storage, he completely replaces the back-end storage services (MySQL), so its data is very important, if there is data pollution and loss, misoperation, etc., will be difficult to recover. So backup is very necessary! To do this, we have shared HDFS resources as our backup pool, and we want to be able to restore the data needed for the business at any time.
2.Small Item & Small instance!
Because the Redis single-threaded (strictly not single-threaded, but considered the processing of the request to be single-threaded) model, large data structure list,sorted Set,hash set of batch processing means that other requests are waiting, Therefore, the use of Redis complex data structure must control the size of its single key-struct.
In addition, the memory capacity of the Redis single instance should be strictly limited. When single instance memory is large, the immediate problem is failure recovery or rebuild from the library for a long time, and worse, Redis rewrite aof and save Rdb will bring very large and long system pressure and occupy additional memory, is likely to cause poor system memory and other serious performance-impacting online failures. Our online 96g/128g memory server does not recommend single instance capacity greater than 20/30g.
3.Been available!
Industry information and use of more is Redis Sentinel (Sentinel)
Http://www.huangz.me/en/latest/storage/redis_code_analysis/sentinel.html
Http://qiita.com/wellflat/items/8935016fdee25d4866d9
2000 line C Implementation of the server status detection, automatic failover and other functions.
However, because the actual structure of their own is often complex, or consider more angles, for this @ Xu Qi Eryk and I did the Hypnos project.
Hypnos is a mythical sleeping God, and the literal meaning is that we engineers do not have to deal with any malfunctions during the rest time. :-)
The principle of its work is indicated as follows:
Talk is cheap, show me your code! Later will write a separate blog to carefully talk about the implementation of the Hypnos.
4.In Memory or not?
Found a situation, development in the communication backend resource design, often because of habitual use and error to understand product positioning and other reasons, but ignore the real users of the assessment. Perhaps this is a historical data, only the most recent day of data to be accessed, and the historical data capacity and the most recent day requests are thrown to the memory class storage reality is very unreasonable.
So when you're using what kind of data structure to store, be sure to measure the cost first, how much data is to be stored in memory? How much data is really meaningful to the user. Because this is really critical to the design of the backend resources, the 1G data capacity and 1T data capacity are completely different for the design idea.
Plans in future?
1.slave Sync Makeover
All the transformation of the online Master-slave data synchronization mechanism, which we borrowed from the idea of MySQL replication, using Rdb+aof+pos as the basis for data synchronization, here briefly explain why the official Psync does not meet our needs well:
Suppose A has two from the library B and C, and a '-b&c, at this time we found that the master a server has the problem of downtime need to restart or a node direct downtime, need to switch B for the new main library, if A, B, C do not share RDB and aof information, C, as a b from the library, will still clear their own data, Because the C node only records synchronization status with Node A.
So we need a synchronization mechanism that switches the A ' –b&c structure switch to a ' –B ' –c structure, and Psync can not support the smooth handoff of master failure, although it supports the continuation of the breakpoint.
In fact, we have used the above function synchronization in our custom Redis counting service, the effect is very good, has solved the operation dimension burden, but still needs to extend to all Redis service, if possibly we also can propose the related sync slave improvement to the official Redis.
2. Name-system Or proxy more suitable for Redis
Careful students found that we in addition to using DNS as a naming system, but also in the zookeeper has a record, why not let users directly access a system, ZK or DNS select one?
In fact, is still very simple, naming system is a very important component, and DNS is a relatively sound naming system, we have done a lot of improvement and trial and error, ZK implementation is still relatively complex, we have not a strong control granularity. We are also thinking about what to do with naming systems more in line with our needs.
3. Back-end data storage
The use of large memory is certainly an important cost optimization direction, flash disk and distributed storage in our future plans. (original link: largest redis clusters Ever)
Second, the pinterest:reids maintenance of the tens of billions of correlation
Pinterest has become one of the most insane stories in Silicon Valley, with a 1047% increase in their pc-based business in 2012, a 1698% increase in mobile end, and a surge of independent visits to 53.3 billion in March. In Pinterest, people pay attention to tens of billions of dollars--each user interface will query a board or whether the user is concerned about the behavior contributed to the unusually complex engineering problems. This also lets Redis get a useful. After several years of development, Pinterest has become the media, social and other areas of the leader, its brilliant record as follows:
Gets more recommended traffic than Google +, YouTube and LinkedIn combined
Become the three most popular social networks with Facebook and Twitter
Refer to Pinterest to buy more users than other sites (more details)
As you would expect, based on its number of independent accesses, Pinterest's high size has contributed to a very high IT infrastructure requirement.
Optimize the user experience with caching
Recently, Pinterest engineering manager Abhi Khune shared his company's user experience needs and redis experience. Even the breed of application makers will not understand these features until they analyze the details of the site, so get a general idea of the usage scenario: first, the pre check for each fan, and then the UI will accurately display the user's fans and attention list paging. Perform these operations efficiently, requiring a very high performance architecture for each click.
Not exception, Pinterest software engineers and architects have used MySQL and memcache, but caching solutions still reach their bottleneck, so in order to have a better user experience, the cache must be expanded. In the actual operation, the engineering team has found that caching only works if the user sub-graph is already in the cache. So. Anyone using this system needs to be cached, which causes the entire graph to be cached. At the same time, the most common answer to "User A's attention to User B" is often negative, but this is used as a cache loss to facilitate a database query, so they need a new way to extend the cache. Ultimately, their team decided to use Redis to store the entire graph to serve numerous lists.
Use Redis to store a large number of Pinterest lists
Pinterest uses Redis as a solution and pushes performance to the memory database level, saving multiple types of lists for users:
List of followers
Board List of your concerns
Fan List
Focus on your board list of users
A list of board in a user that you don't care about
Each of the board and non-followers
Redis stores all of the above lists for its 70 million users, essentially storing all of the fan graphs and slicing through the user ID. Since you can view the data in the above list by type, the profiling profile is stored and accessed in a system that looks more like a transaction. Pinterest the current user like is limited to 100,000, the initial statistics: if each user focus on 25 board, will be in the user and board between 1.75 billion of the relationship. It is also more important that these relationships increase every day as the system is used.
Reids Architecture and operation of Pinterest
As one of Pinterest's founders learned, Pinterest started using Python and custom Django to write applications and continued until they had 18 million user-level 410TB user data. Although data is stored with multiple stores, engineers use 8,192 virtual slices based on user IDs, each of which runs on top of a Redis db, while 1 Redis instances run multiple Redis db. For full use of the CPU core, multithreading and single-threaded Redis instances are used on the same host.
Given that the entire dataset is running in memory, the redis of incoming writes per second on Amazon EBS will be persisted. The expansion is mainly done in two aspects: first, to maintain the 50% utilization rate, through the master-slave conversion, the machine running Redis instance half will translate to a new machine; second, extend nodes and fragments. The entire Redis cluster will use a master-slave configuration, from which the section would be treated as a hot backup. Once the primary node fails, the main transformation is completed immediately from the section, while a new part will be added and zookeeper will complete the process. At the same time they run Bgsave on Amazon S3 every hour for more persistent storage-the Reids operation is done at the back end and Pinterest uses the data to do mapreduce and analysis jobs. (See the original text for more details)
Third, Viacom:redis in the system of use-case inventory
Viacom is one of the world's largest media groups, and has been hit by one of the biggest data challenges of the moment: how to handle the growing dynamic video content.
Looking at the rising trend of this challenge, we will find that 2010 all the world's data volume reached the ZB level, and alone in the year 2012, the internet generated data increased by 2.8 ZB, most of the data are unstructured, including video and pictures.
Covering MVN (formerly known as MTV Networks, Paramount and BET), Viacom is a veritable media giant that supports many popular sites, including The Daily Show, osh.0, South Park Studios, GameTrailers.com and so on. As media companies, documents, pictures, and video clips from these sites are constantly being updated. Long story short, let's go into the Redis practice shared by Viacom senior architect Michael Venezia:
Viacom's Web site architecture background
For Viacom, the spread of content across multiple sites has to be focused on the needs of the scale, and in order for the content to be quickly propagated to the appropriate users, they must also focus on the relationship between the content. However, even if the day show, Nickelodeon, Spike or VH1 these separate sites, the daily average PV can be tens of millions, peak traffic will reach the average 20-30 times. At the same time, based on real-time demand, dynamic scale and speed has become one of the foundation of Architecture.
In addition to the dynamic scale, the service must also speculate on the user's preferences based on the video or geographic location that the user is browsing. For example, a page might associate a separate video clip with a local promotion, an extra part of the video series, or even a related video. To allow users to stay on the site for longer, they built a software engine that could automatically build pages based on detailed metadata, which would recommend additional content based on the user's current interest. Given the change of interest at any time, the type of data is very broad--like graph-like, which actually does a lot of join.
This helps reduce the number of copies of a large volume of files like video, such as a separate record in the data store is the Southpark fragment "Cartman Gets an anal Probe," which may also appear on the German web site. Although the video is the same, but the English user search may be another different word. A copy of the metadata is converted to the search results and points to the same video. So in the case of American users searching for real titles, German viewers may use the translated title, "Cartman und die Analsonde," on the German website.
These metadata overwrite other records or objects, and can change the content according to the usage environment, limiting different geographical locations or the contents of the device request through different rule sets.
The realization method of Viacom
Although many organizations use ORM and traditional relational databases to solve this problem, Viacom uses a radically different approach.
In essence, they cannot afford to have direct access to the database at all. First, most of them deal with streaming data, preferring to use Akamai to distribute content geographically. Second, the complexity of the page may take tens of thousands of objects. With so much data obviously impacting performance, JSON is put into use in 1 data services. Of course, the caching of these JSON objects will directly affect the performance of the site. Also, caching needs to be updated dynamically when the relationship between content or content changes.
Viacom relies on object primitives and superclass to solve this problem, and continues to take South Park as an example: a Private "episode" class contains all the relevant information about the fragment, and a "super object" will help find the actual video object. The idea of super class is really very useful for building the automatic construction of low-latency pages, which can help to map and save primitive objects to the cache.
Viacom Why to use Redis
Whenever Viacom uploads a video fragment, the system creates a private object, which is associated with 1 superclass classes. Each time they are modified, they need to revalue each change in the private object and update all the composite objects. At the same time, the system also requires a URL request in an invalid Akamail. The combination of the system's existing architectures and the need for more agile management methods Viacom to the Redis.
Based on Viacom is primarily based on PHP, so this solution must support PHP. They first chose memcached to do object storage, but it did not support HashMap well, and they needed a more efficient reassessment of invalid steps, a better understanding of content dependencies. Essentially, they need to follow the dependent changes in the invalid steps at all times. So they chose the combination of Redis and Predis to solve the problem.
Their team used Redis to give southparkstudios.com and thedailyshow.com two Web site building dependency maps, and after a great success they began to focus on redis other suitable scenarios.
Other usage scenarios for Redis
Obviously, if someone uses Redis to build a dependency graph, it makes sense to use it for object processing. Again, this is the second use scenario that the architecture team chooses for Redis. Redis's replication and persistence features also conquer Viacom's operations team, so after several development cycles, Redis becomes the primary data and dependency storage for their sites.
The latter two use cases are a buffer of behavioral tracking and browsing counts, and the changed architecture is redis every few minutes to MySQL, while browsing counts are stored and counted by Redis. At the same time, Redis is also used for popularity calculations, a scoring system based on access numbers and access times-the higher the number of times a video has been accessed recently, the more popular it becomes. Doing a calculation every 10-15 minutes on so much content is definitely not the strength of a traditional relational database like MySQL, and the reason Viacom uses Redis is very simple-run a LUA batch job on 1 Redis instances of storing browsing information to calculate all the scoring tables. The information is copied to another Redis instance to support the related product inquiries. Another backup was made on MySQL for later analysis, which reduced the process by 60 times times.
Viacom also uses Redis to store step-by-step job information, which is inserted into a list, and the worker uses the Blpop command line to grab the top task in the queue. At the same time, zsets is used to synthesize content from numerous social networks (such as Twitter and Tumblr), Viacom to sync multiple content management systems through Brightcove video players.
Across these use cases, almost all Redis commands are used--sets, lists, zlists, hashmaps, scripts, counters, and so on. At the same time, Redis also become an integral part of Viacom extensible architecture.