Excerpt from: https://www.cnblogs.com/xiaoxi/p/7007695.html
Without a doubt,Redis pioneered a new way of storing data, using Redis, instead of focusing on how to put an elephant into a refrigerator when faced with a monotonous database, we use Redis's flexible data structure and data manipulation. Build different refrigerators for different elephants. I hope you like this metaphor.
First, Redis common data types
The most commonly used data types of Redis are the following five kinds:
- String
- Hash
- List
- Set
- Sorted Set
Before describing these types of data, let's look at a diagram of how these different data types are described in Redis internal memory management:
First, Redis internally uses a Redisobject object to represent all key and value,redisobject information as shown in: type represents what data type a value object is, Encoding is how different data types are stored inside the Redis, such as: Type=string represents a normal string for value, then the corresponding encoding can be raw or int, If it is an int, the actual redis interior is stored and represented by a numeric class, assuming that the string itself can be represented numerically, such as a string such as "123" "456".
Here's a special description of the VM field, which will only actually allocate memory if the Redis virtual Memory feature is turned on, which is turned off by default. We can find that Redis uses Redisobject to indicate that all key/value data is a waste of memory, and of course, the cost of memory management is mainly to provide a unified management interface for different data types of Redis. The actual author also offers several ways to help us save memory as much as possible, which we'll discuss in detail later.
Ii. various types of data application and implementation methods
Let's start with the analysis of the use of these five types of data and how to implement them internally:
1, String
The string data structure is a simple key-value type, and value is not only a string, it can also be a number.
Common commands : Get, set, INCR, DECR, Mget, and so on.
Application Scenarios: String is one of the most commonly used data types, and normal Key/value storage can be classified as such, which can fully implement the current Memcached functionality and be more efficient. You can also enjoy Redis's timed persistence, operation logs, and replication functions. In addition to providing operations like get, set, INCR, DECR, and so on, Redis provides the following Memcached:
- Get string length
- Append content to a string
- Set and get a section of a string
- Set and get one of the strings (bit)
- Bulk set the contents of a series of strings
usage Scenario: General Key-value cache app. Regular count: Number of Weibo, number of fans.
Implementation method: String in the Redis internal storage By default is a string, referenced by Redisobject, when encountered INCR,DECR and other operations will be converted to a numeric type for calculation, at this time Redisobject encoding field is an int.
2, Hash
Common commands:Hget,hset,hgetall and so on.
Application Scenarios:
Let's simply cite an example to describe the application scenario for a hash, such as storing a user information object data that contains the following information:
The user ID is the key to find, the stored value user object contains the name, age, birthday and other information, if the ordinary key/value structure to store, mainly has the following 2 kinds of storage methods:
The disadvantage of using the user ID as a lookup key to encapsulate other information as a serialized object is to increase the cost of serialization/deserialization and to retrieve the entire object when one of the information needs to be modified, and the modification operation requires concurrency protection. Introduce complex problems such as CAs.
The second method is how many members of this user information object will be saved into the number of key-value, with the user id+ the name of the corresponding property as a unique identifier to obtain the value of the corresponding property, although the cost of serialization and concurrency is omitted, but the user ID is repeated storage, if there is a large number of such data, The memory waste is still very considerable.
So the hash provided by Redis is a good solution to this problem, and the Redis hash is actually the internal stored value as a hashmap, and provides a direct access to the map member's interface, such as:
That is, the key is still the user ID, value is a map, the map key is a member of the property name, value is the property value, so that the data can be modified and accessed directly through its internal map key (Redis called internal map key field), This means that the corresponding attribute data can be manipulated by key (user ID) + field (attribute tag), without the need to store the data repeatedly and without the problem of serialization and concurrency modification control. A good solution to the problem.
It is also important to note that Redis provides an interface (Hgetall) that can fetch all of the property data directly, but if the internal map has a large number of members, it involves traversing the entire internal map, which can be time-consuming due to the Redis single-threaded model. The other client requests are not responding at all, which requires extra attention.
usage Scenario: store Part of the change data, such as user information.
Implementation method:
The above has been said that the Redis hash corresponds to value inside the actual is a hashmap, actually there will be 2 different implementations, this hash of the members of the relatively small redis in order to save memory will be similar to a one-dimensional array to compact storage, without the use of a real HASHMAP structure , the encoding of the corresponding value Redisobject is Zipmap, and when the number of members increases, it automatically turns into a true hashmap, at which time encoding is HT.
3. List
Common commands:lpush,rpush,lpop,rpop,lrange and so on.
Application Scenarios:
Redis list has a lot of applications and is one of the most important data structures of redis, such as Twitter watchlist, fan list, etc. can be implemented using Redis's list structure.
Lists are linked lists, and people who believe that they have a knowledge of data structures should be able to understand their structure. Using the list structure, we can easily achieve the latest message ranking and other functions. Another application of list is Message Queuing,
You can use the push operation of the list to place the task in the list, and then the worker thread then uses the pop operation to take the task out. Redis also provides an API to manipulate a section of the list, and you can query directly to remove elements from a section of the list.
Implementation method:
The implementation of Redis list is a doubly linked list, which can support reverse lookup and traversal, but it is more convenient to operate, but it brings some additional memory overhead, and many implementations within Redis, including sending buffer queues, are also used in this data structure.
The Redis list is a doubly linked list with each child element being a string type, which can be added or removed from the head or tail of the list via push and pop operations, so that the list can be used as a stack or as a queue.
Usage scenario: Message Queuing system
Using list, you can build a queue system, and you can even build a prioritized queue system using sorted set.
For example, use Redis as the log collector
is actually a queue, multiple endpoints write log information to Redis, and a worker uniformly writes all logs to disk.
Operations that take the latest n data
Records the list of the top n most recently logged user IDs, which can be obtained from a database.
// Add the currently logged in person to the linked list
ret = r.lpush ("login: last_login_times", uid)
// Keep the list only N
ret = redis.ltrim ("login: last_login_times", 0, N-1)
// Get the list of the first N newly logged in user Ids
last_login_list = r.lrange ("login: last_login_times", 0, N-1)
such as Sina Weibo:
Our latest Weibo ID in Redis uses a resident cache, which is always updated. But we've made a limit of no more than 5,000 IDs, so our Get ID function will always ask for Redis. Access to the database is required only if the Start/count parameter is out of range.
Our system does not "refresh" the cache in the traditional way, and the information in the Redis instance is always consistent. The SQL database (or other type of database on the hard disk) is only triggered when the user needs to get "very far" data, and the home page or the first comment page will not bother the database on the hard disk.
4. Set
Common commands:
Sadd,spop,smembers,sunion and so on.
Application Scenarios:
The functionality provided by Redis set externally is a list-like feature, except that set is automatically weight-saving, and set is a good choice when you need to store a list of data and you don't want duplicate data. and set provides an important interface to determine whether a member is within a set set, which is not available in list.
A set is a set, and the concept of a collection is a combination of a bunch of distinct values. Using the set data structure provided by Redis, you can store some aggregated data.
Case:
In a microblog app, you can have a collection of all the followers of a user, and a collection of all of their fans. Redis also provides for the collection of intersection, set, difference sets and other operations, can be very convenient to achieve such as common concern, common preferences, two-degree friends and other functions, to all of the above collection operations, you can also use different commands to choose whether to return the results to the client or save set into a new collection.
Set is a set, is an unordered collection of string type, set is implemented by Hashtable, the concept and mathematics of a set basically similar, can be intersection, set, difference set and so on, the elements in set is no order.
Implementation method:
The internal implementation of set is a value that is always null hashmap, which is actually calculated by hashing the way to fast weight, which is also set to provide a judge whether a member is within the cause of the collection.
Usage scenarios: intersection, set, and Difference: (set)
// The book table stores the book name
set book: 1: name ”The Ruby Programming Language”
set book: 2: name ”Ruby on rail”
set book: 3: name ”Programming Erlang”
// tag table uses collections to store data, because collections are good at finding intersections and unions
sadd tag: ruby 1
sadd tag: ruby 2
sadd tag: web 2
sadd tag: erlang 3
// A book that belongs to both ruby and the web?
inter_list = redis.sinter ("tag.web", "tag: ruby")
// ie books that belong to ruby but not to the web?
inter_list = redis.sdiff ("tag.ruby", "tag: web")
// A collection of books that belong to ruby and belong to the web?
inter_list = redis.sunion ("tag.ruby", "tag: web")
Get all data deduplication for a certain period of time
This is the most appropriate set data structure to use Redis, just to constantly throw it into set, set means set, so it automatically takes weight.
5. Sorted Set
Common commands:
Zadd,zrange,zrem,zcard, etc.
Usage scenarios:
The usage scenario for Redis sorted set is similar to set, except that the set is not automatically ordered, and the sorted set can be ordered by the user with an additional priority (score) parameter, and is inserted in an orderly, automatic sort. When you need an ordered and non-repeating collection list, you can choose sorted set data structures, such as the public Timeline of Twitter, which can be stored as score in the publication time, which is automatically sorted by time.
Compared with set, Sorted set added a weight parameter score, so that the elements in the collection can be ordered by score, such as a storage class Sorted set, the collection value can be the student's school number, and score can be its test scores, This allows the data to be sorted in a natural way when it is inserted into the collection. Also can use sorted set to do with the weight of the queue, such as the normal message of the score is 1, the important message of the score is 2, and then the worker can choose to score in reverse order to obtain work tasks. Let important tasks take precedence.
Implementation method:
Redis sorted set internal use HashMap and jump Table (skiplist) to ensure the storage and ordering of data, HashMap in the member to score mapping, and the jumping table is all the members, sorted by HashMap in the score , the use of the structure of the jumping table can obtain a relatively high efficiency of finding, and it is relatively simple to implement.
Third, the actual application of Redis scenarios
1. Display the latest project list
The following statement is often used to show the latest items, and with more data, the query will undoubtedly become slower.
SELECT * from foo WHERE ... ORDER by Time DESC LIMIT 10
In a web app, queries such as "list up-to-date replies" are common, which often leads to extensibility issues. This is frustrating, because the project was created in this order, but it had to be sorted in order to output it. A similar problem can be solved with redis. For example, one of our web apps wants to list the latest 20 reviews posted by users. We have a "show all" link on the side of the latest comment, and you can get more comments when you click on it. we assume that each comment in the database has a unique incrementing ID field. we can use pagination to make page and comment pages, use the Redis template, and each time a new comment is published, we'll add its ID to a redis list:
Lpush latest.comments <ID>
We crop the list to a specified length, so Redis only needs to save the latest 5,000 comments:
Each time we need to get the scope of the latest review project, we call a function to complete (using pseudocode):
FUNCTION get_latest_comments(start, num_items):
id_list = redis.lrange("latest.comments",start,start+num_items - 1)
IF id_list.length < num_items
id_list = SQL_DB("SELECT ... ORDER BY time LIMIT ...")
END
RETURN id_list
END
What we do here is very simple. Our latest ID in Redis uses a resident cache, which is always updated. But we've made a limit of no more than 5,000 IDs, so our Get ID function will always ask for Redis. Access to the database is required only if the Start/count parameter is out of range.
Our system does not "refresh" the cache in the traditional way, and the information in the Redis instance is always consistent. The SQL database (or other type of database on the hard disk) is only triggered when the user needs to get "very far" data, and the home page or the first comment page will not bother the database on the hard disk.
2, the leaderboard application, take top n operation
This requirement differs from the above requirements in that the operation of the latest n data is weighted by time, which is weighted by a certain condition, such as the number of times the top is ordered, then we need our sorted set to go, set the value you want to sort to sorted Set score, sets the specific data to the corresponding value, each time only need to execute one zadd command.
Popular, leaderboard apps:
// Store login times and users in a sorted set
zadd login: login_times 5 1
zadd login: login_times 1 2
zadd login: login_times 2 3
// When a user logs in, the number of logins to the user is incremented by 1
ret = r.zincrby ("login: login_times", 1, uid)
// So how to get the users with the most logins, in reverse order to get the top N users
ret = r.zrevrange ("login: login_times", 0, N-1)
Another common requirement is that data from a variety of databases is not stored in memory, so the performance of the database is not as good as the ability to sort by points and update them in real-time, almost every second. Typically, for example, the leaderboard for online games, such as a Facebook game, based on the score you usually want:
-List Top 100 high-score contestants
-List A user's current global rankings
These operations are a piece of cake for redis, and even if you have millions of users, there will be millions of new points per minute. The pattern is this, each time we get a new score, we use this code:
Zadd leaderboard <score> <username>
You may replace username with a userid, depending on how you designed it. getting the top 100 high-score users is simple:
Zrevrange Leaderboard 0 99
The global rankings for users are similar, and only need to:
Zrank Leaderboard <username>
3. Delete and filter
We can use Lrem to delete comments. If the deletion is very small, the other option is to skip the entry of the comment directly and report that the comment no longer exists. There are times when you want to attach different filters to different lists. If the number of filters is limited, you can simply use a different Redis list for each of the different filters. After all, there are only 5,000 items per list, but Redis is able to use very little memory to handle millions of items.
4, according to the user vote and time sorting
A common variant pattern of the leaderboard, like Reddit or hacker news, is sorted by score based on a formula similar to the following:score = Points/time^alpha So the user's vote will dig the news accordingly, But time will follow a certain index to bury the news. Here's our pattern, of course the algorithm is up to you. The pattern is this, starting with looking at items that might be up-to-date, such as 1000 of the news on the first page are candidates, so let's just ignore the others, which is easy to implement. each time a new news post comes up, we add the ID to the list and use Lpush + LTRIM to ensure that only the latest 1000 items are removed. there is a background task to get this list and continue to calculate the final score for each of the 1000 news articles. The results are populated by the Zadd command in the new Order, and the old news is cleared. The key idea here is that the sort work is done by the background task.
5. Processing Overdue Items
Another common sort of item is sorting by time. We use Unix time as a score. The pattern is as follows:
-Each time a new item is added to our non-Redis database, we add it to the sorted collection. Then we use the time attribute, Current_time and time_to_live.
-Another background task using Zrange ... Scores queries the sorted collection and takes out the latest 10 items. If the Unix time is found to have expired, delete the entry in the database.
6. Counting
Redis is a good counter, thanks to Incrby and other similar commands. I believe that you have tried many times to add new counters to your database to get statistics or display new information, but eventually you have to discard them because of write sensitivity. Okay, now using redis doesn't have to worry anymore. With atomic increment (atomic increment), you can safely add a variety of counts, reset with Getset, or let them expire. For example, this action:
INCR user:<id> EXPIRE
You can figure out the number of page views that have recently been paused for up to 60 seconds between pages, and when the count reaches like 20 o'clock, you can show some banner hints, or anything else you want to show.
7. Specific projects within a specific time period
Another is difficult for other databases, but the easy part of Redis is to count how many specific users have visited a particular resource during a particular period of time. For example, I want to know some specific registered users or IP addresses, how many of them have visited an article. every time I get a new Page view I just need to do this:
Of course, you might want to replace day1 with Unix time, such as timing ()-(Times ()%3600*24) and so on. want to know the number of specific users? Only need to use
SCard page:day1:<page_id>
Need to test if a particular user has access to this page?
Sismember page:day1:<page_id>
8. Find the interval where a value is located (no overlap in the interval): (Sorted Set)
For example, there are two ranges, 10-20 and 30-40
- A_start, A_end 20
- B_start, B_end 40
We will have the starting position of these two ranges in the sorted sets data structure of Redis, the base range starting value as score, the range name plus start and end for its value:
redis 127.0.0.1:6379> zadd ranges 10 A_start
(integer) 1
redis 127.0.0.1:6379> zadd ranges 20 A_end
(integer) 1
redis 127.0.0.1:6379> zadd ranges 30 B_start
(integer) 1
redis 127.0.0.1:6379> zadd ranges 40 B_end
(integer) 1
This way, after inserting sorted sets, the data corresponds to the order in which the starting positions are sorted. Now I need to find out where the value of 15 is in the range, only the following zrangbyscore lookups are required:
Redis 127.0.0.1:6379> zrangebyscore ranges (+inf LIMIT 0) "A_end"
This command means finding the first value greater than 15 in the sorted sets. (+inf represents positive infinity in Redis, 15 of the front brackets indicate >15 instead of >=15) and the result of the lookup is a_end, since all values are in order, so it can be decided that 15 is on A_start to A_end interval, that is, 15 is in a range. That's it.
9, intersection, and set, Difference set: (set)
// The book table stores the book name
set book: 1: name ”The Ruby Programming Language”
set book: 2: name ”Ruby on rail”
set book: 3: name ”Programming Erlang”
// tag table uses collections to store data, because collections are good at finding intersections and unions
sadd tag: ruby 1
sadd tag: ruby 2
sadd tag: web 2
sadd tag: erlang 3
// A book that belongs to both ruby and the web?
inter_list = redis.sinter ("tag.web", "tag: ruby")
// ie books that belong to ruby but not to the web?
inter_list = redis.sdiff ("tag.ruby", "tag: web")
// A collection of books that belong to ruby and belong to the web?
inter_list = redis.sunion ("tag.ruby", "tag: web")
Redis Application Scenarios