1. Problems with the mysql+memcached architecture
The actual MySQL is suitable for massive data storage, through the memcached to load hot data to the cache, speed up access, many companies have used such a structure, but with the increasing volume of business data, and the continuous growth of traffic, we encounter a lot of problems:
1.MySQL needs to be continuously disassembled, memcached also need to continuously follow the expansion, expansion and maintenance work occupy a lot of development time.
2.Memcached and MySQL database data consistency issues.
3.Memcached data hit rate is low or down, a lot of access directly through to the Db,mysql cannot be supported.
4. Cross-room cache sync problem.
Many NoSQL blossom, how to choose
In recent years, the industry has been emerging many kinds of nosql products, so how to properly use these products, maximize their strengths, is that we need to further research and think about the problem, in fact, the most important thing is to understand the positioning of these products, and understand the tradeoffs of each product , in the practical application to achieve weaknesses, in general, these nosql mainly used to solve the following problems
1. Small amount of data storage, high-speed read and write access. This kind of product through the data all In-momery way to guarantee the high speed access, simultaneously provides the data landing function, actually this is the Redis most main application scenario.
2. Massive data storage, distributed system support, data consistency guarantee, convenient cluster node Add/delete.
3. The most representative of this is the ideas set out in the 2 essays by Dynamo and BigTable. The former is a completely non-central design, the node through the gossip way to pass the cluster information, the data to ensure the final consistency, the latter is a centralized scheme design, through a similar distributed lock service to ensure strong consistency, data written to write memory and redo log, The periodic compat are then merged onto the disk, and the random write is optimized for sequential writes, which improves write performance.
4.Schema free,auto-sharding and so on. For example, some of the common document databases are support Schema-free, directly store JSON format data, and support functions such as auto-sharding, such as MongoDB.
In the face of these different types of nosql products, we need to choose the most appropriate product based on our business scenario.
Redis is best suited for all data in-momory scenarios, although Redis also provides persistence, but actually more of a disk-backed function, compared to the traditional sense of persistence there is a big difference, then you may have questions, It seems that Redis is more like an enhanced version of memcached, so when to use memcached, when to use Redis?
If you simply compare the differences between Redis and memcached, most of them will get the following ideas:
1, Redis not only supports simple k/v type of data, but also provides the storage of data structures such as List,set,zset,hash.
2, Redis support data backup, that is, Master-slave mode of data backup.
3, Redis support data persistence, you can keep the in-memory data on the disk, restart the time can be loaded again for use.
2. Redis Common data types
The most commonly used data types for Redis are the following:
- String
- Hash
- List
- Set
- Sorted Set
- Pub/sub
- Transactions
Before describing these types of data, let's look at a diagram of how these different data types are described in Redis internal memory management:
First, Redis internally uses a Redisobject object to represent all key and value,redisobject information, as shown in the following:
Type represents what data type a value object is,
Encoding is how different data types are stored inside the Redis,
For example: Type=string means that value is stored as a normal string, then the corresponding encoding can be raw or int, and if it is an int the actual redis is stored and represented by a numeric class, Of course, the premise is that the string itself can be expressed as a numeric value, such as: "123" "456" such a string.
Here you need to specify the VM field, only the virtual memory feature of Redis is turned on, this field will actually allocate memory, which is turned off by default, which is described later in this function. We can find that Redis uses Redisobject to indicate that all key/value data is a waste of memory, and of course, the cost of memory management is mainly to provide a unified management interface for different data types of Redis. The actual author also offers several ways to help us save memory as much as possible, which we'll discuss in detail later.
3. Various data type application and implementation methods
Let's start with the analysis of the use of these 7 types of data and how to implement them internally:
The
Strings data structure is a simple key-value type, and value is not only a string, it can also be a number.
Common commands: Set,get,decr,incr,mget and so on.
Application Scenarios: String is the most commonly used type of data, and normal Key/value storage can be categorized as such. It is possible to fully implement the current Memcached functionality and be more efficient. You can also enjoy Redis's timed persistence, operation logs, and replication functions. In addition to providing operations like get, set, INCR, DECR, and so on, Redis provides the following Memcached:
- Get string length
- Append content to a string
- Set and get a section of a string
- Set and get one of the strings (bit)
- Bulk set the contents of a series of strings
Implementation method: String in the Redis internal storage By default is a string, referenced by Redisobject, when encountered INCR,DECR and other operations will be converted to a numeric type for calculation, at this time Redisobject encoding field is an int.
Common commands:Hget,hset,hgetall and so on.
scenario: in memcached, we often package structured information into HashMap, which is stored as a string value after the client is serialized, such as the user's nickname, age, gender, integral, and so on, when one of these items needs to be modified. It is usually necessary to remove all values after deserialization, modify the value of an item, and then serialize the store back. This not only increases the overhead, but also does not apply to some scenarios where concurrent operations are possible (for example, two concurrent operations need to modify the integral). The hash structure of Redis allows you to modify only one item property value just as you would update a property in a database.
Let's simply cite an example to describe the application scenario for a hash, such as storing a user information object data that contains the following information:
The user ID is the key to find, the stored value user object contains the name, age, birthday and other information, if the ordinary key/value structure to store, mainly has the following 2 kinds of storage methods:
The disadvantage of using the user ID as a lookup key to encapsulate other information as a serialized object is to increase the cost of serialization/deserialization and to retrieve the entire object when one of the information needs to be modified, and the modification operation requires concurrency protection. Introduce complex problems such as CAs.
The second method is how many members of this user information object will be saved into the number of key-value, with the user id+ the name of the corresponding property as a unique identifier to obtain the value of the corresponding property, although the cost of serialization and concurrency is omitted, but the user ID is repeated storage, if there is a large number of such data, The memory waste is still very considerable.
So the hash provided by Redis is a good solution to this problem, and the Redis hash is actually the internal stored value as a hashmap, and provides a direct access to the map member's interface, such as:
That is, the key is still the user ID, value is a map, the map key is a member of the property name, value is the property value, so that the data can be modified and accessed directly through its internal map key (Redis called internal map key field), This means that the corresponding attribute data can be manipulated by key (user ID) + field (attribute tag), without the need to store the data repeatedly and without the problem of serialization and concurrency modification control. A good solution to the problem.
It is also important to note that Redis provides an interface (Hgetall) that can fetch all of the property data directly, but if the internal map has a large number of members, it involves traversing the entire internal map, which can be time-consuming due to the Redis single-threaded model. The other client requests are not responding at all, which requires extra attention.
Implementation method:
The above has been said that the Redis hash corresponds to value inside the actual is a hashmap, actually there will be 2 different implementations, this hash of the members of the relatively small redis in order to save memory will be similar to a one-dimensional array to compact storage, without the use of a real HASHMAP structure , the encoding of the corresponding value Redisobject is Zipmap, and when the number of members increases, it automatically turns into a true hashmap, at which time encoding is HT.
Common commands:lpush,rpush,lpop,rpop,lrange and so on.
Application Scenarios:
Redis list has a lot of applications and is one of the most important data structures of redis, such as Twitter watchlist, fan list, etc. can be implemented using Redis's list structure.
Lists are linked lists, and people who believe that they have a knowledge of data structures should be able to understand their structure. With the lists structure, we can easily achieve the latest message ranking and other functions. Another application of lists is Message Queuing,
The lists push operation can be used to present the task in lists, and then the worker thread then takes the task out of execution with the pop operation. Redis also provides an API for manipulating a section of lists, where you can directly query and delete elements from a section of lists.
Implementation method:
The implementation of Redis list is a doubly linked list, which can support reverse lookup and traversal, but it is more convenient to operate, but it brings some additional memory overhead, and many implementations within Redis, including sending buffer queues, are also used in this data structure.
Common commands:
Sadd,spop,smembers,sunion and so on.
Application Scenarios:
The functionality provided by Redis set externally is a list-like feature, except that set is automatically weight-saving, and set is a good choice when you need to store a list of data and you don't want duplicate data. and set provides an important interface to determine whether a member is within a set set, which is not available in list.
The concept of a sets collection is a combination of a bunch of distinct values. Using the sets data structure provided by Redis, you can store some aggregated information, such as in a microblog application, where you can have a collection of all the followers of a user and a collection of all their fans. Redis also provides for the collection of intersection, set, difference sets and other operations, can be very convenient to achieve such as common concern, common preferences, two-degree friends and other functions, to all of the above collection operations, you can also use different commands to choose whether to return the results to the client or save set into a new collection.
Implementation method:
The internal implementation of set is a value that is always null hashmap, which is actually calculated by hashing the way to fast weight, which is also set to provide a judge whether a member is within the cause of the collection.
Common commands:
Zadd,zrange,zrem,zcard, etc.
Usage scenarios:
The usage scenario for Redis sorted set is similar to set, except that the set is not automatically ordered, and the sorted set can be ordered by the user with an additional priority (score) parameter, and is inserted in an orderly, automatic sort. When you need an ordered and non-repeating collection list, you can choose sorted set data structures, such as the public Timeline of Twitter, which can be stored as score in the publication time, which is automatically sorted by time.
Also can use sorted sets to do with the weight of the queue, such as the normal message score is 1, the important message of the score is 2, and then the worker can choose to press score reverse order to get work tasks. Let important tasks take precedence.
Implementation method:
Redis sorted set internal use HashMap and jump Table (skiplist) to ensure the storage and ordering of data, HashMap in the member to score mapping, and the jumping table is all the members, sorted by HashMap in the score , the use of the structure of the jumping table can obtain a relatively high efficiency of finding, and it is relatively simple to implement.
Pub/sub literally is the release (Publish) and Subscription (Subscribe), in Redis, you can set a key value for message publishing and message subscription, when a key value on a message published, all subscribed to its client will receive the corresponding message. The most obvious use of this function is to use it as a real-time messaging system, such as regular live chat, group chat, and other functions.
Who says NoSQL does not support transactions, although Redis's transactions provides not strictly acid transactions (such as a string of commands executed with Exec execution, in the execution of the server down, then there will be a part of the command execution, the rest is not executed), However, this transactions provides the basic command package execution function (in case the server does not have a problem, you can ensure that a series of commands are executed together in sequence, there will be other client commands inserted to execute). Redis also provides a watch function, you can watch a key, and then execute transactions, in the process, if the value of this watched is modified, then this transactions will find and refuse to execute.
4. Redis real-World application scenarios
Redis differs from other database solutions in many ways: it uses memory to provide primary storage support, and only uses hard disk for persistent storage; its data model is unique and is single-threaded. Another big difference is that you can use the capabilities of Redis in your development environment, but you don't need to go to Redis.
Turning to Redis is certainly desirable, and many developers have made Redis the preferred database from the outset, but it is not easy to replace the database framework if your development environment is already set up and the application is already running on it. In addition, in some applications that require large-capacity datasets, Redis is not suitable because its datasets do not exceed the memory available to the system. So if you have big data applications and mostly read access patterns, Redis is not the right choice.
However, one of the things I like about Redis is that you can incorporate it into your system, which can solve a lot of problems, such as tasks that your existing database is slow to handle. This allows you to optimize with Redis or create new features for your app. In this article, I want to explore how to add Redis to an existing environment and use its primitive commands to solve some of the common problems encountered in traditional environments. In these cases, Redis is not a preferred database.
1. Display the latest project list
The following statement is often used to show the latest items, and with more data, the query will undoubtedly become slower.
- SELECT * from foo WHERE ... ORDER by Time DESC LIMIT
In a web app, queries such as "list up-to-date replies" are common, which often leads to extensibility issues. This is frustrating, because the project was created in this order, but it had to be sorted in order to output it.
A similar problem can be solved with redis. For example, one of our web apps wants to list the latest 20 reviews posted by users. We have a "show all" link on the side of the latest comment, and you can get more comments when you click on it.
We assume that each comment in the database has a unique incrementing ID field.
We can use pagination to make page and comment pages, use the Redis template, and each time a new comment is published, we'll add its ID to a redis list:
- Lpush latest.comments <ID>
We crop the list to a specified length, so Redis only needs to save the latest 5,000 comments:
LTRIM latest.comments 0 5000
Each time we need to get the scope of the latest review project, we call a function to complete (using pseudocode):
- FUNCTION get_latest_comments (Start, num_items):
- Id_list = Redis.lrange ("latest.comments", start,start+num_items-1)
- IF Id_list.length < Num_items
- Id_list = sql_db ("Select ... ORDER by Time LIMIT ... ")
- END
- RETURN id_list
- END
What we do here is very simple. Our latest ID in Redis uses a resident cache, which is always updated. But we've made a limit of no more than 5,000 IDs, so our Get ID function will always ask for Redis. Access to the database is required only if the Start/count parameter is out of range.
Our system does not "refresh" the cache in the traditional way, and the information in the Redis instance is always consistent. The SQL database (or other type of database on the hard disk) is only triggered when the user needs to get "very far" data, and the home page or the first comment page will not bother the database on the hard disk.
2. Delete and filter
We can use Lrem to delete comments. If the deletion is very small, the other option is to skip the entry of the comment directly and report that the comment no longer exists.
There are times when you want to attach different filters to different lists. If the number of filters is limited, you can simply use a different Redis list for each of the different filters. After all, there are only 5,000 items per list, but Redis is able to use very little memory to handle millions of items.
3. Leaderboard related
Another common requirement is that data from a variety of databases is not stored in memory, so the performance of the database is not as good as the ability to sort by points and update them in real-time, almost every second.
Typically, for example, the leaderboard for online games, such as a Facebook game, based on the score you usually want:
-List Top 100 high-score contestants
-List A user's current global rankings
These operations are a piece of cake for redis, and even if you have millions of users, there will be millions of new points per minute.
The pattern is this, each time we get a new score, we use this code:
Zadd Leaderboard <score> <username>
You may replace username with a userid, depending on how you designed it.
Getting the top 100 high-score users is simple: Zrevrange leaderboard 0 99.
The global ranking of users is similar, only need: Zrank leaderboard <username>.
4, according to the user vote and time sorting
A common variant pattern of the leaderboard is, like Reddit or hacker news, that the news is sorted according to a score similar to the following formula:
Score = Points/time^alpha
So the user's vote will be the corresponding to dig out the news, but the time will follow a certain index to bury the news. Here's our pattern, of course the algorithm is up to you.
The pattern is this, starting with looking at items that might be up-to-date, such as 1000 of the news on the first page are candidates, so let's just ignore the others, which is easy to implement.
Each time a new news post comes up, we add the ID to the list and use Lpush + LTRIM to ensure that only the latest 1000 items are removed.
There is a background task to get this list and continue to calculate the final score for each of the 1000 news articles. The results are populated by the Zadd command in the new Order, and the old news is cleared. The key idea here is that the sort work is done by the background task.
5. Processing Overdue Items
Another common sort of item is sorting by time. We use Unix time as a score.
The pattern is as follows:
-Each time a new item is added to our non-Redis database, we add it to the sorted collection. Then we use the time attribute, Current_time and time_to_live.
-Another background task using Zrange ... Scores queries the sorted collection and takes out the latest 10 items. If the Unix time is found to have expired, delete the entry in the database.
6. Counting
Redis is a good counter, thanks to Incrby and other similar commands.
I believe that you have tried many times to add new counters to your database to get statistics or display new information, but eventually you have to discard them because of write sensitivity.
Okay, now using redis doesn't have to worry anymore. With atomic increment (atomic increment), you can safely add a variety of counts, reset with Getset, or let them expire.
For example, this action:
INCR user:<id> EXPIRE
User:<id> 60
You can figure out the number of page views that have recently been paused for up to 60 seconds between pages, and when the count reaches like 20 o'clock, you can show some banner hints, or anything else you want to show.
7. Specific projects within a specific time period
Another is difficult for other databases, but the easy part of Redis is to count how many specific users have visited a particular resource during a particular period of time. For example, I want to know some specific registered users or IP addresses, how many of them have visited an article.
Every time I get a new Page view I just need to do this:
Sadd page:day1:<page_id> <user_id>
Of course, you might want to replace day1 with Unix time, such as timing ()-(Times ()%3600*24) and so on.
Want to know the number of specific users? You only need to use SCard page:day1:<page_id>.
Need to test if a particular user has access to this page? Sismember page:day1:<page_id>.
8, real-time analysis of the situation is happening, for data statistics and prevention of spam, etc.
We've only done a few examples, but if you look at Redis's command set and combine it, you'll get a lot of real-time analytics, effective and very labor-efficient. Using the Redis Primitives command makes it easier to implement spam filtering systems or other real-time tracking systems.
9, Pub/sub
Redis's pub/sub is very, very simple, stable and fast to run. Supports pattern matching and enables real-time subscription and cancellation of channels.
10. Queue
You should have noticed that Redis commands such as List push and list pop are easy to perform queue operations, but can do more than that: for example, Redis also has a variant command of list pop, which blocks the queue when the list is empty.
Message Queuing (Messaging) is used extensively in modern Internet applications. Message Queuing is used not only for communication between components within the system, but also for interaction between the system and other services. The use of Message Queuing can increase the scalability, flexibility, and user experience of the system. A system that is not message-queuing-based, whose speed depends on the speed of the slowest component in the system (note: short-board effect). Message Queuing allows the components in the system to be decoupled so that the system is no longer constrained by the slowest components, and the components can run asynchronously to perform their work faster.
In addition, when the server is in high concurrency, such as writing log files frequently. You can use Message Queuing to implement asynchronous processing. This enables high-performance concurrent operations.
11. Cache
The cache portion of Redis is worth writing a new article, and I'm just saying it briefly. Redis can replace memcached, so that your cache can only store data to be able to update data, so you no longer need to regenerate the data every time.
Category: Redis
A detailed description of the Redis data type and scenarios for Redis applications