Introduction to the application scenario of Redis database _redis

Source: Internet
Author: User
Tags comments data structures hash numeric memcached memory usage redis time limit

I. The problem of mysql+memcached architecture

The actual MySQL is suitable carries on the massive data storage, through memcached hot spot data loading to the cache, speeds up the visit, many companies have used this kind of architecture, but along with the business data quantity unceasing increase, and the traffic growth, we have encountered many problems:

1 MySQL needs to continue to dismantle the table, memcached also need to constantly follow the expansion, expansion and maintenance work occupy a lot of development time.

2) memcached and MySQL database data consistency problem.

3 memcached data hit low or down machine, a large number of access directly through the Db,mysql can not support.

4 The problem of the cache synchronization across the engine room.

Many NoSQL blossom, how to choose

In recent years, the industry has sprung up a lot of various nosql products, so how to correctly use these products, maximize their strengths, is that we need in-depth research and thinking of the problem, in fact, the most important thing is to understand the positioning of these products, and understand the tradeoffs of each product , in the practical application to avoid weaknesses, in general, these nosql mainly used to solve the following problems:

1 A small amount of data storage, high-speed read and write access. This kind of product through the data all In-momery way to guarantee the high speed access, simultaneously provides the data to fall the function, actually this is redis most main application scene.

2 mass data storage, distributed system support, data consistency assurance, convenient cluster node Add/delete.

3 This is the most representative of Dynamo and BigTable 2 papers elaborated in the train of thought. The former is a completely gossip design, the node through the way to pass the cluster information, data to ensure the final consistency, the latter is a central design, through a similar distributed lock service to ensure strong consistency, data writes first write memory and redo log, Then the periodic compat is merged into the disk, and the random writes are optimized to write in order to improve the write performance.

4) Schema free,auto-sharding and so on. For example, some common document databases are supported Schema-free, directly store JSON format data, and support auto-sharding functions such as MongoDB.

In the face of these different types of nosql products, we need to choose the most suitable products according to our business scenarios.

Redis is most suitable for all data in-momory scenarios, although Redis also provides the persistence function, but actually more is a disk-backed function, compared with the traditional sense of persistence has the big difference, then perhaps everybody may have the question, It seems that redis more like a strengthened version of the memcached, then when to use memcached, when to use Redis?

If you simply compare the differences between Redis and memcached, most will get the following view:

1 Redis not only supports simple k/v types of data, but also provides storage of data structures such as List,set,zset,hash.

2 Redis Support data backup, that is, the Master-slave mode of data backup.

3 Redis Support Data persistence, you can keep the data in memory in the disk, restart can be loaded again to use.

Ii. Common data types of Redis

Redis the most commonly used data types and support features are mainly as follows:

Copy Code code as follows:

String
Hash
List
Set
Sorted Set
Pub/sub
Transactions

Before describing these types of data, let's take a picture of how these different data types are described in Redis internal memory management:

First, Redis internally uses a Redisobject object to represent all key and value,redisobject most important information as shown above:

Type represents the specific data type of a value object, and encoding is the way in which different data types are stored inside the Redis.

For example: type=string represents the value stored is a normal string, then the corresponding encoding can be raw or int, if it is an int represents the actual Redis internally is stored and represented by numeric types of the string, Of course, the premise is that the string itself can be expressed numerically, such as "123" "456" string.

This requires a special description of the VM field, only if the Redis virtual Memory feature is turned on, this field will actually allocate memory, which is turned off by default, which is described later. The above figure shows that Redis use Redisobject to represent all the key/value data is a waste of memory, of course, these memory management costs are mainly to give redis different data types to provide a unified management interface, The actual author also provides a variety of ways to help us save memory usage as much as possible, and we'll discuss it later.

Three, Redis data type application and realization way

Let's first analyze the use and internal implementation of these 5 types of data:

String

The Strings data structure is a simple key-value type, and value is not only a string, but also a number.

Common commands: Set,get,decr,incr,mget and so on.

Scenario: string is one of the most commonly used data types, and ordinary key/value storage can be grouped into this class. That is, the functionality of the current Memcached can be fully realized, and the efficiency is higher. You can also enjoy the redis of timing, operation log and replication functions.

In addition to providing operations such as GET, set, INCR, DECR, and Memcached, Redis also provides the following actions:

Get string length

Append content to a string

Set up and get a section of a string

Sets and gets a character (bit) of a string

Batch set the contents of a series of strings

Implementation: string Redis internal storage default is a string, referenced by Redisobject, when encountered INCR,DECR and other operations will be converted to a numeric calculation, at this time redisobject encoding field Int.

Hash

Common commands: Hget,hset,hgetall and so on.

Application scenario: In memcached, we often package structured information into HashMap, which is stored as a string after the client is serialized, such as the user's nickname, age, gender, integral, and so on, and when one of the items needs to be modified, it is often necessary to remove all values from the deserialization. Modifies the value of an item and then serializes the store back. This not only increases overhead, but also does not apply to situations where there may be concurrent operations (for example, two concurrent operations need to modify the integral). The REDIS hash structure allows you to modify only one property value just as you would update an attribute in a database.

We simply cite an example to describe the application scenario for the hash, for example, we want to store a user information object data containing the following information:
The user ID is lookup key, the stored value user object contains the name, the age, the birthday and so on information, if uses the ordinary key/value structure to store, mainly has the following 2 kinds of storage way:

The first way is to use the user ID as the lookup key, the disadvantage of encapsulating other information as a serialized object is to increase the cost of serialization/deserialization and to retrieve the entire object when one of the information needs to be modified, and the modification operation requires concurrency protection. Introduce complex issues such as CAs.

The second method is how many members of the user information object are stored as Key-value pairs, the value of the corresponding attribute is obtained with the name of the corresponding property of the user id+, although the serialization cost and concurrency problems are omitted, but the user ID is duplicate storage, and if there is a large amount of such data, The memory waste is still very considerable.

So Redis provides a good solution to this problem, the hash of Redis is actually the internal storage value of a hashmap, and provides a direct access to this map member of the interface, the following figure:

In other words, the key is still the user ID, value is a map, the map key is the member's attribute name, value is the attribute value, so that the modification and access to the data can be directly through its internal map key (Redis called the internal Map key field), That is, the key (user ID) + field (attribute tag) can manipulate the corresponding attribute data, neither need to duplicate the data storage, nor bring serialization and concurrency modification control problems. Solved the problem very well.

At the same time, note that Redis provides an interface (Hgetall) to directly fetch all of the attribute data, but if there are many members of the internal map, then it involves traversing the entire internal map, which may be time-consuming due to the Redis single-threaded model, The other client's request is not responding at all, which requires extra attention.

Implementation mode:

Above already said Redis hash corresponds to value inside actually is a hashmap, actually there will be 2 different implementations, this hash member is relatively young redis in order to save memory will be similar to a one-dimensional array of methods to compact storage, and will not adopt a real HASHMAP structure , the corresponding value Redisobject encoding is Zipmap, and when the number of members increases, it automatically turns into a real hashmap, at which point the encoding is HT.

List

Common commands: Lpush,rpush,lpop,rpop,lrange and so on.

Application Scenario: Redis list has a lot of applications and is one of the most important data structures in Redis, such as Twitter's attention list, fan list, etc. can be implemented with REDIS list structure.

Lists is a linked list, I believe that the knowledge of a little data structure should be able to understand its structure. With the lists structure, we can easily implement features such as the latest news ranking. Another application of lists is Message Queuing, where a lists push operation can be used to place the task in lists, and then the worker will then use the pop operation to take the task out and execute it. Redis also provides an API for manipulating a section of lists, which you can query directly to remove elements from a section of the lists.

Implementation: Redis list is implemented as a two-way linked list, that can support reverse lookup and traversal, more convenient operation, but brought some additional memory overhead, many implementations of Redis, including sending buffer queues are also used in this data structure.

Set

Common commands: Sadd,spop,smembers,sunion and so on.

Application Scenario: Redis set provides functionality that is similar to a list, especially if set is automatic, and when you need to store a list of data and do not want duplicate data, set is a good choice. and set provides an important interface for determining whether a member is within a set set, which is not available in the list.

The concept of a Sets set is a combination of a bunch of distinct values. Using the sets data structure provided by Redis, you can store some aggregated data, for example, in a microblog application, you can have all the followers of a user in a collection, and all their fans in one collection. Redis also provides an intersection for the collection, and set, the difference set and so on, can be very convenient to achieve such as common concern, common preferences, two degrees friends and other functions, on the above all the set operation, you can also use different commands to choose the results returned to the client or save set to a new collection.

Implementation: The internal implementation of set is a value is always null HashMap, in fact, by calculating the hash of the way to quickly row weight, which is also set can provide to determine whether a member is within the set of reasons.

Sorted Set

Common commands: Zadd,zrange,zrem,zcard, etc.

Usage Scenario: Redis sorted set is similar to set, except that set is not automatically ordered, and sorted set can be sorted by the user providing an additional priority (score) parameter, and is inserted in an orderly, automatic sort. When you need an ordered and not duplicated list of collections, you can choose to sorted set data structures, such as Twitter's public timeline can be stored as score by publication time, which is automatically sorted by time.

In addition, you can use sorted sets to do with the weight of the queue, such as the ordinary message of the score 1, the important message of score 2, then the worker can choose to press the reverse of the score to get work tasks. Give priority to important tasks.

Implementation: Redis sorted set of the internal use of HashMap and jump Table (skiplist) to ensure the storage and order of data, HashMap is placed in the member to the score map, and the jump tables are stored all members, The sorting is based on the score in the HashMap, the structure of the jump table can obtain the high search efficiency, and it is simpler to realize.

Pub/sub

Pub/sub is literally the publication (Publish) and Subscription (Subscribe), in Redis, you can set a key value for the message and message subscriptions, when a key value of the message published, all subscribed to its clients will receive the corresponding message. The most obvious use of this function is to use as a real-time messaging system, such as ordinary instant chat, group chat and other functions.

Transactions

Who says NoSQL does not support transactions, although Redis's transactions does not provide strict acid transactions (such as a string of commands executed with Exec, the server goes down in execution, a portion of the command executes, and the rest is not executed), However, this transactions provides basic commands for packaging execution (when the server is not in trouble, you can guarantee that a series of commands are executed sequentially, and that there will be other client commands inserted to execute).

Redis also provides a watch function where you can watch a key and then perform transactions, in which case the watched will find and refuse to execute if the value of the transactions is modified.

Four, Redis actual application scene

Redis differs in many ways from other database solutions: It uses memory to provide primary storage support, and uses hard disks for persistent storage; its data model is unique and single-threaded. Another big difference is that you can use Redis functionality in your development environment, but you don't need to go to Redis.

Steering Redis is certainly also desirable, with many developers using Redis as a preferred database from the outset, but assuming that if your development environment is already built and your application is already running on it, then replacing the database framework is obviously not easy. In addition, in some applications that require a large-capacity dataset, Redis is also not appropriate because its dataset does not exceed the memory available to the system. So if you have large data applications and are primarily read access patterns, then Redis is not the right choice.

What I like about Redis, though, is that you can incorporate it into your system, which solves a lot of problems, such as the tasks that your existing database feels slow to handle. This allows you to optimize by Redis or create new features for the application. In this article, I want to explore how to add Redis to the existing environment, and use its primitive commands to solve some of the common problems encountered in the traditional environment. In these examples, Redis is not a preferred database.

1, show the latest list of items

The following statement is commonly used to display the most recent items, and as more data is available, there is no doubt that the query will become increasingly slow.

Copy Code code as follows:

SELECT * from foo WHERE ... Order by Time DESC LIMIT 10

In Web applications, queries such as "list the latest replies" are common, which often leads to scalability issues. This is frustrating because the project was created in this order, but it has to be sorted to output this order.

A similar problem can be solved with redis. For example, one of our web apps wants to list the latest 20 comments posted by users. We have a "show all" link on the latest comment, and you can get more comments when clicked.

We assume that each comment in the database has a unique incremented ID field.

We can use pagination to make the home page and comment page, using the Redis template, each time a new comment is published, we will add its ID to a redis list:

Copy Code code as follows:

Lpush latest.comments

We crop the list to a specified length, so Redis only needs to save the last 5,000 comments:

Copy Code code as follows:

LTRIM latest.comments 0 5000

Each time we need to get the project scope for the latest comment, we call a function to complete (using pseudocode):

Copy Code code as follows:

FUNCTION get_latest_comments (Start, num_items):
Id_list = Redis.lrange ("latest.comments", start,start+num_items-1)
IF Id_list.length < Num_items
Id_list = sql_db ("Select ...") ORDER by Time LIMIT ... ")
End
Return id_list
End

What we do here is very simple. Our latest ID in Redis uses the resident cache, which is updated all the time. But we did limit it to no more than 5,000 IDs, so our Get ID function would always ask Redis. You need to access the database only if the Start/count parameter is out of range.

Our system does not "flush" the cache as traditional, and the information in the Redis instance is always consistent. The SQL database (or other type of database on the hard disk) is triggered only when the user needs to get "very far" data, and the home page or the first comment page does not bother the database on the hard drive.

2, delete and filter

We can use Lrem to delete comments. If the deletion is very small, the other option is to skip directly to the entry of the comment entry and report that the comment no longer exists.

Sometimes you want to attach a different filter to a different list. If the number of filters is limited, you can simply use a different Redis list for each of the different filters. After all, there are only 5,000 items per list, but Redis can use very little memory to handle millions of items.

3, the list of related

Another common requirement is that the data for various databases is not stored in memory, so the performance of the database is not ideal in order of scoring and real-time updating of these functions that need to be updated almost every second.

Typically, for example, a list of those online games, such as a Facebook game, according to the score you usually want to:

-List Top 100 high score players

-List the current global rankings for a user

These operations are a piece of cake for redis, and even if you have millions of users, there will be millions of new points per minute.

The pattern is this, each time we get a new score, we use this code:

Copy Code code as follows:

Zadd Leaderboard <score> <username>

You may use UserID to replace username, depending on how you designed it.

Getting the top 100 high score users is simple:

Copy Code code as follows:

Zrevrange Leaderboard 0 99

The user's global rankings are similar, and only need:

Copy Code code as follows:

Zrank Leaderboard

4, according to the user vote and time sorting

A common variant of the list, like Reddit or hacker news, is sorted by score according to a formula similar to the following:

Copy Code code as follows:

Score = Points/time^alpha

So the user's vote will be the corresponding to dig out the news, but the time will be according to a certain index will bury the news. Here is our pattern, of course the algorithm is up to you.

The pattern is this, the first to observe those may be the latest items, such as the first page of the 1000 news is a candidate, so we first ignore the other, this is easy to implement.

Each time a new news post comes up, we add IDs to the list and use Lpush + LTRIM to ensure that only the latest 1000 items are removed.

A background task gets the list and continues to compute the final score for each piece of news in the 1000 news articles. The calculation results are populated by the Zadd command in a new order, and the old news is cleared. The key idea here is that the sort work is done by background tasks.

5, Processing expired items

Another common sort of item is sorted by time. We use Unix time as a score.

The pattern is as follows:

-Each time a new item is added to our Redis database, we add it to the sorted collection. Then we used the time attribute, Current_time and time_to_live.

-Another background task uses zrange ... Scores query sort collection, take out the latest 10 items. If the Unix time is found to have expired, the entry is deleted in the database.

6. Counting

Redis is a good counter, thanks to Incrby and other similar commands.

I believe that you have tried many times to add new counters to the database to get statistics or to display new information, but in the end you have to give them up because of write sensitivity.

OK, now use redis don't need to worry anymore. With atomic increments (atomic increment), you can safely add a variety of counts, reset them with getset, or let them expire.

For example this operation:

Copy Code code as follows:

INCR User:
EXPIRE user:60

You can calculate the number of pages that the user has recently paused for no more than 60 seconds, and when the count reaches like 20 o'clock, you can display some banner hints or anything else you want to show.

7. Specific projects within a specific period of time

Another is difficult for other databases, but the easy part of Redis is to count how many specific users have access to a particular resource during a certain feature time. For example, I want to know some specific registered users or IP address, how much they have access to an article.

I only need to do this every time I get a new Page view:

Copy Code code as follows:

Sadd page:day1:<page_id> <user_id>

Of course, you may want to replace day1 with Unix time, such as Times ()-(Time ()%3600*24) and so on.

Want to know the number of specific users? You only need to use SCard page:day1:<page_id>.

Need to test whether a particular user has accessed this page? Sismember page:day1:<page_id>.

8, real-time analysis of what is happening, for data statistics and prevention of spam and so on

We've only done a few examples, but if you look at the Redis command set and combine it, you can get a lot of real-time analytics that are efficient and very labor-saving. Using the Redis command, it is easier to implement spam filtering systems or other real-time tracking systems.

9, Pub/sub

The Redis pub/sub is very, very simple, stable and quick to run. Supports pattern matching, enables real-time subscription and cancellation of channels.

10. Queue

You should have noticed that the Redis commands like the list push and list pop are handy for performing queues, but can do more than that: Redis also has a variation command for list pops that blocks queues when the list is empty.

Modern Internet applications use a large number of message queues (messaging). Message Queuing is used not only for communication between components within a system, but also for interaction between systems and other services. The use of Message Queuing can increase the scalability, flexibility, and user experience of the system. Non-Message Queuing based systems whose speed depends on the speed of the slowest component in the system (note: the short plate effect). Based on Message Queuing, the components of the system can be decoupled so that the system is not bound by the slowest components, and the components can run asynchronously to perform their work faster.

In addition, when the server is in a high concurrency operation, such as writing log files frequently. You can use Message Queuing to implement asynchronous processing. To achieve high-performance concurrent operations.

11, caching

The cached part of the

Redis is worth writing a new article, and I'm just saying it briefly. Redis can replace memcached, so that your cache can only be stored from data to be able to update the data, so you no longer need to regenerate the data every time.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.