Using Redis as a time series database: Reasons and Methods

Source: Internet
Author: User
Tags redis cluster install redis

Using Redis as a time series database: Reasons and Methods

Since the emergence of Redis, it has been used in the storage and analysis of time series data to a certain extent. Redis was initially implemented as a buffer for logging. With the continuous development of its functions, it has five explicit and three implicit structures or types, providing multiple methods for data analysis in Redis. This article will introduce you to the most flexible method of using Redis for time series analysis.

About competition and transactions

In Redis, each single command itself is atomic, but multiple commands executed in order are not necessarily atomic, an incorrect behavior may be caused by a race. To cope with this restriction, we will use the "transaction Pipeline" and "Lua script" methods to avoid data contention.

When using Redis and the Python client used to connect to Redis, we will call the Redis connection. pipeline () method to create a "transaction pipeline" (when using other clients, it is also called a "transaction" or "MULTI/EXEC transaction "), you do not need to input a parameter during the call, or you can input a Boolean value of True. By using the pipeline created by the pipeline, you can call the .exe cute () method to collect all input commands. After the .exe cute () method is called, the client sends the MULTI command to Redis, then sends all the collected commands, and finally the EXEC command. When Redis executes this group of commands, it will not be interrupted by any other commands, thus ensuring atomic execution.

There is another option for atomic execution of a series of commands in Redis, that is, the server Lua script. To put it simply, Lua scripts act very similar to the stored procedures in relational databases, but they only use the Lua language and a dedicated Redis API to execute Lua. Similar to the transaction behavior, the Lua script will not be interrupted 1 during execution, but the unhandled error will also cause the Lua script to be interrupted in advance. In terms of syntax, we will call the Redis connection object. the register_script () method loads a Lua script. The objects returned by this method can be used as a function to call the scripts in Redis without calling other methods in the Redis connection, combined with the script load and EVALSHA commands, the SCRIPT is loaded and executed.

Use Cases

When talking about Redis and using it as a time series database, we first raised the question: "What is the purpose or purpose of the time series database ?" The Use Cases of time series databases are more data-related, especially when your data structure is defined as a series of events, examples of one or more values, and metric values that change over time. The following are examples of these applications (but not limited to this ):

  • Stock transaction price and transaction volume
  • Total order price and delivery address of online retailers
  • Operations performed by players in video games
  • Data collected from embedded sensors in IoT Devices

We will continue to discuss it in depth, but basically, the role of the time series database is that if something happens or you perform an evaluation, you can add a timestamp to the recorded data. Once you collect information about certain events, you can analyze these events. You can choose to perform real-time analysis at the same time of collection, or perform analysis when some more complex queries are required after an event occurs.

Advanced Analysis Using Ordered Sets and hashing

In Redis, there is one of the most flexible ways to save and analyze time series data. It needs to combine two different structures in Redis, namely Sorted Set) and Hash ).

In Redis, the structure of Ordered Sets integrates the hash table and the sorting tree (Redis uses a jump table structure internally, but you can ignore this details first. Simply put, each item in an ordered set is a string-type "member" and a double-type "score" combination. The member plays the key role in the hash, while the score plays the role of the sort value in the tree. With this combination, you can directly access members and scores through the value of members or scores. In addition, you can also use multiple methods to access 2 members and scores sorted by the score value.

Save event

Currently, using one or more Ordered Sets and some hash combinations to save time series data is one of the most common use cases in Redis. It represents an underlying build block for different applications. Including social networks like Twitter and News websites similar to Reddit and Hacker News, and a close-to-completion relationship based on Redis itself-object er

In the example in this article, we will obtain the events generated by various user behaviors on the website. All events share four attributes and different quantities of other attributes, depending on the event type. Known attributes include id, timestamp, type, and user. To save each event, we use a Redis hash, whose key is derived from the event id. To generate the event id, we will select a method in a large number of sources, but now we will generate our id through a counter in Redis. If you use a 64-bit Redis instance on a 64-bit platform, we can create a maximum of 2 63-1 events, depending on the available memory size.

When we are ready to record and insert data, we need to save the data as a hash, and insert a member/score pair in the sorted set, corresponding to the event id (member) respectively) and event timestamp (score ). The code for recording an event is as follows:

Def record_event (conn, event ):
Id = conn. incr ('event: id ')
Event ['id'] = id
Event_key = 'event: {id} '. format (id = id)

Pipe = conn. pipeline (True)
Pipe. hmset (event_key, event)
Pipe. zadd ('events', ** {id: event ['timestamp']})
Pipe.exe cute ()

In this record_event () function, we obtain an event, obtain a new id calculated from Redis, assign it to the event, and generate the key to store the event. The key is composed of a new id added to the string "event", which is separated by a colon. Then we created an MPS queue and prepared to set all data related to the event. At the same time, we prepared to save the event id and timestamp pairs in the ordered collection. After the transaction pipeline is completed, the event is recorded and saved in Redis.

Event Analysis

From now on, we can analyze time series in multiple ways. We can scan the latest or earliest event IDs through ZRANGE 4 settings, and we can get these events for analysis later. By using ZRANGEBYSCORE and LIMIT parameters, we can immediately obtain 10 or even 100 events before or after a timestamp. We can also use ZCOUNT to calculate the number of event occurrences in a specific period of time, or even use Lua scripts to implement our own analysis method. The following example uses the Lua script to calculate the number of different event types in a given time range.

Import json

Def count_types (conn, start, end ):
Counts = count_types_lua (keys = ['events'], args = [start, end])
Return json. loads (counts)

Count_types_lua = conn. register_script ('''
Local counts = {}
Local ids = redis. call ('zrangebyscore ', KEYS [1], ARGV [1], ARGV [2])
For I, id in ipairs (ids) do
Local type = redis. call ('hget', 'event: '.. id, 'type ')
Counts [type] = (counts [type] or 0) + 1
End

Return cjson. encode (counts)
''')

The count_types () function defined here first passes the parameter to the encapsulated Lua script and decodes the ing between the json-encoded event type and quantity. The Lua script first creates a result table (corresponding to the counts variable), and then reads the list of event IDs in this time range through ZRANGEBYSCORE. After these IDs are obtained, the script reads the type attributes of each event at a time, keeps the number of event tables increasing, and returns a json-encoded ing result at the end.

Performance and Data Modeling

As the code shows, this method can work normally to calculate the number of different event types within a specific time range, however, this method requires a large number of reads on the type attributes of each event within this time range. For time ranges that contain hundreds or thousands of Events, such analysis is faster. But what if tens of thousands or even millions of food incidents occur in a certain period of time? The answer is simple. Redis will block the computing results.

There is one way to deal with the performance problems caused by long script execution during event stream analysis, that is, to consider the query to be executed in advance. Specifically, if you know that you need to query the total number of each event in a certain period of time, you can use an additional sorted set for each event type, each set only saves the id and timestamp pairs of such types of events. When you need to calculate the total number of events of each type, you can execute a series of ZCOUNT or call methods with the same function 5 and return this result. Let's take a look at the modified record_event () function, which saves an ordered set based on the event type.

Def record_event_by_type (conn, event ):
Id = conn. incr ('event: id ')
Event ['id'] = id
Event_key = 'event: {id} '. format (id = id)
Type_key = 'events: {type} '. format (type = event ['type'])

Ref = {id: event ['timestamp']}
Pipe = conn. pipeline (True)
Pipe. hmset (event_key, event)
Pipe. zadd ('events', ** ref)
Pipe. zadd (type_key, ** ref)
Pipe.exe cute ()

The new record_event_by_type () function is the same as the old record_event () function in many aspects, but some operations are added. In the new function, we will calculate a type_key, which stores the location index of the event in the sorted set corresponding to this type of event. After the id and timestamp pairs are added to the events sorted set, we also need to add the id and timestamp pairs to the sorted set of type_key, then, execute the data insert operation as in the old method.

Now, to calculate the number of times a "visit" event occurs between two time points, we only need to input a specific key for the calculated event type when calling the ZCOUNT command, and the start and end timestamps.

Def count_type (conn, type, start, end ):
Type_key = 'events: {type} '. format (type = type)
Return conn. zcount (type_key, start, end)

If we can know all possible event types in advance, we can call the preceding count_type () function for each type and construct the tables created in count_types. If we cannot know all possible event types in advance, or we may encounter new event types in the future, we can add each type to a Set structure, use this set later to discover all event types. The following are the modified event recording functions.

Def record_event_types (conn, event ):
Id = conn. incr ('event: id ')
Event ['id'] = id
Event_key = 'event: {id} '. format (id = id)
Type_key = 'events: {type} '. format (type = event ['type'])

Ref = {id: event ['timestamp']}
Pipe = conn. pipeline (True)
Pipe. hmset (event_key, event)
Pipe. zadd ('events', ** ref)
Pipe. zadd (type_key, ** ref)
Pipe. sadd ('event: types ', event ['type'])
Pipe.exe cute ()

If a large number of events exist in a certain time range, the new count_types_fast () function will be executed faster than the old count_types () function, the main reason is that the ZCOUNT command is faster than getting each event type from the hash.

Using Redis as Data Storage

Although Redis's built-in analysis tools and their commands and Lua scripts are flexible and have outstanding performance, some types of time series analysis can also benefit from specific computing methods, libraries, or tools. In these cases, it is still very meaningful to store data in Redis, because Redis can access data very quickly.

For example, for a stock, the transaction amount data for the entire 10 years can be sampled at most 1.2 million pieces of data per minute, which can be easily stored in Redis. However, if you want to execute any complex functions to the data through the Lua script in Redis, You need to port or debug the existing optimization library so that they can implement the same functions in Redis. If you use Redis for data storage, you can obtain data within the time range and save them in the existing optimized kernel, to calculate the changing average price and price fluctuations.

So why not use a relational database as an alternative? The reason is speed. Redis saves all the data in RAM and optimizes the data structure (just as in the example of an ordered set ). The combination of data storage and optimized data structures in the memory is not only faster than the SSD-based databases, in addition, the general memory key-value storage system or the system that stores serialized data in the memory is also one to two orders of magnitude faster.

Conclusion and follow-up

When using Redis for time series analysis, or even any type of analysis, a reasonable way is to record some common attributes and values of different events and store them at a common address, this allows you to search for events that contain these common attributes and values. We have achieved this by implementing the corresponding ordered set for each event type, and also mentioned the use of the set. Although this article mainly deals with Ordered Sets of applications, there are still more structures in Redis, and there are many other different options for using Redis in the analysis work. In addition to ordered collection and hash, there are also some common structures in the analysis work, including (but not limited to): bitmap, byte string of array index, HyperLogLogs, List), set, and the sorted set Command 6 based on geographic location index that will soon be released.

When using Redis, you will rethink how to add relevant data structures to a more specific data access mode from time to time. The data storage format you selected provides both the storage capability and the type of query that you can execute. This is almost always the same. Understanding this is important because, unlike traditional and more familiar relational databases, available queries and operations in Redis are limited by the data storage type.

After reading these examples of analyzing time series data, you can further read the methods for searching related data by creating indexes in Chapter 7th of Redis in Action, you can find it in the eBooks column of RedisLabs.com. In Chapter 8th of Redis in Action, we provide a almost complete social network implementation similar to Twitter, including the publisher, list, timeline, and a streaming server, this is a good starting point for understanding how to use Redis to save the time series and events and respond to queries.

1. If you have enabled the lua-time-limit configuration option and the script execution time exceeds the configured upper limit, the read-only script may be interrupted.

2. When the scores are the same, projects are sorted alphabetically by members.

3. In this article, we usually use a colon as the delimiter for the name, namespace, and data when operating Redis data, but you can also select any one of them at will. Other Redis users may select periods (.) or semicolons (;) as separators. It is a good practice to select a character that does not usually appear in the key or data.

4 ZRANGE and ZREVRANGE provide the ability to retrieve elements from the sorted set based on the sorting position. ZRANGE's minimum score index is 0, while ZREVRANGE's maximum score index is 0.

5. The ZCOUNT command calculates the sum of values in a range of Ordered Sets, But it incrementally traverses the entire range from a certain endpoint. For a range that contains a large number of projects, the overhead of this command may be large. As another option, you can use the ZRANGEBYSCORE and ZREVRANGEBYSCORE commands to find the start and end points of members in the range. By using ZRANK at both ends of the member list, you can find two indexes of these members in an ordered set. By using these two indexes, You can subtract the two (plus 1) to get the same results, and its computing overhead is greatly reduced, even if this method requires more calls to Redis.

6 The Z * LEX Command introduced in Redis 2.8.9 uses an ordered set to provide a limited prefix search function for an ordered set. similar to this, the latest unreleased Redis 3.2 will use the GEO * command to provide a limited range of geographic location search and indexing functions.

You may also like the following articles about Redis. For details, refer:

Install and test Redis in Ubuntu 14.04

Basic configuration of Redis master-slave Replication

Redis cluster details

Install Redis in Ubuntu 12.10 (graphic explanation) + Jedis to connect to Redis

Redis series-installation, deployment, and maintenance

Install Redis in CentOS 6.3

Learning notes on Redis installation and deployment

Redis. conf

Redis details: click here
Redis: click here

Using Redis As a Time Series Database: Why and How

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.