Using Redis to achieve statistical sorting in dynamic time periods

Source: Internet
Author: User
Tags add time timedelta

Problem description

This data needs to be ranked based on the statistical values of certain types of data over a dynamic time period. For example, the results are calculated every hour, based on a post that likes to rank in the last 24 hours. The following descriptions are for this example.

Solution Ideas

In response to this problem, my first reaction was to go directly through MySQL
A data table records the behavior of each statistic value change for all data, such as noting at what point each post was liked. Sort results directly via Select + where + order_by + limit
。 Simple and rough, but inefficient, poor scalability, and when the amount of data is very large, it will lead to inefficient database query.

So to improve efficiency, MySQL
Change to Redis
How is it? Move records that have changed these stats to Redis
The Zset
, create a zset for each post
, Zset
Each of the member
Represents a point of praise record, field
For the Post ID, score
Record the time it likes. Using Zcount
Query the total number of likes in the active time period for each post. Redis
The query is quick and looks doable, but when the post base and the number of likes are large, zset
The number of members in the explosion, and how to deal with outdated data? and to change the amount of praise for clicks? It is necessary to record each user's click behavior for each post, is this plan reasonable?

To think about this, in fact, we just need to focus on the number of posts, rather than the likes of each behavior. It is therefore possible to consider using Redis
Record how many posts you like at each point in time, and periodically delete outdated data, such as recording the number of likes in each post per hour. This will allow you to know the total number of likes for this post in the last 24 hours. The specific scenario will be expanded below.

Data

Requires two Redis
Data structures, stored separately
Process

And
Results

, because you want to expire some data periodically, so pass the process
Record the history of data statistics and the results
Record all valid processes
The overall ordering result within the

For example, the current time of 9:00, you need to know the current 9 points to yesterday 9 points of all the likes of the number, also do not need to 8 points of praise yesterday, so the results need to add today's 9 points of data, minus yesterday 8 points of data. And as time goes on, the data from 9 to 9 today will be lost in turn, so it's important to keep track of the data at each point in the effective time period. The effect of the results is obvious, documenting the results and supporting sequencing.

Process using hash
The structure holds statistical values for all posts for each time node within the validity period, for example, establishing: Last_1, last_2, last_3 ...
Wait for 24 key
(because it is counted in the last 24 hours), each hash key
A post that has been liked by the node to the next node within the time period (field
) and the number of likes (value
) Why use Hash
Structure rather than list
or set
, because in the process of updating statistics regularly, you need to get the current up-to-date, plus one or minus one, hash
The Hincrby
The method supports atomic operations and can do both within a transaction. Imagine the list
Structure, multiple concurrent points for the same post can cause data errors.

Results using Zset
The structure holds the total count of all posts, which is the sort result. Include all posts (member
) and its total statistics (value
) Zset
Support Record score
and press Score
Sorting, you can also support paging to get data.

IDEA Realization

Based on the above process and results of the REDIS structure, it is necessary to deal with the update of statistics, as well as the timing of sequencing results two key processes.

The statistics are recorded in point-in-time, and when there is an update, only the most recent point in time is required. And to deal with the expiration point in time. About the timing of the expiration of the selection of time, think of two ways, write a timer to clean up, or to find out the timing of the existing request to trigger cleanup. Timing clean-up is intuitive and accurate, if the project has a timer component, of course, can be used up. I choose the second way, when the statistics update, if you want to add time points, then clean out the expiration point.

Sort results only through one zset
Structure record, pay attention to the timely cleaning score
To 0 of the data.

From datetime import datetime, Timedelta
Import Redis


Time_slot = ' Slot_{name}_{timestamp} ' # hash, point in time, time period to previous point in time, all data change values
Stats_results = ' Stats_{name} ' # Statistical results, periodic calculation of statistical values sorted results
Last_slot = ' Last_slot_{name} ' # records the latest time node


Redis_client = Redis. Redis (host= ' 127.0.0.1 ', port=6380)

Class Dystats (object):
def __init__ (self, stats_name, period, interval):
"""
:p Aram Stats_name: The name of the statistic value, must be unique
:p Aram Period: Validity of statistical parameters, e.g. 7 days, 24 hours
:p Aram Interval: A timed calculation cycle, such as every hour, calculated once every other day
"""
Assert interval in [1, 2, 3, 4, 6, 8, each] or (interval >= and interval% 24 = = 0)
Assert period >= interval and period% interval = 0

Self.stats_name = Stats_name
Self.period = period # in hours
Self.interval = interval # in hours

def incr_stats (self, target_id, amount=1):
"""
Statistic Quantity plus One
"""
Last_slot = Self._last_slot

Redis_client.hincrby (Time_slot.format (Name=self.stats_name, Timestamp=last_slot), target_id, amount)
Redis_client.zincrby (Stats_results.format (name=self.stats_name), target_id, amount)

def get_stats_list (self, offset, limit, withscores=false):
"""
Get ranking Results
"""
If Withscores:
return [(Int (i), s) for I, S in Redis_client.zrevrange (Stats_results.format (Name=self.stats_name),
Offset, offset + limit, withscores)]
Else
return [Int (i) for I in Redis_client.zrevrange (Stats_results.format (Name=self.stats_name),
Offset, offset + limit, withscores)]

def remove_all_expired_slots (self):
"""
Delete all expired nodes
"""
Slot_keys = Redis_client.keys (Time_slot[:11].format (name=self.stats_name) + ' _* ')
now = DateTime.Now ()

For key in Slot_keys:
key = Int (Key.decode () [(6 + len (self.stats_name)):])
If Key < (DateTime (Year=now.year, Month=now.month, Day=now.day, Hour=now.hour)-
Timedelta (Hours=self.period + self.interval)). Timestamp ():
Self.remove_expired_slot (Key)

def remove_expired_slot (self, timestamp):
"""
To remove an expired time node
"""
# Record all the statistics in the node to be deleted
Slot_values = Redis_client.hgetall (Time_slot.format (Name=self.stats_name, Timestamp=timestamp))
# Delete Expired Nodes
deleted = Redis_client.delete (Time_slot.format (Name=self.stats_name, Timestamp=timestamp))

# Subtract expired values from the statistics results
If deleted:
For key in Slot_values.keys ():
Value = Redis_client.zincrby (Stats_results.format (Name=self.stats_name),
Key.decode (),-int (Slot_values[key].decode ()))
# Delete The statistical results of score 0 members, where INCR operations may occur, resulting in value>0, may result in statistical data is not allowed
# Zremrangebyscore seems to solve the problem, but needs to traverse all the member in the Zset, too expensive
# Considering the fact that the sorting problem is not demanding on the data strict accuracy, it can tolerate
If value <= 0:
Redis_client.zrem (Stats_results.format (Name=self.stats_name), Key.decode ())

@property
def _last_slot (self):
"""
Latest Slots
"""
Last_slot = Redis_client.get (Last_slot.format (name=self.stats_name))
last_slot = Int (Last_slot.decode ()) If last_slot are not none else none
If Last_slot is None:
Last_slot = Self._set_first_slot ()

# Last_slot Delete all expired slots as soon as they expire
If Datetime.fromtimestamp (last_slot) + Timedelta (hours=self.interval) <= DateTime.Now ():
Self.remove_all_expired_slots ()

# set up the latest time slots
While Datetime.fromtimestamp (Last_slot) + Timedelta (hours=self.interval) <= DateTime.Now ():
Last_slot = (Datetime.fromtimestamp (last_slot) + Timedelta (hours=self.interval)). Timestamp ()
Redis_client.set (Last_slot.format (name=self.stats_name), int (last_slot))

return int (last_slot)

def _set_first_slot (self):
"""
Set the timestamp of the initial slot
"""
now = DateTime.Now ()
First_slot = DateTime (Year=now.year, Month=now.month, Day=now.day, Hour=now.hour). Timestamp ()

Redis_client.set (Last_slot.format (name=self.stats_name), int (first_slot))
return int (first_slot)
Summarize

Because the Redis is not particularly familiar with, and recently also in the "Redis combat", the feeling has not fully utilized the Redis features, the solution of this article may have a lot of points to optimize. For example Remove_expired_slot ()
method, when the node is deleted, how to deal with the concurrency statistics consistency problem, whether can be through Zremrangebyscore
or transaction Resolution? Can we use other, more concise structures to deal with this kind of problem? A small record of your own in the Redis learning process!

Using Redis to achieve statistical sorting in dynamic time periods

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.