Redis Data Structure hyperloglog

Source: Internet
Author: User

To record the number of independent IP addresses that a website accesses every day

Set implementation:

Use a set to store the IP addresses of each visitor, and obtain multiple independent IP addresses by the set nature (each element in the set is different,
Then, the number of independent IP addresses is obtained by calling the scard command.
For example, the program can use the following code to record the IP addresses of each website visitor on January 1, August 15, 2014:
IP = get_vistor_ip ()
Sadd'2017. 2014: Unique: ip' IP
Use the following code to obtain the number of unique IP addresses on the day:
Scard '2017. 2014: Unique: ip'

Collection implementation problems

It takes up to 15 bytes to store each IPv4 address using a string (Format: 'xxx. XXX. XXX. XXX', for example
'192. 189.128.186 ').
The following table shows the amount of memory consumed when using a set to record different numbers of independent IP addresses:
Number of independent IP addresses per day per month per year
1 million 15 MB 450 MB 5.4 GB
10 million 150 MB 4.5 GB 54 GB
0.1 billion 1.5 GB 45 GB 540 GB
As the number of IP addresses recorded in the Set increases, more memory is consumed.
In addition, if you want to store IPv6 addresses, you need more memory.


To better solve problems such as independent IP address calculation,
Redis 2.8.9 adds the hyperloglog structure.

Hyperloglog Introduction

Hyperloglog can take multiple elements as input and give the base estimate of the input element:
• Base Number: the number of different elements in the set. For example, the base number of {'apple ', 'Banana', 'cherry', 'Banana ', and 'apple'} is 3.
• Estimation value: the base given by the algorithm is not accurate. It may be slightly more or less than the actual base, but it will be controlled in the combination.
Within the scope.
The advantage of hyperloglog is that even if the number or size of input elements is very large, the space required to calculate the base is always fixed.
And is very small.
In redis, each hyperloglog key only requires 12 kb of memory, and can calculate the base close to 2 ^ 64 different elements.
Number. This is in stark contrast to the set with more elements consuming more memory.
However, hyperloglog only calculates the base number based on the input element, instead of storing the input element itself.
Hyperloglog cannot return all input elements like a set.

Add an element to hyperloglog
Pfadd key element [element...]
Add any number of elements to the specified hyperloglog.
This command may modify hyperloglog to reflect the new base estimate.
If the value changes after the command is executed, 1 is returned; otherwise, 0 is returned.
The command complexity is O (n), and N is the number of added elements.

Returns the base estimate of a given hyperloglog.
Pfcount key [Key...]
If only one hyperloglog is given, the command returns the base estimate of the given hyperloglog.
When multiple hyperloglogs are given, the command first calculates the union of the given hyperloglogs to obtain a merged
Hyperloglog, and then return the base estimate of the merged hyperloglog as the result of the command (merged
Hyperloglog is not stored and deleted after being used ).
When a command acts on a single hyperloglog, the complexity is O (1), and the average constant time is very low.
When a command acts on multiple hyperloglogs, the complexity is O (n), and the constant time is more
It is much larger.

 

Example of pfadd and pfcount
Redis> pfadd unique: IP: Counter '2017. 168.0.1'
(Integer) 1
Redis> pfadd unique: IP: Counter '2017. 0.0.1'
(Integer) 1
Redis> pfadd unique: IP: Counter '2017. 255.255.255'
(Integer) 1
Redis> pfcount unique: IP: Counter
(Integer) 3

Merge multiple hyperloglogs
Pfmerge destkey sourcekey [sourcekey...]
Combine multiple hyperloglogs into one hyperloglog. The base estimate of the combined hyperloglog is based on
Given hyperloglog for Union calculation.
The complexity of the command is O (n), where N is the number of merged hyperloglogs, but the constant complexity of this command is relatively high.

Example of pfmerge usage
Redis> pfadd str1 "apple" "banana" "Cherry"
(Integer) 1
Redis> pfcount str1
(Integer) 3
Redis> pfadd str2 "apple" "Cherry" "durian" "mongo"
(Integer) 1
Redis> pfcount str2
(Integer) 4
Redis> pfmerge str1 & 2 str1 str2
OK
Redis> pfcount str1 & 2
(Integer) 5

 

Hyperloglog implements independent IP computing

Number of independent IP addresses per day per month per year (use set)
1 million 12 kb 360 kb 4.32 MB 5.4 GB
10 million 12 kb 360 kb 4.32 MB 54 GB
0.1 billion 12 kb 360 kb 4.32 MB 540 GB
The following table lists the memory consumption for using hyperloglog to record different numbers of independent IP addresses:
As you can see, to count the same number of independent IP addresses, hyperloglog requires much less memory than the set.

Redis Data Structure hyperloglog

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.