Redis Cardinality Statistics: hyperloglog small Memory large use

Source: Internet
Author: User
Tags ip number


We have always known that Redis a few of the most commonly used data structures, strings, hashes, lists, sets, ordered sets. In fact, later Redis made a lot of supplements, one of which is Hyperloglog, the other is geo (geographical location), is 3.2 version plus.



Here we have a brief introduction to the Hyperloglog structure.



First of all use: This structure can be very memory to the statistics of the various counts, such as the number of registered IP, the number of daily access to IP, page real-time UV (PV affirmative string is done), the number of online users.



See all the use here is XXX number, so the characteristics of this data structure is that you can more accurately estimate the number you want to count, but do not know the details of the statistics. For example, statistics daily access IP number, you can get access to the total number of IP, but do not know what these IP is.



There is a loss, of course, you have to count the above mentioned content, you can use the collection to deal with, so you can know the number, you can get all the detailed list. But a large web site, IP for example, 1 million each day, we rough calculate an IP consumption of 15 bytes, then 1 million IP is 15M, if 10 million, is 150M.



Take a look at our hyperloglog, in Redis each key occupies a content of 12K, theoretical storage approximate 2^64 value, regardless of what is stored content. 12K, know the effect of this data structure. That's why he doesn't know the details. This is a base based estimation algorithm, can only be more accurate estimation of the cardinality, the use of a small amount of fixed memory to store and identify the unique elements in the collection. And the cardinality of this estimate is not necessarily accurate, it is an approximate value with 0.81% standard error (standard error).



The HYPERLOGLOG structure, regardless of the number of values allowed in the range, will only occupy 12K of memory.



So for example, we record the daily IP, assuming that there are 100 million IP access every day, if the use of the collection, the day's memory use is 1.5G, assuming we store one months of records, we need 45G capacity. But using Hyperloglog, 12K a day, one months 360K. If we do not need to know the specific IP information, we can leave these records in memory for a year, or do not delete all lines. If necessary, we will also store all the IP access records in other ways. The daily information is stored, we can calculate the total number of IP per month (MERGE), the total number of IP in a year, etc. (to heavy).



Here is an introduction to Hyperloglog's command, in fact, he and the collection of the command more like, but only a few commands, can not get the list. In addition this data structure needs to 2.8.9 and above version can use Oh ~ pfadd



After executing this command, the internal structure of the Hyperloglog is updated and feedback is made, and if the base estimate within the hyperloglog is changed after execution, it returns 1, otherwise (if it already exists) returns 0.
This command also has a comparison artifact is the only key, there is no value, so that means just create an empty key, do not put the value.
If the key exists, does nothing, returns 0, creates it if it does not exist, and returns 1.



The time complexity of this command is O (1), so feel free to use it ~



command example:


redis> pfadd  ip:20160929  "1.1.1.1"  "2.2.2.2"  "3.3.3.3"
(integer) 1
redis> Pfadd  ip:20160929 "2.2.2.2"  "4.4.4.4"  "5.5.5.5"  # only add new
(integer) 1
redis>  pfcount ip:20160929  # Element Estimated quantity unchanged
(integer) 5
redis> pfadd  ip:20160929 "2.2.2.2"  # Existence will not increase
(integer) 0


In fact, we found that in less time is quite accurate, haha. Pfcount



In fact, in the above study we have used this, here to introduce next.



Returns the cardinality estimate of this key when the command acts on a single key. If the key does not exist, it returns 0.
When used for multiple keys, returns the set estimate of these keys. Similar to having these keys merged, after invoking this command output.



This command, when acting on a single value, has a time complexity of O (1), and has a very low average constant time, and when applied to N values, the time complexity is O (n), and the constant complexity of the command is lower.



command example:


redis> pfadd  ip:20160929  "1.1.1.1"  "2.2.2.2"  "3.3.3.3"
(integer) 1
redis> Pfcount  ip:20160929
(integer) 3
redis> pfadd  ip:20160928  "1.1.1.1"  "4.4.4.4"  "5.5.5.5"
(integer) 1
redis> pfcount  ip:20160928  ip:20160929
(integer) 5
Pfmerge


Combine (merge) multiple hyperloglog as a hyperloglog. This is also well understood, and the combined estimate cardinality is similar to the set of all Hyperloglog estimate cardinality.



The first parameter of this command is the target key, and the remaining parameter is the Hyperloglog to be merged. When the command executes, if the target key does not exist, then the merge is created and then executed.



The time complexity of this command is O (n), where n is the number of Hyperloglog to be merged. However, the constant time complexity of this command is relatively high.



command example:


redis> PFADD  ip:20160929  "1.1.1.1"  "2.2.2.2"  "3.3.3.3"
(integer) 1
redis> PFADD  ip:20160928  "1.1.1.1"  "4.4.4.4"  "5.5.5.5"
(integer) 1
redis> PFMERGE ip:201609   ip:20160928   ip:20160929
OK
redis> PFCOUNT  ip:201609
(integer) 5
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.