Python implements the cache replacement algorithm with time for Space

Source: Internet
Author: User
Cache refers to a memory that can exchange high-speed data. It exchanges data with the CPU before the memory, so the speed is very fast. Cache is to temporarily store some data in some places, which may be memory or hard disk. The following describes how to implement the cache replacement algorithm for time-for-space in Python. For more information, see cache. cache refers to a memory that can exchange high-speed data. It exchanges data with the CPU before the memory, therefore, the speed is fast. Cache is to temporarily store some data in some places, which may be memory or hard disk.

When Scrapy is used to crawl a website, additional products are generated because the CPU running time is not tight during Scrapy crawling (the Access frequency is too high to be blocked ), this is a rare opportunity to take a look and free up your memory.

Algorithm principle:

The data to be cached is expanded in binary format. The obtained binary data is mapped to the cached field. to check whether the cached data has been cached, you only need to find the corresponding ing location, if all matches are found, the data is cached.

# Binary is a binary tree
# As shown in the following figure, the data displayed is 0, 1, 2, and 3 (two trees are independent)

0 1
/\/\
0 1 0 1

Therefore, cache operations are converted to Binary Tree operations. To add and search, you only need to find the node in the corresponding path on the binary tree.

Key algorithm code:

def _read_bit(self, data, position):return (data >> position) & 0x1def _write_bit(self, data, position, value):return data | value << position

What is the actual effect?

In comparison with the default set in Python, the test result is as follows ):

Please select test mode:4Please enter test times:1000====================================================================================================TEST RESULT::====================================================================================================set() bytecacheitems 1000 1000add(s) 0.0 0.0209999084473read(s) 0.0 0.0149998664856hits 1000 1000missed 0 0size 32992 56add(s/item) 0.0 2.09999084473e-05read(s/item) 0.0 2.09999084473e-05====================================================================================================size (set / bytecache): 589.142857143add time (bytecache / set): N/Aread time (bytecache / set): N/A====================================================================================================...test fixed length & int data end...====================================================================================================TEST RESULT::====================================================================================================set() bytecacheitems 1000 1000add(s) 0.00100016593933 6.1740000248read(s) 0.0 7.21300005913hits 999 999missed 0 0size 32992 56add(s/item) 1.00016593933e-06 0.0061740000248read(s/item) 0.0 0.0061740000248====================================================================================================size (set / bytecache): 589.142857143add time (bytecache / set): 6172.97568534read time (bytecache / set): N/A====================================================================================================...test mutative length & string data end...====================================================================================================TEST RESULT::====================================================================================================set() bytecacheitems 1000 1000add(s) 0.0 0.513999938965read(s) 0.0 0.421000003815hits 999 999missed 0 0size 32992 56add(s/item) 0.0 0.000513999938965read(s/item) 0.0 0.000513999938965====================================================================================================size (set / bytecache): 589.142857143add time (bytecache / set): N/Aread time (bytecache / set): N/A====================================================================================================...test Fixed length(64) & string data end...

After testing, the memory consumption is well controlled, which is always 56 bytes. However, the set memory is not very large, but is much larger than ByteCache.

However, the biggest problem with ByteCache is that it takes an astonishing amount of time to cache very large random data. Test results are cached for a string of the following Random Length:

Please select test mode:2Please enter test times:2000====================================================================================================TEST RESULT::====================================================================================================set() bytecacheitems 2000 2000add(s) 0.00400018692017 31.3759999275read(s) 0.0 44.251999855hits 1999 1999missed 0 0size 131296 56add(s/item) 2.00009346008e-06 0.0156879999638read(s/item) 0.0 0.0156879999638====================================================================================================size (set / bytecache): 2344.57142857add time (bytecache / set): 7843.63344856read time (bytecache / set): N/A====================================================================================================...test mutative length & string data end...

Adding 2000 data records consumes 31 s, and searching consumes 44 s, while the set value is close to 0. A single data entry also requires 16 ms (average value) to complete read/write operations.

However, as mentioned at the beginning, in Scrapy, where the degree of urgency is not very high, this time is not too embarrassing. What's more, in Scrapy, it is generally used to cache the hash data, one of the important features of these data is the fixed length, which is good in the cache algorithm. The average value is only 0.5 ms in 64-bit length. At the same time, the objective memory can be released when a large number of caches exist.

If there is a better cache algorithm that can speed up to a new level, it is also very much expected...

Summary:

1. The goal of this method is to exchange time for space. Do not use it in areas with high time requirements.

2. It is very suitable for use when a large number of fixed-length data is small.

3. in step 2, we do not recommend that you use a large amount of variable-length data and the data itself is relatively large.

The above content is a small part of the Python Implementation of the cache replacement algorithm with time for space, I hope to help you!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.