Memcache storage big data problems

Last Update:2014-07-09 Source: Internet

Author: User

Tags virtual private server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Memcache big data storage problem huangguisu

Memcached stores the maximum data size of a single item within 1 MB. If the data exceeds 1 MB, false is returned for both set and get access, which causes performance problems.

We used to cache the data in the ranking table. Because the ranking table accounts for 30% of all our SQL select queries, and the ranking table is updated every hour, we must cache the data. To clear the cache, put all user data in the same key, because memcached: Set does not compress data. No problem was found during the trial of the middleware server. When it went online, it was found that when the number of online users was just 490, serverload average floated to 7.9. Then we removed the cache and dropped it to 0.59 at once.

Therefore, memcahce is not suitable for caching big data. If the size of data exceeds 1 MB, you can consider compressing the data in the client or splitting it into multiple keys. It takes a long time for large data to load and uppack to memory, thus reducing server performance.

Memcached supports up to 1 MB of storage objects. This value is determined by its memory allocation mechanism.

Memcached uses the Slab allocator mechanism to allocate and manage memory by default. In this mechanism, memory allocation was implemented by simply malloc and free for all records. However, this method will cause memory fragmentation and increase the burden on the memory manager of the operating system. In the worst case, the operating system will be slower than the memcached process itself. Slab allocator was born to solve this problem. The basic principle of Slab allocator is to cut the allocated memory into blocks of a specific length to completely solve the memory fragmentation problem.

Today (), we tried the data size of memcached: Set again on the slave node. It may be because the memcached extension in PHP is the latest version, and the Set Data is compressed by default. Set Data:

$ AC = new memcahed (); $ DATA = str_repeat ('A', 1024*1024); // 1 m data $ r = $ ac-> set ('key ', $ data, 9999); // or $ DATA = str_repeat ('A', 1024*1024*100 ); // 100m data $ r = $ ac-> set ('key', $ data, 9999 );

Both 1 m data and M data can be set successfully. Later I found that memcachedset data is compressed by default. This is a repeated string, with a compression rate of up to 1000 times. Therefore, MB of data is actually kb after compression.

When I set:

$ Ac-> setoption (memcahed: opt_compression, 0); // do not compress the stored data. $ DATA = str_repeat ('A', 1024*1024); // 1 m data $ r = $ ac-> set ('key', $ data, 9999 ); // The 1 m Data Set fails.

That is to say, the memcached server cannot store data larger than 1 MB. However, after the client compresses the data, only data smaller than 1 MB can be successfully stored.

Knowledge about memcached:

1. Basic settings of memcached
1) Start the memcache server.
#/Usr/local/bin/memcached-D-M 10-u root-l 192.168.0.200-P 12000-C 256-P/tmp/memcached. PID

-D option is to start a daemon,
-M indicates the amount of memory allocated to memcache. The unit is mb. Here I am 10 MB,
-U is the user who executes memcache. Here I am root,
-L is the Server IP address of the listener. If there are multiple IP addresses, I have specified the Server IP address 192.168.0.200,
-P is the port for memcache listening. I have set port 12000 here, preferably port above 1024,
-The C option is the maximum number of concurrent connections executed. The default value is 1024. I have set 256 here, which is based on the load of your server,
-P is the PID file for saving memcache. Here I save it in/tmp/memcached. PID,

2) To end the memcache process, run:

# Kill 'cat/tmp/memcached. Pi'

The hash algorithm maps random-length binary values to smaller binary values with a fixed length. This smaller binary value is called a hash value. A hash value is a unique and extremely compact numeric representation of a piece of data. It is assumed that a plain text segment is hashed and only

A letter in a paragraph, and the subsequent hash will produce different values. It is impossible to calculate two different inputs with the same value as the hash column.

2. What are the business scenarios of memcached?

1) assume that the site contains a dynamic web page with a large volume of frequently asked questions, so the database load will be very high. Because most database requests are read operations, memcached can significantly reduce the database load.

2) If the load ratio of the database server is low but the CPU usage is very high, the calculated results (computed objects) and the rendered webpage template (enderred templates) can be cached ).

3) Use memcached to cacheSession dataTemporary data to reduce write operations on their databases.

4) cache some small files that are frequently asked.

5) cache the results of Web 'services' (non-IBM-advertised Web Services, Translator's note) or RSS feeds ..

3. Is memcached not applicable to business scenarios?

1) the cache object is larger than 1 MB

Memcached is not designed for processing large multimedia (large media) and large binary blocks (streaming huge blobs.

2) The key must be longer than 250 characters

3) the VM does not allow execution of the memcached service.

Assume that the application is hosted on a low-end virtual private server. Virtualization Technologies such as VMware and xen are not suitable for executing memcached. Memcached needs to take over and control large blocks of memory. If the memory managed by memcached is exchanged by the OS or hypervisor, the performance of memcached will be greatly reduced.

4) the application is executed in an insecure environment.

To provide any security policy, memcached can only be accessed via Telnet. If an application is executed on a shared system, security issues need to be considered.

5) What businesses need is persistent data or database.

4. You cannot traverse all items in memcached.

This operation is relatively slow and blocks other operations (here the slow speed is compared to other memcached commands ). All memcached non-Debug commands, such as ADD, set, get, and fulsh.

The data stored in memcached only consumes constant time. No matter what time it takes to run the command that traverses all items, it will be added as the amount of data in memcached is added. When other commands cannot be run because they wait (the command for traversing all items is finished), congestion will occur.

5. the maximum length of keys accepted by memcached is 250 characters.

The maximum length of keys accepted by memcached is 250 characters. Note that 250 is an internal limitation on the memcachedserver. Assuming that the memcachedclient supports "Key prefix" or similar features, the maximum length of the key (prefix + original key) can exceed 250 characters. We recommend that you use a shorter key to save memory and bandwidth.

6. The size of a single item is limited to 1 MB byte.

This is because of the memory distributor algorithm.

Specific answer:

1) memcached memory storage engine uses slabs to manage memory. The memory is divided into slabs chunks of different sizes (Slabs with the same size is first divided, and each slab is divided into chunks with the same size. The chunks of different slab are not equal ). The chunk size starts from a minimum number and increases by a factor until the maximum possible value is reached. Assume that the minimum value is 400b, the maximum value is 1 MB, and the factor is 1.20. The chunk size of each slab is as follows:

The larger the chunk in slab1-400b; slab2-480b; slab3-576b... slab, the larger the gap between it and the previous slab. Therefore, the larger the maximum value, the lower the memory utilization. Memcached must pre-allocate memory for each slab. Therefore, if a small factor and a large maximum value are set, memcached must provide many other memory resources.

2) do not try to access very large data from memcached, such as placing a huge webpage into mencached. It takes a long time to load and unpack big data into the memory, resulting in poor system performance. Assume that you need to store data larger than 1 MB. You can change the value of slabs. C: power_block, and then compile memcached again; or use inefficient malloc/free. In addition, memcached systems can be replaced by database, mogilefs, and other solutions.

7. How does memcached memory distributor work? Why not apply to malloc/free !? Why use slabs?

In fact, this is a compile-time option. By default, the internal slab distributor is used, and the built-in slab distributor must be used. At the earliest time, memcached only used malloc/free to manage the memory. However, this method cannot work very well with the memory management of the OS. Repeated malloc/free results in memory fragmentation. The OS finally spent a lot of time searching for contiguous memory blocks to satisfy malloc requests, rather than executing the memcached process. The slab splitter is generated to solve the problem. The memory is allocated and divided into chunks, which are repeatedly used. Because the memory is divided into Slabs with a size ranging from large to small, it will waste some memory if the size of the item and the slab used to store it are not very suitable.

8. What are the restrictions on the expiration time of items on memcached?

The expiration time of the item object can be up to 30 days. After memcached interprets the input expiration time (time period) as a time point, memcached sets the item to invalid once it reaches this time point. This is a simple but obscure mechanism.

9. What is a binary protocol? Do you need to pay attention to it?

The binary Protocol attempts to provide a more effective and reliable protocol for the client and server to reduce the CPU time generated by processing the protocol. According to Facebook's trial, parsing ASCII protocol is the most CPU-consuming in memcached.

Link.

10. How does memcached memory distributor work? Why not apply to malloc/free !? Why use slabs?

11. Is memcached atomic?

All single commands sent to memcached are completely atomic. If you send a set command and a GET command for the same data copy at the same time, they will not affect the other party. They will be serialized and run successively. Even in multi-threaded mode, all commands are atomic. However, the command sequence is not atomic. Assume that an item is first obtained through the GET command, modified, and then set back to memcached. The system does not guarantee that the item is not processed by other processes (process, is not necessarily a process in the operating system. Memcached 1.2.5 and later versions provide the gets and CAS commands to solve the above problems. If you use the gets command to query the item of a key, memcached returns the unique identifier of the current value of the item. If the client overwrites this item and wants to write it back to memcached, it can send the unique identifier to memcached through the CAS command. If the unique identifier of the item stored in memcached is the same as that provided by you, the write operation is successful. If another process also changes this item during this period, the unique identifier of the item stored in memcached will change, and the write operation will

Failed.

Learn more about the memory allocation mechanism of memcached:

Http://cjjwzs.javaeye.com/blog/762453

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More