Memcache Storage Big Data issues

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Memcache The problem of storing big data Huangguisu

Memcached stores a single item maximum data is within 1MB, assuming that the data exceeds 1M, access set and get are both return false and cause performance problems.

We previously cached the leaderboard data because the leaderboard accounted for 30% of all our SQL SELECT queries, and our leaderboard was updated hourly, so we had to cache the data. In order to clear the cache convenient, put all the user's data in the same key, because memcached:set when the data is not compressed. When testing the test, did not find the problem, when on-line, the results found that the number of online just 490 people, serverload average float to 7.9. Then we removed the cache and dropped to 0.59.

So MEMCAHCE is not suitable for caching big data, more than 1MB data key

Memcached the largest storage object supported is 1M . This value is determined by its memory allocation mechanism.

memcachedby default, a name ofSlab Allocatormechanism to allocate and manage memory. Once the mechanism appears, the allocation of memory is done simply by making all the recordsmallocand the Freeto proceed. However, such a way can lead to memory fragmentation, aggravating the burden of the operating system memory manager, in the worst case, will cause the operating system thanmemcachedThe process itself is slow. Slab Allocatorwas born to solve the problem. Slab Allocator The basic principle is to cut the allocated memory into blocks of a specific length in a predetermined size to completely resolve the memory fragmentation problem.

Today (2012-03-16) we tried again. Memcached:: Set data size. It may be that we use PHP memcached extension is the latest version, set data when the default compression. Set data:

$ac = new memcahed (); $data = Str_repeat (' A ', 1024* 1024); 1M data "  =  $ac->set (' key ', $data, 9999);//or $data = Str_repeat (' A ', 1024* 1024*100);//100m data"  =  $ac->set (' key ', $data, 9999);

Both 1M data and 100M data can be set successfully. Later I found that memcached set data is compressed by default. Because this one is a repeating string, the compression rate is up to 1000 times times. So 100M data compression is actually 100k.

$ac->setoption (memcahed::opt_compression,0); Store data is not compressed. $data = Str_repeat (' A ', 1024* 1024); 1M Data  =  $ac->set (' key ', $data, 9999);//1m data set is unsuccessful.

< Span lang= "Ar-sa" style= "font-size:10.5pt" > This means that memcached server cannot store more than 1M of data, but

Memcached Related knowledge:

1, the basic settings of memcached
1) Start the server side of the Memcache
#/usr/local/bin/memcached-d-M 10-u root-l 192.168.0.200-p 12000-c 256-p/tmp/memcached.pid

The-D option is to start a daemon,
-M is the amount of memory allocated to Memcache, in megabytes, I'm 10MB,
-U is the user who executes memcache, I am root here,
-L is the ServerIP address of the listener, assuming that there are multiple addresses, I specify the IP address of the server 192.168.0.200,
-P is set to memcache listening port, I set 12000 here, preferably more than 1024 port,
The-c option is the maximum number of concurrent connections to execute, the default is 1024, I set the 256 here, according to the load of your server to set,
-P is set to save memcache PID file, I am here to save in/tmp/memcached.pid,

2) Assume that you want to end the memcache process, run:

# Kill ' Cat/tmp/memcached.pid '

Hashing Algorithm The random-length binary value is mapped to a small, fixed-length binary value, and this small binary value is called a hash value. A hash value is a unique and extremely compact numeric representation of a piece of data. Suppose you hash a clear text and even just change the

A letter of the paragraph, and subsequent hashes will produce different values. It is not possible to find two different inputs that hash the same value.

2, the application of memcached business scenarios?

1) Assume that the site includes a very large number of dynamic Web pages, so the database load will be very high. Because most database requests are read, memcached can significantly reduce the database load.

2) Assuming that the database server is relatively low-load but CPU-efficient, it is possible to cache the computed results (computed objects) and the rendered page template (enderred templates).

3) Use memcached to cache session data and temporary data to reduce write operations to their databases.

4) cache Some very small files that are frequently visited.

5) caching the results of web ' services ' (non-IBM advertised Web services, translator notes) or RSS feeds:

3, not applicable to memcached business scenarios?

1) The size of the cache object is greater than 1MB

Memcached itself is not designed to handle huge multimedia (large media) and huge binary blocks (streaming huge blobs).

2) key is longer than 250 characters

3) Virtual hosting does not allow the execution of memcached services

Assuming that the application itself is hosted on a low-end virtual private server, such virtualization technologies like VMware, Xen, are not suitable for executing memcached. Memcached need to take over and control large chunks of memory, assuming that memcached managed memory is swapped out by the OS or hypervisor, memcached performance will be compromised.

4) Application execution in an unsafe environment

Memcached provides access to memcached with no security policy, only via Telnet. Assuming that the application executes on a shared system, you need to focus on security issues.

5) The business itself needs to be persistent data or what is needed should be database

4. You cannot traverse all the item in memcached

The speed of this operation is relatively slow and the other operations are blocked (this is slower than memcached other commands). Memcached all non-debug (non-debug) commands, such as add, set, get, Fulsh, whatever

How much data is stored in the memcached, and they run with only constant time. No matter what time it takes to run a command that traverses all of the item, it will be added as the amount of data in memcached is added. The blockage will occur when the other command cannot be run because it waits (the command to traverse all of the item runs to completion).

5. The maximum length of a key that memcached can accept is 250 characters

The maximum length of a key that memcached can accept is 250 characters. It is important to note that 250 is the internal limit of the Memcachedserver end. Assuming that the memcachedclient supports "key prefixes" or similar features, the maximum length of a key (prefix + original key) can be more than 250 characters. It is recommended to use shorter keys, which saves memory and bandwidth.

6. The size of a single item is limited to 1M bytes

This is because of the memory allocator algorithm.

The specific answer:

1) memcached memory storage engine, use slabs to manage memory. Memory is divided into slabs chunks of unequal size (first divided into equal size slabs, then each slab is divided into equal size chunks, slab of different chunk size is unequal). The size of the chunk starts from a minimum number, and grows by a factor until the maximum possible value is reached. Assume that the minimum value is 400B, the maximum value is 1MB, the factor is 1.20, and the size of each slab chunk is:

slab1-400b;slab2-480b;slab3-576b. Slab the larger the chunk, the greater the gap between it and the slab ahead. Therefore, the larger the maximum value, the less memory utilization. Memcached must pre-allocate memory for each slab, so assuming that a smaller factor and a larger maximum value are set, there will be a lot of other memory available for memcached.

2) do not attempt to access very large data to memcached, such as putting huge web pages into the mencached. Because it takes a very long time to load and unpack big data into memory, the performance of the system is not good. Suppose you really need to store more than 1MB of data, be able to change the value of Slabs.c:power_block, and then compile memcached again, or use inefficient malloc/free. In addition, you can use the database, mogilefs and other alternatives to replace the memcached system.

7. How does the memcached memory allocator work? Why not apply malloc/free!? Why use slabs?

In fact, this is a compile-time option. The internal slab allocator is used by default, and the built-in slab allocator should indeed be used. At the earliest, memcached only used Malloc/free to manage memory. However, such a way does not work well with the memory management of the OS ever. Repeatedly malloc/free causes memory fragmentation, and the OS finally spends a lot of time looking for contiguous blocks of memory to meet malloc requests, rather than executing the memcached process. The slab dispenser was born to solve the problem. The memory is allocated and divided into chunks, which has been reused. Since memory is divided into slabs of different sizes, it would be a waste of memory if the size of the item is not very appropriate for the slab it is chosen to hold.

8. What are the restrictions on the expiration time of item memcached?

The item object expires up to 30 days in length. Memcached the incoming Expiration time (time period) is interpreted as a point in time, once at this point in time, memcached the item to a failed state, which is a simple but obscure mechanism.

9, what is the binary agreement, need to pay attention?

The binary protocol attempts to provide a more efficient and reliable protocol for the end, reducing the CPU time generated by the processing protocol at the Client/server end. Based on Facebook's test, the parsing of the ASCII protocol is the most CPU time consumed in memcached

Link.

10. How does the memcached memory allocator work? Why not apply malloc/free!? Why use slabs?

11. Is the memcached atomic?

All the individual commands that are sent to the memcached are completely atomic. Suppose you send a set command and a GET command at the same time for the same data, and they don't affect each other. They will be serialized and run successively. Even in multithreaded mode, all commands are atomic. However, the command sequence is not atomic. Suppose you first get an item with a GET command, change it, and then set it back to memcached, the system does not guarantee that the item is not manipulated by another process (process, not necessarily an operating system). The memcached 1.2.5 and higher version numbers provide the Get and CAS commands, which solve the problem above. Suppose the item,memcached of a key using the GET command returns a unique identifier for the item's current value. Assuming that the client program has covered this item and wants to write it back to memcached, it is able to send the unique identity to memcached with the CAS command. Assuming that the item's unique identity in the memcached is consistent with what you provide, the write operation will succeed. Assuming another process has changed this item during this time, the unique identity of the item stored in the memcached will change, and the write operation will

Failed.

Learn more about Memcached's memory allocation mechanism:

http://cjjwzs.javaeye.com/blog/762453

Memcache Storage Big Data issues

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Memcache Storage Big Data issues

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Memcache Storage Big Data issues

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support