Mixi Case Study
Mixi used memcached in the early stages of providing services. With the rapid increase of website access, simply adding slave to the database is not enough, so the memcached is introduced. In addition, we have verified the scalability of the memcached, proving that the speed and stability of the process can meet the needs. Now, Memcached has become a very important part of the Mixi service.
Figure 1 The system components now
Server Configuration and number
Mixi uses many servers, such as database servers, application servers, picture servers, reverse proxy servers, and so on. There are nearly 200 servers running in memcached alone. The typical configuration of the memcached server is as follows:
- Cpu:intel Pentium 4 2.8GHz
- Memory: 4GB
- HDD: 146GB SCSI
- Operating system: Linux (x86_64)
These servers were previously used for database servers, and so on. As CPU performance increases and memory prices fall, we actively replace database servers, application servers, and more, with more powerful and memory-intensive servers. This can suppress the sharp increase in the number of servers used by Mixi overall and reduce management costs. Since the memcached server consumes almost no CPU, the swapped-out server is used as the memcached server.
memcached process
Only one memcached process is started per memcached server. The memory allocated to memcached is 3GB and the startup parameters are as follows:
/usr/bin/memcached-p 11211-u nobody-m 3000-c 30720
Due to the use of the x86_64 operating system, it is possible to allocate more than 2GB of memory. 32-bit operating systems, each process can use up to 2GB of memory. have also considered the start of multiple allocation of 2GB of memory, but the number of TCP connections on one server will multiply, management becomes complex, so Mixi unified use of 64-bit operating system.
In addition, although the server's memory is 4GB, but only 3GB is allocated, because memory allocation exceeds this value, it is possible to cause memory exchange (swap). The 2nd time in the series of the former Sakamoto explained memcached memory storage "slab allocator", said at the time, memcached the specified memory allocation is memcached to save the amount of data, does not include "slab allocator" The memory that is occupied by itself, and the administrative space that is set up to save the data. Therefore, it should be noted that the actual memory allocations for the memcached process are larger than the specified capacity.
Most of the data Mixi saved in memcached is small. This way, the size of the process is much larger than the specified capacity. Therefore, we repeatedly change the memory allocation to verify that the size of 3GB does not trigger swap, this is the value of the current application.
Memcached using methods and clients
Now, Mixi's service uses 200 or so memcached servers as a pool. Each server has a capacity of 3GB, so there is a huge memory database of nearly 600GB. The client library interacts with the server by using the cache::memcached::fast of the car mentioned in this series many times. Of course, the cached distributed algorithm uses the consistent hashing algorithm introduced for the 4th time.
- cache::memcached::fast-search.cpan.org
The use of memcached on the application tier is determined and implemented by the engineer who developed the application. However, in order to prevent wheel rebuild and prevent cache::memcached::fast from happening again, we provide the Cache::memcached::fast wrap module and use it.
Maintaining connectivity through Cache::memcached::fast
In the case of cache::memcached, the connection to the memcached (file handle) is stored in the class variable within the cache::memcached package. In environments such as Mod_perl and fastcgi, the variables in the package do not restart at any time as CGI, but remain in the process. The result is that the connection to the memcached is not disconnected, reducing the overhead of TCP connection creation, and also preventing TCP port resource exhaustion due to repeated TCP connections and disconnects in a short time.
However, Cache::memcached::fast does not have this functionality, so you need to keep the Cache::memcached::fast object in class variables outside of the module to ensure a persistent connection.
Package gihyo::memcached;
Use strict;
Use warnings;
Use Cache::memcached::fast;
My @server_list = qw/192.168.1.1:11211 192.168.1.1:11211/;
my $fast; # # for holding objects
Sub New {
My $self = bless {}, shift;
if (! $fast) {
$fast = cache::memcached::fast->new ({servers = \ @server_list});
}
$self->{_fast} = $fast;
return $self;
}
Sub get {
my $self = shift;
$self->{_fast}->get (@_);
}
In the above example, the Cache::memcached::fast object is saved to the class variable $fast.
Processing and rehash of public data
Data such as cached data, setup information, and so on that are shared by all users, such as news on Mixi's home page, can occupy many pages and have a very large number of accesses. Under these conditions, access can easily be centralized to a memcached server. The access set itself is not a problem, but once the server in the access set fails to cause the memcached to connect, it can cause huge problems.
As mentioned in the 4th installment of the series, cache::memcached has the rehash function, that is, when the server that holds the data cannot be connected, the hash value is calculated again, and the other servers are connected.
However, Cache::memcached::fast does not have this feature. However, it can no longer connect to the server in a short period of time when the connection server fails.
My $fast = Cache::memcached::fast->new ({
Max_failures = 3,
Failure_timeout = 1
});
Max_failures is no longer connected to the memcached server in failure_timeout seconds if the last connection failed. Our setting is 1 seconds and more than 3 times.
In addition, Mixi also sets a naming convention for the key names of cached data that is shared by all users, and data that conforms to the naming convention is automatically saved to multiple memcached servers, and only one server is selected from it when taken. Once you have created the library, you can make memcached server failures no longer having other effects.
memcached Application Experience
This concludes with an introduction to the memcached internal constructs and function libraries, followed by some other application experiences.
Start with Daemontools
Normally memcached runs fairly stably, but Mixi now uses the latest version of 1.2.5 that has happened several times memcached process has died. The architecture guarantees that the service will not be affected even if there are several memcached failures, but for servers memcached the process to die, as long as the memcached is restarted, the method of monitoring the memcached process and starting automatically is used. So the daemontools was used.
Daemontools is a set of UNIX service management tools developed by QMail's author DJB, where programs called supervise are used for service startup, stopped service restarts, and so on.
The installation of Daemontools is not described here. Mixi uses the following run script to start the memcached.
#!/bin/sh
if [-f/etc/sysconfig/memcached];then
. /etc/sysconfig/memcached
Fi
EXEC 2>&1
Exec/usr/bin/memcached-p $PORT-U $USER- m $CACHESIZE-C $MAXCONN $OPTIONS
Monitoring
Mixi uses an open-source monitoring software called "Nagios" to monitor memcached.
The plugin can be easily developed in Nagios and can be used to monitor memcached's get, add, and so on in detail. However, Mixi only uses the stats command to confirm the operation status of the memcached.
Define Command {
Command_name check_memcached
Command_line $USER 1$/check_tcp-h $HOSTADDRESS $-P 11211-t 5-e-S ' stats\r\nquit\r\n '-e ' uptime '-M crit
}
In addition, Mixi transforms the results of the stats catalog into graphs through RRDtool, performs performance monitoring, and makes daily memory usage reports, which are shared with developers via email.
Performance of memcached
As has been described in the series, memcached performance is excellent. Let's take a look at the actual case of Mixi. The chart described here is the most centralized memcached server used by the service.
Figure 2 Number of requests
Figure 3 Flow
Figure 4 Number of TCP connections
The number of requests, traffic, and TCP connections from top to bottom. The maximum number of requests is 15000QPS, the traffic reaches 400Mbps, at this time the number of connections has exceeded 10,000. The server does not have special hardware, which is the normal memcached server that is introduced at the beginning. The CPU utilization at this time is:
Figure 5 CPU utilization
Visible, there is still the idle part. As a result, memcached performance is very high and can be a place for Web application developers to safely save temporary or cached data.
Compatible applications
Memcached implementations and protocols are very simple, so there are many implementations that are compatible with memcached. Some powerful extensions can write memcached memory data to disk, enabling data persistence and redundancy. For the 3rd time, the memcached storage layer will become extensible (pluggable) and gradually support these features.
Here are a few applications that are compatible with memcached.
-
Repcached
-
A patch that provides replication (replication) functionality for memcached.
-
Flared
-
Store to QDBM. The functions of asynchronous replication and fail over are also realized.
-
Memcachedb
-
Store to Berkleydb. The message queue is also implemented.
-
Tokyo Tyrant
-
store data in the Tokyo Cabinet. It is not only compatible with the Memcached protocol, but also accessed via HTTP.
Tokyo Tyrant Case
Mixi uses the Tokyo Tyrant in the above compatible applications. Tokyo Tyrant is a network interface for the Tokyo Cabinet dbm developed by Ping Lam. It has its own protocol, but it also has a memcached compatible protocol, and it can also exchange data over HTTP. Tokyo Cabinet Although it is an implementation that writes data to disk, it is very fast.
Mixi does not use the Tokyo Tyrant as a cache server, but instead uses it as a DBMS to hold key-value pairs together. Used primarily as a database to store the user's last access time. It is related to almost all Mixi services, which update data every time a user accesses a page, so the load is quite high. The processing of MySQL is cumbersome, the use of memcached alone to save data and the possibility of loss of data, so the introduction of the Tokyo Tyrant. But there is no need to re-develop the client, just use the cache::memcached::fast intact, which is one of the advantages. For more information about Tokyo Tyrant, please refer to the company's development blog.
- Mixi Engineers ' Blog-tokyo tyrantによる resistant to high load dbの structure
- Mixi Engineers ' Blog-tokyo (cabinet| Tyrant) New machine can
Summarize
By this time, the "Memcached Comprehensive Analysis" series is over. We introduce the basic, internal structure, dispersion algorithm and application of memcached. After reading, if you can be interested in memcached, it is our pleasure. For information on the system and application of Mixi, please refer to the company's development blog. Thank you for reading.
Memcached a comprehensive analysis of the. memcached Application and Compatibility program