Redis Accidental connection Failure Case analysis

Source: Internet
Author: User
Tags ack redis version server memory redis server

Author

Zhang Yanjun: The senior DBA of Ctrip Technical Support Center, has a strong interest in database architecture and troubleshooting analysis.
Shoushang: Ctrip Technical Support Center Senior DBA, mainly responsible for the operation and maintenance of the Ctrip Redis and DB, in the automation operation and maintenance, process and monitoring and other aspects of the practical experience, like in-depth analysis of problems, improve team operation and maintenance efficiency.

"Problem description"

? production environment There is a Redis will occasionally occur connection failure error, the time point of the error, the client IP is not particularly obvious rule, after a while, the error will automatically recover.
? The following is a client error message:

Unable to Connect redis server:   在 CRedis.Third.Redis.RedisNativeClient.CreateConnectionError()   在 CRedis.Third.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs)   在 CRedis.Client.Entities.RedisServer.<>c__DisplayClassd`1.

From the error message, it should be caused by the connection not to Redis. The version of Redis is 2.8.19. Although the version is a bit old, the basic operation is stable.
? online environment only this cluster has occasional error. One of the more obvious features of this cluster is that the client servers are more numerous and have hundreds of stations.

"Problem Analysis"

? From the error message, the client is not connected to the server. Common causes include the following:

    • A common reason for this is that the port is exhausted, the network connection is being checked, and the number of TCP connections is far from the port exhaustion scenario at the point of the problem, so this is not the root cause of redis connectivity.
    • Another common scenario is a slow query on the server that causes the Redis service to block. On the Redis server, we crawl statements that run for more than 10 milliseconds, and we don't catch the slow-running statements.

? From the monitoring of server-side deployment, there is a sudden spike in the number of connections, from 3,500 connections to 4,100 connections. As shown:

At the same time, the server side shows that the Redis service side has dropped packets: 345539–344683 = 856 packets.

Sat Apr  7 10:41:40 CST 2018   1699 outgoing packets dropped   92 dropped because of missing route   344683 SYNs to LISTEN sockets dropped   344683 times the listen queue of a socket overflowed
Sat Apr  7 10:41:41 CST 2018   1699 outgoing packets dropped   92 dropped because of missing route   345539 SYNs to LISTEN sockets dropped   345539 times the listen queue of a socket overflowed

The cause of the client error is basically determined because the connection is too fast, causing the service backlog queue to overflow and the server-side reset is connected.

"About the backlog overflow"

? In high-concurrency short-connection services, this is a very common type of TCP error. A normal TCP build process is as follows:

? 1.client send One (SYN) to server

? 2.server returns a (Syn,ack) to the client

? 3.client returns one (ACK)

Three handshake ends, the client can continue to send the packet to the server, but the server is not ready at this time, as shown in:

In the TCP protocol implemented by the BSD version kernel, the server-side build process requires two queues, one for the SYN queue and one for the accept queue. The former is called a half-open connection (or semi-connected) queue, which is added to the queue when it receives a SYN sent by the client. (a common way to attack a network is to continuously send SYN but not send an ACK, causing the server-side half-open queue to explode and server-side denial of service.) The latter is called the full-connection queue, the server returns (Syn,ack), after receiving the client sends an ACK (at which point the client will assume that the connection is complete and will start sending the PSH package), and if the accept queue is not full, then the server from the SYN The queue moves the connection information to the Accept queue, and if the accept queue overflows at this point, the server's behavior depends on the configuration. If the Tcp_abort_on_overflow is 0 (the default), then drop the PSH packet sent by the client directly, then the client will go into the resend process, after a period of time, the server side resend Syn,ack, start again from the second step of the connection, if the TCP _abort_on_overflow is 1, then the server side discovers that the accept queue is full and sends reset directly.

The Wireshark search found that more than 2000 requests for Redis server-side initiation were initiated within a second. We tried to modify the TCP backlog size from 511 to 2048, and the problem was not resolved. So this kind of fine-tuning does not solve the problem completely.

"Network packet Analysis"

We use Wireshark to identify the exact point of time and cause of network congestion. We have an accurate time point for the error, first with Editcap to cut the super-large TCP packet, cut into 30 seconds interval, and through the Wireshark I/O 100ms interval Analysis Network congestion accurate point of time:

? Depending on the icon, you can see that the TCP packets is present in block.

To analyze the network packets before and after the block, the network packets are in the following situations:

Time
Source Dest Description
?? 12:01:54.6536050?? ?? Redis-server?? ?? Clients?? ?? Tcp:flags= ... Ap...
?? 12:01:54.6538580?? ?? Redis-server?? ?? Clients?? ?? Tcp:flags= ... Ap...
?? 12:01:54.6539770?? ?? Redis-server?? ?? Clients?? ?? Tcp:flags= ... Ap...
?? 12:01:54.6720580?? ?? Redis-server?? ?? Clients?? ?? Tcp:flags= ... A.. S..
?? 12:01:54.6727200?? ?? Redis-server?? ?? Clients?? ?? Tcp:flags= ... A......
?? 12:01:54.6808480?? ?? redis-server?? ?? clients?? ?? tcp:flags= ... AP ... ..
?? 12:01:54.6910840?? ?? Redis-server?? ?? Clients?? ?? Tcp:flags= ... A... S.,
?? 12:01:54.6911950?? ?? Redis-server?? ?? Clients?? ?? Tcp:flags= ... A......
?? ... ?? ?? ... ?? ?? ... ?? ?? ...
?? 12:01:56.1181350?? ?? redis-server?? ?? clients?? ?? tcp:flags= ... AP ....

12:01:54.6808480, the Redis server sends a push package to the client, which is the return of a result for the query request. The following packages are all connected, including ACK packets, ACK acknowledgement packets, and reset RST packages, followed by a push packet being issued at 12:01:56.1181350. The middle interval is 1.437287 seconds. In other words, during this 1.437287-second period, the server side of Redis, in addition to making a query, the other operations are to do the building, or to refuse to connect.

Client error before and after the logic has been clear, Redis-server card 1.43 seconds, the client's connection pool is full, crazy new connection, the server's accept queue full, direct denial of service, client error. Beginning to suspect that the client sent a special command, it is necessary to confirm the client's last few commands, find the first package before Redis-server die, install a wireshark Redis plugin, see the last few commands are simple get, And the key-value are small enough to take 1.43 seconds to complete. There was no slow log on the service side, when the barrier was deadlocked again.

"Further analysis"

To understand what Redis server is doing in 1.43 seconds, we use Pstack to crawl the information. Pstack is essentially a gdb attach. High-frequency crawling can affect redis throughput. The cycle of 0.5 seconds without a brain scratch, when the Redis-server card to catch the stack as follows (filtered the useless stack information):

Thu 11:29:18 CST 2018Thread 1 (Thread 0x7ff2db6de720 (LWP 8378)): #0 0x000000000048cec4 in?? () #1 0x00000000004914a4 in Je_arena_ralloc () #2 0X00000000004836A1 in Je_realloc () #3 0x0000000000422cc5 in Zrealloc () #4 0x00000000004213d7 in Sdsremovefreespace ()#5 0x000000000041ef3c in Clientscronresizequerybuffer ()#6 0x00000000004205de in Clientscron () #7 0x0000000000420784 in Servercron () #8 0x0000000000418542 in AeProcessEvents ( ) #9 0x000000000041873b in Aemain () #10 0X0000000000420FCE in Main () Thu 11:29:19 CST 2018Thread 1 (Thread 0x7ff2db   6de720 (LWP 8378)): #0 0x0000003729ee5407 in Madvise () from/lib64/libc.so.6#1 0x0000000000493a4e in Je_pages_purge () #2 0x000000000048cf70 in?? () #3 0x00000000004914a4 in Je_arena_ralloc () #4 0X00000000004836A1 in Je_realloc () #5 0x0000000000422cc5 in Zrealloc () #6 0x00000000004213d7 in Sdsremovefreespace ()#7 0x000000000041ef3c in Clientscronresizequerybuffer ()#8 0x00000000004205de in Clientscron () #9 0x0000000000420784 in Servercron () #10 0x0000000000418542 in AeProcessEvents ( ) #11 0x000000000041873b in Aemain () #12 0X0000000000420FCE in Main () Thu 11:29:19 CST 2018Thread 1 (Thread 0x7ff2db 6de720 (LWP 8378)): #0 0x000000000048108c in Je_malloc_usable_size () #1 0x0000000000422be6 in Zmalloc () #2 0x00000000004  220BC in Sdsnewlen () #3 0x000000000042c409 in Createstringobject () #4 0x000000000042918e in Processmultibulkbuffer () #5 0x0000000000429662 in ProcessInputBuffer () #6 0x0000000000429762 in Readqueryfromclient () #7 0x000000000041847c in AePr Ocessevents () #8 0x000000000041873b in Aemain () #9 0X0000000000420FCE in Main () Thu 11:29:20 CST 2018Thread 1 (Th Read 0x7ff2db6de720 (LWP 8378)): #0 0x000000372a60e7cd in Write () from/lib64/libpthread.so.0#1 0x0000000000428833 in SE  Ndreplytoclient () #2 0x0000000000418435 in aeprocessevents () #3 0x000000000041873b in Aemain () #4 0X0000000000420FCE in Main ()

After repeated fetching, the suspect stack clientscronresizequerybuffer position is found from the stack, under the Servercron () function, the Redis-server internal timing schedule is not under the user thread, This explains why there is no slow query when the card is dead.

Check out the Redis source to see exactly what Redis-server is doing:

Clientscron (server.h): #define Clients_cron_min_iterations 5void Clientscron (void) {/* Make sure to process at least Nu Mclients/server.hz of clients * per call. Since This function is called server.hz times per second * We be sure that in the worst case we process all the Clien TS in 1 * second.    */int numclients = Listlength (server.clients);    int iterations = numclients/server.hz;    mstime_t now = Mstime (); /* Process At least a few clients while we is at it, even if we need * to Process less than clients_cron_min_iteratio NS to meet we contract * of processing each client once per second.                      */if (iterations < clients_cron_min_iterations) iterations = (Numclients < clients_cron_min_iterations)?    Numclients:clients_cron_min_iterations;        while (Listlength (server.clients) && iterations--) {client *c;        ListNode *head;         /* Rotate The list, take the current head, process. * this if The client must is removed from the list it's the * first element and we don ' t incur into O (N) computation.        */Listrotate (server.clients);        Head = Listfirst (server.clients);        c = Listnodevalue (head);         /* The following functions do different service checks on the client. * The protocol is this they return Non-zero if the client was * terminated.        */if (Clientscronhandletimeout (C,now)) continue;    if (Clientscronresizequerybuffer (c)) continue; }}

Clientscron first determines the number of current clients, which is used to control the number of clean connections, and the number of connections to a single instance of a production server is less than 5000, that is, the number of connections that are cleaned at a time is 50.

Clientscronresizequerybuffer (SERVER.H):/* The client query buffer is an SDS.C string that can end with a lot of * FREE spa Ce not used, this function reclaims space if needed. * * The function always returns 0 as it never terminates the client.    */int clientscronresizequerybuffer (client *c) {size_t querybuf_size = sdsallocsize (C-&GT;QUERYBUF);    time_t idletime = server.unixtime-c->lastinteraction; /* Only in the following two cases resize query buffer: * 1) query buffer > big_arg (defined in Server.h in the # define PROTO_MBULK_BIG_ARG (1024*32     ) and this buffer is less than the peak used by clients for a period of time. * 2) The client is idle for more than 2s and buffer size is greater than 1k.          */if ((Querybuf_size > Proto_mbulk_big_arg) && (querybuf_size/(c->querybuf_peak+1)) > 2) | |    (Querybuf_size > 1024x768 && idletime > 2)) {/* Only resize the query buffer if it is actually wasting space. */if (Sdsavail (C-&GT;QUERYBUF) > 1024        ) {c->querybuf = Sdsremovefreespace (C-&GT;QUERYBUF);   } }/* Reset the peak again to capture the peak memory usage in the next * cycle.    */c->querybuf_peak = 0; return 0;}

If the Redisclient object's query buffer satisfies the condition, then it is resize directly out. The connection that satisfies the condition is divided into two kinds, one is really big, is bigger than the peak that the client uses for a period of time, there is also a very idle (idle>2), both of these must satisfy one condition, that is, the buffer free part exceeds 1k. So the reason that redis-server stuck is that there are just 50 large or idle and free size more than 1k connection of the same time the loop did the resize, because Redis is a single-threaded operation of the program, so block the client. Then the solution to this problem is very clear, so that the frequency of resize or resize to perform faster.

Now that the problem is on query buffer, let's take a look at where this thing has been modified:

Readqueryfromclient (NETWORKING.C): redisclient *createclient (int fd) {redisclient *c = zmalloc (sizeof (redisClient));     /* Passing-1 as FD It is possible to create a non connected client. * This was useful since all, the Redis commands needs to being executed * in the context of a client. When commands is executed in other * contexts (for instance a Lua script) we need a non connected client.        */if (FD! =-1) {anetnonblock (NULL,FD);        Anetenabletcpnodelay (NULL,FD);        if (server.tcpkeepalive) anetkeepalive (null,fd,server.tcpkeepalive); if (Aecreatefileevent (server.el,fd,ae_readable, readqueryfromclient, c) = = Ae_err) {Close (FD            );            Zfree (c);        return NULL;    }} selectdb (c,0);    C->id = server.next_client_id++;    C-&GT;FD = FD;    C->name = NULL; C->bufpos = 0;c->querybuf = Sdsempty (); Initialization is 0readQueryFromClient (NETWORKING.C): void Readqueryfromclient (aeeventloop *el, int fd, void *privdata, int mask) {R    Edisclient *c = (redisclient*) privdata;    int nread, Readlen;    size_t Qblen;    Redis_notused (EL);    Redis_notused (mask);    Server.current_client = C;    Readlen = Redis_iobuf_len; /* If This is a multi bulk request, and we were processing a bulk reply * That's large enough, try to maximize the pro Bability that the query * buffer contains exactly the SDS string representing the object and even * at the risk of RE Quiring more read (2) calls. This is the function * Processmultibulkbuffer () can avoid copying buffers to create the * Redis Object Representi ng the argument. */if (C->reqtype = = Redis_req_multibulk && C->multibulklen && C->bulklen! =-1 &AMP;&A mp        C->bulklen >= redis_mbulk_big_arg) {int remaining = (unsigned) (c->bulklen+2)-sdslen (C-&GT;QUERYBUF); if (Remaining < readLen) Readlen = remaining;    } Qblen = Sdslen (C-&GT;QUERYBUF);     if (C->querybuf_peak < Qblen) C->querybuf_peak = Qblen;c->querybuf = sdsmakeroomfor (C->querybuf, Readlen);is going to be enlarged here.

This shows that c->querybuf after the first read command after the size will be allocated at least 1024*32, so go back to see resize clean logic is obviously problematic, each used to the size of the query buffer is at least 1024* 32, but clean up when the judge condition is >1024, that is, all the idle>2 used connection will be resize off, the next time you receive the request and then reassign to 1024*32, this is actually not necessary, in the more frequent access to the cluster, Memory is frequently reclaimed for redistribution, so we try to change the cleanup criteria to the following so that we can avoid most unnecessary resize operations:

if (((querybuf_size > REDIS_MBULK_BIG_ARG) &&         (querybuf_size/(c->querybuf_peak+1)) > 2) ||         (querybuf_size > 1024*32 && idletime > 2))    {        /* Only resize the query buffer if it is actually wasting space. */        if (sdsavail(c->querybuf) > 1024*32) {            c->querybuf = sdsRemoveFreeSpace(c->querybuf);        }    }

The side effect of this transformation is the memory overhead, according to an instance 5k connection calculation, 5000*1024*32=160m, this memory consumption for hundreds of G memory server fully acceptable.

"Problem Recurrence"

After using the modified source of Redis server, the problem is still reproduced, the client will still report the same type of error, and error, the server memory will still appear jitter. Crawl memory stack information as follows:

Thu June 21:56:54 CST 2018#3 0x0000003729ee893d in Clone () From/lib64/libc.so.6thread 1 (Thread 0x7f2dc108d720 (LWP 2 7851)): #0 0x0000003729ee5400 in Madvise () from/lib64/libc.so.6#1 0x0000000000493a1e in Je_pages_purge ()#2 0x000000000048cf40 in Arena_purge ()#3 0x00000000004a7dad in Je_tcache_bin_flush_large () #4 0x00000000004a85e9 in Je_tcache_event_hard () #5 0x000000000042c 0b5 in Decrrefcount () #6 0x000000000042744d in Resetclient () #7 0x000000000042963b in ProcessInputBuffer () #8 0x0000000 000429762 in Readqueryfromclient () #9 0x000000000041847c in aeprocessevents () #10 0x000000000041873b in AeMain () #11 0x00 00000000420FCE in Main () Thu June 21:56:54 CST 2018Thread 1 (Thread 0x7f2dc108d720 (LWP 27851)): #0 0x0000003729ee5400 I n madvise () from/lib64/libc.so.6#1 0x0000000000493a1e in Je_pages_purge ()#2 0x000000000048cf40 in Arena_purge ()#3 0x00000000004a7dad in Je_tcache_bin_flush_large () #4 0x00000000004a85e9 in Je_tcache_event_hard () #5 0x000000000042c 0b5 in Decrrefcount () #6 0x000000000042744d in Resetclient () #7 0x000000000042963b in ProcessInputBuffer () #8 0x0000000 000429762 in Readqueryfromclient () #9 0x000000000041847c in aeprocessevents () #10 0x000000000041873b in AeMain () #11 0x00 00000000420FCE in Main ()

Obviously, the problem of querybuffer being frequently resize has been optimized, but the client error will still appear. This is another impasse. Are there other factors that cause query buffer resize to slow down? We crawl Pstack again. But at this time, jemalloc aroused our attention. Recall the memory allocation mechanism of REDIS, Redis to avoid libc memory is not released caused a lot of memory fragmentation problem, by default, Jemalloc used as memory allocation management, this error stack information is Je_pages_purge () Redis recycles dirty pages in call Jemalloc. Let's see what Jemalloc has done:

Arena_purge (ARENA.C) static voidarena_purge (arena_t *arena, bool all) {arena_chunk_t *chunk;    size_t Npurgatory;        if (config_debug) {size_t ndirty = 0;        Arena_chunk_dirty_iter (&arena->chunks_dirty, NULL, CHUNKS_DIRTY_ITER_CB, (void *) &ndirty);    ASSERT (Ndirty = = Arena->ndirty);    } assert (Arena->ndirty > Arena->npurgatory | | all);    Assert ((arena->nactive >> Opt_lg_dirty_mult) < (arena->ndirty-arena->npurgatory) | | all);    if (config_stats) arena->stats.npurge++;    Npurgatory = Arena_compute_npurgatory (arena, all);    Arena->npurgatory + = Npurgatory;        while (Npurgatory > 0) {size_t npurgeable, npurged, nunpurged; /* Get next chunk with dirty pages.        */chunk = Arena_chunk_dirty_first (&arena->chunks_dirty);            if (chunk = = NULL) {arena->npurgatory-= npurgatory;        Return     } npurgeable = chunk->ndirty;   ASSERT (Npurgeable! = 0); if (npurgeable > npurgatory && chunk->nruns_adjac = = 0) {arena->npurgatory + = Npurgeable            -Npurgatory;        Npurgatory = npurgeable;        } arena->npurgatory-= npurgeable;        Npurgatory-= npurgeable;        npurged = Arena_chunk_purge (Arena, Chunk, all);        nunpurged = npurgeable-npurged;        Arena->npurgatory + = nunpurged;    Npurgatory + = nunpurged; }}

Jemalloc Each collection will judge all the Chunck that should actually be cleaned up and count the cleanup, which is a luxury for highly responsive systems, so we consider upgrading the Jemalloc version to optimize purge performance. After the release of Redis 4.0, performance has been greatly improved, and the memory can be reclaimed by command, and we are also preparing to upgrade on the line, followed by the 4.0 release of the Jemalloc version of 4.1,jemalloc version of the Jemalloc 4.0 version of Arena_ Purge () made a lot of optimizations, removed the call of the counter, simplified a lot of judgment logic, added the Arena_stash_dirty () method to merge the previous calculation and judgment logic, increased the purge_runs_sentinel, with keeping the dirty block in each arena The way in which the LRU replaces the previous way of keeping the dirty block in the dirty-run-containing Chunck of the arena tree drastically reduces the volume of the Dirty Block purge and no longer moves the block of memory during the memory reclamation. The code is as follows:

Arena_purge (ARENA.C) static voidarena_purge (arena_t *arena, bool all) {chunk_hooks_t chunk_hooks = chunk_hooks_get (aren    a);    size_t Npurge, npurgeable, npurged;    arena_runs_dirty_link_t Purge_runs_sentinel;    extent_node_t Purge_chunks_sentinel;    Arena->purging = true; /* * Calls to Arena_dirty_count () is disabled even for debug builds * because overhead grows nonlinearly as Memor     Y usage increases.        */if (false && config_debug) {size_t ndirty = Arena_dirty_count (Arena);    ASSERT (Ndirty = = Arena->ndirty);    } assert ((arena->nactive >> arena->lg_dirty_mult) < Arena->ndirty | | all);    if (config_stats) arena->stats.npurge++;    Npurge = Arena_compute_npurge (arena, all);    Qr_new (&purge_runs_sentinel, Rd_link);    Extent_node_dirty_linkage_init (&purge_chunks_sentinel); npurgeable = Arena_stash_dirty (Arena, &chunk_hooks, all, Npurge, &purge_runs_sentinel, &purge_chunks_se NtiNel);    ASSERT (npurgeable >= npurge);    npurged = arena_purge_stashed (Arena, &chunk_hooks, &purge_runs_sentinel, &purge_chunks_sentinel);    ASSERT (npurged = = npurgeable);    Arena_unstash_purged (Arena, &chunk_hooks, &purge_runs_sentinel, &purge_chunks_sentinel); Arena->purging = false;}
"Problem Solving"

We actually have multiple options. You can use Google's tcmalloc instead of Jemalloc, you can upgrade the Jemalloc version, and so on. Based on the above analysis, we tried to upgrade the Jemalloc version, and the actual operation was resolved to upgrade the Redis version. We upgraded the Redis version to 4.0. After 9, the tricky issue of online client connection timeouts was resolved.

"Summary of Issues"

Redis in the production environment because of its support for high concurrency, fast response, easy operation is widely used, for operations personnel, its response time requirements bring a variety of problems, Redis connection timeout problem is one of the more typical, from the discovery of problems, the client connection timeout, By crawling the client and server network packets, the memory stack positioning problem, is also confused by some of the artifacts, and finally through the upgrade Jemalloc (Redis) version to solve the problem, this time the most worthy of summary and reference is the entire analysis of the idea.

Redis Accidental connection Failure Case analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.