Redis cluster Implementation (i) cluster architecture and initialization

Source: Internet
Author: User
Tags failover redis cluster

Redis is a high-availability, high-performance, high-scalability kv Storage database that is memory-based and supports persistent storage, compared to previous kv storage Memcached , not only support the value of the type greatly increased, and also support the persistence of data, to compensate for the memcached of the shortcomings of the unsustainable, but before the 3.0 Redis does not support clustering, which is one reason Redis cannot be heavily deployed before 3.0 , but since 3.0 Redis With the support of the cluster function,Redis started a lot of replacements for the previous memcached, and today I learned from the source code level how Redis implements cluster functionality.

I look at the source code is redis-3.0 , you can download the link below.

Http://download.redis.io/releases/redis-3.0.0-rc1.tar.gz

Redis clusters are not architectures such as Namenode and datanode like HDFS . Instead, an improved consistent hashing algorithm is used to fragment the data, evenly distributed on each master node, and each Master node has a corresponding slave node to replicate Master node, so that master can be elected as Master when it goes down. The overall architecture is as follows:


takes a consistent hashing algorithm to ensure that each chunk of data is mapped over the interval of 0-16384 , Then part of this interval is given to a master to serve ( of course not so simple . Each client access the corresponding master , someone may want to ask client is how to know the data in which master , in fact client also do not know, Client accesses a master , then master discovery data is not in this node, then master will tell client store client the master address where you want the data, and then client will access the correct .

So, how is theredis cluster set up, is it dozens of hundred machines at the same time automatic connection? Of course not, when there is only one machine, you can think of it as a cluster with only one machine, then client login Master executes cluster meet <ip> <port> to add a machine with the specified IP address to the cluster. This way, the cluster will have two machines. In this way, a single set of additions, the realization of a large-scale Redis cluster.

First look at the data structure related to the cluster, these are the basis of the cluster implementation.

Node state struct struct Clusternode {//Create node time mstime_t CTime;    The node ID, generated by random number, is 40 in length, and each character is a 16-character Char Name[redis_cluster_namelen];    The node state identifies the bits, such as whether the node is the primary or the slave node.      int flags;     Node Current configuration era uint64_t Configepoch;    The node-stored data slot bitmap, Redis_cluster_slots is the number of blocks of the REDIS cluster, equivalent to the top of the 16384, if the value is 1 for the data slot data stored in the current node, if 0 means not in this node.     unsigned char SLOTS[REDIS_CLUSTER_SLOTS/8];      The number of data slots stored by this node is int numslots;     If this node is the primary node, this field represents the number of nodes from int numslaves;    Pointer array, pointing to each slave node struct clusternode **slaves;                                                                                                If this is a slave node, then point to the master node of the struct Clusternode *slaveof;          The last time the ping packet was sent mstime_t ping_sent;     The timestamp of the last received Pong packet mstime_t pong_received;         The last time set to the Fail status mstime_t fail_time;        The last time to vote for a node mstime_t voted_time;     The last time a copy offset is received from this node mstime_t repl_offset_time;         The copy offset of this node is long long repl_offset; The IP of the nodeAddress Char Ip[redis_ip_str_len];                   Port number of the node int port;     Save connection related information Clusterlink *link; A linked list that records all other node's Downline reports to that node list *fail_reports; };

Then is the structure that records the connections between the clusters

Clusterlink contains all the information required to communicate with other nodes typedef struct CLUSTERLINK {                                                                                                                                      //Connection creation time    mstime_t CTime;            TCP Socket Descriptor    int fd;              An output buffer that holds messages (message) waiting to be sent to other nodes.    SDS sndbuf;         An input buffer that holds messages received from other nodes.    SDS rcvbuf;         The node associated with this connection, if not, is a NULL    struct clusternode *node;} clusterlink;

Then there is a structure that records the state of the cluster, each of which has a structure that represents the state of the current cluster.

typedef struct CLUSTERSTATE {    Pointer to the current node Clusternode *myself;    The current configuration era of the cluster, which is used to implement the failover uint64_t Currentepoch;    The current status of the cluster: online or offline int state;    The number of nodes in the cluster that handle at least one slot.       int size;       The cluster node list (including the myself node)//Dictionary key is the name of the node, the value of the dictionary is clusternode structure dict *nodes; Node blacklist, used for CLUSTER forget command//To prevent forget commands from being added to the cluster again//(but now it doesn't seem to be in use, obsolete?). Is it still not implemented?    ) Dict *nodes_black_list; Records the slots to be migrated from the current node to the target node, and the target node of the migration///Migrating_slots_to[i] = NULL indicates that slot I was not migrated//migrating_slots_to[i] = clusternode_a    Slot I to migrate from this node to node A Clusternode *migrating_slots_to[redis_cluster_slots]; Records the slots to migrate from the source node to this node, and the source node for the migration///Importing_slots_from[i] = NULL indicates that slot I was not imported//importing_slots_from[i] = Clusternode_    A indicates that the slot I clusternode *importing_slots_from[redis_cluster_slots] is being imported from Node A; node responsible for processing individual slots//e.g. slots[i] = CLUsternode_a indicates that groove I is treated by node a clusternode *slots[redis_cluster_slots];    Jumping table, the table with slots as the score, the key as a member, to order the Groove ordered//when the need for some slots (range) operation, the jump table can provide convenient//specific operation defined in the DB.C inside Zskiplist *slots_to_keys;    The following fields are used for the failover election//Last election or the time of the next election mstime_t Failover_auth_time;     The number of votes received by the node int failover_auth_count;     A value of 1 indicates that this node has sent a poll request to another node, int failover_auth_sent;    int Failover_auth_rank;     uint64_t Failover_auth_epoch;               /* Shared manual failover Status *////manual failover Execution time limit mstime_t mf_end;         /* Manual failover status of the primary server */Clusternode *mf_slave;    /* Manual failover status from the server */long long mf_master_offset;              Indicates whether manual failover can start with the flag value//value of non 0 when the individual master can start voting int mf_can_start; /* The followign fields is uesd by Masters to take state on elections.       */* These domains are used by the primary server to record the status of the election *///Cluster Last polling era uint64_t Lastvoteepoch;    What to do before entering the next event loop, record int todo_before_sleep with each flag;     Number of messages sent over cluster connection long long stats_bus_messages_sent; Through CLThe number of messages received uster long long stats_bus_messages_received; } clusterstate;

The basic structure has been introduced, let's look at the code implementation of the cluster.

First in the redis Single-node initialization code, this is the first step of the cluster, the first to start a single-node service.

In the Redis source code, the main function in theredis.c file is the beginning of Redis-server , because we only care about the implementation code of the cluster, Some have little to do with the cluster, I ignored it.

int Main (int argc, char **argv) {

....................

//Description The user specified the parameter, we need to check whether the user specified the configuration file

if (argc >= 2) {

........................

//Read configuration file

Loadserverconfig (configfile,options);

}

and then jump in Loadserverconfig function to parse the string configuration. In the loadserverconfig function There is the following code, if the configuration item has cluster-enabled, we set the The server.cluster_enabled is 1, which indicates the cluster function is turned on.

else if (!strcasecmp (argv[0],"cluster-enabled") && ARGC = =2) {

if ((server.cluster_enabled = Yesnotoi (argv[1])) = =-1) {

Err = "argument must be ' yes ' or ' no '"; Goto Loaderr;

}

then in the main function, read the play configuration file, execute the Initserver () function, inside the initserver function ,

// if the server Cluster mode Open, then initialize Cluster

If (server.cluster_enabled) Clusterinit ();

Then we go into the clusterinit function and look at the initialization code of the standalone cluster setup.

Initialize cluster void Clusterinit (void) {int saveconf = 0;    Initialize the configuration, Server.cluster is the clusterstate structure, each node holds one.    Server.cluster = Zmalloc (sizeof (clusterstate));    Point to its own node pointer server.cluster->myself = NULL;     The initial configuration era is 0 Server.cluster->currentepoch = 0;    Initial configuration state FAIL server.cluster->state = Redis_cluster_fail;     The number of clusters is 1 server.cluster->size = 1;     Server.cluster->todo_before_sleep = 0;    Build the hash structure of the node map server.cluster->nodes = Dictcreate (&clusternodesdicttype,null);    Blacklist of nodes:    Server.cluster->nodes_black_list = Dictcreate (&clusternodesblacklistdicttype,null);     Perform election-related variable initialization server.cluster->failover_auth_time = 0;     Server.cluster->failover_auth_count = 0;     Server.cluster->failover_auth_rank = 0;     Server.cluster->failover_auth_epoch = 0;     Server.cluster->lastvoteepoch = 0;     server.cluster->stats_bus_messages_sent = 0;     server.cluster->stats_bus_messages_received = 0; Initialize Slots   memset (server.cluster->slots,0, sizeof (server.cluster->slots));    Clear the Slot array 0 clustercloseallslots ();    /* Lock the cluster configuration file to ensure that each node uses its own profile */if (clusterlockconfig (server.cluster_configfile) = = Redis_err) exit (1); /* Load the cluster configuration file for this node.  */if (clusterloadconfig (server.cluster_configfile) = = Redis_err) {/* If no cluster configuration file is found, add yourself to the cluster. */Myself = Server.cluster->myself = Createclusternode (null,redis_node_myself|        Redis_node_master);        Redislog (Redis_notice, "No cluster configuration found, I ' m%.40s", myself->name);        Clusteraddnode (myself);    saveconf = 1;    }//Save nodes.conf file if (saveconf) Clustersaveconfigordie (1);    Listening on TCP port server.cfd_count = 0;                   if (Server.port > (65535-REDIS_CLUSTER_PORT_INCR)) {Redislog (redis_warning, "REDIS port number too high.")                "Cluster Communication port is a port" "numbers higher than your Redis port."   "Your Redis port number must be" "lower than 55535.");    Exit (1); }//Listen to the port number of this node if (Listentoport (SERVER.PORT+REDIS_CLUSTER_PORT_INCR, server.cfd,&server.cfd_count) = = REDI    S_err) {exit (1);        } else {int J; for (j = 0; J < Server.cfd_count; J + +) {//Correlation listener event handler if (Aecreatefileevent (Server.el, SERVER.CFD                                                                                                                            [j], ae_readable, clusteraccepthandler,null) = = Ae_err) Redispanic ("Unrecoverable Error creating Redis Clu    Ster "" File event. ");}}    Slots-A keys map is an ordered set, the base implementation is the Jump List, the score is the slot number, the return is the key value Server.cluster->slots_to_keys = Zslcreate (); Resetmanualfailover ();}
Well, at this point, and cluster-related initialization is over, and later I write some add delete nodes and recovery related articles, welcome to ask questions Oh ~ ~






Redis cluster Implementation (i) cluster architecture and initialization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.