Cassandra node Management

Last Update:2018-12-05 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

(1) Start a node

Adding a new node is called "Pilot ".
To bootstrap a node, open AutoBootstrap in the profile storage-conf.xml and start it.
If you explicitly specify an InitialToken in the profile storage-conf.xml, the new node directs to the specified location on the ring. Otherwise, it will select a Token to migrate half of the data from the node with the most disk space, that is, no other node will guide to its range.

Important:

1. You should wait for a long enough time to let all nodes in the cluster know other nodes that are being guided through the Gossip message before starting the guidance of another node. When the new node "auto-uploads" the log, it is safe, probably two minutes after the start. (90 seconds to ensure that it obtains precise load information of other nodes; 30 seconds, wait for another node to start sending it .)
2. Related to. When Token is automatically selected, a node can only guide N nodes at each time, where N is the number of nodes in the current cluster. If you need to double the number of cluster nodes, you must first wait for N nodes until the completion of your cluster size is 2N before guiding more nodes. Therefore, if you are currently a five-node cluster, you want to add seven nodes to guide the five nodes and make the five nodes start completely before starting the other two nodes.
3. as a security measure, Cassandra does not automatically delete the data that is allocated to the new node on the source node. Run the cleanup method of NodeProbe on the source node (sharing neighboring nodes in the same Token range). When you are satisfied with the startup of the new node and the new node is working properly. If you do not do this, the old data will still be included, regardless of whether the data is correct or not, if you add a new node and start it again, it will be discarded.
4. When guiding a new node, the existing node must divide the Key range before the replication starts. This may take a while, so be patient.
5. During the boot, a node will discard the Thrift port and will not accept access from NodeProbe.
6. During guidance, it may take several hours if a large amount of data is involved. See how Streaming monitors progress streams.

If your EndpointSnitch configuration is correct, Cassandra is smart enough to transmit data from the nearest source node (data. Therefore, the new node does not need to be in the same data center as the basic replica node (the node's data is halved to the newly added node, as long as the other copy is in the same data center as the new node.

When the Steam parameter is used, NodeProbe can be used to monitor the boot progress.

During the boot, NodeProbe may report that the new node does not receive or send any Streams, because the sending node first needs to copy local data to the newly added node, the above information can be monitored from the node where the data is being sent through "AntiCompacting... antiCompacted "log information.

(2) join a node
You only need to select the appropriate InitialToken and add it to the existing cluster.
If you explicitly specify an InitialToken in the profile storage-conf.xml, the new node directs to the specified location on the ring. Otherwise, it selects a Token to migrate half of the data from the node with the most disk space.
The qualified data will automatically flow from the existing node to the newly added node.

(3) Delete A node
You can use the decommission operation of NodeProbe to exit an active node from the current cluster, or use the removetoken operation of NodeProbe to delete a dead node. This migrates the Token range of the original node to other nodes, and is responsible for appropriate data replication. If you use the decommission operation, the data will be Stream from the retired node to another node. If the removetoken operation is used, the data will be Stream from other copies to other nodes.
No data is automatically deleted from the decommission node. Therefore, if you want to put the retired node into service again, you should manually delete the legacy data on a different Token Ring.

(4) mobile nodes
Nodeprobe move: Move the target node to a given token. In essence, migration is a more convenient way to retire + Guide

(5) restore a node

If a node is recovered after it goes down, the general repair mechanism can process any inconsistent data. Remember, if the downtime of a node exceeds the configured gcgraceseconds (default: 10 days), it may permanently lose the delete operation. Unless your application does not perform the delete operation, you should delete the data directory of the node, restart it, and execute removetoken to delete the old data on the ghe ring.

If a node goes down completely, you can do two things:

1. (recommended) Get a slave node and use a new IP address to set autobootstrap in the configuration file storage-conf.xml to true. This configuration enables the slave node to automatically find a suitable location in the Cassandra cluster. Then start the node. During the startup, the slave node will not accept any read requests and knows that the startup is complete. After the node is started, run the removetoken operation of nodetool once, and apply the token of the dead node to the machine. In addition, execute the cleanup operation of nodetool once on every other node in the cluster.

2. by running the nodetool ring command on any active node, you can get the token of the Down node. In addition to a special scenario, that is, a node is down during this period. In this case, you can obtain the token of the Down node from the system table of the active node.

3. (replaceable) Use a slave node, use the same IP address and Token as the downtime node, and perform the repair operation of nodetool. Until the repair operation is completed, the data read by the client from this node may be old data. Of course, we can avoid this phenomenon at a higher read consistency level.

The reason why you run the nodetool cleanup operation on each other node is that you are prompted to transfer data (Hinted Handoff writes) When you delete the down node recorded by this node ).

(6) Load Balancing
NodeProbe loadbalance: it is basically a more convenient method for retiring and guiding, instead of simply telling the target node to move to the ring, it will select its own location and start the node at the same time.

The Move and load balancing operations can be monitored through the streams parameter of NodeProbe.

Reference: http://wiki.apache.org/cassandra/Operations

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More