Implementation steps of Cassandra cluster data Initialization

Source: Internet
Author: User

When a Cassandra cluster needs to be invested, initialization data is usually used, for example, all the blog data on a blog website, all the webpage information on a data analysis website, and all the product information on an e-commerce website. These initialization data is often very large, and it is not applicable to directly using Thrift API to talk about Cassandra's client) Direct import. Facebook used the BinaryMemTable method to import large amounts of data into Cassandra using Binary Memtable ).

We installed Hadoop and Cassandra in the cluster. Assume that the data we need to initialize can be imported into a flat file txt file) and then uploaded to HDFS. Each machine is both a Cassandra node and a Hadoop Slave machine, and each Slave machine has 1 Reduce.

In order to import a large amount of data into the cluster, I personally think there are two solutions.

Use BinaryMemTable

1. Run mapcecejob

Partition the imported data by Key in Mapper.

In CER, perform the following operations in the configure phase:

1. initialize Cassandra's message service and Gossip service.

2. Create the Cassandra file directory.

3. Disable Cassandra's compression function.

4. Wait for the delay time of a Range.

In reduce, perform the following operations:

1. Create ColumnFamily corresponding to each key

2. Create a RowMutation message

3. Send messages to all nodes in the cluster that need to obtain the data.

In CER, perform the following operations in the close phase:

1. Wait until all messages in the message service are sent.

2. Complete Cassandra's message service and Gossip service.

2. Start the Cassandra Cluster

After Cassandra is started, manually perform the compression operation to merge a large number of previously generated SSTable files.

Generate an SSTable file by yourself

1. Start the Cassandra Cluster

After the cluster is started, make sure that the ring of the entire cluster has been established.

2. Run mapcecejob

In Mapper, perform the following operations in the configure phase:

1. randomly connect to a Cassandra Machine

2. Obtain the token map of the Cassandra Cluster

In Mapper, the map stage performs the following operations:

1. Divide data based on the Node Address corresponding to the data key

Secondary sorting

1. Use the data corresponding to the node address and key as a Group

2. Data in A Group is sorted in ascending order of keys.

In CER, perform the following operations in the configure phase:

1. Create an SStableWriter instance for each ColumnFamily.

In reduce, perform the following operations:

1. Create ColumnFamily corresponding to each key

2. Call the SStableWriter. append () method to write data to the specified SStable file.

In CER, perform the following operations in the close phase:

1. Call the SStableWriter. closeAndOpenReader () method of each ColumnFamily.

2. SCP the generated SSTable file to the data directory of Cassandra.

3. Restart the Cassandra cluster.

Original article title: Conception of Cassandra cluster data initialization Solution

Link: http://www.cnblogs.com/gpcuster/archive/2010/07/03/1770452.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.