How to install and deploy Cassandra distributed NoSQL Database

Source: Internet
Author: User
Tags cassandra

How to install and deploy Cassandra distributed NoSQL Database

Apache Cassandra is an open-source Distributed Key-Value storage system. It was initially developed by Facebook to store particularly large data. Cassandra is suitable for real-time transaction processing and provision of structured data. Cassandra's data model is a four-dimensional or five-Dimensional Model Based on the Column Family. It uses Memtable and SSTable for storage based on the data structure and features of Amazon Dynamo and Google's BigTable. Before writing data to Cassandra, you must first record the log (CommitLog) and then write the data to the Memtable corresponding to Column Family. Memtable is a memory structure that sorts data by key, when certain conditions are met, refresh the Memtable data to the disk in batches and store it as SSTable. This article describes how to install and configure Cassandra.

Note: by default, you have installed the JDK environment.

I. installation and configuration of Cassandra nodes

1. Download Cassandra

Wget Path =/cassandra/2.1.5/apache-cassandra-2.1.5-bin.tar.gz

2. decompress the file

Tar-zxvf apache-cassandra-2.1.5-bin.tar.gz

Music apache-cassandra-2.1.5-rc1 cassandra

3. Cassandra directory description

Bin stores scripts related to Cassandra operations

Conf directory for storing configuration files

Interface Cassandra's Thrift interface definition file, which can be used to generate interface code for various programming languages

Javadoc source code

Jar package required for lib Cassandra Runtime

4. Prepare the data storage directory for the Cassandra Node

# Modify preparation file storage-conf.xml

# Cd conf

<CommitLogDirectory>/data/db/lib/cassandra/commitlog </CommitLogDirectory>


<DataFileDirectory>/data/db/lib/cassandra/data </DataFileDirectory>


5. Modify the log preparation file log4j. properties

# Log Path

# Log4j. appender. R. File =/var/log/cassandra/system. log

# Log Path after Configuration:

Log4j. appender. R. File =/data/db/log/cassandra/system. log

6. Create a directory for storing data and logs.

# Mkdir-p/data/db/lib/cassandra

# Mkdir-p/data/db/log/Cassandra

7. After preparation, start Cassandra

# Bin/Cassandra

INFO 09:29:12, 888 Starting up server gossip

INFO 09:29:12, 992 Binding thrift service to localhost/ 9160

# When you see the Echo information of these two lines, it indicates that Cassandra has been started successfully.

8. Connect to Cassandra and add and obtain data

# Bin/cassandra-cli -- host localhost -- port 9160

# Cassandra>

# Cassandra> set Keyspace1.Standard2 ['studenta '] ['age'] = '18'

# Value inserted

# Cassandra> get Keyspace1.Standard2 ['studenta ']

#=> (Column = age, value = 18, timestamp = 1272357045192000)

# Returned 1 results

9. Stop the Cassandra Service

# Ps-ef | grep cassandra

# Kill-9 16250

Ii. Supplement

Cassandra preparation documents storage-conf.xml related preparation instructions

# Storage-conf.xml

<! -- The node name displayed when the cluster is running -->

<ClusterName> Test Cluster </ClusterName>

<! -- Whether the node is automatically added to the cluster when it is started. The default value is false. -->

<AutoBootstrap> false </AutoBootstrap>

<! -- Cluster node configuration -->

<Seeds> <Seed> </Seed> </Seeds>

<! -- Communication listening address between nodes -->

<ListenAddress> localhost </ListenAddress>

<! -- The cassandra client listening address based on Thrift. The cluster is set to, which indicates listening to all clients. The default value is localhost. -->

<ThriftAddress> localhost </ThriftAddress>

<! -- Client Connection port -->

<ThriftPort> 9160 </ThriftPort>

<! -- FlushDataBufferSizeInMB: writes data on memtables to the Disk. If the size exceeds the specified size (32 MB by default), data is written to the Disk,

After FlushIndexBufferSizeInMB exceeds the set duration (8 minutes by default), write the data from memtables to the disk. -->

<FlushDataBufferSizeInMB> 32 </FlushDataBufferSizeInMB>

<FlushIndexBufferSizeInMB> 8 </FlushIndexBufferSizeInMB>

<! -- Log synchronization mode between nodes. Default Value: periodic. When batch is started when CommitLogSyncPeriodInMS is configured, CommitLogSyncBatchWindowInMS -->

<CommitLogSync> periodic </CommitLogSync>

<! -- Log records are synchronized every 10 seconds by default. -->

<CommitLogSyncPeriodInMS> 10000 </CommitLogSyncPeriodInMS>

<! -- <CommitLogSyncBatchWindowInMS> 1 </CommitLogSyncBatchWindowInMS> -->

Quick Start to NoSQL databases. For details about how to download high-definition PDF, see

Basic knowledge about NoSQL Databases

Key to enterprise application of NoSQL

This article permanently updates the link address:

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.