Distributed Key-value Storage System: Cassandra entry

Source: Internet
Author: User
Tags cassandra pprint
Apache Cassandra is an open-source Distributed Key-value storage system. It was initially developed by Facebook to store extremely large data. Cassandra is not a database, it is a hybrid non-relational database, similar to Google's bigtable. This article mainly introduces Cassandra from the following five aspects: Cassandra's data model, installation and preparation of Cassandra, Cassandra used in common programming languages to store data, and Cassandra cluster construction.

Develop and deploy your next application on the IBM bluemix cloud platform.

Start your trial

Cassandra's Data Storage Structure

Cassandra's data model is a four-dimensional or five-Dimensional Model Based on the column family. It uses memtable and sstable for storage based on the data structure and features of Amazon dynamo and Google's bigtable. Before writing data to Cassandra, you must first record the log (commitlog) and then write the data to the memtable corresponding to column family. memtable is a memory structure that sorts data by key, when certain conditions are met, refresh the memtable data to the disk in batches and store it as sstable.

Figure 1. Cassandra Data Model diagram:

  1. The basic concepts of Cassandra's data model:
  2. 1. Cluster: Cassandra node instance, which can contain multiple keyspaces
    2. keyspace: the container used to store columnfamily, which is equivalent to schema or database3. columnfamily in Relational Database: the container used to store column, similar to the concept of table in relational database. 4. supercolumn: It is a special column. Its value can enclose multiple column5. columns: the most basic unit of Cassandra. Composed of name, value, and timestamp

The following is an example of Data Model Analysis:

Figure 2. Data Model instance analysis

 

Back to Top

Obtain Cassandra node installation and configuration
 # wget  http://labs.renren.com/apache-mirror/cassandra/0.6.0/apache-  cassandra-0.6.0-rc1-bin.tar.gz  # tar -zxvf apache-cassandra-0.6.0-rc1-bin.tar.gz  # mv  apache-cassandra-0.6.0-rc1 cassandra  # ls Cassandra
Cassandra directory description
Bin Store scripts related to Cassandra operations
Conf Directory for storing configuration files
Interface Cassandra's thrift interface definition file, which can be used to generate interface code for various programming languages
Javadoc Javadoc of source code
Lib Jar package required for Cassandra Runtime
Prepare the data storage directory and log directory of the Cassandra Node

Modify preparation file storage-conf.xml:

Default content
 <CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory>  <DataFileDirectories>  <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>  </DataFileDirectories>
Configured content
 <CommitLogDirectory>/data3/db/lib/cassandra/commitlog</CommitLogDirectory>  <DataFileDirectories>  <DataFileDirectory>/data3/db/lib/cassandra/data</DataFileDirectory>  </DataFileDirectories>

Modify the log preparation file log4j. properties:

Log4j. properties configuration
# Log Path # log4j. appender. r. file =/var/log/Cassandra/system. log # The configured Log Path: log4j. appender. r. file =/data3/DB/log/Cassandra/system. log

Create a directory for storing data and logs in files

 # mkdir – p /data3/db/lib/cassandra  # mkdir – p /data3/db/log/Cassandra
After preparation, start cassandra
 # bin/Cassandra

Display Information

 INFO 09:29:12,888 Starting up server gossip  INFO 09:29:12,992 Binding thrift service to localhost/127.0.0.1:9160

When you see the Echo information of the two lines, Cassandra is successfully started.

Connect to Cassandra and add and obtain data

The bin directory of Cassandra already comes with the command line connection tool Cassandra-CLI, which can be used to connect to Cassandra and add and read data.

Connect to Cassandra and add and read data
 # bin/cassandra-cli --host localhost --port 9160  Connected to: "Test Cluster" on localhost/9160  Welcome to cassandra CLI.  Type ‘help‘ or ‘?‘ for help. Type ‘quit‘ or ‘exit‘ to quit.  cassandra>  cassandra> set Keyspace1.Standard2[‘studentA‘][‘age‘] = ‘18‘ Value inserted  cassandra> get Keyspace1.Standard2[‘studentA‘]  => (column=age, value=18, timestamp=1272357045192000)  Returned 1 results
Stop the Cassandra service and find the PID of Cassandra: 16328
 # ps -ef | grep cassandra  # kill 16328
Cassandra configuration file storage-conf.xml related configuration overview list 1. storage-conf.xml node configuration description list
<! -- Name of the node displayed during cluster --> <clustername> test cluster </clustername> <! -- Whether the node is automatically added to the cluster when it is started. The default value is false --> <autobootstrap> false </autobootstrap> <! -- Cluster node configuration --> <seeds> <seed> 127.0.0.1 </seed> </seeds> <! -- Communication listening address between nodes --> <listenaddress> localhost </listenaddress> <! -- The Cassandra client listening address based on thrift. The cluster is set to 0.0.0.0, which indicates listening to all clients. The default value is localhost --> <thriftaddress> localhost </thriftaddress> <! -- Client Connection port --> <thriftport> 9160 </thriftport> <! -- Flushdatabuffersizeinmb: writes data on memtables to the disk. If the size exceeds the specified size (32 MB by default), data is written to the disk, after flushindexbuffersizeinmb exceeds the set duration (8 minutes by default, write Data from memtables to the disk --> <flushdatabuffersizeinmb> 32 </flushdatabuffersizeinmb> <flushindexbuffersizeinmb> 8 </flushindexbuffersizeinmb> <! -- Log synchronization mode between nodes. Default Value: Periodic. When you configure commitlogsyncperiodinms to start batch, the corresponding configuration is commitlogsyncbatchwindowinms --> <commitlogsync> periodic </commitlogsync> <! -- Log records are synchronized every 10 seconds by default --> <commitlogsyncperiodinms> 10000 </commitlogsyncperiodinms> <! -- <Commitlogsyncbatchwindowinms> 1 </commitlogsyncbatchwindowinms> -->
 

Back to Top

Common programming languages use Cassandra to store data

When using Cassandra, a third-party plug-in thrift is usually required to generate library files related to Cassandra, you can download this plug-in the http://incubator.apache.org/thrift and learn how to use it. Cassandra is used in five common programming languages: Java, PHP, Python, C #, and Ruby:

Java program uses cassandra

Add the libthrift-r917130.jar and apache-cassandra-0.6.0-rc1.jar to the compilation path of Eclipse.

Database Connection: Use the tTransport open method of the libthrift-r917130.jar to establish a connection with the Cassandra server (IP: 192.168.10.2 port: 9160.

Database Operation: Use Cassandra. Client to create a client instance. Call the insert method of the client instance to write data and get data through the get method.

Close database connection: Use tTransport's close method to disconnect from Cassandra server.

Listing 2. Java connects to Cassandra and writes and reads data.
Package COM. test. cassandra; | import Java. io. unsupportedencodingexception; import Org. apache. thrift. transport. tTransport; import Org. apache. thrift. transport. tsocket; import Org. apache. thrift. protocol. tprotocol; import Org. apache. thrift. protocol. tbinaryprotocol; import Org. apache. thrift. texception; import Org. apache. cassandra. thrift. cassandra; import Org. apache. cassandra. thrift. column; import Org. apache. cassandra. thrift. columnorsupercolumn; import Org. apache. cassandra. thrift. columnpath; import Org. apache. cassandra. thrift. consistencylevel; import Org. apache. cassandra. thrift. invalidrequestexception; import Org. apache. cassandra. thrift. notfoundexception; import Org. apache. cassandra. thrift. timedoutexception; import Org. apache. cassandra. thrift. unavailableexception;/** connect the Java client to Cassandra and perform read/write operations * @ author Jimmy * @ date 2010-04-10 */public class jcassandraclient {public static void main (string [] ARGs) throws invalidrequestexception, notfoundexception, unavailableexception, cause, texception, cause {// create a database connection tTransport TR = new tsocket ("192.168.10.2", 9160); tprotocol proto = new tbinaryprotocol (TR ); cassandra. client client = new Cassandra. client (PROTO); tr. open (); string keyspace = "keyspace1"; string cf = "standard2"; string key = "studenta"; // insert long timestamp = system. currenttimemillis (); columnpath Path = new columnpath (CF); Path. setcolumn ("age ". getbytes ("UTF-8"); client. insert (keyspace, key, path, "18 ". getbytes ("UTF-8"), timestamp, consistencylevel. one); Path. setcolumn ("height ". getbytes ("UTF-8"); client. insert (keyspace, key, path, "172 ". getbytes ("UTF-8"), timestamp, consistencylevel. one); // read the data path. setcolumn ("height ". getbytes ("UTF-8"); columnorsupercolumn cc = client. get (keyspace, key, path, consistencylevel. one); column C = cc. getcolumn (); string v = new string (C. value, "UTF-8"); // closes the database connection tr. close ();}}
Use Cassandra in PHP

To use Cassandra in PHP code, you need to use thrift to generate the required PHP file and use thrift -- Gen PHP interface/Cassandra. thrift generates the required PHP file. The generated PHP file provides the functions required to connect to Cassandra and read and write data.

Listing 3. php connects to Cassandra and writes and reads data.
<? PHP $ globals ['thrift _ root'] = '/usr/share/PHP/thrift'; require_once $ globals ['thrift _ root']. '/packages/Cassandra. PHP '; require_once $ globals ['thrift _ root']. '/packages/Cassandra/cassandra_types.php'; require_once $ globals ['thrift _ root']. '/transport/tsocket. PHP '; require_once $ globals ['thrift _ root']. '/protocol/tbinaryprotocol. PHP '; require_once $ globals ['thrift _ root']. '/transport/tframedtra Nsport. PHP '; require_once $ globals ['thrift _ root']. '/transport/tbufferedtransport. PHP '; try {// create a Cassandra connection $ socket = new tsocket ('2017. 192. 168.10.2 ', 9160); $ transport = new tbufferedtransport ($ socket, 1024,102 4); $ protocol = new tbinaryprotocolaccelerated ($ transport); $ client = new cassandraclient ($ Protocol ); $ transport-> open (); $ keyspace = 'keyspace1 '; $ keyuser = "studenta"; $ columnpath = new C Assandra_columnpath (); $ columnpath-> column_family = 'standard1'; $ columnpath-> super_column = NULL; $ columnpath-> column = 'age'; $ consistency_level = condition: zero; $ timestamp = Time (); $ value = "18"; // write data $ client-> insert ($ keyspace, $ keyuser, $ columnpath, $ value, $ timestamp, $ consistency_level); $ columnparent = new cassandra_columnparent (); $ columnparent-> column_family = "Stan Dard1 "; $ columnparent-> super_column = NULL; $ slicerange = new cassandra_slicerange (); $ slicerange-> Start =" "; $ slicerange-> finish = ""; $ predicate = new primary (); List () = $ Predicate-> column_names; $ Predicate-> slice_range = $ slicerange; $ consistency_level = primary: one; $ keyuser = studenta; // query data $ result = $ client-> get_slice ($ keyspace, $ keyuser, $ columnpare NT, $ predicate, $ consistency_level); // close the connection $ transport-> close ();} catch (texception $ Tx) {}?>
Python program uses cassandra

To use Cassandra in Python, thrift is required to generate a third-party Python library. The generation method is thrift -- Gen py interface/Cassandra. thrift, and then introduce the required Python library into the Python code. The generated Python Library provides the methods required to establish a connection with Cassandra and read and write data.

Listing 4. Python connects to Cassandra and writes and reads data.
From Thrift import thrift from thrift. transport Import tTransport from thrift. transport Import tsocket from thrift. protocol. tbinaryprotocol import tbinaryprotocolaccelerated from Cassandra import Cassandra from Cassandra. ttypes import * import time import pprint def main (): Socket = tsocket. tsocket ("192.168.10.2", 9160) Transport = tTransport. tbufferedtransport (socket) protocol = tbinaryprotocol. tbinaryprotocolaccelerated (Transport) Client = Cassandra. client (Protocol) pp = pprint. prettyprinter (indent = 2) keyspace = "keyspace1" column_path = columnpath (column_family = "standard1", column = "Age ") key = "studenta" value = "18" timestamp = time. time () Try: # Open the database connection transport. open () # write data to the client. insert (keyspace, key, column_path, value, timestamp, consistencylevel. zero) # query data column_parent = columnparent (column_family = "standard1") slice_range = slicerange (START = "", finish = "") predicate = slicepredicate (slice_range = slice_range) result = client. get_slice (keyspace, key, column_parent, predicate, consistencylevel. one) pp. pprint (result) Does T thrift. texception, TX: Print 'thrift: % s' % Tx. message finally: # disable the connection to transport. close () If _ name _ = '_ main _': Main ()
C # Use cassandra

To use Cassandra in C #, thrift.exe is required to generate a dynamic link library. /thrift.exe -- Gen CSHARP interface/Cassandra. thrift generates the required DLL file. The generated DLL provides the required classes and methods for establishing connections with Cassandra and reading and writing data. You can use the generated DLL in the programming environment.

Listing 5. C # connect to Cassandra, write and read data.
Namespace csharecassandra {using system; using system. collections. generic; using system. diagnostics; using Apache. cassandra; using thrift. protocol; using thrift. transport; Class cassandraclient {static void main (string [] ARGs) {// create a database connection tTransport transport = new tsocket ("192.168.10.2", 9160 ); tprotocol protocol = new tbinaryprotocol (transport); Cassandra. client client = new Cassandra. client (Protocol); transport. open (); system. text. encoding utf8encoding = system. text. encoding. utf8; long timestamp = datetime. now. millisecond; columnpath namecolumnpath = new columnpath () {column_family = "standard1", column = utf8encoding. getbytes ("Age")}; // write data to the client. insert ("keyspace1", "studenta", namecolumnpath, utf8encoding. getbytes ("18"), timestamp, consistencylevel. one); // read data columnorsupercolumn returnedcolumn = client. get ("keyspace1", "studenta", namecolumnpath, consistencylevel. one); console. writeline ("keyspace1/standard1: Age: {0}, value: {1}", utf8encoding. getstring (returnedcolumn. column. name), utf8encoding. getstring (returnedcolumn. column. value); // close the connection to transport. close ();}}}
Use Cassandra in ruby

To use Cassandra in Ruby, you need to install gem first. Installation command: Gem install cassandra

After the installation is complete, open the IRB of Ruby and start to use Cassandra.

Listing 6. Ruby connects to Cassandra and writes and reads data
> Require 'rubygems '> require 'Cassandra' # create a database connection> CDB = Cassandra. new ('keyspace1 ', "192.168.10.1: 9160",: retries => 3) # Write Data> CDB. insert (: standard1, 'studenta ', {'age' => '18'}) # Read data> CDB. get (: standard1,: studenta) # Close connection> CDB. disconnect
 

Back to Top

Build a Cassandra Cluster Environment

Cassandra clusters have no central nodes and each node has the same status. The gossip protocol is used between nodes to maintain the cluster status.

The following are two servers with Linux installed, with the Cassandra environment initially set up and ports and enabled:
Server Name Port IP address
Servicea 192.168.10.3
Serviceb 192.168.10.2
Configure servicea, serviceb's storage-conf.xml File
 <Seeds>  <Seed>192.168.10.3</Seed>  </Seeds>  <ListenAddress>192.168.10.2</ListenAddress>  <ThriftAddress>0.0.0.0</ThriftAddress>
Serviceb Configuration
 <Seeds>  <Seed>192.168.10.3</Seed>  <Seed>192.168.10.2</Seed>  </Seeds>  <ListenAddress>192.168.10.2</ListenAddress>  <ThriftAddress>0.0.0.0</ThriftAddress>

After preparation, start the Cassandra service on servicea and serviceb respectively.

Check whether the servicea and serviceb clusters are successful. Use the client commands that come with Cassandra.

 bin/nodetool --host 192.168.10.2 ring
If the cluster is successful, the following similar information is returned:
 Address Status Load Range Ring                                        106218876142754404016344802054916108445  192.168.10.2  Up         2.55 KB       31730917190839729088079827277059909532     |<--|  192.168.10.3  Up         3.26 KB       106218876142754404016344802054916108445    |-->|
Use Cassandra command line tool for cluster Testing

To connect to servicea from serviceb, run the following command:

 cassandra-cli -host 192.168.10.3 -port 9160
Cluster Test 1
Write cluster data servicea connects to servicea: # Set keyspace1.standard2 ['studentaa'] ['a2a '] = 'a2a' serviceb connects to servicea: # Set keyspace1.standard2 ['studentba'] ['b2a '] = 'b2a' servicea connects to serviceb: # Set keyspace1.standard2 ['studentab'] ['a2b '] = 'a2b'

Obtain cluster data:

Servicea connects to servicea: # Get keyspace1.standard2 ['studentaa'], get keyspace1.standard2 ['studentba'], get keyspace1.standard2 ['studentab '] serviceb connects to servicea: # Get keyspace1.standard2 ['studentaa'], get keyspace1.standard2 ['studentba'], get role ['studentab'] servicea connects to serviceb: # Get keyspace1.standard2 ['studentaa'], get keyspace1.standard2 ['studentba'], get keyspace1.standard2 ['studentab']

List 8. Cluster test list 2

Servicea stops the Cassandra service, servicea connects to serviceb, and writes data

 # set Keyspace1.Standard2[‘studentAR‘][‘A2R‘] = ‘a2R‘

Start servicea and link to servicea itself to read the data written in serviceb just now

 # bin/cassandra-cli -host 192.168.10.3 -port 9160  # get Keyspace1.Standard2[‘studentAR‘]
 

Back to Top

Summary

The preceding section describes Cassandra's data model, node installation and configuration, and the use of Cassandra and Cassandra clusters and tests in common programming languages. Cassandra is a high-performance P2P decentralized non-relational database that supports distributed read/write operations. When the system is running, you can add or delete drop fields at will. This is an ideal database for SNS applications.

Distributed Key-value Storage System: Cassandra entry

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.