Apache Cassandra is an open-source Distributed Key-value storage system. It was initially developed by Facebook to store extremely large data. Cassandra is not a database, it is a hybrid non-relational database, similar to Google's bigtable. This article mainly introduces Cassandra from the following five aspects: Cassandra's data model, installation and preparation of Cassandra, Cassandra used in common programming languages to store data, and Cassandra cluster construction.
Develop and deploy your next application on the IBM bluemix cloud platform.
Start your trial
Cassandra's Data Storage Structure
Cassandra's data model is a four-dimensional or five-Dimensional Model Based on the column family. It uses memtable and sstable for storage based on the data structure and features of Amazon dynamo and Google's bigtable. Before writing data to Cassandra, you must first record the log (commitlog) and then write the data to the memtable corresponding to column family. memtable is a memory structure that sorts data by key, when certain conditions are met, refresh the memtable data to the disk in batches and store it as sstable.
Figure 1. Cassandra Data Model diagram:
- The basic concepts of Cassandra's data model:
- 1. Cluster: Cassandra node instance, which can contain multiple keyspaces
2. keyspace: the container used to store columnfamily, which is equivalent to schema or database3. columnfamily in Relational Database: the container used to store column, similar to the concept of table in relational database. 4. supercolumn: It is a special column. Its value can enclose multiple column5. columns: the most basic unit of Cassandra. Composed of name, value, and timestamp
The following is an example of Data Model Analysis:
Figure 2. Data Model instance analysis
Back to Top
Obtain Cassandra node installation and configuration
# wget http://labs.renren.com/apache-mirror/cassandra/0.6.0/apache- cassandra-0.6.0-rc1-bin.tar.gz # tar -zxvf apache-cassandra-0.6.0-rc1-bin.tar.gz # mv apache-cassandra-0.6.0-rc1 cassandra # ls Cassandra
Cassandra directory description
Bin |
Store scripts related to Cassandra operations |
Conf |
Directory for storing configuration files |
Interface |
Cassandra's thrift interface definition file, which can be used to generate interface code for various programming languages |
Javadoc |
Javadoc of source code |
Lib |
Jar package required for Cassandra Runtime |
Prepare the data storage directory and log directory of the Cassandra Node
Modify preparation file storage-conf.xml:
Default content
<CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory> <DataFileDirectories> <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory> </DataFileDirectories>
Configured content
<CommitLogDirectory>/data3/db/lib/cassandra/commitlog</CommitLogDirectory> <DataFileDirectories> <DataFileDirectory>/data3/db/lib/cassandra/data</DataFileDirectory> </DataFileDirectories>
Modify the log preparation file log4j. properties:
Log4j. properties configuration
# Log Path # log4j. appender. r. file =/var/log/Cassandra/system. log # The configured Log Path: log4j. appender. r. file =/data3/DB/log/Cassandra/system. log
Create a directory for storing data and logs in files
# mkdir – p /data3/db/lib/cassandra # mkdir – p /data3/db/log/Cassandra
After preparation, start cassandra
# bin/Cassandra
Display Information
INFO 09:29:12,888 Starting up server gossip INFO 09:29:12,992 Binding thrift service to localhost/127.0.0.1:9160
When you see the Echo information of the two lines, Cassandra is successfully started.
Connect to Cassandra and add and obtain data
The bin directory of Cassandra already comes with the command line connection tool Cassandra-CLI, which can be used to connect to Cassandra and add and read data.
Connect to Cassandra and add and read data
# bin/cassandra-cli --host localhost --port 9160 Connected to: "Test Cluster" on localhost/9160 Welcome to cassandra CLI. Type ‘help‘ or ‘?‘ for help. Type ‘quit‘ or ‘exit‘ to quit. cassandra> cassandra> set Keyspace1.Standard2[‘studentA‘][‘age‘] = ‘18‘ Value inserted cassandra> get Keyspace1.Standard2[‘studentA‘] => (column=age, value=18, timestamp=1272357045192000) Returned 1 results
Stop the Cassandra service and find the PID of Cassandra: 16328
# ps -ef | grep cassandra # kill 16328
Cassandra configuration file storage-conf.xml related configuration overview list 1. storage-conf.xml node configuration description list
<! -- Name of the node displayed during cluster --> <clustername> test cluster </clustername> <! -- Whether the node is automatically added to the cluster when it is started. The default value is false --> <autobootstrap> false </autobootstrap> <! -- Cluster node configuration --> <seeds> <seed> 127.0.0.1 </seed> </seeds> <! -- Communication listening address between nodes --> <listenaddress> localhost </listenaddress> <! -- The Cassandra client listening address based on thrift. The cluster is set to 0.0.0.0, which indicates listening to all clients. The default value is localhost --> <thriftaddress> localhost </thriftaddress> <! -- Client Connection port --> <thriftport> 9160 </thriftport> <! -- Flushdatabuffersizeinmb: writes data on memtables to the disk. If the size exceeds the specified size (32 MB by default), data is written to the disk, after flushindexbuffersizeinmb exceeds the set duration (8 minutes by default, write Data from memtables to the disk --> <flushdatabuffersizeinmb> 32 </flushdatabuffersizeinmb> <flushindexbuffersizeinmb> 8 </flushindexbuffersizeinmb> <! -- Log synchronization mode between nodes. Default Value: Periodic. When you configure commitlogsyncperiodinms to start batch, the corresponding configuration is commitlogsyncbatchwindowinms --> <commitlogsync> periodic </commitlogsync> <! -- Log records are synchronized every 10 seconds by default --> <commitlogsyncperiodinms> 10000 </commitlogsyncperiodinms> <! -- <Commitlogsyncbatchwindowinms> 1 </commitlogsyncbatchwindowinms> -->
Back to Top
Common programming languages use Cassandra to store data
When using Cassandra, a third-party plug-in thrift is usually required to generate library files related to Cassandra, you can download this plug-in the http://incubator.apache.org/thrift and learn how to use it. Cassandra is used in five common programming languages: Java, PHP, Python, C #, and Ruby:
Java program uses cassandra
Add the libthrift-r917130.jar and apache-cassandra-0.6.0-rc1.jar to the compilation path of Eclipse.
Database Connection: Use the tTransport open method of the libthrift-r917130.jar to establish a connection with the Cassandra server (IP: 192.168.10.2 port: 9160.
Database Operation: Use Cassandra. Client to create a client instance. Call the insert method of the client instance to write data and get data through the get method.
Close database connection: Use tTransport's close method to disconnect from Cassandra server.
Listing 2. Java connects to Cassandra and writes and reads data.
Package COM. test. cassandra; | import Java. io. unsupportedencodingexception; import Org. apache. thrift. transport. tTransport; import Org. apache. thrift. transport. tsocket; import Org. apache. thrift. protocol. tprotocol; import Org. apache. thrift. protocol. tbinaryprotocol; import Org. apache. thrift. texception; import Org. apache. cassandra. thrift. cassandra; import Org. apache. cassandra. thrift. column; import Org. apache. cassandra. thrift. columnorsupercolumn; import Org. apache. cassandra. thrift. columnpath; import Org. apache. cassandra. thrift. consistencylevel; import Org. apache. cassandra. thrift. invalidrequestexception; import Org. apache. cassandra. thrift. notfoundexception; import Org. apache. cassandra. thrift. timedoutexception; import Org. apache. cassandra. thrift. unavailableexception;/** connect the Java client to Cassandra and perform read/write operations * @ author Jimmy * @ date 2010-04-10 */public class jcassandraclient {public static void main (string [] ARGs) throws invalidrequestexception, notfoundexception, unavailableexception, cause, texception, cause {// create a database connection tTransport TR = new tsocket ("192.168.10.2", 9160); tprotocol proto = new tbinaryprotocol (TR ); cassandra. client client = new Cassandra. client (PROTO); tr. open (); string keyspace = "keyspace1"; string cf = "standard2"; string key = "studenta"; // insert long timestamp = system. currenttimemillis (); columnpath Path = new columnpath (CF); Path. setcolumn ("age ". getbytes ("UTF-8"); client. insert (keyspace, key, path, "18 ". getbytes ("UTF-8"), timestamp, consistencylevel. one); Path. setcolumn ("height ". getbytes ("UTF-8"); client. insert (keyspace, key, path, "172 ". getbytes ("UTF-8"), timestamp, consistencylevel. one); // read the data path. setcolumn ("height ". getbytes ("UTF-8"); columnorsupercolumn cc = client. get (keyspace, key, path, consistencylevel. one); column C = cc. getcolumn (); string v = new string (C. value, "UTF-8"); // closes the database connection tr. close ();}}
Use Cassandra in PHP
To use Cassandra in PHP code, you need to use thrift to generate the required PHP file and use thrift -- Gen PHP interface/Cassandra. thrift generates the required PHP file. The generated PHP file provides the functions required to connect to Cassandra and read and write data.
Listing 3. php connects to Cassandra and writes and reads data.
<? PHP $ globals ['thrift _ root'] = '/usr/share/PHP/thrift'; require_once $ globals ['thrift _ root']. '/packages/Cassandra. PHP '; require_once $ globals ['thrift _ root']. '/packages/Cassandra/cassandra_types.php'; require_once $ globals ['thrift _ root']. '/transport/tsocket. PHP '; require_once $ globals ['thrift _ root']. '/protocol/tbinaryprotocol. PHP '; require_once $ globals ['thrift _ root']. '/transport/tframedtra Nsport. PHP '; require_once $ globals ['thrift _ root']. '/transport/tbufferedtransport. PHP '; try {// create a Cassandra connection $ socket = new tsocket ('2017. 192. 168.10.2 ', 9160); $ transport = new tbufferedtransport ($ socket, 1024,102 4); $ protocol = new tbinaryprotocolaccelerated ($ transport); $ client = new cassandraclient ($ Protocol ); $ transport-> open (); $ keyspace = 'keyspace1 '; $ keyuser = "studenta"; $ columnpath = new C Assandra_columnpath (); $ columnpath-> column_family = 'standard1'; $ columnpath-> super_column = NULL; $ columnpath-> column = 'age'; $ consistency_level = condition: zero; $ timestamp = Time (); $ value = "18"; // write data $ client-> insert ($ keyspace, $ keyuser, $ columnpath, $ value, $ timestamp, $ consistency_level); $ columnparent = new cassandra_columnparent (); $ columnparent-> column_family = "Stan Dard1 "; $ columnparent-> super_column = NULL; $ slicerange = new cassandra_slicerange (); $ slicerange-> Start =" "; $ slicerange-> finish = ""; $ predicate = new primary (); List () = $ Predicate-> column_names; $ Predicate-> slice_range = $ slicerange; $ consistency_level = primary: one; $ keyuser = studenta; // query data $ result = $ client-> get_slice ($ keyspace, $ keyuser, $ columnpare NT, $ predicate, $ consistency_level); // close the connection $ transport-> close ();} catch (texception $ Tx) {}?>
Python program uses cassandra
To use Cassandra in Python, thrift is required to generate a third-party Python library. The generation method is thrift -- Gen py interface/Cassandra. thrift, and then introduce the required Python library into the Python code. The generated Python Library provides the methods required to establish a connection with Cassandra and read and write data.
Listing 4. Python connects to Cassandra and writes and reads data.
From Thrift import thrift from thrift. transport Import tTransport from thrift. transport Import tsocket from thrift. protocol. tbinaryprotocol import tbinaryprotocolaccelerated from Cassandra import Cassandra from Cassandra. ttypes import * import time import pprint def main (): Socket = tsocket. tsocket ("192.168.10.2", 9160) Transport = tTransport. tbufferedtransport (socket) protocol = tbinaryprotocol. tbinaryprotocolaccelerated (Transport) Client = Cassandra. client (Protocol) pp = pprint. prettyprinter (indent = 2) keyspace = "keyspace1" column_path = columnpath (column_family = "standard1", column = "Age ") key = "studenta" value = "18" timestamp = time. time () Try: # Open the database connection transport. open () # write data to the client. insert (keyspace, key, column_path, value, timestamp, consistencylevel. zero) # query data column_parent = columnparent (column_family = "standard1") slice_range = slicerange (START = "", finish = "") predicate = slicepredicate (slice_range = slice_range) result = client. get_slice (keyspace, key, column_parent, predicate, consistencylevel. one) pp. pprint (result) Does T thrift. texception, TX: Print 'thrift: % s' % Tx. message finally: # disable the connection to transport. close () If _ name _ = '_ main _': Main ()
C # Use cassandra
To use Cassandra in C #, thrift.exe is required to generate a dynamic link library. /thrift.exe -- Gen CSHARP interface/Cassandra. thrift generates the required DLL file. The generated DLL provides the required classes and methods for establishing connections with Cassandra and reading and writing data. You can use the generated DLL in the programming environment.
Listing 5. C # connect to Cassandra, write and read data.
Namespace csharecassandra {using system; using system. collections. generic; using system. diagnostics; using Apache. cassandra; using thrift. protocol; using thrift. transport; Class cassandraclient {static void main (string [] ARGs) {// create a database connection tTransport transport = new tsocket ("192.168.10.2", 9160 ); tprotocol protocol = new tbinaryprotocol (transport); Cassandra. client client = new Cassandra. client (Protocol); transport. open (); system. text. encoding utf8encoding = system. text. encoding. utf8; long timestamp = datetime. now. millisecond; columnpath namecolumnpath = new columnpath () {column_family = "standard1", column = utf8encoding. getbytes ("Age")}; // write data to the client. insert ("keyspace1", "studenta", namecolumnpath, utf8encoding. getbytes ("18"), timestamp, consistencylevel. one); // read data columnorsupercolumn returnedcolumn = client. get ("keyspace1", "studenta", namecolumnpath, consistencylevel. one); console. writeline ("keyspace1/standard1: Age: {0}, value: {1}", utf8encoding. getstring (returnedcolumn. column. name), utf8encoding. getstring (returnedcolumn. column. value); // close the connection to transport. close ();}}}
Use Cassandra in ruby
To use Cassandra in Ruby, you need to install gem first. Installation command: Gem install cassandra
After the installation is complete, open the IRB of Ruby and start to use Cassandra.
Listing 6. Ruby connects to Cassandra and writes and reads data
> Require 'rubygems '> require 'Cassandra' # create a database connection> CDB = Cassandra. new ('keyspace1 ', "192.168.10.1: 9160",: retries => 3) # Write Data> CDB. insert (: standard1, 'studenta ', {'age' => '18'}) # Read data> CDB. get (: standard1,: studenta) # Close connection> CDB. disconnect
Back to Top
Build a Cassandra Cluster Environment
Cassandra clusters have no central nodes and each node has the same status. The gossip protocol is used between nodes to maintain the cluster status.
The following are two servers with Linux installed, with the Cassandra environment initially set up and ports and enabled:
Server Name |
Port |
IP address |
Servicea |
|
192.168.10.3 |
Serviceb |
|
192.168.10.2 |
Configure servicea, serviceb's storage-conf.xml File
<Seeds> <Seed>192.168.10.3</Seed> </Seeds> <ListenAddress>192.168.10.2</ListenAddress> <ThriftAddress>0.0.0.0</ThriftAddress>
Serviceb Configuration
<Seeds> <Seed>192.168.10.3</Seed> <Seed>192.168.10.2</Seed> </Seeds> <ListenAddress>192.168.10.2</ListenAddress> <ThriftAddress>0.0.0.0</ThriftAddress>
After preparation, start the Cassandra service on servicea and serviceb respectively.
Check whether the servicea and serviceb clusters are successful. Use the client commands that come with Cassandra.
bin/nodetool --host 192.168.10.2 ring
If the cluster is successful, the following similar information is returned:
Address Status Load Range Ring 106218876142754404016344802054916108445 192.168.10.2 Up 2.55 KB 31730917190839729088079827277059909532 |<--| 192.168.10.3 Up 3.26 KB 106218876142754404016344802054916108445 |-->|
Use Cassandra command line tool for cluster Testing
To connect to servicea from serviceb, run the following command:
cassandra-cli -host 192.168.10.3 -port 9160
Cluster Test 1
Write cluster data servicea connects to servicea: # Set keyspace1.standard2 ['studentaa'] ['a2a '] = 'a2a' serviceb connects to servicea: # Set keyspace1.standard2 ['studentba'] ['b2a '] = 'b2a' servicea connects to serviceb: # Set keyspace1.standard2 ['studentab'] ['a2b '] = 'a2b'
Obtain cluster data:
Servicea connects to servicea: # Get keyspace1.standard2 ['studentaa'], get keyspace1.standard2 ['studentba'], get keyspace1.standard2 ['studentab '] serviceb connects to servicea: # Get keyspace1.standard2 ['studentaa'], get keyspace1.standard2 ['studentba'], get role ['studentab'] servicea connects to serviceb: # Get keyspace1.standard2 ['studentaa'], get keyspace1.standard2 ['studentba'], get keyspace1.standard2 ['studentab']
List 8. Cluster test list 2
Servicea stops the Cassandra service, servicea connects to serviceb, and writes data
# set Keyspace1.Standard2[‘studentAR‘][‘A2R‘] = ‘a2R‘
Start servicea and link to servicea itself to read the data written in serviceb just now
# bin/cassandra-cli -host 192.168.10.3 -port 9160 # get Keyspace1.Standard2[‘studentAR‘]
Back to Top
Summary
The preceding section describes Cassandra's data model, node installation and configuration, and the use of Cassandra and Cassandra clusters and tests in common programming languages. Cassandra is a high-performance P2P decentralized non-relational database that supports distributed read/write operations. When the system is running, you can add or delete drop fields at will. This is an ideal database for SNS applications.
Distributed Key-value Storage System: Cassandra entry