Cassandra Study Notes 1

Source: Internet
Author: User
Tags cassandra

Recently, I tried to build a cloud storage platform. After constant comparison, I decided to use Cassandra as the underlying database. Here we record the learning process of Cassandra.

Cassandra is a hybrid non-relational database with distributed, column-based structure and high scalability. Cassandra is not a database, but a distributed network service composed of a bunch of database nodes. A write operation on Cassandra will be copied to other nodes, read operations on Cassandra are also routed to a node for reading. The Cassandra cluster has no central node, and each node has the same status.

Cassandra's system architecture is a fully P2P Architecture Based on DHT (Distributed Hash table). Compared with traditional sharding-based database clusters, Cassandra can join or delete nodes almost seamlessly, it is very suitable for application scenarios with fast node scale changes. Cassandra's data is written into multiple nodes to ensure data reliability. Cassandra is flexible in terms of consistency, availability, and the compromise between the network partition capacity (CAP, when reading a copy, you can specify that all copies must be consistent (high consistency), read a copy (high availability), or confirm that the majority of copies are consistent (compromise) by election ). In this way, Cassandra can be applied to scenarios with nodes, network failures, and multiple data centers.

First, we will introduce how to start and configure Cassandra.

1. Download cassandra

Http://cassandra.apache.org/the latest version of Cassandra 1.0.8 can be obtained from here, download tar compressed package. Cassandra is written in Java, so there is no platform restriction, as long as there is Java. I installed Cassandra in windows, but it is similar in Linux.

2. Configure Environment Variables

Including Java and Cassandra environment variables. The configuration of Java environment variables is described a lot. Here we only talk about Cassandra's environment variable configuration.

Right-click my computer-> properties-> advanced-> environment variable, and create a new one in the system variable. Enter cassandra_home in name, and enter the Cassandra directory in path. If my Cassandra is decompressed to the root directory of the D Drive, enter D: \ apache-Cassandra-1.0.8 in path. After configuration, enter
Echo % cassandra_home % to check whether the configuration is successful.

3. modify the configuration file

The Cassandra configuration file is located in the conf folder of the Cassandra directory, and we need to configure Cassandra. yaml and log4j-server.properties. If you cannot find Cassandra. yaml in conf but an XML file, download the latest version of Cassandra.

The log4j-server.properties needs to configure the log4j. appender. R. File Attribute, which is the path to the log file and I configure it to D:/Apache-Cassandra-1.0.8/My/log. Note that the path here is the "file" path, not the "directory" path.

Cassandra. yaml requires many configurations.

Cluster_name: This is the cluster name. You can use the default settings. I will change it to firstcluster. In a cluster, the attributes in the configuration files of all nodes should be set to the same so that they can be found to each other.

Initial_token: It is left blank by default. We can set it to 0 or another number. leave it blank first.

Data_file_directories: directory path for storing database data files (not the file path). I configure it to D:/Apache-Cassandra-1.0.8/My/data.

Commitlog_directory: directory for storing submitted logs.

Saved_caches_directory: cache folder path.

There are also some attributes that do not need to be configured at present. They are configured later in the cluster.

In addition, it is recommended that forward slashes be used for all the above paths, because the backslash will be parsed as a special symbol in the program.

4. Start cassandra

Run the command line to enter the bin folder in the Cassandra directory, start Cassandra. BAT (start Cassandra in Linux), and start Cassandra to run. This command line window cannot be closed, otherwise Cassandra is also closed.

The prompt that mx4j cannot be found, can be ignored, or go to the next mx4j-tools.jar and copy it in the Lib folder under the Cassandra directory.

Errors that cannot be skipped are usually caused by incorrect environment variable configuration or incorrect configuration file. Follow the prompts to resolve them.

V. Test cassandra

Run the command line to enter the bin folder in the Cassandra directory and start the cassandra-cli.bat. First try to connect to the Cassandra service on the local machine:

Connect localhost/9160; (there must be no fewer semicolons)

If yes, you will be prompted to successfully connect to the firstcluster. Then create a keyspace:

Create keyspace demo;

This step may fail. The usual failure prompt is: cannot locate Cassandra. yaml. If this prompt is displayed, copy Cassandra. yaml In the conf folder to the bin folder.

After the creation is successful, enter the demo keyspace:

Use demo;

Create a column family:

Create column family student;

Insert a record to it:

Set student [utf8 ('1')] [utf8 ('id')] = utf8 ('123 ');

Set student [utf8 ('1')] [utf8 ('name')] = utf8 ('fykhlp ');

Query records:

Get student [utf8 ('1')]; or list student;

Vi. Cassandra Data Model


1. keyspace: equivalent to a database in a relational database. The cluster mentioned above is equivalent to a service name.

2. columnfamily: equivalent to a table in a relational database. It has only two columns, key and columns. However, columns can contain many records, such as columns in studenta. In the preceding insert statement, 1 is the key (studenta) in columnfamily. ID is the name in columns and 10010 is the value.

3. Column: the most basic storage unit in Cassandra, which consists of name, value, and timestamp. Timestamp is the timestamp, and name indicates the name of the property you inserted, such as age or height. value is the value of the property (which is also used in turn ). Specifically, column is equivalent to a record in a relational database, and each record (age, height) is equivalent to a record field.

4. supercolumn: special columns. Its value can contain multiple columns.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.