Cassandra Data Model

Source: Internet
Author: User
Tags cassandra

Cassandra is an open-source distributed database that combines the key/value of dynamo with the column-oriented Feature of bigtable.

Cassandra has the following features:

1. Flexible Schema: It is very convenient to add or delete fields without having to pre-design the schema like a database ).

2. Support Range Query: You can query the range of keys.

3. high availability and scalability: single point of failure (spof) does not affect cluster services and can be linearly expanded.

We can think of Cassandra's data model as a four-dimensional or five-dimensional hash.

Column

Column is the smallest data unit in Cassandra. It is a data type of 3 yuan, including: name, value and timestamp.

A column is displayed in JSON format as follows:

 
1:{// This is a column

 
2:Name:"Jing Han's world",

 
3:Value:Gpcuster@gmali.com",

 
4:Timestamp: 123456789

 
5:}

For simplicity, we can ignore timestamp. Just think of column as a name/value.

Note that the names and values mentioned here are of the byte [] type and are not limited in length.

Supercolumn

We can think of supercolumn as a column array, which contains a name and a series of corresponding columns.

The format of a supercolumn in JSON format is as follows:

 
1:{// This Is A supercolumn

 
2:Name:"Jing Han's world",

3:// Contains a series of Columns

 
4:Value :{

 
5:Street: {Name:"Street", Value:"1234 X Street", Timestamp: 123456789 },

 
6:City: {Name:"City", Value:"San Francisco", Timestamp: 123456789 },

7:ZIP: {Name:"Zip", Value:"94107", Timestamp: 123456789 },

 
8:}

 
9:}

 

Both columns and supercolumns are a combination of name and value. The biggest difference is that the column value is a "string", while the supercolumn value is the map of columns.

Note that the supercolumn itself does not contain timestamp.

Columnfamily

Columnfamily is a structure that contains many rows. You can think of it as a table in RDBMS.

Each row contains the key provided by the client and a series of columns associated with the key.

Let's look at the structure:

1:USERPROFILE = {// This Is A columnfamily

 
2:Phatduckk :{// This is the key corresponding to columnfamily

 
3:// This is the column corresponding to the key

 
4:Username:"Gpcuster",

 
5:Email:Gpcuster@gmail.com",

6:Phone:"6666"

 
7:},// The first row ends.

 
8:Ieure :{// This is another key of columnfamily

 
9:// This is the column corresponding to another key

 
10:Username:"Pengguo",

11:Email:Pengguo@live.com",

 
12:Phone:"888"

 
13:Age:"66"

 
14:},

 
15:}

The columnfamily type can be standard or super.

The example we just saw is a standard columnfamily. Standard columnfamily contains a series of columns (not supercolumn ).

Super columnfamily contains a series of supercolumns, but it does not contain a series of standard columnfamily like supercolumn.

This is a simple example:

 
1:Addressbook = {// This is a super columnfamily.

 
2:Phatduckk :{// Key

 
3:Friend1: {Street:"8th Street", ZIP:"90210", City:"Beverley Hills", State:"Ca"},

4:John: {Street:"Howard Street", ZIP:"94404", City:"FC", State:"Ca"},

 
5:Kim: {Street:"X Street", ZIP:"87876", City:"Bils", State:"Va"},

 
6:TODD: {Street:"Jerry Street", ZIP:"54556", City:"Cartoon", State:"Co"},

 
7:Bob: {Street:"Q Blvd", ZIP:"24252", City:"Nowhere", State:"MN"},

 
8:...

 
9:},// Row ends

 
10:Ieure :{// Key

11:Joey: {Street:"A Ave", ZIP:"55485", City:"Hell", State:"NV"},

 
12:William: {Street:"Armpit Dr", ZIP:"93301", City:"Bakersfield", State:"Ca"},

 
13:},

 
14:}

Keyspace

Keyspace is the outermost layer of our data. All your columnfamily belong to a specific keyspace. In general, one of ourProgramThe application only has one keyspace.

Simple Test

After running Cassandra, start the command line and execute the following operations:

Cassandra> set keyspace1.standard1 ['jsmith '] ['first'] = 'john'

Value inserted.

Cassandra> set keyspace1.standard1 ['jsmith '] ['last'] = 'Smith'

Value inserted.

Cassandra> set keyspace1.standard1 ['jsmith '] ['age'] = '42'

Value inserted.

At this time, Cassandra already has three pieces of data.

The meaning of each field of the inserted data is as follows:

Next, we will perform the query operation:

Cassandra> Get keyspace1.standard1 ['jsmith ']

(Column = age, value = 42; timestamp = 1249930062801)

(Column = first, value = John; timestamp = 1249930053103)

(Column = last, value = Smith; timestamp = 1249930058345)

Returned 3 rows.

In this way, we can query the inserted data.

Sort

It should be clear that when we use Cassandra, the data will be sorted During writing.

All columns in a key are sorted by their names. We can specify the sort type in the storage-conf.xml file.

Currently, Cassandra provides the following sorting types: bytestype, utf8type, lexicaluuidtype, timeuuidtype, asciitype, and longtype.

Assume that your raw data is as follows:

{Name: 123, value: "Hello there "},

{Name: 832416, value: "kjjkbcjkcbbd "},

{Name: 3, value: "101010101010 "},

{Name: 976, value: "kjjkbcjkcbbd "}

When we specify longtype as the sort type in the storage-conf.xml file:

<! --

Columnfamily defined in storage-conf.xml

-->

<Columnfamily comparewith = "longtype" name = "cf_name_here"/>

The sorted data is as follows:

{Name: 3, value: "101010101010 "},
{Name: 123, value: "Hello there "},

{Name: 976, value: "kjjkbcjkcbbd "},

{Name: 832416, value: "kjjkbcjkcbbd "}

If the sorting type is utf8type

<! --

Columnfamily defined in storage-conf.xml

-->

<Columnfamily comparewith = "utf8type" name = "cf_name_here"/>

The sorted data is as follows:

{Name: 123, value: "Hello there "},
{Name: 3, value: "101010101010 "},

{Name: 832416, value: "kjjkbcjkcbbd "},

{Name: 976, value: "kjjkbcjkcbbd "}

As you can see, the specified sorting type is different, and the sorting result is also completely different.

For supercolumn, we have an additional sorting dimension, so we can specify comparesubcolumnswith to sort another dimension.

Assume that our raw data is as follows:

{// First supercolumn from a row

Name: "workaddress ",

// And the columns within it

Value :{

Street: {name: "street", value: "1234 X Street "},

City: {name: "city", value: "San Francisco "},

ZIP: {name: "Zip", value: "94107 "}

}

},

{// Another supercolumn from same row

Name: "homeaddress ",

// And the columns within it

Value :{

Street: {name: "street", value: "1234 X Street "},

City: {name: "city", value: "San Francisco "},

ZIP: {name: "Zip", value: "94107 "}

}

}

Then we define that the sorting types of comparesubcolumnswith and comparewith are utf8type, And the sorting result is:

{

// This one's first B/C when treated as utf8 strings

{// Another supercolumn from same row

// This row comes first B/C "homeaddress" is before "workaddress"
Name: "homeaddress ",

// The columns within this SC are also sorted by their names too

Value :{

// See, these are sorted by column name too

City: {name: "city", value: "San Francisco "},
Street: {name: "street", value: "1234 X Street "},

ZIP: {name: "Zip", value: "94107 "}

}

},
Name: "workaddress ",

Value :{

// The columns within this SC are also sorted by their names too

City: {name: "city", value: "San Francisco "},
Street: {name: "street", value: "1234 X Street "},

ZIP: {name: "Zip", value: "94107 "}

}

}

Additionally, Cassandra's sorting function allows us to implement it by ourselves, as long as you inherit org. Apache. Cassandra. DB. Marshal. itype.

References

WTF is a supercolumn? An intro to the Cassandra Data Model

Datamodel

 

For more information about CassandraArticle: Http://www.cnblogs.com/gpcuster/tag/Cassandra/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.