Cassandra is an open-source distributed database that combines the key/value of dynamo with the column-oriented Feature of bigtable.
Cassandra has the following features:
1. Flexible Schema: It is very convenient to add or delete fields without having to pre-design the schema like a database ).
2. Support Range Query: You can query the range of keys.
3. high availability and scalability: single point of failure (spof) does not affect cluster services and can be linearly expanded.
We can think of Cassandra's data model as a four-dimensional or five-dimensional hash.
Column
Column is the smallest data unit in Cassandra. It is a data type of 3 yuan, including: name, value and timestamp.
A column is displayed in JSON format as follows:
1:{// This is a column
2:Name:"Jing Han's world",
3:Value:Gpcuster@gmali.com",
4:Timestamp: 123456789
5:}
For simplicity, we can ignore timestamp. Just think of column as a name/value.
Note that the names and values mentioned here are of the byte [] type and are not limited in length.
Supercolumn
We can think of supercolumn as a column array, which contains a name and a series of corresponding columns.
The format of a supercolumn in JSON format is as follows:
1:{// This Is A supercolumn
2:Name:"Jing Han's world",
3:// Contains a series of Columns
4:Value :{
5:Street: {Name:"Street", Value:"1234 X Street", Timestamp: 123456789 },
6:City: {Name:"City", Value:"San Francisco", Timestamp: 123456789 },
7:ZIP: {Name:"Zip", Value:"94107", Timestamp: 123456789 },
8:}
9:}
Both columns and supercolumns are a combination of name and value. The biggest difference is that the column value is a "string", while the supercolumn value is the map of columns.
Note that the supercolumn itself does not contain timestamp.
Columnfamily
Columnfamily is a structure that contains many rows. You can think of it as a table in RDBMS.
Each row contains the key provided by the client and a series of columns associated with the key.
Let's look at the structure:
1:USERPROFILE = {// This Is A columnfamily
2:Phatduckk :{// This is the key corresponding to columnfamily
3:// This is the column corresponding to the key
4:Username:"Gpcuster",
5:Email:Gpcuster@gmail.com",
6:Phone:"6666"
7:},// The first row ends.
8:Ieure :{// This is another key of columnfamily
9:// This is the column corresponding to another key
10:Username:"Pengguo",
11:Email:Pengguo@live.com",
12:Phone:"888"
13:Age:"66"
14:},
15:}
The columnfamily type can be standard or super.
The example we just saw is a standard columnfamily. Standard columnfamily contains a series of columns (not supercolumn ).
Super columnfamily contains a series of supercolumns, but it does not contain a series of standard columnfamily like supercolumn.
This is a simple example:
1:Addressbook = {// This is a super columnfamily.
2:Phatduckk :{// Key
3:Friend1: {Street:"8th Street", ZIP:"90210", City:"Beverley Hills", State:"Ca"},
4:John: {Street:"Howard Street", ZIP:"94404", City:"FC", State:"Ca"},
5:Kim: {Street:"X Street", ZIP:"87876", City:"Bils", State:"Va"},
6:TODD: {Street:"Jerry Street", ZIP:"54556", City:"Cartoon", State:"Co"},
7:Bob: {Street:"Q Blvd", ZIP:"24252", City:"Nowhere", State:"MN"},
8:...
9:},// Row ends
10:Ieure :{// Key
11:Joey: {Street:"A Ave", ZIP:"55485", City:"Hell", State:"NV"},
12:William: {Street:"Armpit Dr", ZIP:"93301", City:"Bakersfield", State:"Ca"},
13:},
14:}
Keyspace
Keyspace is the outermost layer of our data. All your columnfamily belong to a specific keyspace. In general, one of ourProgramThe application only has one keyspace.
Simple Test
After running Cassandra, start the command line and execute the following operations:
Cassandra> set keyspace1.standard1 ['jsmith '] ['first'] = 'john'
Value inserted.
Cassandra> set keyspace1.standard1 ['jsmith '] ['last'] = 'Smith'
Value inserted.
Cassandra> set keyspace1.standard1 ['jsmith '] ['age'] = '42'
Value inserted.
At this time, Cassandra already has three pieces of data.
The meaning of each field of the inserted data is as follows:
Next, we will perform the query operation:
Cassandra> Get keyspace1.standard1 ['jsmith ']
(Column = age, value = 42; timestamp = 1249930062801)
(Column = first, value = John; timestamp = 1249930053103)
(Column = last, value = Smith; timestamp = 1249930058345)
Returned 3 rows.
In this way, we can query the inserted data.
Sort
It should be clear that when we use Cassandra, the data will be sorted During writing.
All columns in a key are sorted by their names. We can specify the sort type in the storage-conf.xml file.
Currently, Cassandra provides the following sorting types: bytestype, utf8type, lexicaluuidtype, timeuuidtype, asciitype, and longtype.
Assume that your raw data is as follows:
{Name: 123, value: "Hello there "},
{Name: 832416, value: "kjjkbcjkcbbd "},
{Name: 3, value: "101010101010 "},
{Name: 976, value: "kjjkbcjkcbbd "}
When we specify longtype as the sort type in the storage-conf.xml file:
<! --
Columnfamily defined in storage-conf.xml
-->
<Columnfamily comparewith = "longtype" name = "cf_name_here"/>
The sorted data is as follows:
{Name: 3, value: "101010101010 "},
{Name: 123, value: "Hello there "},
{Name: 976, value: "kjjkbcjkcbbd "},
{Name: 832416, value: "kjjkbcjkcbbd "}
If the sorting type is utf8type
<! --
Columnfamily defined in storage-conf.xml
-->
<Columnfamily comparewith = "utf8type" name = "cf_name_here"/>
The sorted data is as follows:
{Name: 123, value: "Hello there "},
{Name: 3, value: "101010101010 "},
{Name: 832416, value: "kjjkbcjkcbbd "},
{Name: 976, value: "kjjkbcjkcbbd "}
As you can see, the specified sorting type is different, and the sorting result is also completely different.
For supercolumn, we have an additional sorting dimension, so we can specify comparesubcolumnswith to sort another dimension.
Assume that our raw data is as follows:
{// First supercolumn from a row
Name: "workaddress ",
// And the columns within it
Value :{
Street: {name: "street", value: "1234 X Street "},
City: {name: "city", value: "San Francisco "},
ZIP: {name: "Zip", value: "94107 "}
}
},
{// Another supercolumn from same row
Name: "homeaddress ",
// And the columns within it
Value :{
Street: {name: "street", value: "1234 X Street "},
City: {name: "city", value: "San Francisco "},
ZIP: {name: "Zip", value: "94107 "}
}
}
Then we define that the sorting types of comparesubcolumnswith and comparewith are utf8type, And the sorting result is:
{
// This one's first B/C when treated as utf8 strings
{// Another supercolumn from same row
// This row comes first B/C "homeaddress" is before "workaddress"
Name: "homeaddress ",
// The columns within this SC are also sorted by their names too
Value :{
// See, these are sorted by column name too
City: {name: "city", value: "San Francisco "},
Street: {name: "street", value: "1234 X Street "},
ZIP: {name: "Zip", value: "94107 "}
}
},
Name: "workaddress ",
Value :{
// The columns within this SC are also sorted by their names too
City: {name: "city", value: "San Francisco "},
Street: {name: "street", value: "1234 X Street "},
ZIP: {name: "Zip", value: "94107 "}
}
}
Additionally, Cassandra's sorting function allows us to implement it by ourselves, as long as you inherit org. Apache. Cassandra. DB. Marshal. itype.
References
WTF is a supercolumn? An intro to the Cassandra Data Model
Datamodel
For more information about CassandraArticle: Http://www.cnblogs.com/gpcuster/tag/Cassandra/