Cassandra data model I. Several Concepts
Cluster: Cluster, a node contained in a logical Cassandra instance. A cluster can contain multiple keyspaces.
Keyspace: The namespace of the column family, usually an application keyspace.
Column family: contains multiple columns. Each column includes name, value, and timestamp. Column family is referenced by row key.
Super column: it can be seen that its column contains subcolumn.
Column: including name, value, timestamp.
Ii. Column
Column is the smallest unit for Data Writing. It is a triple (name, value, timestamp ).
The thrift interface is defined as follows:
struct Column { 1: binary name, 2: binary value, 3: i64 timestamp, }
JSON format:
{ "name": "emailAddress", "value": "foo@bar.com", "timestamp": 123456789 }
Iii. Column families
Column family is the container of columns. Similar to tables in relational databases. Defined in storage-config.xml files. If you want to add or change CF, restart the Cassandra instance. Column family holds concurrentskiplistmap <columnname, icolumn> refcolumns references to columns, where column is sorted. The columns sorting in CF can be ASCII, UTF-8, long, or UUID.
4. Rows
In cassandra, each CF is stored in a separate file. Files are sorted by row key to facilitate data compression. The row key determines the machine on which the data should be stored. Therefore, a row key may have multiple CF connections (expressed in the rowmutation structure in the memory, and its structure is (key, map <cfname, columnfamily> modifications), but these cf may not have a logical relationship. When a key corresponds to multiple CF, only one cf is returned for each query by the thrift interface. The row memory structure is row (Key, cfname ).
The JSON format of a key-> column families-> column is as follows:
{ "mccv":{ "Users":{ "emailAddress":{"name":"emailAddress", "value":"foo@bar.com"}, "webSite":{"name":"webSite", "value":"http://bar.com"} }, "Stats":{ "visits":{"name":"visits", "value":"243"} } }, "user2":{ "Users":{ "emailAddress":{"name":"emailAddress", "value":"user2@bar.com"}, "twitter":{"name":"twitter", "value":"user2"} } } }
Mccv and user2 are row keys. The mccv key is associated with two Cf: Users and stats, but this does not mean that the data between the two CF is related.
V. Super Columns
Super column is equivalent to a column containing multiple subcolumns. Its memory structure is supercolumn (byte [] Name, concurrentskiplistmap <byte [], icolumn> Columns), which clearly includes the ing of the set of name and columns. JSON format:
{ "mccv": { "Tags": { "cassandra": { "incubator": {"incubator": "http://incubator.apache.org/cassandra/"}, "jira": {"jira": "http://issues.apache.org/jira/browse/CASSANDRA"} }, "thrift": { "jira": {"jira": "http://issues.apache.org/jira/browse/THRIFT"} } } } }
Mccv indicates row key, tags indicates column family, and Cassandra and thrift indicate super column name. The following is the column structure. To , you need to take one more step than column, that is, column family-> supercolumnname-> columnname-> column.