Cassandra data model (based on CQL to solve the problem of limit on the number of fat columns and flexibility) (Version 1.1 and later) describes Cassandra's programming model and data structure. Since the Cassandra version has been updated several times, Chinese documents on the Internet have become outdated, and representative articles such as ebuy are outdated. Therefore, you can find your own documents and combine them
Cassandra data model (based on CQL to solve the problem of limit on the number of fat columns and flexibility) (Version 1.1 and later) describes Cassandra's programming model and data structure. Since the Cassandra version has been updated several times, Chinese documents on the Internet have become outdated, and representative articles such as ebuy are outdated. Therefore, you can find your own documents and combine them
Cassandra data model (based on CQL, solving the problem of limit on the number of fat columns and flexibility) (Version 1.1 and later)
This article mainly describes Cassandra's programming model and data structure. Since the Cassandra version has been updated several times, Chinese documents on the Internet have become somewhat outdated, and representative articles such as ebuy are outdated, so I have to find the materials myself, write an article on the Cassandra model in conjunction with the official blog. Introduction of some terms: Due to technical term conflicts, tables in BigTable correspond to columns in Cassandra, the concept of columnfamily in BigTable is similar to that in Cassandra's early implementation (this function has been disabled in Cassandra ). Cassandra first introduces Cassandra. Cassandra is an open-source project of Apache and a NoSQL type. It is a column Family database in a wide sense and has a distributed and decentralized machine. Its early implementation mainly referred to the introduction of Google's Big Table paper and Amazon's Dynamo paper. We can refer to two papers to get the design idea of Cassandra. For more detailed information, see the wiki and official documentation. Early implementation and best practices previously mentioned earlier implementation, of course, the difference between the current and early stages is already significant. From Data Model Design to practical use, it is basically self-built. One of the biggest changes is the change in Database Operations in MySQL 1.1. In the early implementation of Cassandra, we used the BigTable data model, that is, the concepts of columns, column families, and primary keys. In the best practices of ebuy, I have explained in detail the use of this model. I have no intention to reference other people's articles here. I just want to briefly describe the concept. The best practice article consists of two parts: the previous section mainly describes how to design a data model in a column Family database, this article mainly discusses the repeated storage of relations and some related data to reduce the distributed reading of the database. The column Family database data structure has an interesting formula, as shown below: Map > The following section describes the specific implementation methods in Cassandra. For example, Cassandra implemented something similar to the BigTable column family, or something with a different name, that is, a super column, aggregate similar columns to extract them together.
For example, we recommend that you use a fat column (using the column key for query) instead of a thin column (using the primary key for query ). A fat column refers to a large number of column storage relationships. For example, the user table users has three columns: user_k, p1, and p2. The first column user_k is the primary key, and the second column p1 is the product 1 purchased by the user, the third column p2 is the product purchased by the user 2, and the pn can be expanded to a large number. note: The above is a simple explanation. The discussion of various application scenarios in the actual full text is not limited to this scope. Cassandra uses the primary key as the partition primary key and the column name as the column key for storage. Unclear: (image from Cassandra 1.1) create table timeline (user_id varchar, tweet_id uuid, author varchar, body varchar, primary key (user_id, tweet_id ));
Is the storage method of traditional databases.
It is the storage method of Cassandra. Note that {1787, author} is the column key name, not just data. As Cassandra grows, the data model originally implemented in full compliance with BigTable has some problems, one of which is the number of columns that cannot be infinitely expanded, although a large enough number of Columns has been designed, it is still not enough for the big data distributed database, and the flexibility of super columns is limited, as a result, developers began to take their own path. With the release and Development of CQL (Cassandra's own SQL-like operating language), they first cleared the super columns (columnfamily in BigTable) concept: In CQL, there is no super column concept. The first level of the column is the table, that is, the super column in the original concept does not exist, how can we deal with the original fat column application scenario? CQL has the primary key and secondary primary key concepts. The primary key is the original primary key. This concept has not changed. For the original Super column aggregation, CQL solves this problem by adding the value of the second-level primary key to the column name as the column key name. That is to say, the data originally stored in the data dictionary (please allow me to use this relational database vocabulary) is stored in the data table, reduces access to data dictionaries and overhead for maintaining the data structure of data dictionaries, and distributes pressure to data tables. Reference: Schema in Cassandra 1.1 Cassandra WikiCassandra Data Modeling Best Practices, Part 1 Cassandra Data Modeling Best Practices, Part 2
Posted on