This article will go beyond the well-known details and explore the less obvious details related to Cassandra. You will examine the Cassandra http://www.aliyun.com/zixun/aggregation/14208.html "> data model, storage-mode design, architecture, and potential surprises associated with Cassandra.
In the database history article "What Goes Around comes Around", Michal Stonebraker describes in detail how storage technology evolves over time. Before implementing the relational model, developers have tried other models, such as hierarchical and forward graphs. It is noteworthy that the sql-based relational model, even if it is still the de facto standard, has prevailed for about 30 years. Given the brief history of computer science and the pace of its rapid development, this is an extraordinary achievement. The relational model has been built up so long that the solution Architect can easily choose the data store for the application for many years. Their choice is always the relational database.
Development, such as increasing the number of systems, mobile devices, extended user online status, cloud computing and the user base of multi-core systems, has led to more and more large systems. High-tech companies such as Google and Amazon are the first companies to touch on the scale. They soon discovered that relational databases were not sufficient to support large systems.
To avoid these challenges, Google and Amazon have proposed two alternative solutions: Big Table and Dynamo, which allow them to relax the guarantees provided by the relational data model to achieve higher scalability. Eric Brewer's "CAP theorem" later officially turned these observations. It claims consistency, availability, and partitioning fault tolerance for scalability systems, because it is impossible to build systems that contain all of these attributes. Shortly thereafter, a new storage system is planned, based on early work by Google and Amazon, as well as an understanding of the scalability systems available. These systems are named "NoSQL" systems. The name initially means "Do not use SQL if you want to scale", which is later redefined as "not just SQL," meaning that there are other solutions in addition to sql-based solutions.
There are many NoSQL systems, and each system eases or alters some aspects of the relational model. It is noteworthy that no NoSQL solution applies to all scenarios. Each solution is superior to the relational model and scaled for a subset of the use cases. My earlier article, "Finding the right data solution for your application in the data Storage Haystack", discusses how to match application requirements to the NoSQL solution.
The Apache Cassandra is one of the earliest and most widely used NoSQL solutions. This article introduces the Cassandra in detail and points out some details and complexities that are not easily discovered when using Cassandra for the first time.
Cassandra is a NoSQL column family (column accessibility) implementation that supports the big Table data model with architectural features introduced by Amazon Dynamo. Some of the advantages of Cassandra are as follows:High scalability and high availability, no single point of failure NoSQL column family with very high write throughput and good read throughput similar to SQL query language (from 0.8), and supported by a level two index to search for adjustable consistency and support for replication flexible patterns
These advantages make it easy to recommend the use of Cassandra, but it is important for developers to delve into the details and complexities of Cassandra to understand the complexity of the program.
Cassandra stores data based on the column family data model, as shown in Figure 1.
Figure 1. Cassandra Data Model