What Should DBAs consider when selecting NoSQL databases?

Source: Internet
Author: User
Tags cassandra riak what sql couchdb voltdb
[51CTO external headlines] We once discussed "What role does NoSQL play in our work ?" We have also raised 101 questions about how to choose a NoSQL database. We even held an online seminar to thoroughly analyze what SQL, NoSQL, or applications can bring to the scalability of Web applications.

[51CTO external headlines] We once discussed "What role does NoSQL play in our work ?" We have also raised 101 questions about how to choose a NoSQL database. We even held an online seminar to thoroughly analyze what SQL, NoSQL, or applications can bring to the scalability of Web applications.

[51CTO external headlines] We once discussed "What role does NoSQL play in our work ?" We have also raised 101 questions about how to choose a NoSQL database. We even held an online seminar to thoroughly analyze the benefits of SQL, NoSQL, or simultaneous application in the scalability of Web applications.

Now let's change our goal and think about which specific factors will affect our choice and which system is more suitable for such factors.

What do you have?

First, let's talk about various data models. For the following information, see the Emil Eifrem blog and NoSQL database description.

Document Database

Inheritance: inspired by Lotus Notes.

Data Model: document summary, including key-value summary.

Instance: CouchDB, MongoDB

Advantages: Data Modeling is natural, programmers are easy to use, the development process is short, the webpage mode is compatible, and CRUD (short for adding, querying, updating, and deleting) is easy to achieve ).

Graphics Database

Inheritance: from Euler and graphics theory.

Data Model: nodes and relationships, which can be combined to maintain the paired state between keys and values.

Example: AllegroGraph, InfoGrid, Neo4j

Advantage: easy to play with complex graphics problems, fast processing speed

Relational databases

Inheritance: derived from the data relationship model theory proposed by E. F. Codd in a large shared database

Data Model: Based on Relational groups

Instance: VoltDB, Clustrix, MySQL

Advantages: powerful performance, good scalability of the online transaction processing system, support for SQL access, intuitive view, good at dealing with transaction relationships, and excellent interaction with programmers

Object-Oriented Database

Inheritance: research results from the Graphic Database

Data Model: Object

Example: objecti.pdf, Gemstone

Advantage: it is good at processing complex object models, fast key-value access and key-function access, and has various features of the Graphic Database.

Key-value Storage

Inheritance: paper concept and distributed hash table in Amazon Dynamo

Data Model: collate Summary of key-value pairs

Instance: Membase, Riak

Advantages: proper size control, good at handling sustained small-scale read/write needs, fast speed, easy to use for programmers

BigTable Clones

Inherited from: paper concept in Google BigTable

Data Model: column group. In a table model, each row can theoretically have at least a separate column configuration.

Example: HBase, Hypertable, Cassandra

Advantages: proper size control, good at dealing with large-scale write loads, high availability, support for multiple data centers, support for simplified ing

Data Structure services

Inheritance: Unknown

Instance: Redis

Data Model: the execution process is based on indexes, lists, sets, and string values.

Advantage: bringing unprecedented fresh blood to database applications

Grid database

Inheritance: research on data grids and tuples

Data Model: space-based architecture

Instance: GigaSpaces, Coherence

Advantage: excellent performance and excellent transaction processing scalability

Which solution should we choose for our applications?

The key to selection is to rethink how our applications work collaboratively based on different data models and products. Use the correct data model to process the corresponding real tasks and use the correct products to solve the corresponding real problems.

To explore which data models can help our applications, refer to "What role does NoSQL play in our work ?" Article. In this article, I try to combine unconventional Application Instances in systems that are frequently created with different features and features.

Associate objective requirements in application instances with our choices. In this way, we can reverse analyze which products are suitable for introduction in our infrastructure. The specific conclusion is NoSQL or SQL, which is no longer important.

Focus on data models, product features, and needs. Products always combine different functions, so it is difficult for us to find the most useful one simply from the perspective of a certain type of data model.

There is a priority for the functional and feature requirements. As long as we have a clear understanding of this priority, we can make the best choice.

If our application needs...

Complex transactions: because no one is willing to bear data loss, or everyone prefers a simple and easy-to-use transaction programming model, consider using relational or grid databases.

For example, a set of inventory systems may require complete ACID (four elements of database transaction execution: atomicity, consistency, isolation, and durability ). The customer selects a product but is notified that there is no inventory. This type of situation is obviously easy to cause trouble. Most of the time, what we want is not extra compensation, but the selected item.

If it is based on scalability, NoSQL or SQL can handle it freely. In this case, we need to focus on systems that support outward scaling, classification, real-time addition and removal of devices, load balancing, automatic classification and sorting, and have a high sampling rate.

High Availability is required if you require continuous database writing. In this case, you may wish to focus on BigTable products, which have outstanding performance in terms of consistency.

If there are a large number of small-scale continuous read/write requirements, that is to say, the workload is fluctuating, you can pay attention to document classes, key-value classes, or databases that provide fast memory access. Introducing SSD as the storage media is also a good choice.

Focusing on social networks, we first think of graphic databases, and then relational databases such as Riak. The resident in-memory relational databases with simple SQL functions can basically meet the needs of small data sets. Redis collections and list operations can also play a role.

If our application needs...

Document databases are worth considering when there are various access modes and data types. This type of database is not only flexible, but also stable in performance.

Hadoop is the preferred product for the need for complete offline reports and large datasets, followed by other products that support simplified ing. However, simplifying the support for ing alone is not enough to provide the same processing capability as Hadoop.

If the business spans several data centers, Bigtable Clone and other products that provide distributed options can cope with latency caused by regional distance and have better partition compatibility.

To create a CRUD application, document-type databases are preferred. Such products simplify the process of accessing complex data from the outside. Riak is recommended if built-in search is required.

Redis is the best choice for operations such as list, set, queue, and publish/subscribe in the data structure. Its distributed locking, overwriting logs, and other functions will all shine in this kind of application status.

Data can be easily fed back to programmers in the form of processing (such as JSON, HTTP, REST, Javascript). Document-type databases can meet such requirements, followed by key-value databases.

If our application needs...

With the ability to synchronize transactions in an intuitive view and real-time data feedback, VoltDB is a good choice. Its data aggregation and time window display are all eye-catching.

If enterprise-level support and service level agreements are required, we need to focus on special markets. Membase is an example.

Do we need to record continuous data streams, but cannot find the necessary consistency assurance? BigTable Clone handed in a satisfactory answer because it is based on a distributed file system and can handle a large number of write operations.

To make the operation process as simple as possible, the answer must be in a hosted or platform-as-a-service solution. They exist to deal with such requirements.

Should I make recommendations to enterprise customers? You may consider relational databases because their long term is a technology that can solve complex relational problems.

If you need to use a dynamic method to establish the relationship between objects so that they have dynamic characteristics, graphics databases can be of great help. Such products often do not need specific models, so they can be gradually established through programming.

Storage services such as S3 are generated to support large media information. In contrast, NoSQL systems often cannot process large binary data blocks, even though MongoDB has file service functions.

If our application needs...

Is there a need to upload large amounts of data in batches efficiently? We still need to find products with corresponding functions. Most products are not competent because they do not support batch operations.

Document databases or key-value databases can use a smooth model system to provide a convenient way to upload, because these two types of products not only support optional regions, add regions, and delete regions, there is no need to establish a complete mode migration framework.

To implement integrity restrictions, You must select a product that supports SQL DLL and run it in the stored procedure or application code.

When collaboration is highly dependent, You need to select a graphic database because such products support fast switching between different entities.

When the distance between data movement is short and the data does not need to pass through the network, you can make a choice in the pre-stored program. Pre-stored programs can be found in relational, grid, document, and even key-value databases.

If our application needs...

The key-value storage system is good at processing BLOB data cache and storage problems. Caching can be used to store web pages or complex objects. This solution reduces latency and costs less than relational databases.

If you have high requirements on data security and working status, you can try to use custom products and use them in common areas of work (such as upward scaling, adjustment, distributed caching, partitioning, and de-Standardization) in addition, you must prepare solutions for scalability (or other aspects.

Diverse data types mean that our data cannot be simply managed by tables or divided by columns. The complex structure and user group (and possibly other factors) only document, key-value, and Bigtable Clone databases can be used. All the above databases have extremely flexible data type processing capabilities.

Sometimes other business departments need to perform a quick link query. The introduction of this query method makes it unnecessary for us to recreate all the information for occasional viewing. Any database that supports SQL can perform such queries. As for running on the cloud platform and automatically taking full advantage of the functions of the cloud platform, this kind of good wish can only be at present.

If our application needs...

Secondary indexes are supported to search for data through different keywords, which can be achieved only by the joint support of relational databases and the new secondary index system launched by Cassandra.

Create a set of growing data sets (truly astronomical data), but the access volume is not large, so Bigtable Clone is the best choice, because it will properly arrange the data in the distributed file system.

Need to integrate other types of services and ensure that the database provides the delayed write synchronization function? The best way to achieve this is to capture database changes and feed them back to other systems to ensure operational consistency.

Measure the test taker's knowledge about the system's adaptability to power supply interruptions, isolation, and other faults through fault tolerance checks.

If there is no one to worry about a technology, but you feel that you have great potential to explore it, you may wish to stick to it on this road. This situation sometimes brings unexpected bright prospects.

Try to work on the mobile platform and pay attention to CouchDB and the mobile version of couchbase.

Which solution is better?

The 25% improvement in the status is not enough for us to make up our mind to choose NoSQL.

Whether the selection criteria are appropriate depends on the actual situation. Does this kind of standard have guiding significance for your solution?

If your company is in its infancy and needs to launch its own products as soon as possible, don't hesitate. Both SQL and NoSQL can be used as a reference.

Performance may not differ much from one host, but what if we want to deploy it on more than N hosts?

Everything in the world is not perfect. If you visit the Amazon forum, you will find that the EBS above is slow in response. Of course, maybe I am a special case. However, GAE's data storage system is slow in response, sometimes even displaying Red Cross. Every product we use has many problems, but can you accept the problems you have chosen?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.