In the process of driving big data projects, enterprises often encounter such a critical decision-making problem-which database solution should be used? After all, the final option is often left with SQL and NoSQL two. SQL has an impressive track record and a huge installation base, but NoSQL can generate considerable revenue and also has a lot of supporters. In today's debate, we will listen to the views of the experts in the two camps together.
John Dix, editor of the Network World website, organized the debate and invited a number of experts. Among them, two of the participating experts are VOLTDB company CTO Ryan Betts and Couchbase company CEO Bob Wiederhold. Ryan Betts that SQL has earned a stable living space among large companies, and big data is just another piece of work that SQL needs to support. Bob Wiederhold that NoSQL is a very viable alternative, and in fact it is a great fit for big data in many areas-especially in terms of scalability.
Point One: SQL has passed the test of time and still thrives- Voltdb company CTO Ryan Betts
Structured Query Language (SQL) has proven its strength for decades, and has continued to engage with many big data vendors and companies, including Google, Facebook, Cloudera and Apache.
Although the rising star of NoSQL does cause some repercussions, SQL still retains a significant share of the market and continues to gain input and adoption in the big Data world.
Once a technology, like SQL, has dominated, people tend to forget their core competitive advantage. The main reason why SQL wins is that it has the following unique combination of advantages:
1. SQL can enhance interaction with data, allowing users to raise a wide range of issues for a single database design. This is the key to the success of SQL-if the data is not interactive, it will essentially lose its usefulness. And the continuous growth of the interaction of the future development of the database will bring new perspectives, related issues and practical significance.
2. SQL has standardized features that allow users to freely apply expertise from a variety of systems, while supporting third-party plugins and tools.
3. SQL is scalable, feature-rich, and proven to solve a variety of challenges-including write-driven fast transactions and deep analysis involving frequent scans.
4. SQL can seamlessly interface with data presentation and storage mechanisms. Some SQL systems also support JSON and other structured object formats, resulting in better performance and more functional features than NoSQL scenarios.
The term "NoSQL" is not exactly accurate, but in this discussion, I used the definition that Dr. Rick Cattell for NoSQL, "refers to systems that provide operations such as key/value storage or simple logging and indexing, and are designed to provide vertical scalability for these simple operations." ”
It is clear that many of the new databases on the market are very different from each other--accurate mastery of their respective features and deep mechanisms to the user's convenience and limitations is the key to the success of the project deployment. The core features of NoSQL make it more appropriate to solve specific problems. For example, a graphical database is better suited to working with instances that organize data based on relationships rather than traditional rows or documents, while a specific text search system is better at handling situations in which users can enter content in real time.
Here, I intend to summarize the differences and major advantages of the SQL system compared to the simple key/value or even the JSON object storage system that is innovative in terms of storage format and extensibility.
* SQL brings interactive features. SQL is a declarative query language. Users say what they want (for example, showing where the customers who purchased the most purchases in the past five years are from each year), and the database builds the relevant algorithms internally and extracts the corresponding results as required. In contrast, the Code innovation Achievement of NoSQL is a kind of protocol query technology. MapReduce requires that users not only understand the results they want, but also provide specific ways to execute the results.
While it sounds like a rather boring technical difference, this feature is still critical for the following two reasons: first, declarative SQL queries are easier to create through graphical tools and simple clicks on report generators. This relatively low threshold of use can help analysts, operators, managers, and other users who do not understand software programming knowledge to enjoy their core functionality and effectiveness. Second, the database engine uses internal information and chooses efficient algorithms to abstract the process. Even if the physical layer or database index changes, the optimization algorithm can still complete the task exactly. In contrast, in the past programmatic systems, programmers need to re-examine existing processes and make two of programming. This brings both high costs and the potential for unexpected errors.
The market is very clear about this essential difference. As early as 2010, Google announced the introduction of a SQL solution to strengthen MapReduce to meet the real needs of internal users. Recently, Facebook published its own SQL scenario Presto, which is intended to query its petabyte-level HDFS cluster data. According to Facebook: "As our data Warehouse scale has grown to petabytes and business needs evolve, we obviously need an optimized, interactive system for lower query latency." "Beyond that, Cloudera is setting up its own SQL scheme Impala above HDFS. The series of developments mentioned earlier are based on hive--a set of SQL enclosures for Hadoop, long-lived, and widely used.
* SQL has standardized features. while vendors sometimes make special adjustments and customizations to their own SQL interfaces, essentially the SQL kernel is still a highly standardized solution, and other specifications represented by ODBC and JDBC also provide a wide range of stable interfaces for SQL-based systems. The resulting management and operational tools ecosystem can help you to design, monitor, inspect, explore, and develop your applications based on SQL systems.
SQL users and programmers have thus been able to reuse APIs and user interface knowledge accumulated from a variety of backend systems to reduce application development time. Standardized features also allow third parties with claims licensing to create extract, transform, and Load (ETL) tools designed to help enterprises process data flows between different databases and systems in a structured manner.
* SQL is extensible. Some friends may mistakenly assume that SQL has to be extensible by sacrificing performance, which is actually completely wrong. As mentioned above, Facebook has created a SQL interface to query petabytes of data. SQL also has a very fast performance when it runs ACID transaction processing tasks. SQL provides an abstraction for data storage and retrieval mechanisms that allows users to do their work in a unified manner, without regard to the specific task type and data size, which enables SQL to run efficiently between various clustered replica data storage systems. The practice of using SQL as an interface does not involve cloud creation, sizing, or HA systems, and there are no inherent factors in SQL that limit fault tolerance, high availability, and replication capabilities. In fact, all modern SQL systems are now well-supported for horizontal scalability, replication capabilities, and fault tolerance in cloud architectures.
* SQL support JSON. a few years ago, many SQL systems began to incorporate XML document support into their own design ideas. Today, as JSON becomes one of the mainstream data interchange formats, SQL vendors are also actively supporting JSON. Given the current agile programming process and the need for Internet access infrastructure uptime, the ability to support structured data types has become an essential component. Oracle 12c, PostgreSQL 9.2, VOLTDB, and other types of database scenarios are starting to support json--'s performance benchmark is generally superior to the "native" JSON NoSQL scheme.
SQL will continue to take the initiative in the battle for market share and will continue to attract more investor and adopter support. The NoSQL database, while providing a proprietary query language or simple key-value semantics, does not differentiate from a deep technical level, which undoubtedly seriously affects its ability to challenge market rulers. Modern SQL systems can support rich query semantics, build and nurture user bases, expand ecosystem integration, and deepen adoption within an enterprise environment while maintaining or exceeding the original scalability.
Opinion Two: NoSQL is better suited for big data applications --couchbase CEO Bob Wiederhold
More and more companies are starting to see NoSQL as a viable alternative to relational databases, especially in big data applications, where many enterprise users realize that scale-out operations are better than standardized clusters and commercial servers. In addition, the use of non-modal data model is often more suitable for the current different types of data capture and processing work.
When discussing big data topics in the NoSQL world, we focus primarily on the read and write processes in the operational database-that is, the interactive tasks that people involve in the Daily online transaction process (e.g., using big Data to guide online flight bookings). An operational database differs from an analytic database in that it typically takes care of large amounts of data and collects the analytical conclusions contained in the data (for example, using big data to analyze how many passengers are scheduled on a particular day for a flight).
But for big data in an operational database, the design thrust is not focused on analytical work; an operational database often requires a huge set of data for countless users, helping them with continuous data access and real-time transactions. The sheer scale of such databases for manipulating and managing big data content also explains the importance of NoSQL features and why it plays a central role in big data applications.
* NoSQL is the key to achieving scalability
The technology industry is bound to experience a transition inflection point every time a fundamental shift in hardware development is in progress. In the database domain, this shift from scaling up to scale-out architectures is also a major factor driving NoSQL's rapid growth. relational databases, including concrete solutions by giants such as Oracle and IBM, focused on solving the scaling up challenge. That is, they take a centralized, global sharing technology that can only be extended by adding more expensive hardware devices.
In contrast, the NoSQL database takes the distributed characteristics into account from the design idea, which belongs to the out-of-the-way expansion technology. They leverage a range of distributed nodes (which form a whole cluster) to provide superior resiliency, helping users add more nodes at will to cope with increasing workloads.
Distributed scale-out scenarios often lead to lower usage costs than the upward scaling mechanism. The latter is a set of large, complex, fault-tolerant server systems, so both design, build, and post-support can be costly to spend. The licensing costs of commercial relational databases cannot be overlooked, as their billing strategy is based on a single server. NoSQL databases, on the other hand, usually belong to open source projects, with server clusters as the overall billing unit and lower prices.
* NoSQL is the key to achieving flexibility
Relational and NoSQL data models are completely different. The relational model needs to split the data into multiple relational tables containing rows and columns, which are referenced by the foreign keys that are also stored in the column.
When a user needs to query a set of data, the required information must be collected from multiple tables-typically involving hundreds of of today's commonly used enterprise applications-and consolidating them before they can be delivered to the terminal application. Similarly, when writing data, the write process needs to be reconciled and oriented to multiple tables during execution. Relational databases often have the ability to capture and store information when the amount of data is relatively small and the speed of importing into the database is not very fast. However, current applications often need to handle the read and write operations of massive amounts of data and are required to be done in near real time, beyond the scope of the operational database.
NoSQL databases take a completely different pattern. From the core point of view, the NoSQL database really implements "Norel", that is, non-relational, that is, such a scheme in the process of saving and organizing information does not depend on the table and the relationship between the tables. For example, a set of document-oriented NoSQL databases will first acquire the data we need and then integrate it into a document in JSON format. Each JSON document can be considered an object that can be used by the application. A JSON document can store data that would otherwise need 25 relational database tables in the same row and organize them into a single document/object.
Information aggregation can lead to duplication of information content, but since storage resources are no longer part of the primary cost source, such data models provide greater flexibility, facilitate efficient allocation of the resulting documents and improve performance of read and write operations, thereby improving the alternative effect of WEB applications.
* NoSQL is the key to supporting big data applications
Today, we have been able to capture and access data more easily through third-party environments, including social media sites. Personal user information, geo-location data, user-generated content, device login data, and sensor data are just a few of the typical representatives of this wave, and the list of data sources is expanding. At the same time, companies are increasingly reliant on the power of big data technologies to drive their critical business applications. Overall, companies have begun to extend an olive branch to NoSQL, which is the only way to cope with the current emerging data types.
Developers need a more flexible set of database scenarios that can easily adapt to the latest data types to avoid disrupting content structure adjustments provided by third-party data providers. Most new types of data are unstructured or semi-structured, so developers need their own databases to be able to save them efficiently. Unfortunately, the strictly defined, pattern-based design approach adopted by relational databases makes it impossible for us to quickly accept new data types, and it is naturally difficult to adapt to unstructured and semi-structured data. The data model that NoSQL brings is better mapped to its actual needs.
Overall, as WEB and mobile applications continue to grow in popularity, emerging trends, and the shift towards online consumer behavior and new types of data, the industry's various process scenarios aspire to a database technology that provides scalability and flexibility for data management and access. In this context, NoSQL technology is the only solution that can effectively meet these requirements.
Sql/nosql Two camps debate: who is better suited to big data