One of the key decisions faced by enterprises that perform large data projects is which database to use, SQL or NoSQL? SQL has impressive performance, a huge installation base, and NoSQL is gaining considerable revenue and has many supporters. Let's take a look at the views of two experts on this issue.
Experts
· VOLTDB's chief technology officer, Ryan Betts, says that SQL has won widespread deployments of large companies, and that big data is another area that it can support.
· NoSQL is a viable option and, in many ways, it's the best choice for big data, especially when it comes to scalability, says Bob Wiederhold, chief executive of Couchbase Corporation.
The SQL went through the test of time and was still booming
Structured Query Language (SQL), a time-tested winner, has dominated for decades, and current big data companies and organizations such as Google, Facebook, Cloudera and Apache are actively investing in SQL.
After becoming a dominant technology, such as SQL, it is sometimes easy to forget its superiority. The unique advantages of SQL include:
1. SQL enhances interaction with the data and allows for a single database design to be problematic. This is a key feature because data that cannot be interacted is largely useless, and enhanced interactivity can bring new insights, new problems, and more meaningful future interactions.
2. SQL is standardized to enable users to use their knowledge across systems and to support third-party attachments and tools.
3. SQL can be extended, and is versatile and time validated, which can solve problems from fast write-driven transmission to scanning intensive in-depth analysis.
4. SQL is an orthogonal form of data rendering and storage, and some SQL systems support JSON and other structured object formats, with better performance and functionality than NoSQL.
Although NoSQL has some impact, SQL still dominates the market and has won a lot of investment and extensive deployment in large data areas.
NoSQL's statement is vague, and for this discussion, I borrow Rick Cattell's definition of NoSQL, which provides simple operations (such as key/value storage) or simple records and indexes, and focuses on the horizontal scalability of these simple operations.
Obviously, many new databases are not all the same now, and understanding the rationale behind each database and potential problems is the key to success. The main features of NoSQL make it more suitable for specific problems. For example, a graphics database is more appropriate for data to be organized in a relational context, and a dedicated text search system is better suited to situations where real-time search is required.
Here, let's look at the main advantages and differentiation features of the SQL system:
* SQL enables interactivity. SQL is a declarative query language. Users say what they want (for example, to show the location of the top customers during the past March 5), the database internals algorithm and extracts the results of the request. In contrast, NoSQL programming innovation MapReduce is a kind of procedural query technology. When the user asks, MapReduce asks the user not only to say what they want, but also to ask them to state how to produce the answer.
This sounds like a boring technical difference, but this is critical because: first, declarative SQL queries are easier to build with graphical tools and by clicking on the Report Builder. This allows analysts, operators, managers, and other employees who do not have software programming skills to perform database queries, and secondly, the database engine can use internal information to select the most efficient algorithm. The best algorithm can still be computed by changing the physical layout or database of the database. In a procedural system, programmers need to access and reprogram algorithms, which is a costly and error-prone process.
The market understands this critical difference. In 2010, Google announced the deployment of SQL to complement MapReduce, driven primarily by internal user needs. Recently, Facebook has released Presto, a SQL deployment, to query its PB-level HDFs clusters. "As our warehouses grow to petabytes and our demand changes, we are clearly aware that we need an interactive system that provides low latency queries," according to Facebook. In addition, Cloudera is building impala-another HDFS based SQL deployment.
* SQL is standardized. Although vendors sometimes add their own language to the SQL interface, the core of SQL is standardized, and other specifications (such as ODBC and JDBC) provide a wide range of stable interfaces to SQL storage. This brings a management and operational tool ecosystem that can design, monitor, inspect, explore, and build applications on top of the SQL system.
SQL users and programmers can reuse their APIs and UI knowledge across multiple back-end systems, reducing application development time. Standardization also allows declarative third-party extraction, transformation, and Loading (ETL) tools that enable organizations to transfer data between databases and across systems.
* SQL extensible. It is entirely wrong to think that SQL must be sacrificed to gain scalability. As mentioned earlier, Facebook created a SQL interface to query PB-level data. SQL can run extremely fast acid transmission very efficiently. SQL's abstract [injection] of data storage and indexing allows for consistent use across a variety of issues and dataset sizes, allowing SQL to run efficiently across clusters of replicated data stores. Using SQL as an interface independent of building a cloud, scale, or ha system, there is nothing in SQL that prevents and restricts fault tolerance, high availability, and replication. In fact, all modern SQL systems support cloud-friendly horizontal scalability, replication, and fault tolerance.
* SQL supports JSON. A few years ago, many SQL Systems added XML document support. Now, as JSON becomes a popular format for data interchange, SQL vendors are adding JSON-type support. Based on the current flexible programming process and the normal uptime requirements of the web infrastructure, we need support for structured data types. Oracle 12c, PostgreSQL 9.2, VOLTDB, and other JSON-enabled databases typically have better performance than "native" JSON.
SQL will continue to gain market share and will continue to see new investments and deployments. NoSQL databases provide proprietary query language or simple key-value semantics without deeper technical differentiation. Modern SQL systems provide scalability while supporting richer query semantics, as well as extensive user-installed infrastructure, broad ecosystem consolidation, and deep enterprise deployments.
NoSQL more suitable for large data applications
NoSQL is increasingly considered a viable alternative to relational databases, especially for large data applications. In addition, the modeless data model is generally better suited to the types and types of data that are now captured and processed.
When we talk about large data in the NoSQL field, we refer to reading and writing from the Operation database. Do not confuse the operational database with the profiling database, which typically looks at large amounts of data and obtains visibility from that data.
Although large data for manipulating a database does not appear to be analytical, the operations database typically stores large datasets of very large numbers of users who often need access to data to perform transactions in real time. The scale of operation of this database also explains the key features of NoSQL, which is why NoSQL is a key reason for large data applications.
NoSQL is the key to scalability
Every time the technology industry undergoes a fundamental shift in hardware development, there is an inflection point. In the field of database, the transformation from vertical to horizontal expansion has promoted the development of NoSQL. relational databases, including databases from Oracle and IBM, are scaled vertically. That is, they are a centralized, shared-everything technology that can only be extended by adding more expensive hardware.
The NoSQL database is distributed scale-out technology. They use distributed node sets, called clusters, to provide a highly resilient extension that allows users to add nodes to dynamically process the load.
Distributed horizontal scaling is usually cheaper than the vertical approach. The licensing costs of business relational databases are daunting, as their prices are calculated on a per-server basis. On the other hand, the NoSQL database is usually open source technology, charging according to the server cluster running, and the price is relatively cheap.
NoSQL is the key to flexibility
Relational and NoSQL data models vary greatly. Relational mode gets the data and assigns the data to many interrelated tables, which are applied to each other through foreign keys.
When a user needs to run a query on a dataset, the information needs to be collected from multiple tables (typically involving hundreds of enterprise applications), combined with this information, and then provided to the application. Similarly, when you write data, you need to reconcile and execute writes across multiple tables. Relational databases are typically able to capture and store information when the data is relatively low and data flows to the database at a slower rate. However, today's applications often require fast writing (and reading) of massive amounts of data.
The NoSQL database takes a very different pattern. At its core, the NoSQL database is actually "Norel", or not a relational type, which means that they do not rely on tables and the relationships between tables to store and organize information. For example, a document-oriented NoSQL database takes the data you want to store and integrates it into the document in JSON format. Each JSON document can be treated as an object by your application. JSON documents can extract data spanning 25 of tables and integrate data into a single document.
Aggregating this information can result in duplication of information, but since storage is no longer a cost issue, the flexibility of the data model, the simplicity of the document produced by the publication, and the increased read and write performance make this a good choice.
NoSQL is the key to large data applications
Data is becoming easier to capture and access through third parties, including social media sites. These data include: personal user information, geographic data, user-generated content, machine-recorded data, and data produced by the sensor. Organizations can also rely on large data to drive their mission-critical applications. At the same time, companies are moving to the NoSQL database, which is ideal for new types of data.
Developers want a flexible database that can easily adapt to new data types and will not be affected by changes in the content structure of third-party data vendors. Most new data is unstructured and semi-structured, so developers need a database that can effectively store the data. However, relational databases adopt a rigidly defined, model-based approach that makes it impossible to quickly integrate new data types and is not suited to unstructured and semi-structured data.
Overall, industry needs to be able to provide scalable and flexible database technology to manage and access data as Web and mobile applications grow, new trends, changes in consumer behavior on the web, and new data types emerge. NoSQL technology is the only viable solution to meet these needs effectively.