Big Data Technology vs database All-in-one machine [go]

Source: Internet
Author: User

Http://blog.sina.com.cn/s/blog_7ca5799101013dtb.html

At present, although big data and database all are very hot, but quite a few people can not understand the essential difference between the two. Here's a comparison between big data technologies (such as Hadoop, primarily MapReduce and NoSQL) and database integration (the next generation of mainstream relational databases):

Hardware architecture

In essence, the hardware architecture of the two is basically the same, it is the distributed parallel model of x86 server cluster to deal with large scale data and computation. However, the database all-in-one business will be the hardware system for the product-oriented, systematic overall tuning, but also have their own unique means, such as Oracle Exadata Infiniband,flash CACHE,IBM Nettezza FPGA.

Software system

The key difference between big data and database integration is the software system.

The core of the database is the SQL system, which not only refers to SQL parsing, but more importantly, including SQL Optimization engine, index, lock, transaction, log, security and management, including the complete and large technical system. It is mature, product-oriented;

The mapreduce of Big Data technology provides a distributed programming framework for mass data processing, and users need to compile their own computing logic. MapReduce reads and writes data in batches, rather than randomly, while another nosql system of big data is mostly just a distributed storage with massive amounts of data and an index-based fast-reading mechanism, and most of the programming APIs are available to users (though there are also SQL-like languages, But its essence is not a complete SQL system.

Because of the complexity of SQL system and the overall correlation of processing logic, the database integration machine is still far inferior to the big data technology, although the database integrated machine has greatly improved the bottleneck of vertical expansion of traditional relational database. The single cluster of MapReduce and NoSQL can often be extended to thousands of nodes, and if the database is scaled up to this scale in hardware, it is meaningless from the software!

Feature performance

The above software system is different in nature, resulting in different characteristics of the two:

Database All-in-one machine is often suitable for storing relational complex data model (such as enterprise core business data), and it needs to be limited to the relationship model based on two-dimensional table, and it is suitable for high consistency and transactional requirements, and complex bi computation.

Big Data technology is more suitable for storing simpler data models and can be unconstrained by patterns. As a result, the data types of storage management are richer, while big data technology is suitable for computing with low consistency and transactional requirements (mainly referred to as NoSQL query operations) and batch distributed parallel Computing (MAPREDUCE) for ultra-large-scale mass data.

It is important to note that NoSQL databases are more efficient in querying and inserting than the database, because they get rid of the cumbersome SQL system constraints, and big data technology is more capable of processing than the database-all-in-one, mainly because its clusters can be expanded more greatly.

Essence Description

In essence, MapReduce is an important innovation in the field of distributed computing for massive data, but it is only more dominant in the large-scale batch processing problem, which is suitable for parallel processing, and for some operations such as complex join, it is not necessarily advantageous;

NoSQL, in essence, can be seen as a simplification of the traditional relational database: Because the NoSQL database design idea only has the main index characteristic in the relational database, and adds the upper distribution storage, but will not need in the SQL system to "some special problem" the thing that does not want, Thus achieving greater efficiency, scalability and flexibility.

It is obvious that in practice there are many problems (especially big data problems), and many of the designs in relational databases are not needed, which is the fundamental foothold of the nosql emergence.

Relationships and collaboration

Therefore, the conclusion should be: Big Data technology and database all-in-one machine technology should be complementary, rather than replace each other. They are designed for different application scenarios and complement and collaborate with each other. Specifically:

Big Data Technology can:

1. To deal with the massive, simple model, diverse types of unstructured and semi-structured data in the enterprise (such as social data, various logs and even pictures, videos, etc.), its processing results can be directly used;

2. The above processing results can also be considered as a new input storage in the enterprise-level data warehouse, the big data technology is equivalent to the big data source of the new ETL means;

3. Storage or computation for large amounts of data that is not appropriate for SQL operations.

and database All-in-one technology should be the mainstream technology of enterprise data Warehouse, at least for a long time, it should store and calculate the key business data of the most important and valuable enterprise.

Some misunderstanding

Some people think: Although the original open source state of big data technology is not suitable for the enterprise Data Warehouse as the main platform requirements, but after development, supplemented, should be possible.

There is nothing wrong with this view. In fact, complementing the development of big data technologies for open source is what big data technologies have removed from the original design that would have belonged to the relational database system, if you did:

1. How much development is difficult to estimate;

2. It is difficult for an enterprise to realize the theory, product and system of these work as a professional database manufacturer;

3. From a purely technical perspective, it is possible to develop anything! But if your business does this, is it ready to develop another commercial relational database?

Obviously, this violates the Big Data technology design original intention!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.