Transferred from:http://blog.jobbole.com/86269/
Although hierarchical databases are still widely used on mainframes today, relational database (RDBMS) (SQL) has captured the database market and performed quite well. The money we save doesn't go to someone else's account, and we make sure we have an exclusive seat on the plane, and we're not going to be blamed for what we didn't do, and so on. The data integrity of a relational database is due to its adherence to the principles of acid (atomicity, consistency, independence, and persistence). Relational database technology can be traced back to the 70 's.
So what's the change now? Web technology has opened this revolution. Nowadays, many people buy things on Amazon. But relational databases are not designed to handle large-scale transactions per second on Amazon. The main constraint factor is the mechanism of the relational database.
NOSQL databases provide another mechanism, but this mechanism weakens the acid principle. Some NOSQL vendors have made great strides in addressing these issues, and their solutions are called eventual consistency. As for Newsql, why not use modern programming languages and techniques to create a relational database with no drawbacks? This is the way many newsql suppliers have started. Other Newsql companies have created an enhanced MYSQL solution.
Hadoop is a completely different species. It is actually a file system rather than a database. The root of Hadoop is based on Internet search engines. Although Hadoop and partners (Hbase,mapreduce,hive,pig,zookeeper) have made it a very powerful database, Hadoop is still a fault-tolerant, scalable, and inexpensive distributed file system. Hadoop is now characterized by its batch processing for data analysis.
Now, let's start with an example: I imagined that video game companies, after 10 years of business, recently launched our hottest games and sent them to retailers around the world. We are always optimistic that our customer information is currently stored in a SQL Server database. However, as players start playing games online, our database is not able to keep up with the speed of data updates, causing the player experience to be delayed. With the rapid growth of the user base, we have spent a lot of money to buy more hardware and software is useless. The last thing we want is to lose customers. Where do we go from here?
We decided to split our online user base and run our online games on NOSQL and Newsql. Our goal is to find the best solution. The IT department then chose the NOSQL couchbase (similar to MongoDB's document-oriented) and Newsql Voltdb.
Couchbase is open source, it has an integrated caching mechanism, and can automatically propagate data across multiple nodes. VOLTDB is a relational database that adheres to the ACID principle, can be fault tolerant, scale out, and has no shared & in-memory architecture. Finally, both systems can operate. I will not dwell on the complexities of each scenario, as this is just an example, and actually compares these techniques to test, benchmark management, and in-depth analysis.
Now that the online operation can proceed smoothly, we want to analyze our data to find the market we should develop. Which country is the best place to sell our products? To do this, we need to combine the user data from the SQL Server data Warehouse with the data from the online game database, and then run the analysis report. This is the turn of Hadoop. We build a Hadoop system and combine the data from these two data sources. Finally, we use the open source R language to connect with its MapReduce module to generate analysis reports.
Example: Hadoop vs. NOSQL vs. Sql vs. Newsql