References:
(1) "OLTP Through The Looking Glass, and what We Found there"
(2) The End of an architectural Era. VLDB 2007
More and more programs ape start to do mobile app development, really do the bottom system development Program Ape is still a few. See the domestic database system development information is not very much, I also put my own understanding of the current database system development blog, and we share, hope to learn from each other.
Recent development and classification of database systems
With the development of the operating system stabilized (not including the mobile OS), more and more research focused on the development of database systems, not many people said to do another operating system, many other people are on the existing OS to do a variety of applications. But in the past 10, the database blowout development stage, a variety of products burst out, such as file storage database (such as MongoDB), Columnstore database (such as Vertica), various newsql databases (such as VOLTDB). All of this development, due to the constant rapid expansion of data volume, the traditional database on the Big Data processing performance can not meet the needs and so on.
People tend to develop databases for different application types to meet the needs of specific data processing. Developing a database system application on the operating system is much like developing a mobile app. There has been a flourishing development. Because big data is still a hot topic at the moment. Over the next period of time. The database that provides the underlying data management service will still be one of the areas where the computer is growing faster.
Many people confuse the database system with some other concepts, in fact the database as a large system. On the market today, the product can be divided into a lot of categories:
1. Relational database management system (relational DBMS), such as: Oracle,sql Server, MySQL. PostgreSQL
2. Key-value storage, for example: Redis. Memcached, DynamoDB
3. File storage, such as: Mongodb,couchdb,couchbase
4. Big data storage systems, such as: Cassandra,hbase. Google ' s Bigtable
5. Hadoop-based data analysis system. For example: Hive,spark. Impala (class fourth and class fifth, somewhat crossover. )
6. Text query system, for example: SOLR, Elasticsearch.
In addition to the common types above, there are many other very small branches, shape databases, object databases, and so on. This is not the focus of the discussion. This paper mainly discusses the first kind of traditional relational database system (RDBMS).
Different types of databases, which apply to different needs, have similarities and differences between them.
As the first kind of traditional relational data system. Some of the most obvious differences from other types of databases are: a) support for all SQL statements. B) The ACID properties of the support transaction (Transaction).
Categories II and III do not have features a and B, and classes fourth and fifth mostly do not support A and B. Even if other classes support A or B. It is also very different from the A/b supported by the RDBMS. For a. Other categories of databases are just a subset of the supported SQL. Rather than the entire SQL standard, or the older SQL standard, for example sql92+.
For b, the acid attribute of all transactions is not supported at the row level, and those eventually consistency are commercial words. The truth is no consistency.
This is not to say that other types of databases are not good. It's just that we're entering a period of database diversification . Different databases have their own characteristics and areas of expertise, not generalize. For consistency, for example, a bank's business needs to strong consistency to ensure that money is in and out, and that applications such as Weibo can abandon some consistency in exchange for high system throughput. Users are not very concerned about whether they can see a friend's microblog status even if (for example, time delays are less than 2 seconds).
The traditional relational database system system can be broadly divided into two categories:OLTP(online Transaction processing) and OLAP(Online analytical Processing). Among them, OLTP processing concurrency, multithreading management and other transactions, OLAP for a large number of data analysis, is a part of BI (business Intelligence). The first type of relational database system mostly includes the functions of OLTP and OLAP, which belongs to the general database. The following also focuses on the OLTP type database.
Performance analysis and bottleneck of traditional relational database
In recent years, the analysis of traditional database performance has been very much.
I am personally more optimistic about HP HP and Massachusetts Institute of Technology MIT Joint Study of a piece of literature, "OLTP Through The Looking Glass, and what We Found there." Simply put, their understanding of the contemporary database is planed and analyzed. It is concluded that the traditional relational database has only about 10% of the time to deal with the valid data. The remaining 90% of the time is wasted on other ancillary tasks: Buffer manager,latching,locking. Logging,btree keys and so on.
This is where they run TPC-C benchmark the performance icon for different database sections, the left side is the percentage of the instruction. The right side is the percentage of CPU cycle (that is, CPU run time). The white part is really practical data processing, the rest is the traditional database indispensable part, but consumes a lot of resources. From what is seen, cache management and locks, latches and logs are the actual large overhead of traditional relational databases.
The performance flaws of traditional databases have never been mentioned on our schedule, mainly due to the small amount of data in the past. With the development of Internet in the last 10 years, especially the explosion of mobile applications in the last 5 years, the data volume is also growing in a blowout style.
In contemporary times, who can deal with big data, who can tap the business value of big. Who can make money.
The competition of many technology companies is the competition of data processing ability.
That's why a very many NoSQL databases and Newsql databases have sprung up in the last 10 years. NoSQL developed earlier, with many well-known systems, such as Google's Big Table, Amazon's Dynamodb,apache Hbase,cassandra, and so on. The Newsql system appears later than NoSQL for about 5, 6 years. There are voltdband NuoDB in vogue nowadays. Clustrix and so on.
What they have in common is solving big data processing performance issues, which are NEWSQL systems designed to address the features of NoSQL that do not support the standard SQL language and transaction transaction support ACID properties.
In other words, Newsql is more versatile than NoSQL and more compatible with traditional data.
Many people want to ask why the popular database in the market is actually so bad. Designed to look like this? Is everybody wrong? In fact, the problem is very easy, the traditional database developed very early, the earliest can be traced back to the 780 's. There are at least 30 years from now. The actual architecture and pattern of the database system are determined by the level of computer hardware and the theoretical level.
In recent years, the speed of hardware development is very rapid, whether from the size of the disk/ram to the price, or CPU performance and multi-core (multi-core) technology, compared to 30 years ago, have a leap-type development. Although the growth rate of semiconductor technology has slowed in the past two years, Moore's law is still improving. And that's because. 30 years ago the application of the database is very simple very easy, after so many years of development, our actual data processing needs are also constantly diversified. The traditional database is also constantly adding different functions to make it more and more large.
Architecture of the new OLTP database
In order to remove the performance bottleneck of the traditional database. Researchers at the University of MIT, based on the current level of hardware, have designed the database all over again, rather than making smile changes on previous traditional databases.
Contemporary new databases also come to pay more attention to distributed scale out. Traditional databases are also improving the processing capacity of a single machine. For ordinary users, it is impossible to be as rich as large enterprises. Buy expensive mainframe and database software.
If you want to back up your data and do high avaliability, you need to buy and execute at least one copy.
New OLTP Database solutions :
Purpose of the database system change |
New OLTP Database Technology |
Removing logging overhead |
Using the new logging |
Removal of locking,latching and other expenses |
Data Partitioning + single-threaded operation |
Remove Buffer Manager Overhead |
Use memory instead of disk read and write |
According to the results of relevant scholars study. After removing these significant expenses. The throughput of the OLTP relational database transaction has increased by at least 20 times times .
The traditional database is declining, the new OLTP database is developing hot