More and more programmers are starting to develop mobile apps, and there are few programmers who really do the development of the underlying systems. See the domestic database system development is not a lot of information, I also put my own knowledge of the current database system development as a blog, and we share, hope to learn from each other.
Recent development and classification of database systems
As the operating system has stabilized (excluding the mobile OS), more and more research is focused on the development of the database system, and few people say that to re-make an operating system, more people are doing a variety of applications on the existing OS. But in the past 10, the database blowout development stage, a variety of products burst out, such as file storage database (such as MongoDB), Columnstore database (such as Vertica), various newsql databases (such as VOLTDB). All of this development, due to the rapid expansion of data volume, traditional database on the Big Data processing performance can not meet the needs and so on. People tend to develop databases for different application types to meet the needs of specific data processing, the development of database system applications on the operating system is much like the development of mobile apps, there has been a booming development. Because big data is still a very hot topic in the future, the database that provides the underlying data management service will still be one of the areas where the computer develops rapidly in the coming period.
Many people will confuse the database system with some other concepts, in fact the database as a large system, on the current market products, can be divided into many categories:
1. Relational database management system (relational DBMS), for example: Oracle,sql Server, MySQL, PostgreSQL
2. Key-value storage, for example: redis,memcached, DynamoDB
3. File storage, for example: Mongodb,couchdb,couchbase
4. Big Data storage System, for example: Cassandra,hbase,google ' s Bigtable
5. Hadoop-based data analysis systems, such as: Hive,spark,impala (class fourth and class fifth, are somewhat overlapping. )
6. Text query system, for example: SOLR, Elasticsearch.
In addition to the above common types, there are many other small branches, shape database, object database, etc., here is not the focus of discussion. This paper mainly discusses the first kind of traditional relational database system (RDBMS).
Different types of databases, which apply to different needs, have similarities and differences between them. As the first type of traditional relational data system, the most obvious differences from other types of databases are: a) supports all SQL statements, B) supports the ACID properties of transactions (Transaction). Categories II and III do not have features a and B, and classes fourth and fifth mostly do not support A and B. Even if the other classes support A or B, it is very different from the A/a supported by the RDBMS. For a, other categories of databases only support subsets of some SQL, not the entire SQL standard, or older SQL standards, such as sql92+. For b, the acid attribute of all transactions is not supported at the row level, and those eventually consistency or something are commercial words, in fact no consistency.
This is not to say that other types of database is not good, but we have entered a database of a variety of times, different databases have their own characteristics and good places, not generalize. For consistency, for example, the bank's business needs to strong consistency to ensure that the money is in and out, and that the microblog application can abandon some consistency in exchange for high system throughput. The user is not very concerned about the ability to see a friend's microblog status even if (for example, time delays are less than 2 seconds).
The traditional relational database system system can be broadly divided into two categories depending on the application:OLTP(online Transaction processing) and OLAP(Online analytical Processing) , where OLTP processing concurrency, multithreading management and other transactions, OLAP for a large number of data analysis, is a part of BI (business Intelligence). The first kind of relational database system mostly contains the functions of OLTP and OLAP, which belongs to the general database. The following also focuses on the OLTP type database.
Performance analysis and bottleneck of traditional relational database
There has been a lot of analysis of traditional database performance in recent years. Personally, I'm looking at HP HP and MIT, Massachusetts Institute of Technology, a piece of literature, "OLTP Through The Looking Glass, and what We Found there". Simply put, their understanding of the contemporary database planing analysis, concluded: The traditional relational database, only about 10% of the time is to deal with valid data, the remaining 90% of the time is wasted on other ancillary work: Buffer manager,latching,locking, Logging,btree keys and so on.
This is where they run TPC-C. Benchmark the performance icons for different database sections, the percentage of instructions on the left, and the percentage of CPU cycle (that is, CPU execution time) on the right. The white part is really useful data processing, the rest is the traditional database indispensable parts, but consumes a lot of resources. As shown, cache management and locks, latches and logs are the actual large overhead of traditional relational databases.
The performance flaws of traditional databases have never been mentioned on our schedule, mainly because of the small amount of data in the past. With the development of Internet in the last 10 years, especially the explosion of mobile applications in the last 5 years, the data volume is also growing in a blowout style. In the contemporary era, who can deal with big data, who can tap into the business value of big database, who can make money. The competition of many technology companies is the competition of data processing ability. That's why many NoSQL databases and Newsql databases have sprung up in the last 10 years. NoSQL developed earlier, with many well-known systems, such as Google's Big Table, Amazon's Dynamodb,apache Hbase,cassandra, and so on. Newsql system appears later than NoSQL about 5, 6 years, now popular have Voltdb, Nuodb,clustrix and so on. What they have in common is solving big data processing performance issues, which are NEWSQL systems designed to address the features of NoSQL that do not support the standard SQL language and transaction transaction support ACID properties. In other words, Newsql is more versatile than NoSQL and more compatible with traditional data.
Many people want to ask, why is the popular database in the market so bad, designed to look like this? Is everybody wrong? In fact, the problem is very simple, the traditional database development is very early, the earliest can be traced back to the 780 's, from now on at least 30 years. The actual architecture and mode of the database system are determined by the level of computer hardware and the theoretical level. In recent years, the speed of hardware development is very rapid, whether from the size of the disk/ram to the price, or CPU performance and multi-core (multi-core) technology, compared to 30 years ago, have a leap-type development. Although the growth rate of semiconductor technology has slowed in the past two years, Moore's law is still improving. Another reason is that 30 years ago the application of the database is very simple, after so many years of development, our actual data processing needs are constantly diversified, the traditional database will continue to add different functions, make it more and more huge.
Resources:
(1) "OLTP Through The Looking Glass, and what We Found there",
(2) The End of an architectural Era, VLDB