This article is translated from Coding-geek article: How does a relational the database work.
Original link: http://coding-geek.com/how-databases-work/#Buffer-replacement_strategies
This article translates the following chapters:
First, preface
When it comes to relational databases, I don't think there's anything missing, and it can be said that relational data is everywhere. There are different relational databases: a light-weight, useful sqlite to a powerful data warehouse.
However, this is just a short article that describes how relational databases work. You can Google "relational database working principle", the article introduces how the database works less. And, even if there are, these articles are very brief; If you search for the current most popular technologies (Big Data, NoSQL, and JavaScript), you can find a lot of articles that explain how it works in depth.
is the relational database too old, too outdated technology? In addition to college classrooms and research papers, no one is interested in talking about its principles.
As a programmer, I hate to use things I don't understand. And if a database has been used for more than 40 years, there must be a reason. Over the years, I spent hundreds of hours trying to figure out the black boxes under the database I used every day. The relational database is very interesting because it is based on some very useful and extensible ideas. If you are interested in the database, but have no time to delve into the entire database domain. Then this article will be very suitable for you.
The title of this article is already clear, and this is not an article that describes how to use a database. So you should already know how to write simple even table queries and base CRUD database operations on SQL statements. Otherwise, you may not be able to understand this article. This is the basic knowledge you need to master in advance, and other knowledge will be explained in this article.
This article is a long, highly technical article, which contains many algorithms and data structures. It will take you more time to read. Some concepts are difficult to understand, and you can skip them to understand the gist of the article.
To facilitate your mastery, the article is divided into three parts:
- The overall description of which components are contained in the database.
- Overview of query optimizations.
- Introduction to Transaction and cache handling.
Second, back to basic-return to the basic concept
A long time ago (before the birth of the Milky Way ...) haha), the developer must have a clear idea of the number of CPU operation instructions that the code needs to use. They perspicacious the algorithms and data structures used in the code because they can't waste a tiny amount of CPU and memory resources. The computer is too slow, the resources are very valuable.
In this chapter, I will arouse your school memory of some basic concepts, and understanding these concepts is not necessary to understand the database. I'll also describe what a database index is.
A O (1) VS O (n^2)
Nowadays, many programmers do not close the bar time complexity, and ... They are always right!
But when you're dealing with big data (not just thousands of data) or you're fighting to optimize performance to the millisecond level. Understanding the concept of time negative clutter is critical. Imagine that the database handles both of these scenarios at the same time. I won't waste much of your time, just introduce the basic concepts. This will help us understand the cost Based optimization concept later.
Two The Concept-Basic concepts
Time complexity is used to measure the time it takes for an algorithm to process a specified amount of data. In order to describe the complexity of time, computer scientists use mathematical notation to capitalize the O representation. This symbol is usually used in conjunction with a function that describes the number of CPU instructions that an algorithm needs to execute to handle a specified number of input data.
For example, when I say "an algorithm is O (some_function ())", it means that for a given amount of data, the number of instruction strips that the algorithm needs to execute is some_function (number of data).
What matters is not the number of data or the number of instructions, but how the number of instructions will increase as the volume of data increases. Time complexity cannot be used to count the exact number of instructions that an algorithm needs to perform, but it is a good way to evaluate.
In this picture, you can see the complex evolution of different times. The amount of data I zoom in exponentially, in other words, the data will grow from 1 to 1 billion quickly. Let's take a look at how the number of instructions the different time complexity algorithms need to execute varies:
1. (1) Always keep constant (otherwise, it is not called constant time complexity).
2. O (log (n)) is always at a low growth rate, even if the amount of data reaches 1 billion levels.
3. The worst is O (n^2). The number of operation instructions expands rapidly with the increase of data volume.
4. The other two kinds of time complexity also grew rapidly.
Three examples– An example
The differences between O (1) and O (N2) are negligible when the amount of data is small. For example, suppose our algorithm handles 2000 of data:
1. The O (1) algorithm requires the execution of 1 CPU instructions.
2. O (log (n)) algorithm requires the execution of 7 instructions.
3. The O (n) algorithm requires the execution of 2000 instructions.
4. O (N*log (n)) algorithm requires the execution of 14,000 instructions.
5. O (n^2) algorithm requires the execution of 4 million instructions.
The difference between O (1) and O (N2) looks great and is 4 million worse. The actual execution time is 2 milliseconds, which is equivalent to blinking the eyes. Indeed, modern processing has been able to process millions of instructions per second. This is why performance optimization is not a center of gravity in many IT projects.
But listen to me, understanding these concepts is still very useful when dealing with very large data. If the data to be processed becomes 1 million (not a large amount of data for the database):
- The O (1) algorithm requires the execution of 1 CPU instructions.
- The O (log (n)) algorithm requires the execution of 14 instructions.
- The O (n) algorithm requires the execution of 1 million instructions.
- O (N*log (n)) algorithm requires the execution of 14 million instructions.
- O (n^2) algorithm requires the execution of 1 trillion instructions.
I can't calculate exactly, but I dare say that an O (N2) algorithm is executed, and you can go for a cup of coffee (or two cups). If you add another 0 to the amount of data, you can go to sleep.
(iv) Going deeper– a little deeper
Throw an opinion:
- The time complexity of querying an element in a good hash table is O (1).
- The time complexity of querying an element in an excellent two-fork balance tree is O (log (n)).
- The time complexity of querying an element in a data group is O (n).
- The time complexity of the best sorting algorithm is O (N*log (n)).
- The time complexity of the poor sorting algorithm is O (N2).
Note: Below we will look at these algorithms and data structures.
The time complexity of an algorithm is also divided into several scenarios:
- Time complexity in a common scenario.
- The time complexity of the optimal scenario.
- The time complexity of the worst scenario.
The time complexity of an algorithm is usually the time complexity of the worst scenario.
I'm only talking about the time complexity of the algorithm, and actually the complexity of the algorithm includes these:
- The memory size consumed by the algorithm.
- Disk I/O consumed by the algorithm.
Of course, there are some time complexities worse than O (n^2):
- O (n^4): Damn, I'm going to mention algorithms with such complexity.
- O (3^n): Worse, we'll see an algorithm in the next article (and it's actually used in many databases).
- Fibonacci N: You'll never figure out the results, even with a small amount of data.
- O (n^n): If you write such an algorithm, you should think about whether you are fit to stay in the IT field.
Note: I did not give the symbol o the next exact definition, I just introduced his thoughts. If you want to know what it really means, you can do it on Wikipedia.
How relational databases work-time complexity (translated from coding-geek articles)