[Original] underlying implementation principles of databases from Io perspective

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Crazy brother said that the scientific attitude will always be rigorous and tireless. I think this is just a fart when you are hysterical. However, the facts are facts and cannot be fabricated.
I recently studied some problems in Hibernate and found that besides the cache mechanism, some problems are also worth pondering. When hibernate strictly limits the ing between Java packaging classes and tool classes and the underlying data types of the corresponding database, have you ever wondered why this ing is necessary? Maybe you would say this is the hibernate3.0 specification. But why did you specify this specification, for the time being, let's leave this question aside. Here we only talk about one thing, that is, we need to actively think about why?
After learning various databases and various SQL query statements, how much do you know about the database? How is it implemented at the underlying layer? Yesterday I had a long discussion with crazy guy on an I/O issue. When I went home, I suddenly began to talk about the database implementation principle. I will share my summary of the two days with you, I hope you will not laugh, communicate with each other, and improve your performance together:
1. First, we need to understand how the database is stored:
In Oracle, our data is stored in tables, while our tables are stored in tablespaces, which intuitively look at the data. when DBF Files are surprised at the high query speed of Oracle and many databases, when we read the file ourselves, even if we use a seemingly awesome cache mechanism to quickly obtain the data in the file from the file, if we traverse the data file, so there will definitely be a problem of full data file traversal, so we will contact the actual situation to find a better way.
2. Examples of daily life:
In our life, if we want to find a house or a bar in a city as large as Changsha, we must get an address. Otherwise, we will hit it everywhere with no purpose, it is more reasonable to traverse all the locations in the city, my days, so the efficiency should be low. It seems that the relationship between the obtained address and the data file is not big, but it must be related. Otherwise, where does the database of the ox cross come from theoretical support?
3. solution:
Since we can hook up with the address, you may have thought of reading the file one by one, then putting it into the memory, and then finding it based on the address. This can be done, but for sudden power outages, or when the memory capacity is insufficient, you should seek more secure and more cost-effective methods, that is, query and modify files by address.
4: file offset:
The file has a corresponding address, but we can use another method similar to the memory address, that is, the file offset:
Someone has done this on Baidu: file offset: after the first byte offset at the beginning of the file is 0, 1 is added for every byte offset. Yes, we want to store the files in a certain format and order. When we read the files later, we only need to get the file offset in the corresponding "Address Book, then, you can directly read or modify Io operations to quickly read and modify data files.

Then we use the randomaccessfile class in the Java Io package to simulate such a scenario:
First, we will introduce this class. This class is called a random file reading class, or a random File Access class. It is applicable to random file reading, but is generally used to process fixed-length data. What is fixed-length data? It means that each record has the same length of bytes according to the corresponding format, so that it can be processed more conveniently, and Io can know where to start reading and obtaining data, the same resumable upload on the network, multi-thread download is the classic implementation of this theory, but now we do not stick to the mud and theory, on the Code:
Public class employee {
Private string name;
Private int age;
Public static final int Len = 8; // The fixed length of the string type, equivalent to the type length in the database

Public Employee (string name, int age ){
If (name. getbytes (). length> = Len ){
This. Name = Name. substring (0, Len );
} Else {
While (name. getbytes (). Length <Len ){
Name + = "/u0000"; // blank, occupies 8b
}
This. Name = Name;
}
This. Age = age;
}
Public String getname (){
Return name;
}
Public int getage (){
Return age;
}
}
I create several objects for storage:
Public class testrandomaccessfile {
Public static void main (string [] ARGs) throws exception {

Employee e1 = new employee ("Zhang San", 23 );
Employee e2 = new employee ("zhangsan", 35 );
Employee E3 = new employee ("Li zhangsan", 67 );

Randomaccessfile Rf = new randomaccessfile (new file ("C:/employee.txt"), "RW ");

RF. Write (e1.getname (). getbytes ());
RF. writeint (e1.getage ());
RF. Write (e2.getname (). getbytes ());
RF. writeint (e2.getage ());
RF. Write (e3.getname (). getbytes ());
RF. writeint (e3.getage ());

RF. Close ();

Randomaccessfile RAF = new randomaccessfile (new file ("C:/employee.txt"), "R ");
Raf. skipbytes (12 );

Byte [] Buf = new byte [8];
Int I = Raf. Read (BUF );
System. Out. println ("I =" + I );
System. Out. println (new string (BUF). Trim () + Raf. readint ());

Raf. Seek (24 );
Raf. Read (BUF );
System. Out. println ("I =" + I );
System. Out. println (new string (BUF). Trim () + Raf. readint ());
}
}
The result is as follows:
I = 8
Zhangsan35
I = 8
Li zhangsan 67
When reading data at a specified location, the implementation of other modifications and quick retrieval is similar to the implementation of database index creation. In fact, the storage of the read offset of the file is saved in a B-tree for all quick functions. If you want to do more perfect, then please refer to the classic theoretical books in MS-SQL, database system principles.
At this point, I believe you have a clear answer to the question at the beginning of my article, that is, why the type ing and database field length are strictly limited. I will not talk nonsense here. I hope you will enjoy your studies.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Original] underlying implementation principles of databases from Io perspective

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Original] underlying implementation principles of databases from Io perspective

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support