Storage System implementation-Analysis of storage mechanisms and principles

Source: Internet
Author: User

This article mainly aims to write some thoughts and insights on storage, because some things have been practiced by myself, so I feel more deeply, technically, I still think that the implementation is different from others' code.

Let's assume a library. If we say that books are the data we want to put, how can we put them. The best way to do this is to drop it inside and then there is no rule, so that every time we look for a book, we will be exhausted, because each book must have a book and a book (a bit like a DB full table scan). If you are lucky, you may find it before comparison, in the worst case, I searched the entire library and finally found the book. In reality, library books are not lost at will. They all correspond to various types of books on different bookshelves, so as to facilitate searching. This is also true for our data storage. It is impossible to store a piece of data in the file, which makes it very difficult to search for a large amount of data.

The implementation of the entire storage is very simple. No dB queries by various fields, only queries by ID, and deletes by ID. My original intention is not to repeat the wheel, but to better understand the basic principles of these storage.

The system architecture diagram first looks at the structural diagram of the entire system to understand the entire layered concept.
The first layer is the entire entry, including the entry for insertion, update, query, and deletion. The core is file management, including file allocation and reading/writing. All operations on files here are used RandomaccessfileThis class to operate. The benefit of using this class is that you can know the specific offset of the file. In addition, specific storage files are divided Index FileAnd Data Files. That is, the index and data corresponding to the storage layer. This is similar to database file allocation. Index a single column into a file mainly to speed up the query. One advantage is that we can use the order of indexes and various search algorithms. Here we use" Skip tableAlgorithm. In addition, because the index file is usually much smaller than the actual data file, I/O can be reduced multiple times during the search process to improve the performance.

Data format in the system architecture diagram above, we can see that the storage on the third layer is divided into three files, one is Manage files (manager), Index File), Data Files).
Glossary Manage files (manager): It is understood as the unified manager of a file. The following describes the file format in detail. Index File): The index file is easy to understand. Anyone who has used the database knows the index file concept. Data Files): The data file is the storage block of real data.
The management File (manager) mentioned above roughly describes the responsibilities of the manager file. To clarify the responsibilities of this file, you must have some knowledge about the configuration mechanism of the entire bucket. Here we will describe it in text: If I want to store a piece of data, the simplest way is to use randomaccessfile to write directly to the file by row, each row is a record, in addition, each row requires an identifier to identify the uniqueness of this row, which is a bit like the primary key concept in dB, in this way, you do not need to move the entire file's displacement (physical deletion requires moving the entire displacement). You only need to set this row to available. Let's go back to the question: how can we know whether the current row is available or occupied? Here we need to manage the space. Just like buying a ticket, if I buy a ticket for seat A, the ticket for seat a is in use. If you return the ticket (corresponding to our deletion), the ticket can be sold to other people. To sum up, the manager is actually a management file for the entire space, storing the Usage Status of the space. After talking about this, let's take a look at the specific manager format in the previous figure:
The entire file consists of five parts: Unique object identification Block, Usage Status, File Type, File superstart offset, End offset. A total of 22 bytes. These fields are easy to understand. The file type will be described below. The start offset and end offset of the file are actually a coordinate to tell you where the position is, you just need to go there. The configuration mechanism of the entire file is fixed-length allocation. Simply put, a room will be allocated with small cells of various sizes at the beginning, which is what is seen in the upper right corner. 256 bytes, 512 bytes, 4 K, 1024 KThe advantage of such allocation is that it is easy to manage and avoid the generation of fragments (regular file merging will be performed to solve the fragmentation problem ). The quota allocation will avoid fragmentation, and the disadvantages may lead to data continuity.
The index file (index, which is strictly a primary key file) should be better understood if the index file has been used by the database. If you do not understand it, you can refer to the dictionary directory, take a look at the format of the entire index file:
It can be seen that the index file size is fixed Index File ID, Start offset of the data file, Index Usage StatusIt consists of three parts, a total of 13 bytes. Here, we will explain the starting offset of the data file, which is actually a marker for locating the specific location of the Data. Generally, if you look for an index, you will first find the specific index, then, locate the specific data Location Based on the Start offset of the data file. Why is there an index usage status here, because the allocation of indexes does not go through Manager Management (in theory, Manager Management is also required), but here it is for simplicity. In addition, the index here strictly indicates the primary key, because the index file ID can only represent one int type. It is different from the real index format.

The following figure shows the most complex data file storage format.
A data space is composed of a fixed length of 29 bytes + data. That is to say, the space for applying for a bytes capacity of 29 bytes is not available. As shown in my article, since each space is fixed, but not all data is applied for only one space, here is a simple example: bytes bytes of data will apply for a bytes space and a bytes space. Here I set it to a and B, and physically these two spaces are likely to be discontinuous. So I found a part of the data in a, and then B found the remaining data, and then made a splicing, and a complete data came out. You can use Chain StorageYou can also use another method, for example, to store all the space addresses of the complete data in a certain place.
The specific operation data describes some data storage protocols, so what will happen next? Store Data, Delete data, Update Data.
Insert data here we will first look at the entire Insert Process with a figure:
From the process above, we can see that there are four steps to insert: Format data: Generally, data has a certain format. here we need to define a specific format, such as the table concept in the data. In this way, you can format it according to the Conventions. However, the final conversion result is a byte [], but the specific formatting process will be exquisite. For example, for a string, you must know the length of this string, the specific segment corresponding to the byte [] array. Other fixed-length types are better. For example, int is used to store 4 bytes, and long is used to store 8 bytes. Apply for a bucket: As mentioned above, after formatting the data, you can know the size of the currently stored data, so you can allocate some suitable space according to the size. Write Data Files: To get the bucket, just fill in byte. Write index files: Strictly speaking, I am not an index but a primary key here, and my indexes here are fixed int types, which are relatively narrow. Delete data Here, I can only delete Objects Based on the ID (the ID in the index file). Other columns are not supported currently.
Search Indexes: In fact, the search here is simplified to simply searching by ID. This is not the case if you search by other fields. The specific search algorithm is provided in the Select data later. Search for Data: After finding the index (Primary Key), you can find the specific data address based on the index, and then set these space addresses to available (For details, refer to the manger file format) Reclaim Indexes: The Usage Status of the index file will be used in this case.
Here, my update policy deletes a record and inserts a record for processing. This is easy and convenient, because the update also involves the reuse of space. In addition, the concept that formatted data is not refined to database fields is not defined here, and all data cannot be managed by fields.
Select data is used to search for data by ID in the deleted data. From the index file feature, we can know that the index file is ordered, so we can perform some search algorithms based on this feature. The search algorithm I use here is" Skip table"Algorithm, the principle of this algorithm is actually to change the step size. For more information, see index search for Skip tables. Summary The storage I wrote above should be the simplest and helpful for learning the principle of storage. For example, why is there a good idea about the pre-allocation of indexes and space. Because many storage systems, such as memcached, use this method to allocate space. Trying to implement it by yourself is also a way to help you understand it.

Storage System implementation-Analysis of storage mechanisms and principles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.