Data storage experience

Source: Internet
Author: User

Many people have accepted some habitual and practical inheritance, such as using databases. However, what is the purpose of using a finished database?

The first answer is probably the first query. Query is of course very important, but query is not necessarily the core of all problems. One reality we can observe is that many small projects have very simple queries. If data persistence is not considered, it is processed in the memory, use some of the simplest search methods, such as the for loop.

This is also true for many larger projects: even if the data volume has a certain scale, there are many types of "entities", but their relationships are not complex. They only use very limited database functions. There is another reason for the use of database products. Although many people cannot say this for a while, it is quite obvious that small data fragments are stored in an easy-to-access manner.

To achieve this goal, the first available middleware is a file system, but most of the time the file system is out of consideration. This is largely because the file system has three problems:

The first is the block mechanism of the file system. Most Operating Systems Support a file system of about 4 K by default, and the minimum unit is generally 512. As you can imagine, if you store every file such as a "User Name", each file occupies a block, which will cause a great waste. The file system does not guarantee that empty files do not occupy disks.

The second problem is that not all file systems ensure that file names are sorted according to certain rules, which is fundamentally dependent on the internal indexing method of the file system. Most file systems support only some pre-defined indexing method. This method is even ordered, and does not guarantee that the file system will not change during upgrade. At the same time, the operating system is not obligated to promise not to replace other file systems.

The third problem is that even if our business does not depend on data sorting, we can only locate a certain data in a file system by file name, there is no reliable guarantee for the file name format. For example, the length limit and invalid characters may be the same for a file system or a file system for a period of time.

In addition to the first part, these three problems are economically considered, to address other problems, the file system does not provide any stable protocols to support business data management except file management. It is precisely because of the non-dependence of the file system on human conventions that we have a need for other solutions.

Let's take a look at what data storage needs are: in terms of economics, it should be tight, that is, no matter how small the data is, they should basically be one by one; in terms of ease of use, they should be easy to locate, that is, a data should be found within a few low-cost operations; in terms of many general business needs, sorting should be supported, that is, if you find 2, you can immediately find 1 and 3.

This is the most basic requirement, and a feature such as Join is essentially a pseudo requirement. The reason for this is that some database products have pre-configured data storage models (or Schemas) and a remedy for the new problems caused by these models. For example, Join exists because we put a set of related data into two different relational sets (tables ).

To meet the above requirements, and to meet the requirements such as consistency, we obviously have to pay some costs, taking into account the cost of our own implementation, if the current task can have at least one homogeneous to the existing product, you can naturally use them.

Why do we need to live in vain? If you have mastered the basic facts and considered your own needs, you can obtain the most accurate answers to questions such as self-creation, acquisition, and use.

----------

A warning without guarantee:

I have recently implemented the most basic data storage, which has taken different options in some aspects than most widely used products. For example, some block-level management methods are removed from the conventional methods described in textbooks. For example, most database products, whether SQL or NoSQL, adopt a C/S structure, in my design, each process retrieves files through inventory.

In this process, I have some experience. I want to write some articles to record them.

These different practices (including the choices I made from different schemes in the book) are not absolute and correct, but have many considerations: for example, space collection methods, fragments, and Utilization Problems Caused by variable-length data, such as problems arising from data deletion and update, and how to effectively use various hardware expansion methods to improve efficiency.

To be honest, this is really not a university question, but it is still a bit of skill requirements.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.