2T mass storage free delivery, Baidu how to do it?

Source: Internet
Author: User
Keywords Technical discussion network disk Baidu cloud housekeeper
Tags .mall cloud data data storage delete different disk disk space

[REVIEW] If I want to give each user to provide 1G of network storage space. And if the server has a 1000G hard drive can be used to provide all users with data storage, if each user can be assigned to the largest storage space 1G, then how many users can be allocated to use it?

In a while ago, Xiaobian use Baidu network disk, suddenly found, 咦? Baidu network disk exclamation can receive 2TB free space it!

Network hard disk surely we may all have more or less touched, have to say, in this era of all things cloud, this can be said to be a very good network tools, but for our poor to dregs Free users, the hard disk space is only sad that is flawed, at the beginning of the time everyone really is for space, but also for a variety of toss room (do the so-called task there), but hard to do later Expansion of 5G or so. But now just fine, casually, we can easily have 2T of space.

Well, this sudden 2T space is how to achieve it?

The truth is this drop!

Suppose I want to provide 1G network storage for each user.

If the server has a 1000G hard drive can all provide users with data storage, if each user assigned 1G of maximum storage space, then how many users can be allocated to use it?

You must say 1000/1 = 1000 users.

But in fact you are so assigned, you will find that each user usually does not upload 1G things will be the capacity of the full, more or less, but the average user usually only upload 50M files, that is, if 1000G hard drive you will be used to 1000 individuals, but only the effective use of one of the 50M * 1000 = 50G of space, the remaining 950G of space are basically completely wasted.

So how to solve it?

You can work around this 1000G space allocated for the use of 20,000 users, each person's upload cap capacity or 1G, but each person usually upload an average of 50M data, then 20000 * 50M = 1000G, this time put the valuable The storage space on the server is fully utilized. But you are afraid of this allocation to 20000 people, if a sudden a sudden increase in upload data point, then the user is not aware of your 1G space assigned to others is a fake it? So you can not assign so many people only Assigned to 19000 people, leaving some room for emergency use.

Suddenly found that the number of users can be assigned at once turned over nineteen times, great. Is there any way to use it more effectively?

If I have more than 1000 servers, 1000GB of space on one server, then we have to leave 50G of space on each server in order to allow users suddenly upload large data caused by the data is full, then I 1000 The server on the empty 1000 * 50G = 50000G space was wasted, how a pity. So the casino lion invented the storage cluster, making a user's data can be distributed on multiple servers to store, but in the user's view it is only a 1G contiguous space, then there is no need to reserve on each server Emergency room, and even enough to fill the previous server, the data down in a server plug. This ensures the maximum use of server space, if an administrator immediately found that users are crazy to upload data (in a large user base, with very little probability) lead me to provide enough space, it does not matter, Just need to add a few hard disk or server to solve it.

Well, it is time to take advantage of our server space, you can allocate a certain amount of space to the largest number of users. But there is no better way to improve it?

One day, administrators found that even though each user stored only 50M on average, this 50M was not achieved in a single action, and slowly reached that level with 1-2 years of use, that is, a new When a user has just signed up to my web space, he or she will not upload anything or just a very small thing. Then I initially allocated 50 MB for each user, even though they would fill up the 50 MB in the next 2 years, but a lot of this space was wasted during this time. So smart Lionel Lions said: Now that we can distributed, clustered storage, a user's data can be distributed on multiple servers, then we assume that the beginning of a new registered users to provide 0M space, in the future he How much, I give him how much storage space, so completely guarantee the hard disk utilization. But the user's front end or to show 1G.

The idea of ​​the engineer made it possible for me to initially register and use about 1,000,000 servers with a 1000G server in the early stages of establishing a network drive. As more people register, I also have money, and I can keep adding servers to provide them Post-storage. At the same time as part of the server completed more than a year to buy, my purchase costs down.

So ... is this over?

If the mail provider, then the utilization rate is high enough. But the network disk is not the same.

Clever engineers found that: Unlike mailboxes, most of the content and attachments are homemade and different. However, many things uploaded on the network disk are duplicates.

For example: Zhang San today downloaded a "TxxxO HxT" uploaded to his network disk, Li Si in three days also downloaded the same "TxxxO HxT" uploaded to the network hard drive, as the user increases, you will find A total of 1,000 people uploaded 1000 identical files to your valuable server space, so engineers come up with a solution, since it is the same file, I will only save one not good enough, then the user's front-end display is No one has a copy of it. When some users want to delete this file, I do not really delete, only the front appears to be deleted, but the back end has been reserved for other users to have this file download. Until all users using this file have deleted this file I really delete it.

This way with the storage of more and more data, more and more registered users, and its upload more and more duplicate data. You find it increasingly efficient to store such duplicate files. In this way, it seems that each person uploads duplicate files can only average 1M / user. You can provide more than 50 times more user access to your limited space.

But with the use, you find a law again:

Zhang San upload "TxxxO HxT N0124" and Lee uploaded "TH n124" is the same file, but the file name is different, can I not recognize that they are a file, and then only to the different users It's okay to save it in a different file name, but it uses some algorithm that recognizes the sameness of the file, such as the MD5 value. As long as the two files have the same MD5 value and the same file size, I think they are the same file, just save one file and give different users a different file name.

One day you found that because each file needs to calculate the MD5 value, resulting in a large CPU load, and the same file must be uploaded back to the wasteful bandwidth can detect consistency, can you improve it?

Clever engineers wrote a small software or a small plug-in, the United States in the name of "upload control" will use the software to calculate the MD5 upload to the user's computer to complete, once the user to calculate the data to be uploaded and the server Has stored a certain data is the same, simply do not upload, marked directly on the user where the file has been uploaded in accordance with the XX file name was successful. This process is almost instantly get, and gave it a high handsome name "seconds pass!"

Through these many steps, you found that you can only provide network space for 1000 users, so many improvements, the client shows 1G space unchanged, almost can provide network space for 1,000,000 users.

So if you have a good mood on that day, the propaganda said: I want to increase the storage space per user to 1TB. Then each user on average or only upload 50M data, only a very few users upload a breakthrough 1G raw space data, you will find the cost is almost negligible.

The hard-working siege lion is still working hard at exploiting the disk space provided by the server for more efficient use of it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.