I. conventional image storage policies
In general, the image storage below GB can be used in folders
For example, the folder level is year/Industry Attribute/month/date/user attribute.
There are several more important principles than limit:
1. The number of files in a single folder cannot exceed 2000. The addressing speed is slow. You can see the effect when there are too many files in Linux ls.
2. the folder hierarchy should not be too deep, so the server processing addressing is slow.
Ii. Massive Image Storage policies
1. Core difficulties
(1) The massive volume means that the number of images is hundreds of millions. We just don't have to worry about indexing, but we don't have to worry about database storage.
(2) The total image size is calculated based on T-a single node is certainly not supported
(3) pictures are very easy and have a long tail effect-there is no so-called hot spot
2. Solution
(1) Storage Solution
Worker stores some small files in the distributed cluster environment in a distributed manner, and records the location in hash mode (usually hash first and then confirm the storage location ). Directly use the location as the file name
Common hash calculation methods: Hash (key) % N = rough physical location
Common distributed storage solutions: HDFS, TFS ....
(2) Explain the solution (if we use Apache)
Hard answer
Directly let Apache locate the specified file storage location based on the file name and read the file stream
Soft answer:
Directly use Apache rewrite to read files
Massive Image Storage Policy