MongoDB GridFS best application Overview
Author: chszs, reprinted with note. Blog homepage: http://blog.csdn.net/chszs
GridFS is a simple File System Abstraction on the MongoDB database. If you are familiar with Amazon S3, GridFS is similar to it. Why does a NoSQL database like MongoDB provide such a file layer abstraction?
I. Reasons for using GridFS
The reasons are as follows:
1) store user-generated file content
Most Web applications allow users to upload files. When a user uses a relational database, the files generated by these users are stored in the file system and isolated from the database, rather than stored in the database. This poses some problems. How do I copy a file to all servers that require the file? After an object is deleted, how does one Delete All copies? How can we ensure file security and perform disaster recovery? GridFS solves these problems well. You can use your database backup to back up your files. In addition, because of MongoDB's own replication technology, you can copy your files at each copy of the MongoDB cluster. Deleting a file is as simple as deleting objects in a database.
2) Partition of Access File Content
After a file is uploaded to GridFS, the file is split into 256KB blocks and stored separately. Therefore, when you need to read bytes in a certain range of files, you only need to load the corresponding file block into the memory, instead of loading the entire file into the memory. This is useful when you select to read or edit a large media content file.
3) store files larger than 16 Mb in MongoDB
The default file size for MongoDB is 16 MB. Therefore, if your file exceeds 16 MB, you should use GridFS.
4) Overcome File System Restrictions
If you need to store a large number of files, you need to consider the restrictions of the file system, because the file system requires the number of files in the directory. After GridFS is used, you do not need to worry about this problem. The sharding of GridFS and MongoDB allows your files to be distributed across multiple servers without increasing the operation complexity.
Ii. go deep into GridFS
GridFS uses two Collection sets to store data.
> show collections;fs.chunksfs.filessystem.indexes>
The fs. files collection contains the metadata of the file, while the fs. chunks collection stores the actual file blocks separated by kb. If you have a set of shards, the file blocks are distributed across multiple servers, which may provide better performance than the file system.
> db.fs.files.findOne();{"_id" : ObjectId("530cf1bf96038f5cb6df5f39"),"filename" : "./conn.log","chunkSize" : 262144,"uploadDate" : ISODate("2014-02-25T19:40:47.321Z"),"md5" : "6515e95f8bb161f6435b130a0e587ccd","length" : 1644981}>
MongoDB also creates a composite index in files_id and number of file blocks to help you quickly access these file blocks.
> db.fs.chunks.getIndexes();[{"v" : 1,"key" : {"_id" : 1},"ns" : "files.fs.chunks","name" : "_id_"},{"v" : 1,"key" : {"files_id" : 1,"n" : 1},"ns" : "files.fs.chunks","name" : "files_id_1_n_1"}]>
Iii. GridFS instance
MongoDB has a built-in tool named program files to help you practice GridFS scenarios. See the relevant Driver documentation to see how to use GridFS.
Put#mongofiles -h -u -p --db files put /conn.logconnected to: 127.0.0.1added file: { _id: ObjectId('530cf1009710ca8fd47d7d5d'), filename: "./conn.log", chunkSize: 262144, uploadDate: new Date(1393357057021), md5: "6515e95f8bb161f6435b130a0e587ccd", length: 1644981 }done!Get#mongofiles -h -u -p --db files get /conn.logconnected to: 127.0.0.1done write to: ./conn.logList# mongofiles -h -u -p listconnected to: 127.0.0.1/conn.log 1644981Delete[root@ip-10-198-25-43 tmp]# mongofiles -h -u -p --db files delete /conn.logconnected to: 127.0.0.1done!
Iv. GridFS Module
If you want to directly serve the GridFS files stored in MongoDB on a Web server or file system, you can use the following GridFS plug-in:
1) GridFS-Fuse: enables GridFS files to serve the file system directly.
2) GridFS-Nginx: Allows GridFS files to directly serve Nginx
V. Limitations of GridFS
GridFS is not perfect either. It also has some limitations:
1) Working Set
The GridFS file with the database content will significantly stir up the MongoDB memory working set. If you do not want GridFS files to affect your memory working set, you can store GridFS files on different MongoDB servers.
2) Performance
The file service performance is slower than the performance of the local file service provided from the Web server or file system. However, the loss of performance is a management advantage.
3) Atomic update
GridFS does not provide an atomic update method for files. To meet this requirement, you must maintain multiple versions of the file and select the correct version.