MongoDB Gridfs Best Practice Overview
CHSZS, reprint need to indicate. Blog home: Http://blog.csdn.net/chszs
Gridfs is a simple file system abstraction on top of a MongoDB database. If you are familiar with Amazon S3, then Gridfs is similar. Why would a NoSQL database like MongoDB provide such a file-layer abstraction?
First, the reasons for using Gridfs
The reasons are as follows:
1) Store user-generated file content Most Web apps allow users to upload files. When a user uses a relational database, the files generated by these users are stored in the file system, isolated from the database, and not placed in the database. This brings up some problems. How do I copy a file to all the servers that need it? When a file is deleted, how do I delete all copies? How can the security of documents and the preparation of disasters be ensured? Gridfs solves these problems well and you can use your database backup to back up your files. And thanks to MongoDB's own replication technology, you have copies of your files in every copy of the MongoDB cluster. Deleting files is as simple as deleting objects in the database.
2) access to the contents of the file when the file is uploaded to Gridfs, the file is divided into 256KB blocks and stored separately. So when you need to read a range of bytes in a file, simply load the corresponding file block into memory without having to load the entire file into memory. This is useful when you choose to read or edit a large-sized media content file.
3) in MongoDB store more than 16MB files mongodb default file size limit is 16MB. So, if your files exceed 16MB, then you should use Gridfs.
4) Overcoming File system limitations if you need to store a large number of files, you need to consider the limitations of the file system itself, because the file system is required for the number of files in the directory. With Gridfs, you don't have to worry about this anymore. Gridfs and MongoDB shards allow your files to be distributed across multiple servers without increasing the complexity of the operation.
Second, in-depth gridfs
Gridfs uses two sets of collection to store data
[JavaScript]View Plaincopyprint?
- > Show Collections;
- Fs.chunks
- Fs.files
- System.indexes
- >
> Show collections;fs.chunksfs.filessystem.indexes>
The Fs.files collection contains the metadata for the file, while the Fs.chunks collection stores the actual block of files that are split in 256KB size. If you have a collection of shards, the file blocks will be distributed across multiple servers, perhaps with better performance than the file system.
[JavaScript]View Plaincopyprint?
- > db.fs.files.findone ();
- {
- "_id" : objectid ( "530cf1bf96038f5cb6df5f39"),
- "filename" :
- " ChunkSize " : 262144,
- "Uploaddate" : isodate ( " 2014-02-25t19:40:47.321z "),
- " MD5 " : "6515E95F8BB161F6435B130A0E587CCD",
- "Length"  : 1644981  
- }  
- >
> Db.fs.files.findOne (); {"_id": ObjectId ("530cf1bf96038f5cb6df5f39"), "filename": "./conn.log", "chunkSize": 262144, "uploaddate": Isodate (" 2014-02-25t19:40:47.321z ")," MD5 ":" 6515E95F8BB161F6435B130A0E587CCD "," Length ": 1644981}>
MongoDB also creates composite indexes in the number of files_id and file blocks to help quickly access these file blocks
[JavaScript]View Plaincopyprint?
- > db.fs.chunks.getIndexes ();
- [
- {
- "V": 1,
- "Key": {
- "_ID": 1
- },
- "NS": "Files.fs.chunks",
- "Name": "_id_"
- },
- {
- "V": 1,
- "Key": {
- "files_id": 1,
- "N": 1
- },
- "NS": "Files.fs.chunks",
- "Name": "files_id_1_n_1"
- }
- ]
- >
> db.fs.chunks.getIndexes (); [{"V": 1, "key": {"_id": 1}, "ns": "Files.fs.chunks", "name": "_id_"},{"V": 1, "key": {"files_id": 1, "n": 1}, "ns": " Files.fs.chunks "," name ":" Files_id_1_n_1 "}]>
Iii. Examples of Gridfs
MongoDB has a built-in tool mongofiles that can help practice the actual use of gridfs scenes. See the related driver documentation to see how to use Gridfs.
[JavaScript]View Plaincopyprint?
- Put
- #mongofiles-H-u-p--db files Put/conn.log
- Connected to:127.0.0.1
- Added file: {_id:objectid (' 530cf1009710ca8fd47d7d5d '), FileName: "./conn.log", chunksize:262144, Uploaddate: new Date (1393357057021), MD5: "6515E95F8BB161F6435B130A0E587CCD", length:1644981}
- done!
- Get
- #mongofiles-H-u-p--db files Get/conn.log
- Connected to:127.0.0.1
- Done write to:./conn.log
- List
- # mongofiles-h-u-p List
- Connected to:127.0.0.1
- /conn.log 1644981
- Delete
- [Email protected] tmp]# mongofiles-h-u-p--db files delete/conn.log
- Connected to:127.0.0.1
- done!
Put#mongofiles-h -u -P--db files put/conn.logconnected to:127.0.0.1added file: {_id:objectid (' 530cf1009710ca8fd47d7d5d '), FileName: "./conn.log", chunksize:262144, Uploaddate:new Date (1393357057021), MD5: " 6515E95F8BB161F6435B130A0E587CCD ", length:1644981}done! Get#mongofiles-h -u -P--db files get/conn.logconnected to:127.0.0.1done write to:./conn.loglist# Mongofiles-h- U- P listconnected to:127.0.0.1/conn.log 1644981delete[[email protected] tmp]# mongofiles -H- U- p --db files delete/conn.logconnected to:127.0.0.1done!
Iv. Modules for Gridfs
If you want to gridfs files stored in MongoDB directly to the Web server or file system, then you can use the following Gridfs plug-in: 1) gridfs-fuse: Let Gridfs files directly serve the file system 2) Gridfs-nginx: Let Gridfs's files directly serve Nginx
V. Limitations of GRIDFS
Gridfs is not perfect, it also has some limitations: 1 Working Set the Gridfs file that accompanies the database content will significantly stir the MongoDB memory working set. If you don't want Gridfs files to affect your memory working set, you can store Gridfs files on different MongoDB servers. 2) Performance file service performance is slower than providing local file service performance from a Web server or file system. But the loss of performance in exchange for management advantage. 3) Atomic Update Gridfs does not provide a way to update the file atomically. If you need to meet this requirement, then you need to maintain multiple versions of the file and choose the correct version.
http://blog.csdn.net/chszs/article/details/20123327
MongoDB Gridfs Best Practice Overview