Gridfs Introduction
Gridfs is a built-in feature in MongoDB that can be used to store a large number of small files.
Gridfs use
MongoDB provides a command line tool mongofiles can handle Gridfs,
List all Files:
Copy Code code as follows:
Upload a file:
Copy Code code as follows:
Download a file:
Copy Code code as follows:
Find Files:
Copy Code code as follows:
Finds all files that contain "xxx" in the file name
Mongofiles Search XXX
Finds all files with the prefix "xxx" as the file name
Mongofiles list xxx
Parameter description:
–D Specifies the database, and the default is Fs,mongofiles list–d Testgridfs
-u–p Specify user name, password
-h Specifies the host
-port Specify host Port
-c Specifies the name of the collection, the default is FS
-t specifies the MIME type of the file, which is ignored by default
Gridfs Realization Principle
Gridfs in the database, Fs.chunks and fs.files are used by default to store files.
Where the Fs.files collection holds the file information, Fs.chunks holds the file data.
A record in a Fs.files collection is as follows: The information for a file is as follows.
Copy Code code as follows:
{
' _id ': ObjectId ("4f4608844f9b855c6c35e298"),//unique ID, can be a user-defined type
"FileName": "CPU.txt",//filename
"Length": 778,//File length
"Chunksize": 262144,//chunk size
"Uploaddate": Isodate ("2012-02-23t09:36:04.593z"),//upload time
"MD5": "E2C789B036CFB3B848AE39A24E795CA6",//MD5 value of the file
' ContentType ': ' Text/plain '//MIME type of File
"META": null//File other information, default is no "meta" This key, users can define themselves as arbitrary bson objects
}
The chunk in the corresponding fs.chunks are as follows:
Copy Code code as follows:
{
"_id": ObjectId ("4f4608844f9b855c6c35e299"),//chunk ID
"files_id": ObjectId ("4f4608844f9b855c6c35e298"),//ID of file, corresponding to object in Fs.files, foreign key equivalent to Fs.files set
"N": 0,//The first few chunk blocks of the file, if the file is larger than chunksize, it will be split into multiple chunk blocks
"Data": Bindata (0, "QGV ...")//File binary data, here omitted the specific content
}
The default chunk size is 256K.
So in the file into the GRIDFS process, if the file is greater than chunksize, the file is divided into multiple chunk, and then the chunk saved to the fs.chunks, and finally the file information into the fs.files.
In reading the file, the first according to the conditions of the query, in Fs.files found a suitable record, get "_id" value, and then according to this value to fs.chunks find All "files_id" as "_id" Chunk, and press "n" sort, and then read chunk in turn The contents of the "data" object and revert to the original file.
Attention matters
1.GridFS does not automatically process MD5 the same file, for MD5 the same file, if you want to in Gridfs only one storage, to the user to handle. The calculation of the MD5 value is done by the client.
2. Because Gridfs in the process of uploading files is to save the file data to Fs.chunks, and finally the file information saved to the fs.files, so if the upload file in the process of failure, it is possible to appear in the Fs.chunks garbage data. This junk data can be cleaned up on a regular basis.