MongoDB documents are stored in Bson format and support binary data types when we store binary format data directly into MongoDB's documentation. However, when files are too large, such as samples and videos, each document is limited in length, so MongoDB provides a canonical--gridfs for handling large files.
Gridfs Implementation Principle
In the Gridfs database, by default, Fs.chunks and Fs.files are used to store the file, where the Fs.files collection holds the file's information, fs.chunks the data that holds the file, One of the records in a Fs.files collection is as follows: A file information is shown below.
[SQL]View Plaincopy print?
- <pre name="code" class="javascript" >{
- "_id": ObjectId ("4f4608844f9b855c6c35e298"),//unique ID, can be a user-defined type
- "FileName": "CPU.txt",//File name
- "Length": 778,//File length
- "ChunkSize": 262144, size of//chunk
- "Uploaddate": Isodate ("2012-02-23t09:36:04.593z"),//upload time
- "MD5": "E2C789B036CFB3B848AE39A24E795CA6",//MD5 value of the file
- "ContentType": MIME type of "Text/plain"//File
- "META": NULL//file other information, default is no "meta" this key, the user can define themselves as any Bson object
- }
Corresponding to the chunk in fs.chunks (Chinese meaning data block), as follows:
[JavaScript]View Plaincopy print?
- {&NBSP;&NBSP;&NBSP;
- "_id" : ObjectId ( "4f4608844f9b855c6c35e299"), // Chunk id
- "files_id" : ObjectId ( "4f4608844f9b855c6c35e298"), //file ID, Corresponds to an object in the Fs.files equivalent to the foreign key of the Fs.files collection;
- "n" : 0, //file chunk block, if the file is larger than chunksize, it will be divided into multiple chunk blocks
- "QGV ...") //file binary data, here omit the specific content
- }
The default size is 256k, so the file into the GRIDFS process, if the file is larger than chunksize, then the file is divided into multiple chunk, then the chunk saved in Fs.chunks, and finally the file information into Fs.files.
When reading the file, according to the conditions of the query, find a suitable record in the Fs.files, get the value of "_id", and then find all files_id _id chunk according to this value to Fs.funks, and sort by "n", and then read the chunk in sequence. The contents of the data object and revert to the original file.
Note:
1, Gridfs does not automatically process MD5 the same file, for MD5 the same file, if you want to have only one store in Gridfs, to user processing, the calculation of MD5 value is done by the client.
2, because Gridfs in the process of uploading files is the first to save the file data to Fs.chunks, and finally save the file information to Fs.files, so if the upload file process failure, there may be garbage data in the fs.chunks, these garbage data, can be cleaned up regularly.
MongoDB (eight) Mongodb--gridfs storage