1. Gridfs Introduction
Gridfs is a submodule of MONGO that uses GRIDFS to persist files based on MongoDB. and supports distributed applications (file distribution storage and reading). As a solution for storing binary data in MongoDB in a database, typically used to handle large files, there is a size limit for data (documents) stored in MongoDB's Bson format, up to 16M. However, in the actual system development, uploaded pictures or files may be large size, at this time we can borrow Gridfs to assist in managing these files.
Gridfs is not a mongodb feature, it is a file specification that stores large files in MongoDB, and all officially supported drivers implement the GRIDFS specification. Gridfs make large files in the database how to handle, through the development of language drivers to complete, through the API interface to store the retrieval of large files.
2. Gridfs Usage Scenarios
(1) If your file system has a limited number of files stored in a directory, you can use Gridfs to store as many files as possible.
(2) When you want to access part of a large file, but do not want to load the entire file into memory, you can use Gridfs to store the file and read the file part of the information without having to load the entire file into memory.
(3) When you want your files and meta data to be automatically synced and deployed across multiple systems and facilities, you can use Gridfs to implement distributed file storage.
3. Gridfs Storage principle
Gridfs uses two collections (collection) to store files. A collection is chunks, which is used to store binary data for the contents of a file; A collection is files, which is used to store metadata for a file.
Gridfs will place two sets in a normal buket, and the two collections will be prefixed with Buket's name. MongoDB's Gridfs default uses the FS named Buket to hold two file collections. So two sets of stored files are named collection Fs.files, set fs.chunks.
Of course, you can define a different buket name, or even define multiple bukets in a database, but all collections must have names that don't exceed the limits of the MongoDB namespace.
The name of the MongoDB collection includes the database name and the collection name, which passes the database name and the collection name through "." Delimited (eg:<database>.<collection>). and the maximum length of the name must not exceed 120bytes.
When a file is stored in Gridfs, if the file is larger than chunksize (each chunk block size is 256KB), the file is divided into chunk blocks according to the size of the chunk, and finally the information for the chunk block is stored in multiple documents in the Fs.chunks collection. The file information is then stored in a single document in the Fs.files collection. Where the file_id field in multiple documents in the Fs.chunks collection corresponds to the Fs.files centralized document "_id" field.
When reading a file, first find the corresponding document in the Files collection according to the query criteria, and get the "_id" field, and then query all "files_id" equals "_id" in the chunks collection according to "_id". Finally, the "Data" field of the chunk is read according to the "N" Field order to restore the file.
4. Stored Procedures
The Fs.files collection stores metadata for a file, stored in a class JSON-formatted document. Each file is stored in Gridfs, and a document is generated in the Fs.files collection.
The Fs.files collection documents are stored as follows:
Fs.chunks sets the binary data that stores the contents of a file file and stores it in a class JSON-formatted document. Each file is stored in Gridfs, gridfs the file content according to chunksize size (chunk capacity of 256k) into multiple file blocks, The file block is then present in the class JSON format. Chunks collection, each file block corresponds to a document in the Fs.chunk collection. One storage file corresponds to one or more chunk documents.
The Fs.chunks collection documents are stored as follows:
In order to improve the retrieval speed of MongoDB, two sets of Gridfs are indexed. The Fs.files collection uses the "filename" and "uploaddate" fields as unique, composite indexes. The Fs.chunk collection uses the "files_id" and "N" fields as unique, composite indexes.
5. Precautions
(1) Gridfs does not automatically process files with the same MD5 value, that is, two put commands for the same file, which will correspond to two different stores in Gridfs, which is a waste for storage. For MD5 the same file, if you want to have only one store in Gridfs, you need to extend the processing through the API.
(2) MongoDB does not release the hard disk space that is already occupied. Even if you delete a collection in db, MongoDB does not free up disk space. Similarly, if you use Gridfs to store files and delete useless junk files from the Gridfs store, MongoDB will still not free up disk space. This can cause the disk to continue to consume and not be recycled.
How do I free up disk space?
(1) You can reclaim disk space by repairing the database by running the db.repairdatabase () command or the Db.runcommand ({repairdatabase:1}) command in the MONGO shell. (This command is slow to execute).
When you use the Repair database method to reclaim a disk, it is important to note that the remaining space on the repaired disk must be greater than or equal to the storage dataset footprint plus 2G, or the repair cannot be completed. Therefore, a large number of storage files using Gridfs must be considered in advance to design a disk recycling scheme to address MongoDB disk reclamation issues.
(2) using the Dump & Restore method, you first delete the data that needs to be purged in the MongoDB database, and then use Mongodump to back up the database. After the backup is complete, delete the MongoDB database and use the Mongorestore tool to restore the backup data to the database.
When you use the Db.repairdatabase () command to not have enough disk space left, you can reclaim disk resources using dump & restore. If MongoDB is a replica set mode, the Dump & Restore method can be used for external continuous service to reclaim disk resources without affecting the normal use of MongoDB.
6. Code Examples
The code is based on spring boot, which mainly implements the basic operation of Gridfs.
(1) The application.properties configuration is as follows:
Spring.data.mongodb.uri=mongodb://localhost:27017/test
(2) Spring boot function
1 Packagecom.ws;2 3 ImportOrg.springframework.boot.SpringApplication;4 ImportOrg.springframework.boot.autoconfigure.SpringBootApplication;5 6@SpringBootApplication7 Public classApplication {8 Public Static voidMain (string[] args) {9Springapplication.run (Application.class, args);Ten} One}
View Code
(3) Spring boot domain, the primary definition returns the identity
1 Packagecom.ws;2 3 Public classResponse {4 PrivateString name;5 6 PublicResponse (String name) {7 This. name = name;8}9 Ten PublicString GetName () { One returnName A} - - Public voidSetName (String name) { the This. name = name; -} - -}
View Code
(4) Spring boot controller layer, define interface function
1 Packagecom.ws;2 3 ImportCom.mongodb.BasicDBObject;4 ImportCom.mongodb.DBObject;5 ImportCom.mongodb.gridfs.GridFSDBFile;6 ImportOrg.apache.commons.io.IOUtils;7 ImportOrg.apache.log4j.Logger;8 Importorg.springframework.beans.factory.annotation.Autowired;9 ImportOrg.springframework.data.mongodb.core.query.Criteria;Ten ImportOrg.springframework.data.mongodb.core.query.Query; One ImportOrg.springframework.data.mongodb.gridfs.GridFsTemplate; A ImportOrg.springframework.http.MediaType; - Importorg.springframework.web.bind.annotation.RequestMapping; - ImportOrg.springframework.web.bind.annotation.RequestMethod; the ImportOrg.springframework.web.bind.annotation.RequestParam; - ImportOrg.springframework.web.bind.annotation.RestController; - ImportOrg.springframework.web.multipart.MultipartFile; - + ImportJava.io.IOException; - ImportJava.io.InputStream; + ImportJava.util.Date; A ImportJava.util.List; at ImportJava.util.UUID; - -@RestController -@RequestMapping ("/api") - Public classGridfsapi { - Private StaticLogger Logger = Logger.getlogger (Gridfsapi.class); in@Autowired - PrivateGridfstemplate gridfstemplate; to +@RequestMapping (value = "/save", method = Requestmethod.post, produces = Mediatype.application_json_value) - PublicResponse Save (@RequestParam (value = "file", required =true) Multipartfile file) { the *Logger.info ("Saving file :"); $DBObject MetaData =NewBasicdbobject ();Panax NotoginsengMetadata.put ("CreatedDate",NewDate ()); - theString fileName = Uuid.randomuuid (). toString (); + ALogger.info ("File Name:"+ fileName); the +InputStream InputStream =NULL; - Try{ $InputStream = File.getinputstream (); $Gridfstemplate.store (InputStream, FileName, "Image", MetaData); -Logger.info ("File Saved:"+ fileName); -}Catch(IOException e) { theLogger.error ("IOException:"+ e); - Throw NewRuntimeException ("System Exception while handling request");Wuyi} theLogger.info ("File return:"+ fileName); - return NewResponse (FileName); Wu} - About@RequestMapping (value = "/get", method = Requestmethod.get, produces = Mediatype.image_jpeg_value) $ Public byte[] Get (@RequestParam (value = "FileName", required =true) String FileName)throwsIOException { -Logger.info ("Getting file :"+ fileName); -list<gridfsdbfile> result = Gridfstemplate -. Find (NewQuery (). Addcriteria (Criteria.where ("filename"). is (FileName)); A if(Result = =NULL|| Result.size () = = 0) { +Logger.info ("File not found"+ fileName); the Throw NewRuntimeException ("No file with Name:"+ fileName); -} $Logger.info ("File found"+ fileName); the returnIoutils.tobytearray (Result.get (0). getInputStream ()); the} the the@RequestMapping (value = "/delete", method = Requestmethod.delete) - Public voidDelete (@RequestParam (value = "FileName", required =true) (String fileName) { inLogger.info ("Deleting file :"+ fileName); theGridfstemplate.delete (NewQuery (). Addcriteria (Criteria.where ("filename"). is (FileName)); theLogger.info ("File deleted"+ fileName); About} the} the
View Code
Spring Boot uses MONGO's GRIDFS module