MongoDB's Gridfs storage file

Source: Internet
Author: User
Tags install openssl save file unique id git clone

1, the Gridfs of MongoDB detailed analysis

Gridfs Introduction

Gridfs is a built-in feature in MongoDB that can be used to store a large number of small files.

Http://www.mongodb.org/display/DOCS/GridFS

Http://www.mongodb.org/display/DOCS/GridFS+Specification

Gridfs use

MongoDB provides a command line tool that Mongofiles can handle Gridfs in the bin directory.

List all Files:

Mongofiles List

Upload a file:

Mongofiles put Xxx.txt

To download a file:

Mongofiles Get Xxx.txt

To find a file:

Mongofiles Search XXX//will find all files that contain "xxx" in the file name

Mongofiles list XXX//will find all files with the file name prefixed with "XXX"

Parameter description:

–d The specified database, the default is Fs,mongofiles list–d Testgridfs

-u–p Specify user name, password

-h Specifies the host

-port Specifying host Ports

-c Specifies the collection name, which is the default FS

-t specifies the MIME type of the file, which is ignored by default

Use Mongovue to view, manage Gridfs

Mongovue Address: http://www.mongovue.com/

Mongovue is a freeware, but features are limited beyond 15 days. The restrictions can be lifted by removing the following registry key:

[hkey_current_user\software\classes\clsid\{b1159e65-821c3-21c5-ce21-34a484d54444}\4ff78130]

It's all right to delete the value under this item.

Upload the downloaded file with Java driver:

: Https://github.com/mongodb/mongo-java-driver/downloads

Official documents don't seem to be up-to-date, but they're not trapped by viewing the API to use Zhui.

Http://api.mongodb.org/java/2.7.2/

The following code is based on Mongo-2.7.3.jar

NGINX-GRIDFS Module installation and use

Project home: Https://github.com/mdirolf/nginx-gridfs

With Nginx-gridfs, you can access the files in the Gridfs directly using HTTP.

1. Installation

Install various dependency packages: Zlib,pcre,openssl

Under Ubuntu may be the following command:

sudo apt-get install Zlib1g-dev//seemingly sudo apt-get install Zlib-dev cannot be installed

sudo apt-get install libpcre3 Libpcre3-dev

sudo apt-get install OpenSSL Libssl-dev

Install Git (slightly)

Download the Nginx-gridfs code with git:

git clone git://github.com/mdirolf/nginx-gridfs.git

CD Nginx-gridfs

Git submodule init

git submodule update

Download Nginx:

wget Http://nginx.org/download/nginx-1.0.12.zip

Tar zxvf nginx-1.0.12.zip

CD nginx-1.0.12

The path of the./configure--add-module=<nginx-gridfs >

Make

sudo make install

If a compilation error occurs, the--with-cc-opt=-wno-error parameter is added to the Configure.

2. Configure Nginx

In the configuration of the server, add the following

location/pics/{

Gridfs Pics

Field=filename

type=string;

MONGO 127.0.0.1:27017;

}

The above configuration indicates:

The database is pics, and the file is accessed by filename, the type of filename is string

Currently only access to files via ID and filename is supported.

Start Nginx:/usr/local/nginx/sbin/nginx

Use Mongovue to upload a picture 001.jpg into the pics database.

Open: Http://localhost/pics/001.jpg

If successful, you can see the picture displayed.

3. Deficiencies of the Nginx-gridfs

There is no range support that implements HTTP, that is, the ability to continue the breakpoint and to download the Shard.

Gridfs Implementation principle

Gridfs in the database, Fs.chunks and fs.files are used by default to store files.

Where Fs.files collection of information stored in the file, Fs.chunks storing the file data.

A record in a Fs.files collection is as follows: The information for a file is as follows.

{ "_id": ObjectId ("4f4608844f9b855c6c35e298"),//unique ID, which can be a user-defined type"FileName": "CPU.txt",//file name"Length": 778,//file Length"ChunkSize": 262144,//size of the chunk"Uploaddate": Isodate ("2012-02-23t09:36:04.593z"),//Upload Time"MD5": "E2c789b036cfb3b848ae39a24e795ca6",//MD5 value of the file"ContentType": "Text/plain"//MIME type of file"Meta":NULL     //the other information of the file, the default is no "meta" this key, the user can define themselves as any Bson object}

The chunk in the corresponding fs.chunks are as follows:

"_id": ObjectId ("4f4608844f9b855c6c35e299"),      //chunk ID"files_id": ObjectId (" 4f4608844f9b855c6c35e298 "),     // ID of the file, corresponding to the object in the Fs.files, equivalent to the foreign key " n "of the Fs.files collection: 0,      // the first few chunk blocks of the file, if the file is larger than chunksize, will be divided into multiple chunk blocks "data": Bindata (0, "QGV ...")     // The binary data of the file, the specific content is omitted here }

The default chunk size is 256K.

public static final int default_chunksize = 256 * 1024;

So in the process of depositing the file into Gridfs, if the file is larger than chunksize, the file is divided into multiple chunk, then the chunk is saved to fs.chunks, and then the file information is deposited in the fs.files.

When reading the file, according to the conditions of the query, find a suitable record in the Fs.files, get the value of "_id", and then according to this value to Fs.chunks to find all "files_id" for "_id" Chunk, and press "n" to sort, finally read chunk in sequence The contents of the "data" object and revert to the original file.

Custom Gridfs hash function

Although theoretically, no matter what hash function, it is possible to have the same hash value, but the content is not the same file, but for the GRIDFS default use of the MD5 algorithm, currently has the same length and MD5 values are the same but the content is not the same file.

If you want to use a different hash algorithm, you can start with the driver. Because Gridfs in MongoDB is actually only two ordinary collection, so can completely self-modification driver, replace the next hash algorithm can be.

Java version of the current driver is relatively simple, can easily modify the implementation.

But be aware that this does not conform to the specifications of Gridfs.

Precautions

1. Gridfs does not automatically process MD5 the same file, for MD5 the same file, if you want to have only one store in Gridfs, you want to be self-processed by the user. The calculation of the MD5 value is done by the client.

2. Because Gridfs in the process of uploading the file is to save the file data to Fs.chunks, and finally save the file information to Fs.files, so if the upload file process failed, it is possible to appear in the Fs.chunks garbage data. This junk data can be cleaned up regularly.

2, Gridfs principle

Gridfs is a built-in feature of MongoDB that provides a set of file manipulation APIs to utilize MongoDB to store files, the basic principle of Gridfs is to save files in two collection, a save file index, a save file content, The contents of the file are divided into blocks of a certain size, and each piece exists in a document, which not only provides the file storage, but also provides the storage of some additional properties related to the file (such as MD5 value, file name, etc.).

MongoDB's Gridfs storage file

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.