MongoDb Gridfs-ngnix File Storage Scenarios-Pictures

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.cnblogs.com/wintersun/p/4622205.html

In the development of all kinds of System application server, we often encounter the problem of file storage. Common disk File system, DBMS traditional file stream storage. Today we look at the storage scheme based on the NoSQL database MongoDB. In the context of CentOS 6.5,mongodb 2.6.3, Nginx-1.4.7, for example, you need to know about common Linux commands.
Let's take a look at the internal file structure of MongoDB

MongoDB is partitioned on the Datastore by namespace, a collection is a namespace, an index is also a namespace
Data from the same namespace is partitioned into multiple extent,extent using a doubly linked list connection
In each extent, the data for each row is saved, which is also connected by a two-way link
Each row of data storage includes not only the data footprint, but also some additional space, which makes it possible to not move the location after the data update has become large
Indexes are implemented in the BTREE structure

And then the structure of the Gridfs.

Gridfs in the database, Fs.chunks and fs.files are used by default to store files.

Where Fs.files collection of information stored in the file, Fs.chunks storing the file data.

A record in a Fs.files collection is as follows: The information for a file is as follows.

{
"_id": ObjectId ("4f4608844f9b855c6c35e298"), // The unique id, which can be a user-defined type
"filename": "CPU.txt", // file name
"length": 778, // file length
"chunkSize": 262144, // chunk size
"uploadDate": ISODate ("2012-02-23T09: 36: 04.593Z"), // upload time
"md5": "e2c789b036cfb3b848ae39a24e795ca6", // md5 value of the file
"contentType": "text / plain" // The MIME type of the file
"meta": null // Other information of the file, there is no "meta" key by default, and the user can define it as any BSON object
}

The chunk in the corresponding fs.chunks are as follows:

{
"_id": ObjectId ("4f4608844f9b855c6c35e299"), // chunk's id
"files_id": ObjectId ("4f4608844f9b855c6c35e298"), // id of the file, corresponding to the object in fs.files, equivalent to the foreign key of the fs.files collection
"n": 0, // The first chunk of the file, if the file is larger than the chunksize, it will be split into multiple chunks
"data": BinData (0, "QGV ...") // The binary data of the file, the specific content is omitted here
}

File into the Gridfs process, if the file is larger than chunksize, the file is divided into multiple chunk, and then save the chunk to Fs.chunks, and finally the file information into the fs.files.

When reading the file, according to the conditions of the query, find a suitable record in the Fs.files, get the value of "_id", and then according to this value to Fs.chunks to find all "files_id" for "_id" Chunk, and press "n" to sort, finally read chunk in sequence The contents of the "data" object and revert to the original file.

Installing Install and Configuration

1. Installing MongoDB

Add MongoDB Repository, do not know vim, please refer to vim

Vim/etc/yum.repos.d/mongodb.repo

If it's 64bit,

[MongoDB]

Name=mongodb Repository

baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/

Gpgcheck=0

Enabled=1

32bit System:

[MongoDB]

Name=mongodb Repository

baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686/

Gpgcheck=0

Enabled=1

Then the installation will prompt y/n:

Yum Install Mongo-10gen Mongo-10gen-server

Start:

Service Mongod Start

View status

Service Mongod Status

Stop it

Service Mongod Stop

For more information, please refer to the official website for more than 3.0 versions.

2. Installing Nginx and Nginx-gridfs

dependent libraries, tools

# yum-y Install Pcre-devel openssl-devel zlib-devel

# yum-y Install gcc gcc-c++

Download Nginx-gridfs Source code

# git clone https://github.com/mdirolf/nginx-gridfs.git

# CD Nginx-gridfs

# git checkout v0.8

# git submodule init

# git submodule update

Download Nginx source code, compile and install. (High version support is not good)

# wget http://nginx.org/download/nginx-1.4.7.tar.gz

# tar ZXVF nginx-1.4.7.tar.gz

# CD nginx-1.4.7

#./configure--with-openssl=/usr/include/openssl--add-module=.. /nginx-gridfs/

# make-j8 && Make Install–j8

Note The blue characters are configured to correspond to the path of the Nginx-gridfs

3. Configure Nginx-gridfs

Vim/usr/local/nginx/conf/nginx.conf

Add a location node to the server node

location/img/{
Gridfs TestDB
Field=filename
type=string;
MONGO 192.168.0.159:27017;
}

location/files/{
Gridfs TestDB
field=_id
Type=objectid;
MONGO 192.168.0.159:27017;
}

Here our MONGO service is in IP 192.168.0.159.
If field is not specified, the default is MongoDB's self-increment ID, and type is int

Configuration parameter Description:

Gridfs:nginx Identify plug-in keywords
Testdb:db Name
[Root_collection]: Select collection, such as Root_collection=blog, Mongod will find Blog.files and blog.chunks two blocks, the default is FS
[Field]: Query field, guaranteed Mongdb has this field name, support _id, filename, can be omitted, default is _id
[Type]: interpreted field data type, support Objectid, int, string, can be omitted, default is int
[User]: username, can be omitted
[Pass]: password, can be omitted
Mongo:mongodb URL

Start Nginx Service

#/usr/local/nginx/sbin/nginx

May appear:
Nginx [Emerg]: Bind () to 0.0.0.0:80 failed (98:address already on use)

You can use the command to close a 80-port program

sudo fuser-k 80/tcp

Simple test

Upload a file with the native command line

Mongofiles put 937910.jpg--local ~/937910_100.jpg--host 192.168.0.159--port 27017--db testdb--type jpg

937910.jpg is our advance download good one picture file, note that we do not specify collection, the default is FS

Install the Robomongo management tool from http://www.robomongo.org/to view the files you just uploaded

Finally we visit in the browser, if you see the picture is OK

Http://192.168.0.159/img/937910.jpg

For the. NET Environment MongoDB CSharpDriver 1.10.0 from NuGet:
Install-package mongocsharpdriver-version 1.10.0
We use the following fragment code:

int nFileLen = fileUploadModel.FileBytes.Length;
 
MongoGridFSSettings fsSetting = new MongoGridFSSettings () {Root = CollectionName};
MongoGridFS fs = new MongoGridFS (mongoServer, MongoDatabaseName, fsSetting);
 
// You need to manually set the upload time when calling the Write, WriteByte, WriteLine functions
// Add additional information via Metadata
MongoGridFSCreateOptions option = new MongoGridFSCreateOptions ();
option.Id = ObjectId.GenerateNewId ();
var currentDate = DateTime.Now;
option.UploadDate = currentDate;
option.Aliases = alias;
BsonDocument doc = new BsonDocument ();
// Document additional information storage
if (fileUploadModel.DocExtraInfo! = null && fileUploadModel.DocExtraInfo.Count> 0)
{
    foreach (var obj in fileUploadModel.DocExtraInfo)
    {
        if (! doc.Elements.Any (p => p.Name == obj.Key))
        {
            doc.Add (obj.Key, obj.Value);
        }
    }
}
option.Metadata = doc;
 
// Create file, file and store data
using (MongoGridFSStream gfs = fs.Create (fileUploadModel.FileName, option))
{
    gfs.Write (fileUploadModel.FileBytes, 0, nFileLen);
    gfs.Close ();
}
log.ErrorFormat ("Attachment ID: {0} File name: {1} uploaded successfully", alias, fileUploadModel.FileName);
return option.Id.ToString ();

Note that currently Gridfs-ngnix does not support _id type is GUID, about Objectid reference website, such as:

MongoDB generates Objectid there is also a greater advantage that MongoDB can generate objectid through its own services, or through the driver of the client.

When to use Gridfs

From the official 2.6.10 Release manual Content

For documents in a MongoDB collection, should always use GRIDFS for storing fileslarger thanMB. In some situations, storing large files is more efficient in a MongoDB database than on a system-level filesystem.

If your filesystem limits the number of files in a directory, you can use Gridfs to store as many files as needed.
when want to keep your files and metadata automatically synced and deployed across a number of systems and Facilitie S. When using geographically distributed replica sets MongoDB can distribute files and their metadata automatically to a n Umber of Mongod instances and facilities.
when want to access information from portions of large files without have to load whole files into memory, can Use Gridfs to recall sections of files without reading the entire file into memory.

The Gridfs if need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata . You can update the metadata field This indicates "latest" status in an atomic update after uploading the new version of th e file, and later remove previous versions if needed.

Furthermore, if your files is all smaller the MB BSON Document Size limit, consider storing the file manually within a Single document. The Bindata data type to store the binary data. See your drivers documentation for details on using Bindata.

Database Master-Slave synchronization

Schematic diagram

is MongoDB using replica sets mode of synchronization process

A red arrow indicates that the write is written to primary, and then asynchronously synchronizes to multiple secondary
A blue arrow indicates that the read operation can be read from either primary or secondary
Heartbeat synchronization is maintained between each primary and secondary to determine the status of replica sets

Data fragmentation mechanism

MongoDB shards are specified as a shard key to be performed, the data is divided into different chunk by range, and the size of each chunk is limited
There are multiple shard nodes that save these chunk, and each node holds a portion of the chunk
Each shard node is a replica sets, which guarantees the security of the data
When a chunk exceeds its limit of maximum volume, it splits into two small chunk
Chunk migration action is raised when the chunk is unevenly distributed across the Shard nodes

Shard Time Server role

This is the standard for sharding, and here are some of the node roles that are specific to sharding

Client Access routing node MONGOs for data read and write
The config server holds two mappings, one is the mapping of which chunk is the interval of the key value, and the other is the mapping of which shard node exists in chunk
The routing node obtains the data information through the config server, and through this information, it finds the corresponding operation of the Shard node that actually holds the data.
The routing node also determines whether the current chunk is out of bounds in the write operation, and if so, it is divided into two chunk
For queries and update operations by Shard Key, the routing node will find the specific chunk and then do the related work
For queries and update operations that do not press shard Key, MONGOs sends a request to all subordinate nodes and then merges the returned results

Some other tips on MongoDB:

Do not use the 32-bit version

The 32-bit version of MongoDB is also not recommended because you can only process data of 2GB size. Remember the first limit? This is the description of MongoDB about this limitation.

Learn about the official restrictions

To my surprise, very few people queried the limitations of the tools they were going to use. Fortunately, MongoDB developers have published a blog with all the limitations of MongoDB, and you can get the information in advance to avoid embarrassment during use.

Master-slave replication does not ensure high availability

Although it is not recommended to be used, MongoDB provides another replication strategy, that is, master-slave replication. It solves the 12 node limit problem, but it creates a new problem: if you need to change the master node of the cluster, then you have to be done by hand, you're surprised? Look at this link.

Data replication through a replica set works great, but there are limitations.

The copy set strategy for data replication in MongoDB is great, easy to configure and really good to use. But if you have more than 12 nodes in your cluster, you will encounter problems. The copy set in MongoDB has a limit of 12 nodes, here is the description of the problem, you can trace the question to see if it has been resolved.

Conclusion

Gridfs is best suited for large file storage, especially for video, audio, large images over 16MB size files. Small files can also be stored, but require 2 query costs (metadata and file content) [tip#18 Tips and Tricks for MongoDB developers]. Do not modify the contents of the stored file, but update the file metadata such as version, or upload a new version of the file, delete the old version of the file. For large amounts of file storage, multiple data nodes, replication, data fragmentation, etc. are required. Do not access the image file based on Nginx, the browser does not cache. From the Internet storage picture case, the picture is mostly jpg, png and thumbnail files, the Depository file system (DFS) will be a better solution.

Resources:

GRIDFS official
Building MongoDB applications with Binary Files Using Gridfs

MongoDb Gridfs-ngnix File Storage Scenarios-Pictures

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MongoDb Gridfs-ngnix File Storage Scenarios-Pictures

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MongoDb Gridfs-ngnix File Storage Scenarios-Pictures

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support