MongoDb Gridfs-ngnix File Storage Scenarios-Pictures

Source: Internet
Author: User
Tags mongodb collection openssl unique id git clone robomongo



Http://www.cnblogs.com/wintersun/p/4622205.html



In the development of all kinds of System application server, we often encounter the problem of file storage. Common disk File system, DBMS traditional file stream storage. Today we look at the storage scheme based on the NoSQL database MongoDB. In the context of CentOS 6.5,mongodb 2.6.3, Nginx-1.4.7, for example, you need to know about common Linux commands.
Let's take a look at the internal file structure of MongoDB





    1. MongoDB is partitioned on the Datastore by namespace, a collection is a namespace, an index is also a namespace
    2. Data from the same namespace is partitioned into multiple extent,extent using a doubly linked list connection
    3. In each extent, the data for each row is saved, which is also connected by a two-way link
    4. Each row of data storage includes not only the data footprint, but also some additional space, which makes it possible to not move the location after the data update has become large
    5. Indexes are implemented in the BTREE structure


And then the structure of the Gridfs.






Gridfs in the database, Fs.chunks and fs.files are used by default to store files.



Where Fs.files collection of information stored in the file, Fs.chunks storing the file data.



A record in a Fs.files collection is as follows: The information for a file is as follows.


{
"_id": ObjectId ("4f4608844f9b855c6c35e298"), // The unique id, which can be a user-defined type
"filename": "CPU.txt", // file name
"length": 778, // file length
"chunkSize": 262144, // chunk size
"uploadDate": ISODate ("2012-02-23T09: 36: 04.593Z"), // upload time
"md5": "e2c789b036cfb3b848ae39a24e795ca6", // md5 value of the file
"contentType": "text / plain" // The MIME type of the file
"meta": null // Other information of the file, there is no "meta" key by default, and the user can define it as any BSON object
}


The chunk in the corresponding fs.chunks are as follows:


{
"_id": ObjectId ("4f4608844f9b855c6c35e299"), // chunk's id
"files_id": ObjectId ("4f4608844f9b855c6c35e298"), // id of the file, corresponding to the object in fs.files, equivalent to the foreign key of the fs.files collection
"n": 0, // The first chunk of the file, if the file is larger than the chunksize, it will be split into multiple chunks
"data": BinData (0, "QGV ...") // The binary data of the file, the specific content is omitted here
}


File into the Gridfs process, if the file is larger than chunksize, the file is divided into multiple chunk, and then save the chunk to Fs.chunks, and finally the file information into the fs.files.



When reading the file, according to the conditions of the query, find a suitable record in the Fs.files, get the value of "_id", and then according to this value to Fs.chunks to find all "files_id" for "_id" Chunk, and press "n" to sort, finally read chunk in sequence The contents of the "data" object and revert to the original file.


Installing Install and Configuration


1. Installing MongoDB



Add MongoDB Repository, do not know vim, please refer to vim


Vim/etc/yum.repos.d/mongodb.repo


If it's 64bit,


[MongoDB]

Name=mongodb Repository

baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/

Gpgcheck=0

Enabled=1


32bit System:


[MongoDB]

Name=mongodb Repository

baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686/

Gpgcheck=0

Enabled=1


Then the installation will prompt y/n:


Yum Install Mongo-10gen Mongo-10gen-server


Start:


Service Mongod Start


View status


Service Mongod Status


Stop it


Service Mongod Stop


For more information, please refer to the official website for more than 3.0 versions.



2. Installing Nginx and Nginx-gridfs



dependent libraries, tools


# yum-y Install Pcre-devel openssl-devel zlib-devel

# yum-y Install gcc gcc-c++


Download Nginx-gridfs Source code


# git clone https://github.com/mdirolf/nginx-gridfs.git

# CD Nginx-gridfs

# git checkout v0.8

# git submodule init

# git submodule update


Download Nginx source code, compile and install. (High version support is not good)


# wget http://nginx.org/download/nginx-1.4.7.tar.gz

# tar ZXVF nginx-1.4.7.tar.gz

# CD nginx-1.4.7

#./configure--with-openssl=/usr/include/openssl--add-module=.. /nginx-gridfs/

# make-j8 && Make Install–j8


Note The blue characters are configured to correspond to the path of the Nginx-gridfs



3. Configure Nginx-gridfs



Vim/usr/local/nginx/conf/nginx.conf



Add a location node to the server node



location/img/{
Gridfs TestDB
Field=filename
type=string;
MONGO 192.168.0.159:27017;
}



location/files/{
Gridfs TestDB
field=_id
Type=objectid;
MONGO 192.168.0.159:27017;
}



Here our MONGO service is in IP 192.168.0.159.
If field is not specified, the default is MongoDB's self-increment ID, and type is int



Configuration parameter Description:



Gridfs:nginx Identify plug-in keywords
Testdb:db Name
[Root_collection]: Select collection, such as Root_collection=blog, Mongod will find Blog.files and blog.chunks two blocks, the default is FS
[Field]: Query field, guaranteed Mongdb has this field name, support _id, filename, can be omitted, default is _id
[Type]: interpreted field data type, support Objectid, int, string, can be omitted, default is int
[User]: username, can be omitted
[Pass]: password, can be omitted
Mongo:mongodb URL


Start Nginx Service

#/usr/local/nginx/sbin/nginx


May appear:
Nginx [Emerg]: Bind () to 0.0.0.0:80 failed (98:address already on use)



You can use the command to close a 80-port program


sudo fuser-k 80/tcp


Simple test


Upload a file with the native command line


Mongofiles put 937910.jpg--local ~/937910_100.jpg--host 192.168.0.159--port 27017--db testdb--type jpg


937910.jpg is our advance download good one picture file, note that we do not specify collection, the default is FS



Install the Robomongo management tool from http://www.robomongo.org/to view the files you just uploaded






Finally we visit in the browser, if you see the picture is OK



Http://192.168.0.159/img/937910.jpg



For the. NET Environment MongoDB CSharpDriver 1.10.0 from NuGet:
Install-package mongocsharpdriver-version 1.10.0
We use the following fragment code:


int nFileLen = fileUploadModel.FileBytes.Length;
 
MongoGridFSSettings fsSetting = new MongoGridFSSettings () {Root = CollectionName};
MongoGridFS fs = new MongoGridFS (mongoServer, MongoDatabaseName, fsSetting);
 
// You need to manually set the upload time when calling the Write, WriteByte, WriteLine functions
// Add additional information via Metadata
MongoGridFSCreateOptions option = new MongoGridFSCreateOptions ();
option.Id = ObjectId.GenerateNewId ();
var currentDate = DateTime.Now;
option.UploadDate = currentDate;
option.Aliases = alias;
BsonDocument doc = new BsonDocument ();
// Document additional information storage
if (fileUploadModel.DocExtraInfo! = null && fileUploadModel.DocExtraInfo.Count> 0)
{
    foreach (var obj in fileUploadModel.DocExtraInfo)
    {
        if (! doc.Elements.Any (p => p.Name == obj.Key))
        {
            doc.Add (obj.Key, obj.Value);
        }
    }
}
option.Metadata = doc;
 
// Create file, file and store data
using (MongoGridFSStream gfs = fs.Create (fileUploadModel.FileName, option))
{
    gfs.Write (fileUploadModel.FileBytes, 0, nFileLen);
    gfs.Close ();
}
log.ErrorFormat ("Attachment ID: {0} File name: {1} uploaded successfully", alias, fileUploadModel.FileName);
return option.Id.ToString ();


Note that currently Gridfs-ngnix does not support _id type is GUID, about Objectid reference website, such as:






MongoDB generates Objectid there is also a greater advantage that MongoDB can generate objectid through its own services, or through the driver of the client.


When to use Gridfs


From the official 2.6.10 Release manual Content


For documents in a MongoDB collection, should always use GRIDFS for storing fileslarger thanMB. In some situations, storing large files is more efficient in a MongoDB database than on a system-level filesystem.

If your filesystem limits the number of files in a directory, you can use Gridfs to store as many files as needed.
when want to keep your files and metadata automatically synced and deployed across a number of systems and Facilitie S. When using geographically distributed replica sets MongoDB can distribute files and their metadata automatically to a n Umber of Mongod instances and facilities.
when want to access information from portions of large files without have to load whole files into memory, can Use Gridfs to recall sections of files without reading the entire file into memory.

The Gridfs if need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata . You can update the metadata field This indicates "latest" status in an atomic update after uploading the new version of th e file, and later remove previous versions if needed.

Furthermore, if your files is all smaller the MB BSON Document Size limit, consider storing the file manually within a Single document. The Bindata data type to store the binary data. See your drivers documentation for details on using Bindata.

Database Master-Slave synchronization


Schematic diagram






is MongoDB using replica sets mode of synchronization process


    • A red arrow indicates that the write is written to primary, and then asynchronously synchronizes to multiple secondary
    • A blue arrow indicates that the read operation can be read from either primary or secondary
    • Heartbeat synchronization is maintained between each primary and secondary to determine the status of replica sets
Data fragmentation mechanism




    • MongoDB shards are specified as a shard key to be performed, the data is divided into different chunk by range, and the size of each chunk is limited
    • There are multiple shard nodes that save these chunk, and each node holds a portion of the chunk
    • Each shard node is a replica sets, which guarantees the security of the data
    • When a chunk exceeds its limit of maximum volume, it splits into two small chunk
    • Chunk migration action is raised when the chunk is unevenly distributed across the Shard nodes
Shard Time Server role





This is the standard for sharding, and here are some of the node roles that are specific to sharding


    • Client Access routing node MONGOs for data read and write
    • The config server holds two mappings, one is the mapping of which chunk is the interval of the key value, and the other is the mapping of which shard node exists in chunk
    • The routing node obtains the data information through the config server, and through this information, it finds the corresponding operation of the Shard node that actually holds the data.
    • The routing node also determines whether the current chunk is out of bounds in the write operation, and if so, it is divided into two chunk
    • For queries and update operations by Shard Key, the routing node will find the specific chunk and then do the related work
    • For queries and update operations that do not press shard Key, MONGOs sends a request to all subordinate nodes and then merges the returned results
Some other tips on MongoDB:
    • Do not use the 32-bit version

The 32-bit version of MongoDB is also not recommended because you can only process data of 2GB size. Remember the first limit? This is the description of MongoDB about this limitation.

    • Learn about the official restrictions

To my surprise, very few people queried the limitations of the tools they were going to use. Fortunately, MongoDB developers have published a blog with all the limitations of MongoDB, and you can get the information in advance to avoid embarrassment during use.

    • Master-slave replication does not ensure high availability

Although it is not recommended to be used, MongoDB provides another replication strategy, that is, master-slave replication. It solves the 12 node limit problem, but it creates a new problem: if you need to change the master node of the cluster, then you have to be done by hand, you're surprised? Look at this link.

    • Data replication through a replica set works great, but there are limitations.

The copy set strategy for data replication in MongoDB is great, easy to configure and really good to use. But if you have more than 12 nodes in your cluster, you will encounter problems. The copy set in MongoDB has a limit of 12 nodes, here is the description of the problem, you can trace the question to see if it has been resolved.

Conclusion


Gridfs is best suited for large file storage, especially for video, audio, large images over 16MB size files. Small files can also be stored, but require 2 query costs (metadata and file content) [tip#18 Tips and Tricks for MongoDB developers]. Do not modify the contents of the stored file, but update the file metadata such as version, or upload a new version of the file, delete the old version of the file. For large amounts of file storage, multiple data nodes, replication, data fragmentation, etc. are required. Do not access the image file based on Nginx, the browser does not cache. From the Internet storage picture case, the picture is mostly jpg, png and thumbnail files, the Depository file system (DFS) will be a better solution.


Resources:


GRIDFS official
Building MongoDB applications with Binary Files Using Gridfs



MongoDb Gridfs-ngnix File Storage Scenarios-Pictures


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.