MongoDB implements Distributed File Storage Based on GridFS

Source: Internet
Author: User
GridFS is a mechanism for storing large binary files in MongoDB. There are several reasons for using GridFS to store files: using Grid can simplify requirements. If you have used MongoDB

GridFS is a mechanism for storing large binary files in MongoDB. There are several reasons for using GridFS to store files: using Grid can simplify requirements. If you have used MongoDB

GridFS is a mechanism for storing large binary files in MongoDB. There are several reasons for saving files with GridFS:

Grid can simplify requirements. If you have used MongoDB, GridFS does not need to use an independent file storage architecture.

GridFS will directly use the established replication or sharding mechanism, so it is easy for file storage to recover and expand faults.

GridFS can avoid some problems in the file system that stores user-uploaded content. For example, there is no problem when GridFS places a large number of files in the same directory.

GridFS does not generate disk fragments because MongoDB uses 2 GB as the data file space.

Use Cases: if your system has the following scenarios:

1) there are a large number of uploaded images (User uploads, System File releases, etc)

2) The magnitude of files is growing rapidly. It is possible that the file system query performance bottleneck of the single-host operating system may exceed the expansion range of the single-host hard disk.

3) file backup (not applicable to gridfs, which can be done by third parties, but not convenient), failover and repair of file system access ..

4) file indexing: in addition to the file itself, more metadata information needs to be associated (for example, not only files are stored, you also need to save the published author/release time/file tag attributes and other custom information of some files...

5) based on 4), the classification of files is fuzzy. If the file system of the operating system is used, the folder classification relationship is chaotic or cannot be classified ..

6) The current system is web-based, and the access to images is routed according to the url rules .. (normal file systems can also)

7) the file size is small and large, and the file may be migrated/deleted ..

There are two ways to store files in Mongodb in the GridFSB mode:

1. Command Line program files 2. client-side Driver Programming

1. Command Line Program Files

The program files command line inserts file data into the Mongodb database.

Program Files-host 127.0.0.1: 27017-d mydb put file name

Insert a file into the database mydb. put is a command that uploads files to Mongodb. get and delete indicate obtaining and deleting files respectively.

Run db. fs. files. find () to view the file list in GridFS.

Mongo comes with an implementation of fliles. The basic operations are as follows:
List all files:
Program Files list
Upload an object:
Program Files put xxx.txt
Download an object:
Program files get xxx.txt
Find Files:
Program Files search xxx // searches for all files whose names contain "xxx ".
Program Files list xxx // searches for all files whose names are prefixed with "xxx ".
Parameter description:
-D specifies the database. The default value is fs and program files list-d testGridfs.
-U-p specifies the user name and password
-H specifies the host
-Port: Specifies the host port.
-C: Specifies the set name. The default value is fs.
-T specifies the MIME type of the file, which is ignored by default.

2. Use APIs to access files

I generally use python to call pymongo to manage mongodb aggregation.

Most of the time, we can use nginx to allow nginx to directly read gridfs files.

Install nginx-gridfs
Wget https://download.github.com/mdirolf-nginx-gridfs-v0.8-0-gb5f8113.tar.gz
Tar-zxvf mdirolf-nginx-gridfs-v0.8-0-gb5f8113.tar.gz
Mv mdirolf-nginx-gridfs-v0.8-0-gb5f8113 mdirolf-nginx-gridfs-v0.8
Wget https://download.github.com/mongodb-mongo-c-driver-v0.3-0-g74cc0b8.tar.gz
Tar-zxvf mongodb-mongo-c-driver-v0.3-0-g74cc0b8.tar.gz
Mv mongodb-mongo-c-driver-v0.3-0-g74cc0b8/* mdirolf-nginx-gridfs-v0.8/mongo-c-driver
Rm-rf mongodb-mongo-c-driver-v0.3-0-g74cc0b8
Install nginx and specify the nginx-gridfs directory and nginx joint compilation
Wget
Tar-zxvf nginx-1.0.1.tar.gz
Cd nginx-1.0.1
. /Configure -- prefix =/usr/local/nginx -- with-openssl =/usr/include/openssl -- with-http_stub_status_module -- add-module =/home/cdh/Downloads/mdirolf-nginx- gridfs
Make-j8
Sudo make install-j8
Configure nginx-gridfs
Location/pics /{
Gridfs pics
Field = filename
Type = string;
Mongo 127.0.0.1: 27017;
}
Gridfs: keyword used by nginx to identify plug-ins
Pics: db name
[Root_collection]: select collection, for example, root_collection = blog. mongod will find the blog. files and blog. chunks. The default value is fs.
[Field]: query a field. Make sure that the field name is in mongdb. The parameter _ id and filename are supported, which can be omitted. The default value is _ id.
[Type]: indicates the Data type of the field. objectid, int, string are supported, which can be omitted. The default value is int.
[User]: user name, which can be omitted
[Pass]: password, which can be omitted
Mongo: mongodb url
You can download the image.

Gridfs is applicable to scenarios where there are too many applications and there is no need for massive files. Otherwise, we will not spend so much time doing this.

In this case, you can split the pressure by sharding and Master/Slave.

Mkdir-p/data/shard/s0
Mkdir-p/data/shard/s1
Mkdir-p/data/shard/log
Mkdir-p/data/shard/config
# Start shard server
./Mongod -- shardsvr -- port 20000 -- dbpath/data/shard/s0 -- fork -- logpath/data/shard/log/s0.log -- directoryperdb
./Mongod -- shardsvr -- port 20001 -- dbpath/data/shard/s1 -- fork -- logpath/data/shard/log/s1.log -- directoryperdb
# Start config server
./Mongod -- configsvr -- port 30000 -- dbpath/data/shard/config -- fork -- logpath/data/shard/log/config. log -- directoryperdb
# Start route server mongos
Mongos -- port 40000 -- configdb localhost: 30000 -- fork -- logpath/data/shard/log/route. log -- chunkSize 1
# Add parts
./Mongo admin -- port 40000 # link to mongos
Db. runCommand ({addshard: "localhost: 20000"}) # Add Part 1
Db. runCommand ({addshard: "localhost: 20001"}) # add part 2
Db. runCommand ({enablesharding: "test"}) # Set the test database to enable multipart.
Db. runCommand ({shardcollection: "test. users", key: {_ id: 1}) # Set the primary keys of the test. users set shards and shards.
# Verify sharding
Use test
# Insert 0.5 million data
For (var I = 1; I <= 500000; I ++) db. users. insert ({age: I, name: "zx", addr: "beijing", country: "china "})
Db. users. stats () # view users shards

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.