In the development of all kinds of System application server, we often encounter the problem of file storage. Common disk File system, DBMS traditional file stream storage. Today we look at the storage scheme based on the NoSQL database MongoDB. In the context of CentOS 6.5,mongodb 2.6.3, Nginx-1.4.7, for example, you need to know about common Linux commands.
Let's take a look at the internal file structure of MongoDB
- MongoDB is partitioned on the Datastore by namespace, a collection is a namespace, an index is also a namespace
- Data from the same namespace is partitioned into multiple extent,extent using a doubly linked list connection
- In each extent, the data for each row is saved, which is also connected by a two-way link
- Each row of data storage includes not only the data footprint, but also some additional space, which makes it possible to not move the location after the data update has become large
- Indexes are implemented in the BTREE structure
And then the structure of the Gridfs.
Gridfs in the database, Fs.chunks and fs.files are used by default to store files.
Where Fs.files collection of information stored in the file, Fs.chunks storing the file data.
A record in a Fs.files collection is as follows: The information for a file is as follows.
{
"_id": ObjectId ("4f4608844f9b855c6c35e298"), // The unique id, which can be a user-defined type
"filename": "CPU.txt", // file name
"length": 778, // file length
"chunkSize": 262144, // chunk size
"uploadDate": ISODate ("2012-02-23T09: 36: 04.593Z"), // upload time
"md5": "e2c789b036cfb3b848ae39a24e795ca6", // md5 value of the file
"contentType": "text / plain" // The MIME type of the file
"meta": null // Other information of the file. By default, there is no "meta" key. The user can define it as any BSON object.
}
The chunk in the corresponding fs.chunks are as follows:
{
"_id": ObjectId ("4f4608844f9b855c6c35e299"), // chunk's id
"files_id": ObjectId ("4f4608844f9b855c6c35e298"), // id of the file, corresponding to the object in fs.files, equivalent to the foreign key of the fs.files collection
"n": 0, // The first chunk of the file, if the file is larger than the chunksize, it will be split into multiple chunks
"data": BinData (0, "QGV ...") // The binary data of the file, the specific content is omitted here
}
File into the Gridfs process, if the file is larger than chunksize, the file is divided into multiple chunk, and then save the chunk to Fs.chunks, and finally the file information into the fs.files.
When reading the file, according to the conditions of the query, find a suitable record in the Fs.files, get the value of "_id", and then according to this value to Fs.chunks to find all "files_id" for "_id" Chunk, and press "n" to sort, finally read chunk in sequence The contents of the "data" object and revert to the original file.
Installing Install and Configuration
1. Installing MongoDB
Add MongoDB Repository, do not know vim, please refer to vim
Vim/etc/yum.repos.d/mongodb.repo
If it's 64bit,
[MongoDB]
Name=mongodb Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
Gpgcheck=0
Enabled=1
32bit System:
[MongoDB]
Name=mongodb Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686/
Gpgcheck=0
Enabled=1
Then the installation will prompt y/n:
Yum Install Mongo-10gen Mongo-10gen-server
Start:
Service Mongod Start
View status
Service Mongod Status
Stop it
Service Mongod Stop
For more information, please refer to the official website for more than 3.0 versions.
2. Installing Nginx and Nginx-gridfs
dependent libraries, tools
# yum-y Install Pcre-devel openssl-devel zlib-devel
# yum-y Install gcc gcc-c++
Download Nginx-gridfs Source code
# git clone https://github.com/mdirolf/nginx-gridfs.git
# CD Nginx-gridfs
# git checkout v0.8
# git submodule init
# git submodule update
Download Nginx source code, compile and install. (High version support is not good)
# wget http://nginx.org/download/nginx-1.4.7.tar.gz
# tar ZXVF nginx-1.4.7.tar.gz
# CD nginx-1.4.7
#./configure--with-openssl=/usr/include/openssl--add-module=. /nginx-gridfs/
# make-j8 && Make Install–j8
Note The blue characters are configured to correspond to the path of the Nginx-gridfs
3. Configure Nginx-gridfs
Vim/usr/local/nginx/conf/nginx.conf
Add a location node to the server node
location/img/{
Gridfs TestDB
Field=filename
type=string;
MONGO 192.168.0.159:27017;
}
location/files/{
Gridfs TestDB
field=_id
Type=objectid;
MONGO 192.168.0.159:27017;
}
Here our MONGO service is in IP 192.168.0.159.
If field is not specified, the default is MongoDB's self-increment ID, and type is int
Configuration parameter Description:
Gridfs:nginx Identify plug-in keywords
Testdb:db Name
[Root_collection]: Select collection, such as Root_collection=blog, Mongod will find Blog.files and blog.chunks two blocks, the default is FS
[Field]: Query field, guaranteed Mongdb has this field name, support _id, filename, can be omitted, default is _id
[Type]: interpreted field data type, support Objectid, int, string, can be omitted, default is int
[User]: username, can be omitted
[Pass]: password, can be omitted
Mongo:mongodb URL
Start Nginx Service
#/usr/local/nginx/sbin/nginx
May appear:
Nginx [Emerg]: Bind () to 0.0.0.0:80 failed (98:address already on use)
You can use the command to close a 80-port program
sudo fuser-k 80/tcp
Simple test
Upload a file with the native command line
Mongofiles put 937910.jpg--local ~/937910_100.jpg--host 192.168.0.159--port 27017--db testdb--type jpg
937910.jpg is our advance download good one picture file, note that we do not specify collection, the default is FS
Install the Robomongo management tool from http://www.robomongo.org/to view the files you just uploaded
Finally we visit in the browser, if you see the picture is OK
Http://192.168.0.159/img/937910.jpg
For the. NET Environment MongoDB CSharpDriver 1.10.0 from NuGet:
Install-package mongocsharpdriver-version 1.10.0
We use the following fragment code:
int nFileLen = fileUploadModel.FileBytes.Length;
MongoGridFSSettings fsSetting = new MongoGridFSSettings () {Root = CollectionName};
MongoGridFS fs = new MongoGridFS (mongoServer, MongoDatabaseName, fsSetting);
// You need to manually set the upload time when calling the Write, WriteByte, WriteLine functions
// Add additional information via Metadata
MongoGridFSCreateOptions option = new MongoGridFSCreateOptions ();
option.Id = ObjectId.GenerateNewId ();
var currentDate = DateTime.Now;
option.UploadDate = currentDate;
option.Aliases = alias;
BsonDocument doc = new BsonDocument ();
// Document additional information storage
if (fileUploadModel.DocExtraInfo! = null && fileUploadModel.DocExtraInfo.Count> 0)
{
foreach (var obj in fileUploadModel.DocExtraInfo)
{
if (! doc.Elements.Any (p => p.Name == obj.Key))
{
doc.Add (obj.Key, obj.Value);
}
}
}
option.Metadata = doc;
// Create file, file and store data
using (MongoGridFSStream gfs = fs.Create (fileUploadModel.FileName, option))
{
gfs.Write (fileUploadModel.FileBytes, 0, nFileLen);
gfs.Close ();
}
log.ErrorFormat ("Attachment ID: {0} File name: {1} uploaded successfully", alias, fileUploadModel.FileName);
return option.Id.ToString ();
Note that currently Gridfs-ngnix does not support _id type is GUID, about Objectid reference website, such as:
MongoDB generates Objectid there is also a greater advantage that MongoDB can generate objectid through its own services, or through the driver of the client.
When to use Gridfs
From the official 2.6.10 Release manual Content
For documents in a MongoDB collection, should always use GRIDFS for storing fileslarger thanMB. In some situations, storing large files is more efficient in a MongoDB database than on a system-level filesystem.
? If your filesystem limits the number of files in a directory, you can use Gridfs to store as many files as needed.
? When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities. When using the geographically distributed replica sets MongoDB can distribute files and their metadata automatically to a num ber of Mongod instances and facilities.
? When you want to access information from portions of large files without have to load whole files into memory, can u SE Gridfs to recall sections of files without reading the entire file into memory.
The Gridfs if need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata . You can update the metadata field This indicates "latest" status in an atomic update after uploading the new version of th e file, and later remove previous versions if needed.
Furthermore, if your files is all smaller the MB BSON Document Size limit, consider storing the file manually within a Single document. The Bindata data type to store the binary data. See your drivers documentation for details on using Bindata.
Database Master-Slave synchronization
Schematic diagram
is MongoDB using replica sets mode of synchronization process
- A red arrow indicates that the write is written to primary, and then asynchronously synchronizes to multiple secondary
- A blue arrow indicates that the read operation can be read from either primary or secondary
- Heartbeat synchronization is maintained between each primary and secondary to determine the status of replica sets
Data fragmentation mechanism
- MongoDB shards are specified as a shard key to be performed, the data is divided into different chunk by range, and the size of each chunk is limited
- There are multiple shard nodes that save these chunk, and each node holds a portion of the chunk
- Each shard node is a replica sets, which guarantees the security of the data
- When a chunk exceeds its limit of maximum volume, it splits into two small chunk
- Chunk migration action is raised when the chunk is unevenly distributed across the Shard nodes
Shard Time Server role
This is the standard for sharding, and here are some of the node roles that are specific to sharding
- Client Access routing node MONGOs for data read and write
- The config server holds two mappings, one is the mapping of which chunk is the interval of the key value, and the other is the mapping of which shard node exists in chunk
- The routing node obtains the data information through the config server, and through this information, it finds the corresponding operation of the Shard node that actually holds the data.
- The routing node also determines whether the current chunk is out of bounds in the write operation, and if so, it is divided into two chunk
- For queries and update operations by Shard Key, the routing node will find the specific chunk and then do the related work
- For queries and update operations that do not press shard Key, MONGOs sends a request to all subordinate nodes and then merges the returned results
Some other tips on MongoDB:
- Do not use the 32-bit version
The 32-bit version of MongoDB is also not recommended because you can only process data of 2GB size. Remember the first limit? This is the description of MongoDB about this limitation.
- Learn about the official restrictions
To my surprise, very few people queried the limitations of the tools they were going to use. Fortunately, MongoDB developers have published a blog with all the limitations of MongoDB, and you can get the information in advance to avoid embarrassment during use.
- Master-slave replication does not ensure high availability
Although it is not recommended to be used, MongoDB provides another replication strategy, that is, master-slave replication. It solves the 12 node limit problem, but it creates a new problem: if you need to change the master node of the cluster, then you have to be done by hand, you're surprised? Look at this link.
- Data replication through a replica set works great, but there are limitations.
The copy set strategy for data replication in MongoDB is great, easy to configure and really good to use. But if you have more than 12 nodes in your cluster, you will encounter problems. The copy set in MongoDB has a limit of 12 nodes, here is the description of the problem, you can trace the question to see if it has been resolved.
Conclusion
Gridfs is best suited for large file storage, especially for video, audio, large images over 16MB size files. Small files can also be stored, but require 2 query costs (metadata and file content) [tip#18 Tips and Tricks for MongoDB developers]. Do not modify the contents of the stored file, but update the file metadata such as version, or upload a new version of the file, delete the old version of the file. For large amounts of file storage, multiple data nodes, replication, data fragmentation, etc. are required. Do not access the image file based on Nginx, the browser does not cache. From the Internet storage picture case, the picture is mostly jpg, png and thumbnail files, the Depository file system (DFS) will be a better solution.
Resources:
GRIDFS official
Building MongoDB applications with Binary Files Using Gridfs
We hope to help you with your software development. Other articles you might be interested in:
Competency Model Consulting Tool (PART1)
Evolution of real-time measurement system of enterprise application performance
A few examples of cloud computing reference architectures
Smart Mobile Guide Solution Brief
Evolution of human resource management system
If you want to know more software, System IT, enterprise information information, please follow my subscription number:
Petter Liu
Source: http://www.cnblogs.com/wintersun/
This article is copyright to the author and the blog Park, Welcome to reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link, otherwise reserves the right to pursue legal responsibility.
The article was also published in my Independent blog-petter Liu blog.
MongoDb Gridfs-ngnix file Storage scheme