How to standardize your schema in the face of the schema free MongoDB

Source: Internet
Author: User
Tags mongodb collection



http://www.mongoing.com/archives/2282



In general, there are two main tools to limit the schema of MongoDB:



Variety: See what happens in each field in collections



Document Validation: Qualification of field rules for data in tables, error and warn level, error deny insert, warn write log


Body


We all know that MongoDB is a document database, schema free.



So what are the benefits of MongoDB's document model, in this simple list of several:


    • JSON form-in MongoDB, developers can directly store a JSON data into MongoDB, which is very friendly to developers;
    • High Read and write performance-in the relational database, we often do join, sub-query and other related requirements, this time will often bring more random io, and in MongoDB, we can through the reasonable data model design to the many related requirements through inline, inverse paradigm to achieve, reduce the random io;
    • Schema Free-mongodb's data model is flexible, without worrying about online DDL, and different document types can have different structures.


Here, we do not delve into how to design and model MongoDB schema, for this part, we recommend you can read TJ in Open source China's annual event to share the "MongoDB Advanced Mode Design", as well as "Retail Reference Architecture Part 1 to 4.



Here we will mainly for the initial modeling, and formally launched the service after the schema of the inspection and inspection methods to discuss.


Variety


Variety is an open source, very useful, source tool for detecting the type and distribution of MongoDB table fields.



As the first sentence of its GitHub readme says, "Meet Variety, a Schema Analyzer for MongoDB"



Variety can help us detect field types, distributions, and production reports in our MongoDB tables, allowing us to visually analyze existing table structures, field types, and identify hidden dangers in the data model.


Let's take an example to explain:


First, create a table


db.users.insert({name: "Tom", bio: "A nice guy.", pets: ["monkey", "fish"], someWeirdLegacyKey: "I like Ike!"});
db.users.insert({name: "Dick", bio: "I swordfight.", birthday: new Date("1974/03/14")});
db.users.insert({name: "Harry", pets: "egret", birthday: new Date("1984/03/14")});
db.users.insert({name: "Geneviève", bio: "?a va?"});
db.users.insert({name: "Jim", someBinData: new BinData(2,"1234")});


Let's take a look at the results obtained through variety.


$ mongo test --eval "var collection = ‘users‘" variety.js

+------------------------------------------------------------------+
| key | types | occurrences | percents |
| ------------------ | ------------ | ----------- | -------- |
| _id | ObjectId | 5 | 100.0 |
| name | String | 5 | 100.0 |
| bio | String | 3 | 60.0 |
| birthday | String | 2 | 40.0 |
| pets | Array(4),String(1) | 5 | 40.0 |
| someBinData | BinData-old | 1 | 20.0 |
| someWeirdLegacyKey | String | 1 | 20.0 |
+------------------------------------------------------------------+


Test is our DB name, and users is the name of the table. As we can see, for the 5 data we inserted earlier, the result of the variety run is:



All document contains _id, and the Name field, 60% of document contains the Bio field, 40% of document contains birthday and pets fields, and the Pets field has 2 types of data (4 array, 1 string), 20% of the document contains the Somebindata and Someweirdlegacykey fields.



However, in the production environment because of our large amount of data, such as a table with 1 billion data, all scanning will take a long time, we may only want to analyze the 1000 data, we can use limit to qualify.


$ mongo test --eval "var collection = ‘users‘, limit = 1000" variety.js

+----------------------------------------------------+
| key | types | occurrences | percents |
| ----------- | ----------- | ----------- | -------- |
| _id | ObjectId | 1000 | 100.0 |
| name | String | 1000 | 100.0 |
| someBinData | BinData-old | 1000 | 100.0 |
+----------------------------------------------------+


Since MongoDB can be embedded to reduce the need for federated queries, the inverse paradigm can be used to reduce random io, so there is a good chance that nesting will appear in our document. Sometimes there are too many layers of nesting, affecting our statistical information, how to do, we can be limited by maxdepth. Please refer to the following example:


db.users.insert({name:"Walter", someNestedObject:{a:{b:{c:{d:{e:1}}}}}});

$ mongo test --eval "var collection = ‘users‘" variety.js

+----------------------------------------------------------------+
| key | types | occurrences | percents |
| -------------------------- | -------- | ----------- | -------- |
| _id | ObjectId | 1 | 100.0 |
| name | String | 1 | 100.0 |
| someNestedObject | Object | 1 | 100.0 |
| someNestedObject.a | Object | 1 | 100.0 |
| someNestedObject.a.b | Object | 1 | 100.0 |
| someNestedObject.a.b.c | Object | 1 | 100.0 |
| someNestedObject.a.b.c.d | Object | 1 | 100.0 |
| someNestedObject.a.b.c.d.e | Number | 1 | 100.0 |
+----------------------------------------------------------------+

$ mongo test --eval "var collection = ‘users‘, maxDepth = 3" variety.js

+----------------------------------------------------------+
| key | types | occurrences | percents |
| -------------------- | -------- | ----------- | -------- |
| _id | ObjectId | 1 | 100.0 |
| name | String | 1 | 100.0 |
| someNestedObject | Object | 1 | 100.0 |
| someNestedObject.a | Object | 1 | 100.0 |
| someNestedObject.a.b | Object | 1 | 100.0 |
+----------------------------------------------------------+


Or we would like to specify the conditions of the statistic, such as the hope that Caredabout is true, you can do this:


$ mongo test --eval "var collection = ‘users‘, query = {‘caredAbout‘:true}" variety.js


Or want to sort it out:


$ mongo test --eval "var collection = ‘users‘, sort = { updated_at : -1 }" variety.js


We can also specify the format of the analysis results:


$ mongo test --quiet --eval "var collection = ‘users‘, outputFormat=‘json‘" variety.js


Generally in production, we do not analyze on the primary, we can be in a priority of 0, and for the hidden secondary on the analysis, this time need to specify the Slaveok:


$ mongo secondary.replicaset.member:31337/somedb --eval "var collection = ‘users‘, slaveOk = true" variety.js


Or, we would like to have the results of the analysis in MONGO:


$ mongo test --quiet --eval "var collection = ‘users‘, persistResults=true" variety.js


and specify the storage details:


    • Resultsdatabase the DB name stored by the analysis result
    • Collection name stored by resultscollection analysis results
    • Resultsuser the user of the instance stored by the analysis result
    • Resultspass the password of the instances stored by the analysis results
mongo test --quiet --eval "var collection = ‘users‘, persistResults=true, resultsDatabase=‘db.example.com/variety‘ variety.js
Why should we use variety?


Although our MongoDB is schema free, in most cases we want the field type to be uniform.



Inconsistent field types can cause errors in our data, just imagine that if a field's field type is not uniform and we don't know it, it's likely that the business query has data loss and inaccurate data.



And in the production environment, the version of the application is constantly iterative, the demand is increasing, the field changes, if there is no normalization of the on-line process after the check, the database may still have some data in the field is true, for example, some document has a field, some do not, Variety can also help us find these problems.


Document Validation


MongoDB 3.2 introduced a lot of power to the feature, in this had to mention the appearance of Document validation,document Validation I think is also MongoDB official want to express "schema free and may need Some rules "bar, haha, purely speculative.



Simply introduce the following document Validation:



We can make some restrictions for our schema free MongoDB collection. Of course, this does not mean that MongoDB has become a relational database, and personally feel that this is better to highlight the MongoDB Schema free features. In the right place, where it is needed, the schema free, there is a limit in the right place.



Suppose we want to create a new table contacts with the following constraints:



The phone field will match the end of "@mongodb. com" or "Unknown" or "incomplete" for the string type or email fields.


db.createCollection( "contacts",
{ validator: { $or:
[
{ phone: { $type: "string" } },
{ email: { $regex: /@mongodb.com$/ } },
{ status: { $in: [ "Unknown", "Incomplete" ] } }
]
}
} )


For a table that has already been established, we can do it in the following ways:


db.runCommand( {
collMod: "contacts",
validator: { $or: [ { phone: { $type: "string" } }, { email: { $regex: /@mongodb.com$/ } }, { status: { $in: [ "Unknown", "Incomplete" ] } } ] },
validationLevel: "moderate"
} )


As you can see here, with one more validationlevel parameter, we can specify our Validationlevel level when setting up validation:


    • The default level is strict, the collection existing and the new document is validation verified;


    • Can be set to moderate, only the existing document is validation qualified;



There is also the validationaction parameter to specify how our MongoDB instance handles the update or insert of data that does not conform to the validation rule.


    • The default level of ERROR,MONGODB will reject inserts and update that do not conform to the validation rule.
    • You can set the Warn,mongodb to be logged in the log, but allow such inserts and update operations. The logs are as follows:

2015-10-15T11:20:44.260-0400 W STORAGE [conn3] Document would fail validation collection: example.contacts doc: { _id: ObjectId(‘561fc44c067a5d85b96274e4‘), name: "Amanda", status: "Updated" }


?


Limitations of validation
    • Validation cannot set the collection in the admin, local and config libraries;
    • System.* Such collections can not be validation set;


How to standardize your schema in the face of the schema free MongoDB


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.