MongoDB Report Instance Scheme selection

Source: Internet
Author: User
Tags mongodb driver solr wrapper mongo shell

MongoDB Report Instance Scenario selection


Background introduction


In our production environment we are using replica sets, in order to allocate the business pressure of the database server, we split the database to run on a different replica set.


We run the application on the MongoDB replica set, sometimes reporting requirements, the general purpose is to get the analysis of user behavior, there are other business custom indicator data, there are search engine query requirements, using SOLR to obtain incremental data from Oplog.rs index of new product information.


These report queries and search engine query requirements, as far as possible can not affect the line of business normal operation, so you can not directly run the report on the production database. After the development and operations discussion, at the beginning of the project, plan to partition the report task so as not to affect the production task.


to talk about confusion. problems with production and reporting


The working set (WorkingSet) is a subset of the entire database that MongoDB reads and writes at any time interval. Active users in the production environment manipulate the document data, and the operating system keeps them in physical memory.


Note: Do not let your working set grow larger than memory! You can use MongoDB's monitoring Services Cloud Manager to monitor your instances. If this problem does occur, you need sharding, so capacity planning is important and can be used as an independent topic. Even if the database size is hundreds of or thousands of times times the available memory, MongoDB can also run efficiently if you plan the architecture and optimize the index upfront. Data outside the working set remains consistent with the disk, and when the user is idle, the documents they manipulate are no longer used, and the memory consumed is used for the new active user's memory request.


Report applications query large amounts of data, typically without repeated access to the same data, and each report application may have full access to different sets of data. This means that the memory needs to be continuously provided to new document read requests. If you run the report app and the production app on the same instance, the report app will compete with your production app for memory, continually request data from the active user, and your production application continues to load it (Getmore). Then the performance of the database server fluctuates.


Report application, there will be a large number of count, aggregate, MapReduce and other aggregation operations, these operations are not efficient for mongodb, so it is a good practice to separate it from production tasks.


Use exclusive reports replication sets for instances


The MongoDB replica set has online persistence by replicating data to all nodes in a collection and providing seamless failover to the client. Contains a master node that provides write, and the rest is a read replica. The election determines which node is the master when the condition requires. A copy set should contain an odd number of members to help with quick elections.


It is impossible to judge whether an unreachable machine is down or not, and it may be partitioned by the network. So if most of the nodes in the replication set go offline (that is, 2 of the 3 members), even if a healthy master node is retained, it is downgraded to a read-only copy. Failure to do so may result in multiple machines defining themselves as primary nodes in the case of a network partition, with multiple master nodes resulting in horrible data inconsistencies.


Therefore a replica set contains at least 3 members, which provides a machine failure tolerance for errors.


In MongoDB's official documentation, it is recommended to restrict report queries to dedicated nodes. The report basically does not require a write operation, but the final consistency data is counted. If the extracted data has a second level or a rating delay, the daily report is not allowed. If your count statistics are missing some operations, this will result in inaccurate report data.


You can build a dedicated report node in the MongoDB replica set environment, with the scheme having hidden replica set members hidden member or reading preferences read preference setting the associated tag set tag sets. The first method is simpler and the second method is more flexible.


Is the architecture diagram that uses the dedicated node to provide reporting requirements:

650) this.width=650; "title=" clip_image002 "style=" border-top:0px;border-right:0px;background-image:none; border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;margin:0px;padding-right:0px; "Alt=" Clip_ image002 "src=" http://s3.51cto.com/wyfs02/M00/89/F0/wKioL1gikuyRDjsbAACI5-Tai8g364.jpg "border=" 0 "height=" 429 "/ >



Hide Member Scenarios


Reference: https://docs.mongodb.com/manual/tutorial/configure-a-hidden-replica-set-member/


Hidden members are part of a replication set, but cannot be primary and are not visible to client applications. Hidden members can vote in an election.


The hidden members of a replica set are configured to priority:0 to prevent them from being elected as the primary. Setting hidden:true, even if they specify a read preference of secondary, also prevents the client from connecting to the replica set by routing read operations to it.


Reading data from a hidden member, you can only access the hidden member by direct connection and specify SLAVE_OK, not through the Mongoreplicasetclient class.


Hide Members Set


You can use the MONGO shell to hide a member that has a replica set:

$ mongo Admin-uxucy-pprimary> conf = Rs.config () {"_id": "Test", "version": +, "members": [{"_id": 0, "host": "xucy.local:27017",}, {"_id": 1, "host": "xucy.local:28017",}, {"_id": 2, "host": "xucy.local:29017",}]}primar y> conf.members[1].priority = 0primary> Conf.members[1].hidden = trueprimary> Conf.version + = 1PRIMARY> Rs.reconfig (CONF)


XUCY.LOCAL:28017 is now hidden, it will continue to copy the operation and vote as usual in the election, but the client connected to the replica set will not read from it, even if the xucy.local:29017 is offline.


Example of a ruby version of a report app connection code:

Require ' MONGO ' reporting = Mongo::mongoclient.new ("xucy.local", "28017", slave_ok:true) reporting[' my_application '] [ ' Users '].aggregate (...)


Limit Description


Using hidden members is one of the simplest ways to configure instances for dedicated workloads, such as reports and search engine access, however there are some limitations that need to be explained.


Hidden members cannot be read in emergency situations


With 2 normal and one hidden member in a copy set, tolerance for write errors is equivalent to a regular collection of 3 members. However, if you lose two nodes, your production application will not gracefully downgrade to read-only mode because your hidden members will not allow the replica set client to read. If you just like to hide member access simple, the local tyrants scenario is to use a copy set of 5 members (with a hidden member).


Wrapper code for a copy set cannot be used


Many teams create application-specific wrapper codes, and use the replica set connection access method provided by the MongoDB driver to add basic information about the replica set connection to the client. Because you need to use a standalone connection to your report instance, you can't reuse it.



Label member Scenarios


Reference: https://docs.mongodb.com/manual/tutorial/configure-replica-set-tag-sets/


Label members, more complex, however, are more flexible methods for routing report queries to a dedicated node to use tags and read preferences.


Set a member to priority:0, prevent it from being elected, but do not set it as hidden, assigning a label use:reporting:

Primary> conf = Rs.config () {"_id": "Test", "version": +, "members": [{"_id": 0, "host": "xucy.local:27017",}, {"_id": 1, "host": "xucy.local:28017",}, {"_id": 2, "host": "xucy.local:29017",}]}primary> CONF.MEMBERS[1].P riority = 0primary> Conf.members[1].tags = {"Use": "Reporting"}primary> conf.version + = 1primary> Rs.reconfig (c Onf


In this case, the xucy.local:28017 will never become the Lord. However, when the other two machines become unreachable, your app can also process read requests to the report server. It will continue to run and will not cause your report to be suspended during such an event.


Examples of application connection code for Python versions of reports:

From Pymongo import mongoreplicasetclientfrom pymongo.read_preferences Import Readpreferencerep_set = Mongoreplicasetclient (' xucy.local:27017,xucy.local:28017,xucy.local:29017 ', replicaset = ' Test ', read_preference = Readpreference.secondary, tag_sets = [{' Use ': ' Reporting '}]) Rep_set.my_application.users.aggregate (...)


For report applications, make sure that you try not to run the report app on the remaining unique secondary members, as the primary is available, because this mixes the report with production.


The above only sends a report query to a secondary member labeled Use:reporting, and if there is no available master, we should fundamentally prevent it from continuing. In practice, if you find that there is no master, you should throw exceptions and process them in your extension code. Also do a good job of monitoring the state, such as: Reporting_system.ok (). Branch processing When an exception is found.


Benefits and consider


The use of tags and reading preferences provides some flexibility in relation to hidden members.


Easy to add report instances


Because your connection code is definable, instead of being assigned to a dedicated host. You can add more nodes as report instances, just add and tag them, like this:

Primary> Rs.add ({_id:3, Host: "xucy.local:30017", priority:0, tags:{' use ': ' Reporting '}})


Your original code will take advantage of the new report instance, and the copy set will continue to run without triggering the election and disconnecting from the client.


Report instances can be skipped or deleted


When you want to present a report instance that is currently in use to another app, the report tag can be moved or removed if necessary. A reconfiguration like this will trigger an election and reconnect to all clients, which is acceptable. Note: This is a reverse method by increasing the number of common production available instances and distributing the production read to the replica members.


Some drivers require manual synchronization


Checking your driver documentation, for example, the Ruby driver (like 1.9.2), does not flush the view of the replica set unless the client uses Refresh_mode:: Sync to initialize explicitly.



SOLR To generate a full-text index


MongoDB Replica set configuration is simple and easy to get started is one of the reasons I like MongoDB. For MongoDB report instances, development and deployment are straightforward, whether you use hidden members or label members. We also use report instances in the production environment for SOLR to generate full-text indexes.


SOLR is a stand-alone enterprise Search application server that provides API interfaces similar to Web-service. The user can submit an XML file of a certain format to the search engine server via an HTTP request, generate an index, or make a lookup request through an HTTP GET operation and get the returned result in XML format.


Reading Report Instances selection of solutions


The initial scenario is to use Mongo-connector integrated MongoDB to SOLR to implement incremental indexing. (http://ultrasql.blog.51cto.com/9591438/1696083/)


Mongo-connctor is used to synchronize MongoDB data to other system components, such as it can synchronize data to SOLR, Elasticsearch, or other MongoDB clusters. Its implementation principle is based on MongoDB replica set replication mode, through the analysis of Oplog log files to achieve the final synchronization purposes. The installation configuration boot process can refer to the official documentation.


Because it is a single-process version, efficiency is not high enough for us at the time, and could be better if it could be changed to take advantage of multicore performance.


MongoDB tailable Cursors


MongoDB has a feature called tailable cursors, which resembles the tail-f command, you perform a query operation on a capped collection, and when the operation is complete, you can not close the returned data cursor, and continue to read the newly added data from it.


On high-write capped collection, the Tailable Cursors can be used when the index is not available. For example, MongoDB replication uses tailable cursors to get Primary's tail oplog logs.


Consider the following behavior related to tailable cursors:

    • Tailable cursors does not use indexes and returns documents in a natural sort.

    • Because tailable cursors does not use an index, the initial scan of the query is very performance-intensive, but after the cursor is initialized, the newly added document that is subsequently obtained is very fast.

    • Tailable Cursors If you encounter one of the following conditions, it will be zombie or invalid:

      • The query has no matching results.

      • The cursor returns the document at the end of the collection, and then the application deletes the document.

The zombie cursor ID is 0.


DBQuery.Option.awaitData


When using Tailablecursor, this parameter blocks a short period of time before the data is read, and then reads and returns.


Example of tracking Oplog:

Use Localvar cursor = Db.oplog.rs.find ({"Op": "U", "ns": "Mydb.product"},{"ts": 1, "o2._id": 1}). AddOption (Dbquery.optio n.tailable). AddOption (DBQuery.Option.awaitData), while (Cursor.hasnext ()) {var doc = Cursor.next ();p Rintjson (doc);};


Cursor Methods for version 2.6:

Cursor.addoption ()

https://docs.mongodb.com/v2.6/reference/method/cursor.addOption/


Cursor Methods for version 3.2:

Cursor.tailable ()

https://docs.mongodb.com/manual/reference/method/cursor.tailable/


Based on this feature, we developed a Java application that initializes the full-volume synchronous data to SOLR to generate the index and record the synchronization time. The tailable cursors then reads the oplog.rs compared to the last recorded synchronization time, and if it is a new change, it is updated to the SOLR document through the new process asynchronously fetching the latest data recorded in the log.

650) this.width=650; "title=" clip_image004 "style=" border-top:0px;border-right:0px;background-image:none; border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;padding-right:0px, "alt=" clip_image004 "src=" Http://s3.51cto.com/wyfs02/M00/89/F0/wKioL1giku3RtFrtAACpmMHkl7k565.jpg "border=" 0 "height=" 419 "/>


This article is from the SQL Server deep Dive blog, so be sure to keep this source http://ultrasql.blog.51cto.com/9591438/1870982

MongoDB Report Instance Scheme selection

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.