MongoDB-What is it? Where do you come from?

Source: Internet
Author: User
Tags benchmark connection pooling failover mongodb sharding

Suitable for readers: for MongoDB there is not yet a holistic understanding of beginners. Here is a brief introduction to this ID:


Ii. introduction of MongoDB

MongoDB is a high-performance, open-source, modeless document-based database that is a popular one in the current NoSQL database. It can be used in many scenarios to replace the traditional relational database or key/value storage methods. MONGO is developed using C + +. MONGO's official website address is: http://www.mongodb.org/, where readers can get more detailed information.


Episode: What is NoSQL?

NOSQL, the full name is not-only Sql, refers to the non-relational database. The next generation of databases mainly solves several key points: non-relational, distributed, open-source, horizontally extensible. The original goal was for large-scale Web applications, which began in the early 2009 with features such as freedom of mode, support for simple replication, simple APIs, eventual consistency (non-acid), high-capacity data, and so on. NoSQL is the most used when the number of key-value storage, of course, there are other document-type, column storage, graph database, XML database and so on.


Characteristics:

High performance, easy to deploy, easy to use, easy to store data. The main features are:

For collection storage, easy to store data for object types.

Mode of freedom.

Supports dynamic queries.

Supports full indexes, including internal objects.

Support Queries.

Supports replication and recovery.

Use efficient binary data storage, including large objects such as video.

Automatically process fragmentation to support scalability at the cloud level

Drivers for Python,php,ruby,java,c,c#,javascript,perl and C + + languages are supported, and drivers for Erlang and. NET platforms are also available in the community.

The file storage format is Bson (an extension of JSON).

can be accessed over the network.

Function:


Collection-oriented storage: suitable for storing objects and data in JSON form.

Dynamic query: MONGO supports rich query expressions. Query directives use a JSON-style tag to easily query objects and arrays embedded in the document.

Full index support: Includes embedded objects and arrays in the document. The query optimizer of MONGO parses the query expression and generates an efficient query plan.

Query monitoring: MONGO contains a monitoring tool to analyze the performance of database operations.

Replication and automatic failover: The MONGO database supports data replication between servers, supporting master-slave mode and inter-server replication. The primary goal of replication is to provide redundancy and automatic failover.

Efficient traditional storage: supports binary data and large objects (such as photos or pictures)

Auto-sharding to support cloud-scale scalability: Automatic sharding supports a level of database clustering, adding additional machines dynamically.

Applicable occasions:


Website data: MONGO is ideal for real-time inserts, updates and queries, as well as the replication and high scalability required for real-time data storage on the site.

Caching: Because of its high performance, MONGO is also suitable as a caching layer for the information infrastructure. After the system restarts, the persistent cache layer built by MONGO can avoid overloading the underlying data sources.

Large, low-value data: Storing some data in a traditional relational database can be expensive, and many times programmers often choose traditional files for storage.

Highly scalable scenario: The MONGO is ideal for databases made up of dozens of or hundreds of servers. Built-in support for the MapReduce engine is already included in the roadmap for MONGO.

Storage for objects and JSON data: The MONGO Bson data format is ideal for storing and querying in a document format.



In the overall environment of the country, MongoDB in the small and medium-sized companies, its flow far beyond other nosql.

Once your business expands rapidly and business data becomes larger, please see the following location first:

http://news.cnblogs.com/n/121155/


Why not?


1) MongoDB uses an insecure write method to win Benchmark tests by default


If you do not call GetLastError (), MongoDB will not return after confirming the completion of the database write operation, which introduces at least two issues:


In a concurrent environment (connection pooling, etc.), continuous read operations after a read "Done" error, MongoDB does not have a "fence lock" to know when to finish writing.

An unknown number of save operations are discarded because the queue for the save operation is in a different place. such as TCP cache. These things are discarded when you connect to the database because some of them are meant to be disconnected.

10gen CTO reply: This has nothing to do with Benchmark, and said this is the design of the API, it is given to the user to choose, because there are many ways to write.


2) MongoDB will lose data in an alarming way


Here is a list of the data that we have gone through:


The data is lost, the reason is unknown.

Recovering data from a damaged database is unsuccessful, such as a transaction log.

Data duplication between master and slave nodes has a gap, resulting in "from node" loss of "main node" data. Yes, there is no CheckSum, and yes, you will also see the current state of the replication status from node.

Data replication sometimes stops and there are no errors. You have to monitor your replication status!

10gen CTO replies: 1) Never a data loss BUG we didn't fix things right away. Can you tell me the number of the question you quoted us? We must at least understand what is going on. If it's our problem, we'll fix it right away. 2) is it not quite normal to recover data from a damaged database? However, it should be better if you have a master-slave server for each backup. 3) Please tell me your problem number, we have never received such a bug report. If there is, it is serious indeed. 4) It is possible to say that there is no notification when the error condition occurs. In addition, you can monitor the data copy of the write operation, you can use w=2 for GetLastError parameters.


3) MongoDB requires a global write lock to request a write operation


This is tantamount to killing you when writing is frequently done. If you run a blog, you may not care about it because your reading and writing operations are not high.


10gen CTO reply: Read-write lock is always a problem, but 2.0 will be better, 2.2 will solve better.


4) MongoDB sharding (partition) will stop working under high load


Adding a shard to a high load is a nightmare. Mongo either moves its data block too fast and causes a DOS attack to generate a lot of traffic that consumes bandwidth or completely rejects more blocks of data. This can make a high-traffic site suffer heavy write operations.


10gen CTO Reply: If the system has exceeded its load, then moving the data will certainly become difficult. Every time I speak very clearly, do not in the system performance is not the time to add Shard, this can not.


5) Mongo Unreliable


The architecture of the mongod/configuration server/mongos is indeed reasonable and intelligent. Unfortunately, MONGOs is completely rubbish. In the case of a load, it crashes from time to time, sometimes for hours, sometimes for several days. Process Restart monitoring is sometimes not used, because it throws some assertions that forge a critical thread, causing the process to run. Double Fail.


Worst of all, the only way to do this is to put a HaProxy (a load balancer) in front of a bunch of mongos instances, run a job that slowly accesses these mongos instances, and periodically kill them so that new instances can be restarted. I'm not joking.


10gen CTO reply: There is no such thing, can you tell me more details?


6) MongoDB Once even deleted the entire database


MongoDB 1.6, in the data synchronization configuration, is sometimes configured with an incorrect node (often an empty node) as an up-to-date data node. So the data on the other nodes of the synchronization data is wiped out (I'm talking about 700GB of good data) because it synchronizes the empty junction data back to the node with the data. The database is never supposed to do this. If this is the case, the database should throw an error and let the DBA choose a reasonable operation, or force the correct configuration to be used. Instead of deleting all the data (it was really bad that day).


They fixed the problem in 1.8, and I dropped God.


10gen CTO reply: Can't find such a thing, also can't find the corresponding code, can you give more information?


7) Released something that should not be published


It is well known that some embarrassing bugs in the stable version can lead to data problems-and we always tell us these questions after the problem, because we bought 10gen of their platinum technical support for their super scams. They responded by sending us a hot patch, their internal RC stuff, and then letting the hot patch run on our data.


10gen CTO reply: About Platinum Technical support, all of the issues we take over will be public and fix will be made public. There is no specific situation, this kind of thing is difficult to discuss. We will respond differently depending on the situation. We hope that the problems of our users can be solved as soon as possible.


8) The replicator is eclipsed on a busy server


The replicator often launches DOS attacks to Master, or the replication is very slow, takes a huge amount of time, and Oplog is almost exhausted (even if it is a 50GB oplog).


We have a busy, big data set and we won't copy it because it's dynamic. It's been a painful one months, or we need to cross two fingers before choosing a different database system (Note: Good luck gesture)


10gen CTO reply: This looks like the server is overloaded. I've mentioned it before.


But the worst problems are:


You might say that my problems are past, that they fix all these problems or that they will fix them in the next release, and that the X problem can be mitigated by the practice of Y. Wait, wait.


Unfortunately, you say that these things do not have a point of use.


The real problem is that so many problems are the primary problem. Database developers should be able to hold a higher standard than the average programmer. In other words, your priority should look like this:


Don't lose the data, you have to be completely sure of the data.

Ensure availability through practice

Scalability of multi-node performance

The minimum delay should remain between 99% and 95%

Number of requests per second for each resource

The order of 10gen seems to be #5 first, others casually, #1 not in the top 3.


10gen CTO Reply: This is obviously not true. Take a look at our submitted code and take a look at our fix. We never hide a bug in the release version. If we care much about the benchmark of performance, we will spend our energies to solve the problems of locking, so that multithreading can be faster.


MongoDB is a new thing, there is a lot of things to be polished. If you want to meet us, we welcome you to meet us.


These failures, and the implied priorities of the company, point to one of the most basic corporate culture issues that will make the issue appear in either release: because they lack the design discipline of the necessary database system.


Please consider these warnings carefully.


MongoDB-What is it? Where do you come from?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.