How does Cassandra migrate data to MongoDB?

Source: Internet
Author: User
Tags cassandra flowdock

Flowdock is a Web-based team communication tool. all software developers should use it for communication, rather than using tools such as Campfires, Skype Chats, or IRC. because it can better support their real workflows.

Last week, we made a switchover to the Database Service of Flowdock. Cong migrated data from Cassandra to another NoSQL tool-MongoDB. as our technical choices have aroused some interest, I will explain our reasons for decision-making to the public.

Some of our customers will remember the image below:

To a certain extent, we have encountered the stability problem of Cassandra. all nodes are in an infinite loop, running Garbage Collection (GC, Garbage Collection) and attempting to compress data files-occasionally causing cluster paralysis. in addition to restarting the cluster and regularly compressing the nodes manually to stabilize the cluster for a while, we have no plans. other people have reported similar issues. in the past few weeks, our Cassandra node always eats all the resources allocated to it, resulting in slow Flowdock operation.

Because of our bloodthirsty database selection (James Note: This is something I don't agree with, maybe this is a last resort for some Startup companies .), this is not the first time we have encountered such a problem. when upgrading from Cassandra 0.4 to 0.5, we were forced to shut down the entire cluster, just to refresh all the data to the disk (although, we have performed manual refresh according to the document ). this operation resulted in the loss of discussions for several minutes, and serious inconsistencies in the indexes we created manually, so that we need to perform a full reconstruction. I think the last time we leave the office is four o'clock in the morning.

The NoSQL community has changed a lot since we first selected Cassandra. mongoDB has changed a lot. The newly added auto-sharding and replica set make it a powerful alternative to Cassandra. therefore, we decided to try MongoDB.

It takes me one day to write a script for migrating data from Cassandra to MongoDB. in about a week, we can run Flowdock on MongoDB. before deploying MongoDB in the production environment, the internal test lasted for several weeks.

At present, we have completed this adjustment,

1. intelligent (Multi-key) index. manual indexes are annoying. MongoDB can automatically help us maintain the required indexes. for example, our message contains tags, such as the document in the following format:

 
 
  1. {Content: "Write a blog post about # mongodb .",
  2. Workspace: 'myflow ',
  3. Tags: ["mongodb", "todo", "@ Otto"]}
  4. In this way, if you only search for your own tasks, you only need to perform the following query in the background of Flowdock:
  5. Db. messages. find ({
  6. Workspace: 'myflow ',
  7. Tags: {$ all: ["todo", "@ Otto"]}
  8. })
  9.  

2. query. no matter how simple the data model is, you do not need to plan this matter in advance whenever you need to execute a query. in MongoDB, you can customize complex queries directly on the console, which is very similar to SQL database. it performs an ordered scan accordingly, which is faster and more convenient than manually processing millions of records on the client.

3. Map-Reduce. This is a powerful tool for analysts. Although MongoDB's Map-Reduce function is not very well supported, it is easy to use at least.

4. GridFS makes file storage operations very easy. Its storage capacity can grow with the expansion of our MongoDB cluster.

We also encountered some minor restrictions.:

1. We found a JSON parsing bug, but we fixed it within 10 minutes.

2. BSON's Document key does not support dot. Generally, this may not be a problem, but we must solve this problem in data migration.

3. document has a size limit of 4 MB. this is not a problem for our data model. Because MongoDB has excellent support for atomic update in-place updates, you need to pay attention to it, document cannot exceed the limit of 4 MB.

4. Adding new nodes is not that easy in Cassandra. However, Cassandra has its own problems in load balancing of new nodes.

So far, it has been running smoothly, and the work of developers and database administrators has been greatly reduced.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.