Our products require full-text search functionality, and back-end data storage primarily uses MySQL + MongoDB, where the content that needs to be retrieved is in MongoDB.
MongoDB itself is self-featured with text indexing, but it does not support Chinese. The technology industry has specialized, MongoDB is the data storage application, then the full text search uses the specialized full-text search engine bar.
Some of the pre-selected contestants are: SOLR, ElasticSearch, Xapian, Sphinx, Xunsearch. Because of our large amount of data, I feel that now stand alone is a bit too much, MongoDB also began to plan to do shards, then full-text search if the self-distributed skills that is most appropriate. After a series of considerations, finally, we decided to use the ElasticSearch.
Now the back-end program is to write data directly to MongoDB, I do not want to modify the program code, do not want to add and delete data in the MongoDB while adding and deleting data in ElasticSearch. You want to automatically sync to ElasticSearch when data is sent in MongoDB, so you can use the ElasticSearch as quickly as possible.
At first I found the solution to use the ElasticSearch River to synchronize the data, and on GitHub to the MongoDB River plugin: Elasticsearch-river-mongodb. However, then I read this blog on the ElasticSearch official online: "Deprecating Rivers", the official has been in the version after 1.5 abandoned river, for the user's migration, will remain until the 2.0 version.
So, I have to find another solution. Then I found another solution on the Internet: Mongo-connector. This is a tool written by the official MongoDB developer in Python, which currently supports the synchronization of MONGODB data into SOLR, ElasticSearch, MongoDB, and allows users to expand themselves. See the README in the disclaimer, I kind of feel that this is the ensemble nature of the tool, but still hold the mentality decided to try.
Here is the deployment process:
MongoDB must turn on replication set, if it is already open, ignore this step:
To configure the name of the replica set:mongod --replSet myDevReplSet
Initialize the replica set in the MONGO shell:rs.initiate()
Install ElasticSearch, ignore this step if you have already installed it.
Install Mongo-connector:
Install PIP First:
yum install python-setuptools && easy_install pip
Install Mongo-connector with PIP:
pip install mongo-connector
Run Mongo-connector:
mongo-connector -m 127.0.0.1:27017 -t 127.0.0.1:9200 -d elastic_doc_manager
OK, now, in MongoDB, the data can be added to the ElasticSearch immediately synchronization. In the trial process, Mongo-connector quit two times, one of the disconnection too long did not notice, I had to resynchronize. Still a little unreliable feeling, may also have to write a Guardian program, let Mongo-connector has been able to work in the backstage.
Expand reading:
MongoDB data is automatically synced to ElasticSearch