Abstract: intends to write several elasticsearch use experience. First, start with the horizontal comparison of Elasticsearch and Sphinx. Cross-correlation is a good way to react to the pros and exposures of the problem. I am the Sphinx camp to the Elasticsearch camp, both are mature open source search engine, each has the pros and cons, this article can also be used to tangle with which package of students to provide some of the basis for choice. • Import MySQL data generation index Elasticsearch:github-
Intend to write a few elasticsearch use experience.
First, start with the horizontal comparison of Elasticsearch and Sphinx. Cross-correlation is a good way to react to the pros and exposures of the problem. I am the Sphinx camp to the Elasticsearch camp, both are mature open source search engine, each has the pros and cons, this article can also be used to tangle with which package of students to provide some of the basis for choice.
• Import MySQL data generation index
Elasticsearch:github-scharron/elasticsearch-river-mysql
Sphinx: Native support for MySQL-based table indexing
Elasticsearch official documentation, data is inserted using a restful interface, which is an incremental update. When the amount of data is very large, it can be very time-consuming to traverse the full table to rebuild an index. And Elasticsearch-rivel-mysql This project is not very reliable, developers have even been on git to mark deprecated (not now). Anyway, I wrote another set of myself.
When importing MySQL data generation index, Sphinx is superior to elasticsearch in terms of ease of use, reliability, and speed.
• Incremental update support
Elasticsearch is better than Sphinx. Elasticsearch the incremental update as the preferred curd approach, and Sphinx using the auxiliary table is not only elegant, but also complicates your other systems, making it easy to make mistakes when you change a single piece of data frequently.
• Visualization and accessibility tools
Elasticsearch:kibana,logstash,marvel
Sphinx:sphinx Tools
Kibana is a graphical interface provided by Elasticsearch, the basic ability is: 1) read an index 2) to an index write query out the specific data 3) generated by the data graph 4) pull a few charts to generate a report. Kibana is very powerful, and based on these basic functions, we are already free to customize and complete a variety of complex requirements. Kibana can also add a variety of plug-ins, the most commonly used is Marvel (performance, status monitoring) and Logstash (data collection), very useful.
Sphinx Tools, also stay in the performance monitoring stage, but also in beta, was elasticsearch kibana+ family barrels too far.
• Search algorithm Support
Elasthcsearch's search underlying functionality is based on Lucene,sphinx. Elasticsearch's query DSL, however, supports more complex querying logic, which is beyond Sphinx.
In terms of custom ranker, Elasticsearch's function score query is much more powerful than Sphinx Expression-ranker. That year I in order to let Sphinx support a custom ranker, had to change the source, later found that this feature can be easily implemented on Elasticsearch.
Overall, Elasticsearch is slightly better than Sphinx.
• Scale-out and high availability
Elasticsearch is inherently designed for clustering. If the index is not replica, the yellow light will be displayed, and the light will be green. Each node is divided into client node, Data node, Master node three roles, under reasonable configuration, any one (or even many) of the machine blew up, the whole cluster can run normally. Elasticsearch also supports dynamic addition of machines and so on functions, temporarily do not repeat.
Sphinx also has the concept of master searchd and slave searchd, which can be distributed, but it is quite complex to achieve high availability.
Elasticsearch is better than Sphinx. The disadvantage of Sphinx is not that it cannot be done, but that it is not good.
• Resource Usage
Sphinx is better than Elasticsearch.
It has to be said that Java is inferior to C + + in this respect. CPU is OK, the gap is not big, the memory footprint of the true day difference.
• Search Speed
Search speed mainly see how to configure the cluster, the more the more search up faster.
NLP Support
Only Chinese NLP is spoken here.
Elasticsearch:github-medcl/elasticsearch-analysis-ik
Sphinx: There used to be a project called Coreseek, but did not continue to maintain.
In fact, both sides can develop third-party plug-ins, access to domestic LTP or ictclas are not difficult.
• Summary
Sphinx and Elasticsearch are very good programs, have experienced the test of actual combat. Then why did I go from Sphinx to Elasticsearch, the main reason is:
1) I have custom ranker needs, Elasticsearch's functional score query just met me
2) business is getting bigger, Elasticsearch has stronger scale-out capability and high availability
3) You can use Kibana to complete a good-looking report ah.
Now look back, Elasticsearch force is very fierce, version iterations such as rockets, the community is also very active. I don't think the choice is wrong.