Directory (?) [+]
Open Source Search engine evaluation: Lucene Sphinx elasticsearch Open Source Search engine program has 3 major categories
Lucene System, Java development, including SOLR and Elasticsearch
Sphinx, C + + development, simple and high performance
Xapian, C + + development
Search Engine program This name is not appropriate, strictly speaking, it should be called the Index program (indexing programs), the early mainly used to do Chinese full-text search, but with the deepening of the Internet, the size of the Web site, the index program in Optimized Web site architecture plays a bigger role: replacing MySQL database built -in indexes
Let MySQL no SQL, only assume the function of data persistent storage
Eliminate join query/subquery, improve the concurrency of database processing ability
Usage status
Lucene from the noble, descendants thrive, and its brother Hadoop is flourishing, so fame is the biggest, and sphinx because simple and reliable, code structure, excellent performance, in the domestic large-scale web site use the most widely. Xapian users are too few to use
Technology selection to choose the most people's direction, not unconventional
Search performance
Elasticsearch is said to be 200ms.
SOLR I have no data at hand, should be slower than Sphinx
Sphinx Average Search Time: 20ms, so fast, because Sphinx can basically count as static indexes .
The client API can only update the document properties that have been saved and cannot add new documents.
Add new documents only through Build/merge, disk IO overhead is very large, from this point of view, Sphinx is not suitable for the content update frequently site, not suitable for real-time indexing. But the reality is that the domestic strong UGC sites are basically using Sphinx, such as Sina Weibo, Sohu Weibo, go, discuz, etc.
This is a huge challenge for programmers: only by building multilevel indexes, or by using SPHINX+SOLR's hybrid scheme
Lucene system
Lucene is a pure Index program code package, when used, you have to write a simple server program (accept keywords-through lucence query-return results), and then configured in the Application Server (Tomcat/resin), in general, This server program will use HTTP protocol, or XML-RPC, directly with TCP that's too boring.
SOLR has a warrior jigonghaoyi, to help you write the above mentioned Web program, you only need to configure the deployment is available, this is SOLR,SOLR external interface is the HTTP protocol, also supports distributed indexing
Elasticsearch, new project, recently very red, in fact, is also lucene vest, has the following characteristics
RESTful interface
Distributed guidance, including distributed search, distributed index, 0 configuration, auto-Shard, index auto-load
Optimized for real-time search: Put the index in memory and sync to the hard disk on a regular basis
Web graphical management tools included
Elasticsearch is designed for Amazon CloudSearch, and its key word is
Distributed
Realtime
Highly Available
These are the KO Fu giant, day UV millions of of the site, the index is only dozens of G, ordinary players do not need
But from the point of view of cutting-edge technology, if you have more than 3 index servers, you can try to deploy Elasticsearch, performance is now almost, but hardware and time will help you get everything done.
-
Top
-
0
-
Step
-
0
Open Source Search engine evaluation: Lucene Sphinx Elasticsearch