Open Source Search engine evaluation: Lucene Sphinx Elasticsearch

Source: Internet
Author: User
Tags solr

Directory (?) [+]

Open Source Search engine evaluation: Lucene Sphinx elasticsearch Open Source Search engine program has 3 major categories
    • Lucene System, Java development, including SOLR and Elasticsearch

    • Sphinx, C + + development, simple and high performance

    • Xapian, C + + development

Search Engine program This name is not appropriate, strictly speaking, it should be called the Index program (indexing programs), the early mainly used to do Chinese full-text search, but with the deepening of the Internet, the size of the Web site, the index program in Optimized Web site architecture plays a bigger role: replacing MySQL database built -in indexes

    1. Let MySQL no SQL, only assume the function of data persistent storage

    2. Eliminate join query/subquery, improve the concurrency of database processing ability

Usage status

Lucene from the noble, descendants thrive, and its brother Hadoop is flourishing, so fame is the biggest, and sphinx because simple and reliable, code structure, excellent performance, in the domestic large-scale web site use the most widely. Xapian users are too few to use

Technology selection to choose the most people's direction, not unconventional

Search performance
    • Elasticsearch is said to be 200ms.

    • SOLR I have no data at hand, should be slower than Sphinx

    • Sphinx Average Search Time: 20ms, so fast, because Sphinx can basically count as static indexes .

      The client API can only update the document properties that have been saved and cannot add new documents.

      Add new documents only through Build/merge, disk IO overhead is very large, from this point of view, Sphinx is not suitable for the content update frequently site, not suitable for real-time indexing. But the reality is that the domestic strong UGC sites are basically using Sphinx, such as Sina Weibo, Sohu Weibo, go, discuz, etc.

      This is a huge challenge for programmers: only by building multilevel indexes, or by using SPHINX+SOLR's hybrid scheme

Lucene system
    • Lucene is a pure Index program code package, when used, you have to write a simple server program (accept keywords-through lucence query-return results), and then configured in the Application Server (Tomcat/resin), in general, This server program will use HTTP protocol, or XML-RPC, directly with TCP that's too boring.

    • SOLR has a warrior jigonghaoyi, to help you write the above mentioned Web program, you only need to configure the deployment is available, this is SOLR,SOLR external interface is the HTTP protocol, also supports distributed indexing

    • Elasticsearch, new project, recently very red, in fact, is also lucene vest, has the following characteristics

      1. RESTful interface

      2. Distributed guidance, including distributed search, distributed index, 0 configuration, auto-Shard, index auto-load

      3. Optimized for real-time search: Put the index in memory and sync to the hard disk on a regular basis

      4. Web graphical management tools included

Elasticsearch is designed for Amazon CloudSearch, and its key word is

    • Distributed

    • Realtime

    • Highly Available

These are the KO Fu giant, day UV millions of of the site, the index is only dozens of G, ordinary players do not need

But from the point of view of cutting-edge technology, if you have more than 3 index servers, you can try to deploy Elasticsearch, performance is now almost, but hardware and time will help you get everything done.


Top
0
Step
0

Open Source Search engine evaluation: Lucene Sphinx Elasticsearch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.