International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Search engine options: Elasticsearch and SOLR

Last Update:2016-04-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Search engine Selection Research Document ELASTICSEARCH Introduction *

Elasticsearch is a real-time, distributed search and analysis engine. It can help you deal with large-scale data at an unprecedented rate.

It can be used for full-text search, structured search and analysis, and of course you can combine the three.

Elasticsearch is a search engine based on the full-text search engine Apache lucene™, which can be said that Lucene is the most advanced and efficient full-featured open source search engine framework today.

But Lucene is just a framework that takes full advantage of its functionality, requires Java, and integrates Lucene into the program. It takes a lot of learning to understand how it works, and Lucene is really complicated.

Elasticsearch uses Lucene as an internal engine, but when using it for full-text search, you only need to use a uniformly developed API, without having to understand how the complex lucene behind it works.

Of course elasticsearch is not just lucene so simple, it includes not only full-text search function, but also can do the following work:

Distributed real-time file storage, and each of the fields are indexed so that they can be searched.
Distributed search engine for real-time analysis.
Can scale to hundreds of servers, processing petabytes of structured or unstructured data.

With so many features integrated into a single server, you can easily communicate with ES's RESTful API through the client or any of your favorite programming languages.

Elasticsearch is very easy to get started with. It comes with a lot of very reasonable default values, which makes it easy for beginners to avoid the complicated theory of getting started.

It is ready to use and can become productive with a small learning cost.

As the deeper you learn, the more advanced features of Elasticsearch can be leveraged, and the entire engine can be configured flexibly. You can customize your own elasticsearch according to your own needs.

Use case:

Wikipedia uses Elasticsearch to perform full-text searches and highlight keywords, as well as providing search suggestions such as Search-as-you-type, Did-you-mean, and more.
The Guardian uses Elasticsearch to process guest logs so that the public can respond in real time to editorial responses to different articles.
StackOverflow combines full-text search with geolocation and related information to provide more-like-this-related issues.
GitHub uses Elasticsearch to retrieve more than 130 billion lines of code.
Every day, Goldman Sachs uses it to process the index of 5TB data, and many investment banks use it to analyze stock market movements.

But Elasticsearch is not just for large companies, it has also helped a lot of startups like Datadog and Klout to expand their capabilities.

Advantages and Disadvantages of Elasticsearch * *: Benefits

The Elasticsearch is distributed. No other components are needed, the distribution is real-time and is called "Push Replication".
Elasticsearch fully supports Apache Lucene's near real-time search.
Handling multi-tenancy (multitenancy) does not require special configuration, while SOLR requires more advanced settings.
Elasticsearch uses the Gateway concept to make the complete part simpler.
Each node makes up a peer network structure, and when some nodes fail, they are automatically assigned other nodes to work instead.

Disadvantages

There is only one developer (the current Elasticsearch GitHub organization is more than that, already has a fairly active maintainer)
Not enough automatic (not suitable for the current new index warmup API)

About SOLR *

SOLR (read as "solar") is an open source enterprise search platform for the Apache Lucene project. Its main functions include full-text search, hit-mark, faceted search, dynamic clustering, database integration, and rich text (such as word, PDF) processing. SOLR is highly extensible and provides distributed search and index replication. SOLR is the most popular enterprise-class search engine, and SOLR4 has added NoSQL support.

SOLR is a standalone full-text Search server written in Java that runs in a servlet container such as Apache Tomcat or jetty. SOLR uses the Lucene Java Search Library as the core of full-text indexing and searching, and has a rest-like Http/xml and JSON API. SOLR's powerful external configuration makes it possible to adapt to multiple types of applications without the need for Java encoding. SOLR has a plug-in architecture to support more advanced customization.

Since the 2010 Apache Lucene and Apache SOLR Project were merged, two projects were made by the same Apache Software Foundation development team. When referring to technology or products, LUCENE/SOLR or Solr/lucene are the same.

Advantages and disadvantages of SOLR

SOLR has a larger, more mature community of users, developers, and contributors.
Supports the addition of multiple formats of indexes, such as HTML, PDF, Microsoft Office series software formats, and plain text formats such as JSON, XML, and CSV.
SOLR is more mature and stable.
It is faster to search without having to consider building an index.

Disadvantages

When indexing is established, search efficiency decreases and real-time index search efficiency is not high.

Comparison of Elasticsearch and SOLR *

SOLR is faster when you simply search for existing data.

When indexed in real time, SOLR generates IO blocking, poor query performance, and Elasticsearch has obvious advantages.

As the amount of data increases, SOLR's search efficiency becomes lower, and Elasticsearch does not change significantly.

In summary, SOLR's architecture is not suitable for real-time search applications.

Actual Production Environment Test *

The average query speed of the search engine from SOLR to Elasticsearch has been increased by 50 times times.

A comparative summary of Elasticsearch and SOLR

Both installations are simple;
SOLR uses Zookeeper for distributed management, and Elasticsearch itself with distributed coordination management functions;
SOLR supports more formats of data, while Elasticsearch only supports JSON file formats;
SOLR officially provides more features, while Elasticsearch itself is more focused on core functions, advanced features are provided by third-party plug-ins;
SOLR is better than Elasticsearch in traditional search applications, but the aging rate in real-time search applications is significantly lower than that of Elasticsearch.

SOLR is a powerful solution for traditional search applications, but Elasticsearch is more suitable for emerging real-time search applications.

Other Lucene-based open source search engine solutions *

Direct use of Lucene

Description: Lucene is a JAVA search class library that is not a complete solution in itself and requires additional development work.

Pros: Proven solutions with a lot of success stories. Apache's top-notch project is continuing to make rapid progress. A large and active development community, a huge number of developers. It's just a class library, with plenty of customization and optimization space: simple customization to meet most common needs, optimized to support 1 billion + magnitude search.

Cons: Additional development work is required. All extensions, distribution, reliability, etc. need to be implemented on their own, not real-time, there is a time lag from index to searchable, and the scalability of the current "near real time" search solution is still to be perfected

Katta

Description: Lucene-based, support for distributed, extensible, fault-tolerant, quasi-real-time search solutions.

Advantage: Out-of-the-box, can be distributed with Hadoop. Extended and fault tolerant mechanisms.

Cons: Just search for the project, build the index part still need to implement. On the search function, only the most basic requirements are realized. Fewer success stories and less maturity for the project. Because of the need to support distributed, for some complex query requirements, customization will be more difficult.

Hadoop Contrib/index

Description: Map/reduce mode, distributed indexing scheme, can be used in conjunction with Katta.

Advantages: Distributed indexing, scalability.

Cons: Just build an indexing scheme, not including search implementations. Working in batch mode, poor support for real-time search.

LinkedIn's Open Source solutions

Description: A series of solutions based on Lucene, including quasi real-time search Zoie, facet search implementation Bobo, machine learning algorithm decomposer, abstract repository Krati, database schema packaging sensei, etc.

Pros: Proven solutions that support distributed, scalable, and rich feature implementations

Cons: Too close to LinkedIn and less customizable

Lucandra

Description: Based on Lucene, index exists in Cassandra database

Advantages: The advantages of reference Cassandra

Cons: Refer to the disadvantages of Cassandra. In addition, this is just a demo, without a lot of verification

Hbasene

Description: Based on Lucene, index exists in HBase database

Pros: Refer to the advantages of HBase

Cons: Refer to the disadvantages of HBase. In addition, in the implementation, Lucene terms is a storage line, but each term corresponds to the posting lists is stored in a column way. As the posting lists of a single term increases, the speed of the query is greatly affected.

Reprint: http://blog.csdn.net/jameshadoop/article/details/44905643

Search engine options: Elasticsearch and SOLR

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

apache solr search engine solr search engine tutorial elasticsearch search solr vs elasticsearch how to create search engine in php and mysql search engine in php and mysql source code google voice search options

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Search engine options: Elasticsearch and SOLR

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support