The disadvantage of LUCENE/SOLR

Source: Internet
Author: User
Keywords Different direct can when can't
LUCENE/SOLR disadvantage solrlucenehadoop&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; 1 http request done Cache,8630.html "> Sometimes the new data will not be visible, cache lag problem." -cache optimization is not a problem


2 Admin Background page, support Chinese, complex query syntax, less friendly. -It's not a problem to expand yourself


3) Swap core when the single node multiple core, and the core corresponding index is larger, the switching process occurs in memory twice times phenomenon, or even timeout phenomenon. -It's not a problem if you switch back and forth.


4 The index build and index search are often together, leading to a full volume process with disk peaks of 3 times times. An original, a new one, a time for optimization. -Of course, build and search separation can solve this problem, but also the conventional approach.


5 Build and search together, but also make build and search some parameter settings can not be treated differently, especially build and search fit, reserve disk, memory and other accelerated build, but affect search. --Of course you can build search to get rid of
.

6 Distributed query, if there is a merge, there are some problems with performance. --Of course you can partition the data to avoid merge 7. The scoring factor can be adjusted, but the increase in the score factor and the expansion of the scoring formula cannot be inserted directly from the SOLR configuration. --however, you can extend the code or parameter spanquery of Lucene, add a query, and insert SOLR, which is a bit more work. In addition, the community provides BM25, PageRank and other sorting batch, after the Lucene has to understand, can be directly referenced.


8) SOLR Distributed index full volume, incremental control granularity, is not friendly enough. Specifies the node, any time full, and the increment is not smooth enough under the specified conditions. Although SOLR provides a custom extension implementation method. This is not a big problem.


9 SOLR Build and search and together, data and business are actually bound together, not completely isolated. This makes data source management maintenance very resource-intensive at 100 core. Direct introduction of Hadoop or other NoSQL storage is currently the most popular for isolating data and business coupling. There are many open source and distributed lucene schemes.


abtest share the same index directory, and different sorting or different word SOLR can not directly support one abtest independent index directory, different sorting or different participle, SOLR can not directly support


12) A core corresponds to multiple subdirectories, the query can check both the specified subdirectory and all subdirectories, as well as update a subdirectory index or all subdirectories index, SOLR also can not directly support, and these are in large amount of data need to support these features.


SOLR or Lucene does not currently support fast "local" updates. This refers to a quick update of a field in document, which is now required to pass in the complete document and add it. If the document's invariant fields originate from multiple sources, IO, computational resources are a bit wasteful, if the update volume is not good. -Of course, it can be handled with the newer memory alone, while the larger base index does not move him.


SOLR does not support third-party conditional filtering. For example, filtering from the inverted row for a batch of doc, which needs to be filtered with the external source for doc-domain values. The main problem is that the third party information is too dynamic, which is not conducive to the direct writing index.


in support of Chinese word segmentation, there are many third-party packages can be introduced, but need to expand queryparse sometimes, overall, there are advantages and disadvantages. Advantage is the introduction of convenience, the disadvantage is the word library, algorithm system and Lucene is not fully compatible, extended, perfect is not so easy.


16) There is no readily available support for the sort, weight or timing dynamics of the order. The weight is the first few results of the order, it is possible that a field value is exactly the same, or a number of field values exactly the same, resulting in the appearance of the previous results with some associated fields of "aggregation" for some applications, not the best.


in the time factor dynamic, there is no direct support, can only rely on the indirect chronological order to achieve. This problem is not in fact Lucene, SOLR to pay attention to it, should be the application of the particularity caused it.


SOLR, Lucene output of the log, there is no common analysis tool, including high-frequency words, queries query aggregation and so on. Can only be resolved on its own.


18) In support of the recommendation, the log information can not be directly related to the recommendation is also largely off-line calculation good, import inverted index, query and related.


19 when the memory 30 g above, the single node index data is large, the JVM environment FGC and memory management is very difficult. Tuning requires careful testing


Lucene is rarely interface-oriented, and SOLR has many interfaces, plug-ins, and scalability that makes SOLR flexible


21 for vertical type of platform search, support n different applications, different schemas, different data sources, different update frequency, different query logic, different access requests, different performance index requirements, different machine configuration, vertical expansion, horizontal expansion, SOLR is not competent, Although Solrcloud has a lot of valuable design experience.


22) Flow control and NC, SOLR also can not directly support. Access requests do not support timing and quantitative control, index vertical expansion (increased index replicas, support for more access requests), index level expansion (increase index partition number, support more data volume, balance performance and space pressure)


) SOLR self-fault tolerance is not strong enough. For example, the unreasonable detection caused by schema change and the rollback of configuration error, some parameters of solrconfig can not be obtained dynamically, so they must be configured in advance. Oom can not automatically reload! after the You cannot discard requests when the volume of requests is large.


24 advanced applications based on bitwise operations are not flexible enough, such as Boolean storage and facet, byte] storage and facet, group, etc., and the support is still not friendly enough.


Query parse has no predictive functionality and cannot adjust query order and auto shrink conditions. Of course, in general, there is no need for such a complex optimization.


26 Some of the more abnormal query requirements are not particularly efficient. For example, querying a domain is not empty. Of course, you can take the default value instead of the airspace, query defaults and filter.


27) for a unique range, there is no optimization, resulting in a unique range of term data bloat. The most common is the update time, upload time and so on, accounted for a very large proportion of term.


Multivalue field, the essence is to establish multiple fields of the same domain name, not a domain. For a lot of domain values, it has to be saved with. Also, arrays such as long int short float double cannot be saved directly as a type, all of which must be converted to character storage. Space and efficiency are somewhat low.


29 Some words appear in a particularly high frequency, resulting in a very long inverted line, with no interference from SOLR or Lucene. The task is given to the application for its own consideration, in fact, the SOLR single node for the hit over 100w, and the multiple fields of sorting, cache failure when the performance is very bad.


Solr\lucene to Tens application is very good at, billion-level applications need to be treated with caution.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.