ClouderaSearch: Easy full-text Hadoop search

Source: Internet
Author: User
Tags solr
Recently, ClouderaSearch was launched. For me who used to search and use javasesolr, although it is not a new technology, I believe that in terms of application, for the industry, there is no doubt that it is a very exciting news. Think about it. ClouderaSearch with a complete set of solutions in hand is in hand. Now

Recently, Cloudera Search was launched. For me who used Lucene/Solr for information retrieval and use, although it is not a new technology, I believe that at the application level, for the industry, there is no doubt that it is a very exciting news. Think about it. ClouderaSearch with a complete set of solutions in hand is in hand. Now

Recently, Cloudera Search was launched. For me who used Lucene/Solr for information retrieval and use, although it is not a new technology, I believe that at the application level, for the industry, there is no doubt that it is a very exciting news. Think about it. With ClouderaSearch, which has a complete set of solutions in hand, anyone can now easily search the full text of the data stored in Hadoop like Google Baidu!


The core components of Cloudera Search include Hadoop and Solr, which are built on Lucene. Hadoop was developed as a subproject of Lucene in. Now, we are pleased to see that these two technologies have joined hands again to provide more users with a powerful tool to easily use the massive data stored in Hadoop to obtain information and value. We can predict that, more enterprise applications, including internal and external enterprises, can be deployed around Hadoop.


The following is a simple Cloudera Search component.



With Tika, Cloudera Search supports a large number of widely used file formats. In addition, ClouderaSearch also supports many other data commonly used in Hadoop applications, such as Avro, SequenceFile, and log files.


The data used for indexing and full-text retrieval can be from HDFS, such as log files, Hive or HBase tables (by integrating the NGData Lily project, support for HBasae is also in progress ). Or, you can use Flume to collect data from external data sources and use a new FlumeSink to directly write data to the index database. You can also use Flume to pre-process the data to be indexed, for example, conversion, extraction and creation of metadata.

?

The created indexes are stored in HDFS. This provides search with the advantages of scalability, redundancy, and fault tolerance.


In addition, we can run MapReduce to index the data we need to search for Solr.


In most cases, Zookeeper is used to coordinate the distribution of various data (refer to the http://wiki.apache.org/solr/SolrCloud) and provides automatic failover in case of system errors to improve reliability.


In terms of system installation and deployment, Cloudera Manager can be used to simplify tedious work and provide functions for managing and monitoring search services.


On the user interface, search users can use the HUE search interface to perform search operations. Alternatively, you can use the command line tool or the Solr GUI.


Currently, Cloudera Search is still in the testing phase. In terms of performance, a server can support hundreds of millions of documents and indexes close to TB. The Search results are usually returned within 1 to 2 seconds. We look forward to the early release of stable versions.

Http://training.cloudera.com/elearning/SearchOverview/


Http://www.cloudera.com/content/support/en/documentation/cloudera-search/cloudera-search-documentation-v1-latest.html


Http://wiki.apache.org/solr/

?

Http://wiki.apache.org/solr/SolrCloud

?


? Should youth bloom like this ?? Game testing: Who is your best brother in the Three Kingdoms period !!?? Secret of your horoscope

Original article address: Cloudera Search: Hadoop full-text Search is easy to implement. Thank you for sharing it with the original author.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.