Full-Text search engine SOLR series--SOLR core concepts, configuration files

Source: Internet
Author: User
Tags relational database table solr

Document

Document is the most basic unit of the SOLR index (verb, indexing) and search, which resembles a record in a relational database table and can contain one or more fields (field), each containing a name and text value. A field can be stored in an index while it is indexed, and the value of the field can be returned when the search is searched, and usually the document should contain an ID field that uniquely represents the document. For example:

12345678 <doc>    <field name="id">company123</field>    <field name="companycity">Atlanta</field>    <field name="companystate">Georgia</field>    <field name="companyname">Code Monkeys R Us, LLC</field>    <field name="companydescription">we write lots of code</field>    <field name="lastmodified">2013-06-01T15:26:37Z</field></doc>
Schema

The schema in SOLR is similar to the table structure in a relational database, it exists in the Conf directory as Schema.xml text, and when added to the index, you need to specify that the Schema,schema file contains three main parts: field, field type (FieldType), unique key (UniqueKey)

    • field type (FieldType): Used to define the type that is added to the XML file fields (field) in the index, such as: Int,string,date,
    • Field: The name of the field when added to the index file
    • Unique key (UniqueKey): UniqueKey is a field that identifies the uniqueness of the document (Feild), which is used when updating and deleting

For example:

1234567891011121314151617181920212223 <schema name="example" version="1.5">    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />    <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>    <uniqueKey>id</uniqueKey>    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">          <analyzer type="index">            <tokenizer class="solr.StandardTokenizerFactory"/>            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />            <!-- in this example, we will only use synonyms at query time            <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>            -->            <filter class="solr.LowerCaseFilterFactory"/>          </analyzer>          <analyzer type="query">            <tokenizer class="solr.StandardTokenizerFactory"/>            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>            <filter class="solr.LowerCaseFilterFactory"/>          </analyzer>    </fieldType></schema>
Field

In Solr, a field is the basic unit that forms the document. Corresponds to a column in a database table. A field is a metadata that includes the name, type, and how the value corresponding to the field is handled. Like what:

<field name="name" type="text_general" indexed="true" stored="true"/>
    • Indexed:indexed=true, indicates that the field is added to the index by SORL processing, and only the indexed fields can be searched.
    • Stored:stored=true, the field value is stored in the index as a copy of the original content and can be returned by the search component component, which is not suitable for long text storage in the index, due to performance issues.
Field Type

Each field in SOLR has a corresponding field type, such as float, long, double, date, TEXT,SOLR provides a rich field type, and we can also customize the data types that are appropriate for you, such as:

12345678910 <!-- Ik 分词器 -->  <fieldType name="text_cn_stopword" class="solr.TextField">     <analyzer type="index">          <tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="false"/>     </analyzer>     <analyzer type="query">          <tokenizer class="org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart="true"/>     </analyzer> </fieldType> <!-- Ik 分词器 -->
Solrconfig:

If the schema is defined as SOLR's model, then Solrconfig is the SOLR configuration, which defines SOLR if it handles many requests such as indexing, highlighting, searching, and also specifies a cache policy, with more elements including:

    • Specify the index data path
123456 <!-- Used to specify an alternate directory to hold all index dataother than the default./data under the Solr home.If replication is in use, thisshould match the replication configuration. --><dataDir>${solr.data.dir:./solr/data}</dataDir>
    • Cache parameters
12345678910111213141516171819202122 <filterCache  class="solr.FastLRUCache"  size="512"  initialSize="512"  autowarmCount="0"/> <!-- queryResultCache caches results of searches - ordered lists of     document ids (DocList) based on a query, a sort, and the range     of documents requested.  --> <queryResultCache  class="solr.LRUCache"  size="512"  initialSize="512"  autowarmCount="0"/> <!-- documentCache caches Lucene Document objects (the stored fields for each document).   Since Lucene internal document ids are transient, this cache will not be autowarmed.  --> <documentCache  class="solr.LRUCache"  size="512"  initialSize="512"  autowarmCount="0"/>
    • Request processor
      The request processor is used to receive the HTTP request, and after processing the search, the processor returns the result of the response. For example: Query request:
123456789 <!-- A request handler that returns indented JSON by default --><requestHandler name="/query" class="solr.SearchHandler">     <lst name="defaults">       <str name="echoParams">explicit</str>       <str name="wt">json</str>       <str name="indent">true</str>       <str name="df">text</str>     </lst></requestHandler>
每个请求处理器包括一系列可配置的搜索参数,例如:wt,indent,df等等。

Full-Text search engine SOLR series--SOLR core concepts, configuration files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.