Apache SOLR initial experience 3

Source: Internet
Author: User
Tags apache solr solr

Two days ago, I learned the basic usage of SOLR. Next I will start to learn about SOLR. Learn about its file structure and configuration file.

In the current learning stage, the most important folder is the example folder, which contains many things we want to learn.

Let's take a look at the structure of this folder.

We can see this file structure. You can see the meaning of folders. Here we only introduce two folders: multicore and SOLR.

Multicore is used only when multiple SOLR instances are used. Currently, it is useless. SOLR is a built-in SOLR. Home, which is the focus of this introduction.

Go to this folder and we can see the following structure:

When the bin folder contains additional processing scripts, we need to put them here. This is not used for the moment. We will skip it first.

Conf is the configuration file of SOLR, which is important here.

Data is the index directory.

I thought it was run, so there is this data. Bak Directory, which is a previous backup.

Let's take a look at the configuration file in the conf Folder:

Here the copy is my backup file. The most important files are schema.xmland solrconfig.xml. the remainder of the files, such as stopwords.txt, is the pause word. When the index and query encounter these words, they will be automatically ignored. For other files, we do not need to talk about them for the time being.

Open solrconfig. xml and we can see the <datadir> label. By default, this is the case.

<Datadir >$ {SOLR. Data. dir:./SOLR/Data} </datadir>

By default, it creates a data directory under the SOLR directory under the current directory to store the index. This is why When SOLR. Home is set to this, the SOLR/data folder is generated when it is started in the bin folder of Tomcat. Because the current directory when Tomat is started is bin.

Of course, this is not a SOLR configuration error. For its built-in jetty server, it is correct, because the current directory is exactly the SOLR directory.

We don't need to configure many things in this file. Let's talk about it first, and I will explain it later.

Next, let's take a look at the key schema. xml. This file describes our index fields. Here we can describe all the fields that can be indexed.

We can see that there are a lot of comments in it. In fact, most of the comments in it are understandable. The configuration of this file is closely related to the integration of Chinese word segmentation that we need in the future, so we need to take a look at this file.

We can see the types label, and there are a lot of fieldtypes in it, there are also a lot of attributes, here we will give a rough introduction:

Fieldtype is the field type that defines the index. It has several attributes, including name and class:

Name is the type name, and class is the class in SOLR. There are also two attributes that may be difficult to understand when you look at the English comments. sortmissinglast and sortmissingfirst attributes refer to where the searched content is ranked when it is empty, when sortmissinglast is true, it indicates that it is placed at the end, and sortmissingfirst indicates that it is placed at the beginning. If it is false, the opposite is true.

Note: The <analyzer> label is found in fieldtype later. This is used to configure the word divider. We will discuss this later.

Next is the <fields> label and the <field> in it. The field is of course our index content.

It has the attributes indexed and stored, respectively, indicating whether to index and store, And a multivalued indicating whether to allow multi-value.

Here we can see that features has several values, and we can see that the configured features in the configuration file is also like this:

<field name="features" type="text" indexed="true" stored="true" multiValued="true"/>  

It indicates that the field of our index has multiple values. If it is not configured here, an error will be reported during data transmission.

The type corresponds to the fieldtype we configured earlier.

Another <dynamicfield> is a dynamic matching domain,

<dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>  

We can see that its name specifies a wildcard, which indicates that it can match all fields with names ending with _ I. If we specify

<dynamicField name="*"/>  

Then we can match all the fields.

Next, we can see:

<uniqueKey>id</uniqueKey>  

 

<defaultSearchField>text</defaultSearchField>  

 

<solrQueryParser defaultOperator="OR"/>  
<copyField source="cat" dest="text"/>


Here we can see it in English. The most important thing is the bottom copyfield, which refers to copying the field. It will copy the value in the source field to the text field, this facilitates indexing. Note that copyfield is usually used only for multi-value fields, that is, fields with multivalued set to true.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.