Document
Document is the most basic unit of the SOLR index (verb, indexing) and search, which resembles a record in a relational database table and can contain one or more fields (field), each containing a name and text value. A field can be stored in an index while it is indexed, and the value of the field can be returned when the search is searched, and usually the document should contain an ID field that uniquely represents the document. For example:
12345678 |
<doc>
<field name=
"id"
>company123</field>
<field name=
"companycity"
>Atlanta</field>
<field name=
"companystate"
>Georgia</field>
<field name=
"companyname"
>Code Monkeys R Us, LLC</field>
<field name=
"companydescription"
>we write lots of code</field>
<field name=
"lastmodified"
>
2013
-
06
-01T15:
26
:37Z</field>
</doc>
|
Schema
The schema in SOLR is similar to the table structure in a relational database, it exists in the Conf directory as Schema.xml text, and when added to the index, you need to specify that the Schema,schema file contains three main parts: field, field type (FieldType), unique key (UniqueKey)
- field type (FieldType): Used to define the type that is added to the XML file fields (field) in the index, such as: Int,string,date,
- Field: The name of the field when added to the index file
- Unique key (UniqueKey): UniqueKey is a field that identifies the uniqueness of the document (Feild), which is used when updating and deleting
For example:
1234567891011121314151617181920212223 |
<schema name=
"example" version=
"1.5"
>
<field name=
"id" type=
"string" indexed=
"true" stored=
"true" required=
"true" multiValued=
"false" />
<field name=
"title" type=
"text_general" indexed=
"true" stored=
"true" multiValued=
"true"
/>
<uniqueKey>id</uniqueKey>
<fieldType name=
"string" class
=
"solr.StrField" sortMissingLast=
"true" />
<fieldType name=
"text_general" class
=
"solr.TextField" positionIncrementGap=
"100"
>
<analyzer type=
"index"
>
<tokenizer
class
=
"solr.StandardTokenizerFactory"
/>
<filter
class
=
"solr.StopFilterFactory" ignoreCase=
"true" words=
"stopwords.txt" />
<!-- in
this example, we will only use synonyms at query time
<filter
class
=
"solr.SynonymFilterFactory" synonyms=
"index_synonyms.txt" ignoreCase=
"true" expand=
"false"
/>
-->
<filter
class
=
"solr.LowerCaseFilterFactory"
/>
</analyzer>
<analyzer type=
"query"
>
<tokenizer
class
=
"solr.StandardTokenizerFactory"
/>
<filter
class
=
"solr.StopFilterFactory" ignoreCase=
"true" words=
"stopwords.txt" />
<filter
class
=
"solr.SynonymFilterFactory" synonyms=
"synonyms.txt" ignoreCase=
"true" expand=
"true"
/>
<filter
class
=
"solr.LowerCaseFilterFactory"
/>
</analyzer>
</fieldType>
</schema>
|
Field
In Solr, a field is the basic unit that forms the document. Corresponds to a column in a database table. A field is a metadata that includes the name, type, and how the value corresponding to the field is handled. Like what:
<field name="name" type="text_general" indexed="true" stored="true"/>
- Indexed:indexed=true, indicates that the field is added to the index by SORL processing, and only the indexed fields can be searched.
- Stored:stored=true, the field value is stored in the index as a copy of the original content and can be returned by the search component component, which is not suitable for long text storage in the index, due to performance issues.
Field Type
Each field in SOLR has a corresponding field type, such as float, long, double, date, TEXT,SOLR provides a rich field type, and we can also customize the data types that are appropriate for you, such as:
12345678910 |
<!-- Ik 分词器 -->
<fieldType name=
"text_cn_stopword" class
=
"solr.TextField"
>
<analyzer type=
"index"
>
<tokenizer
class
=
"org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart=
"false"
/>
</analyzer>
<analyzer type=
"query"
>
<tokenizer
class
=
"org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" useSmart=
"true"
/>
</analyzer>
</fieldType>
<!-- Ik 分词器 -->
|
Solrconfig:
If the schema is defined as SOLR's model, then Solrconfig is the SOLR configuration, which defines SOLR if it handles many requests such as indexing, highlighting, searching, and also specifies a cache policy, with more elements including:
- Specify the index data path
123456 |
<!-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration. --> <dataDir>${solr.data.dir:./solr/data}</dataDir> |
12345678910111213141516171819202122 |
<filterCache
class
=
"solr.FastLRUCache"
size=
"512"
initialSize=
"512"
autowarmCount=
"0"
/> <!-- queryResultCache caches results of searches - ordered lists of
document ids (DocList) based on a query, a sort, and the range
of documents requested. -->
<queryResultCache
class
=
"solr.LRUCache"
size=
"512"
initialSize=
"512"
autowarmCount=
"0"
/>
<!-- documentCache caches Lucene Document objects (the stored fields
for each document).
Since Lucene internal document ids are
transient
,
this cache will not be autowarmed. -->
<documentCache
class
=
"solr.LRUCache"
size=
"512"
initialSize=
"512"
autowarmCount=
"0"
/>
|
- Request processor
The request processor is used to receive the HTTP request, and after processing the search, the processor returns the result of the response. For example: Query request:
123456789 |
<!-- A request handler that returns indented JSON by
default -->
<requestHandler name=
"/query" class
=
"solr.SearchHandler"
>
<lst name=
"defaults"
>
<str name=
"echoParams"
>explicit</str>
<str name=
"wt"
>json</str>
<str name=
"indent"
>
true
</str>
<str name=
"df"
>text</str>
</lst>
</requestHandler>
|
每个请求处理器包括一系列可配置的搜索参数,例如:wt,indent,df等等。
Full-Text search engine SOLR series--SOLR core concepts, configuration files