SOLR 6.0 Learn (iii) Schema.xml configuration

Source: Internet
Author: User
Tags memory usage require reserved solr tomcat

<?xml version= "1.0" encoding= "UTF-8"?> ... <!--this is the SOLR schema file, which should be named Schema.xml, and in the Conf directory of SOLR Home (for example, by default in

 ./solr/conf/schema.xml). For information on how to customize the file as needed, refer to: HTTP://WIKI.APACHE.ORG/SOLR/SCHEMAXML performance Note: There are many options that are not needed for practical applications. To improve performance, you can:-try to set stored= "false" for all searches only, instead of the field that is actually returned,-try to set indexed= "false" for all fields that are used only for return, and not for the search;-Remove all unwanted copyfield statements -To achieve optimal index size and search performance, set indexed= "false" for all text fields, use Copyfield to copy them to fields in "consolidated field" Name= "text", search using consolidated fields;-Run with server mode
JVM, at the same time, increase the log level to avoid outputting all requested logs. 
     --<schema Name= "Example" version= "1.5" > slightly ... <fields> <!--fields Individual property Description: Name: Must attribute-field name
     Type: Must attribute-the field type defined in <types> indexed: If the field needs to be indexed (for searching or sorting), the property value is set to True stored: if the field content needs to be returned, the value is set to True Docvalues: If this field should have a document value (Doc values), set to True. Document values are useful in façade search, grouping, sorting, and function queries.
           Although not required, it can cause the build index to become larger and slower, but this setting makes the index load faster, more NRT friendly, and more efficient in memory usage. However, there are limitations to use: Only Strfield, Uuidfield, and all trie*fields are currently supported, and depending on the field type, the field may require a single value (single-valued), must or has a default value. Multivalued: If this field may contain multiple values in each document, set to True termvectors: [false] When set to true, the associated vector (vector) of the given field is saved when using the Morelikethi
     s, the field used for the similarity judgment needs to be set to stored to achieve the best performance. Termpositions: Saving and vector-related location information increases storage overhead termoffsets: Saving offset and vector-related information increases storage overhead required: The field must have a value, or it will throw an exception defaul T: When you add a document, you can set a default value for the field as needed to prevent it from being empty-<!--field names consist of alphanumeric underscores and cannot begin with a number.
    Fields that are underlined on both ends are reserved, such as (_version_). --<field name= "id" type= "string" indexed= "true" stored= "true" required= "true" multivalued= "False"/> <field name= "title" Type= "Text_general" indexed= "true" stored= "true" multivalued= "true"/ > <field name= "description" type= "Text_general" indexed= "true" stored= "true"/> <field name= "Author" type = "Text_general" indexed= "true" stored= "true"/> <field name= "keywords" type= "text_general" indexed= "true" stored = "true"/> <field name= "category" Type= "Text_general" indexed= "true" stored= "true"/> <field name= "url" type= "Text_general" indexed= "true" stored= "true"/> <field name= "Last_modifie D "type=" date "indexed=" true "stored=" true "/> <!--Note: To save space, this field is not indexed by default because the use of Copyfield is copied to a field named text. Used for content return and highlighting. Search using Text field--<field name= "content" type= "Text_general" indexed= "false" stored= "true" Multivalu Ed= "true"/> <!--consolidation field (catchall field), containing other searchable fields (implemented via Copyfield)--<field name= "text" type= "Text_ General "indexed=" true "stored=" false "multivalued=" true "/> <!--reserved field, cannot be deleted, otherwise error--<field Name= "_version_" type= "Long" indexed= "true" stored= "true"/> </fields> <!--Unique identifier of the document, which can be understood as the primary key unless it is identified as require D= "false", otherwise the value cannot be empty-<uniqueKey>id</uniqueKey> <!--copy the field to be indexed into the consolidation field--<copyfield source = "title" dest= "text"/> <copyfield source= "Author" dest= "text"/> <copyfield source= "description" dest= "Te XT "/> <coPyfield source= "keywords" dest= "text"/> <copyfield source= "content" dest= "text"/> <copyfield source= "url "dest=" text "/> <types> <!--field type definition--<fieldtype name=" string "class=" SOLR. Strfield "sortmissinglast=" true "/> <fieldtype name=" boolean "class=" SOLR. Boolfield "sortmissinglast=" true "/> <fieldtype name=" int "class=" SOLR. Trieintfield "precisionstep=" 0 "positionincrementgap=" 0 "/> <fieldtype name=" float "class=" SOLR. Triefloatfield "precisionstep=" 0 "positionincrementgap=" 0 "/> <fieldtype name=" Long "class=" SOLR. Trielongfield "precisionstep=" 0 "positionincrementgap=" 0 "/> <fieldtype name=" Double "class=" SOLR. Triedoublefield "precisionstep=" 0 "positionincrementgap=" 0 "/> <fieldtype name=" date "class=" SOLR. Triedatefield "precisionstep=" 0 "positionincrementgap=" 0 "/> slightly ... <!--Thai, Thailand type fields--< FieldType name= "text_th" Class= "SOLR. TextField "positionincrementgap=" > <analyzer> <tokenizer class= "SOLR. Standardtokenizerfactory "/> <filter class=" SOLR. Lowercasefilterfactory "/> <filter class=" SOLR. Thaiwordfilterfactory "/> <filter class=" SOLR. Stopfilterfactory "ignorecase=" true "words=" Lang/stopwords_th.txt "/> </analyzer> </fie Ldtype> <!--Turkish, Turkish Type field--<fieldtype name= "text_tr" class= "SOLR. TextField "positionincrementgap=" > <analyzer> <tokenizer class= "SOLR. Standardtokenizerfactory "/> <filter class=" SOLR. Turkishlowercasefilterfactory "/> <filter class=" SOLR. Stopfilterfactory "ignorecase=" false "words=" Lang/stopwords_tr.txt "/> <filter class=" SOLR. Snowballporterfilterfactory "language=" Turkish "/> </analyzer> </fieldType> <!--Chine SE, need our own configuration, integration MmseG4J is configured here--</types> <!--document similarity judgment relies on the document similarity score. A custom similarity or similarityfactory can be specified here, but the default settings are already suitable for most applications. Refer to: Http://wiki.apache.org/solr/SchemaXml#Similarity-<!--<similarity class= "com.example.s Olr. 

Customsimilarityfactory "> <str name=" paramkey ">param value</str> </similarity>-- </schema>


Above configuration reference: http://my.oschina.net/HuifengWang/blog/307471

################################# #使用solr遇到一个问题 start############################

When SOLR uses the query, "q=city:new York" hits all data documents that contain New York and returns.

But using the Chinese "q=city: Chengdu" When will hit the inclusion of Sing Woo all the collection, in fact, we need to find accurate, find data found, if you want to only find the "Chengdu"

The word of the document, we need to do this "q=city:" "Chengdu" must be added quotation marks

################################# #使用solr遇到一个问题 end############################

A few more important configurations

Word segmentation:

<!--ik participle start-->
	<fieldtype name= "Text_ik" class= "SOLR. TextField ">   
		<analyzer type=" index ">
			<!-- 
				iktokenizerfactory: Inherit tokenizerfactory  
				Usesmart: Enable smart word-to-
			<tokenizer class= "org.wltea.analyzer.lucene.IKTokenizerFactory" usesmart= " False "/>
			<!--
				stopfilterfactory: Stop participle, stop word-to-
			<filter class= based on the files configured in Stopwords.txt
			"SOLR. Stopfilterfactory "ignorecase=" true "words=" Stopwords.txt "/>
		</analyzer>
		<analyzer type=" Query ">
			<tokenizer class=" Org.wltea.analyzer.lucene.IKTokenizerFactory "usesmart=" true "/>
			<filter class= "SOLR. Stopfilterfactory "ignorecase=" true "words=" Stopwords.txt "/>
		</analyzer>
	</fieldType>
<!--ik participle end-->

Synonym configuration:

<fieldtype name= "text_general" class= "SOLR. TextField "positionincrementgap=" > <analyzer type= "index" > <tokenizer class= "SOLR. Standardtokenizerfactory "/> <filter class=" SOLR. Stopfilterfactory "ignorecase=" true "words=" Stopwords.txt "/> <!--in this example, we'll only use synonym S at query time <filter class= "SOLR. Synonymfilterfactory "synonyms=" Index_synonyms.txt "ignorecase=" true "expand=" false "/>-<filte R class= "SOLR. Lowercasefilterfactory "/> </analyzer> <analyzer type=" Query "> <tokenizer class=" SOLR . Standardtokenizerfactory "/> <filter class=" SOLR. Stopfilterfactory "ignorecase=" true "words=" Stopwords.txt "/> <filter class=" SOLR. Synonymfilterfactory "synonyms=" Synonyms.txt "ignorecase=" true "expand=" true "/> <filter class=" SOLR. Lowercasefilterfactory "/> </analyzer> </fieldType>

Rating weight configuration (support customization):

<similarity class= "Com.example.solr.CustomSimilarityFactory"/>

Configure default query fields

The default query for query queries is field name= "text", and the same is true for other select or query components that you define.

<requesthandler name= "/query" class= "SOLR. Searchhandler ">
     <lst name=" Defaults ">
       <str name=" Echoparams ">explicit</str>
       <str name= "WT" >json</str>
       <str name= "indent" >true</str>
	   <!--default query fields
       -- <str name= "DF" >text</str>
     </lst>
  </requestHandler>

########################## #修改schema. xml###################################

Publish SOLR to Tomcat, such as the need to modify Schema.xml, add a field, this time is not supported by hot release, need to restart Tomcat




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.