IK Word source explanation (v)-ik configuration and configuration in SOLR use __ algorithm and data structure

Source: Internet
Author: User
Tags solr

Using IK in SOLR is simple

Download the latest Ik2012 Chinese word breaker.

2. Extract IK analyzer 2012ff_hf1.zip and obtain IK Analyzer 2012ff_hf1.

The Ikanalyzer.cfg.xml,ikanalyzer2012ff_u1.jar,stopword.dic in the directory

Put it in the installation tomcat_home/webapps/solr/web-inf/classes directory (the Classes folder is not created.) )

3. Modify the Schema.xml in/solr_home/collection1/conf/and add the following in <type></types>:

<fieldtype name= "Text_ik" class= "SOLR." TextField ">

<analyzer type= "index" ismaxwordlength= "false" class= "Org.wltea.analyzer.lucene.IKAnalyzer"/>

<analyzer type= "Query" ismaxwordlength= "true" class= "Org.wltea.analyzer.lucene.IKAnalyzer"/>

</fieldType>

Modify the filed at the same time so that filed references Text_ik. This allows you to use an IK word breaker.

<fieldname= "name" type= "Text_ik" indexed= "true" stored= "true"/>

The following is an explanation of the configuration file IKAnalyzer.cfg.xml for IK itself, which is structured as follows:

<?xmlversion= "1.0" encoding= "UTF-8"?>

<! Doctypeproperties SYSTEM "HTTP://JAVA.SUN.COM/DTD/PROPERTIES.DTD" >

<properties>

<comment>ik Analyzer Extended Configuration </comment>

<!--users can configure their own extended dictionaries here-->

<entrykey= "Ext_dict" >ext/ext.dic; </entry>

<!--users can configure their own extended stop word dictionaries here-->

<entrykey= "Ext_stopwords" >stopword.dic;</entry>

</properties>

The above directory structure is very clear ext_dict this property is used to configure the extension of the dictionary, you can add more than n, such as the addition of other extensions of the thesaurus can be the following way

<entrykey= "Ext_dict" >ext/ext.dic;ext/net.dic;ext/Encyclopedia title. dic;ext/commonly used vocabulary, dic;ext/commonly used computer technology thesaurus, dic;ext/commonly used names, DIC; Ext/supermarket commodity name origin and pharmacy commodity name origin. dic;ext/idiom. dic;ext/Power Glossary. dic;ext/Vocabulary for electronic commerce-dic;ext/animal words. dic;ext/Environmental vocabulary. dic;ext/ World of Warcraft, dic;ext/photography, dic;ext/four-word idiom Daquan. dic;ext/Taobao Special vocabulary. dic;ext/Network Popular new words. dic;ext/Metallurgical Glossary. dic;ext/Medical Glossary. dic;ext/ .dic;</entry> Dictionary of Plant Encyclopedia

It can be extended in this way, but when it is actually loaded, it will be loaded into a dictionary by traversing it.

Ext_stopwords, here to add their own stop word thesaurus, I added in the configuration file is hit the stop word thesaurus, the effect is not bad.

Source Address: http://download.csdn.net/detail/a925907195/8240641

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.