Coreseek-3.2.13 compatible with sph0000- 0.9.9 Configuration

Source: Internet
Author: User

Coreseek-3.2.13 compatible with sph0000- 0.9.9 Configuration

The coreseek-3.2.13 is compatible with the sph0000- 0.9.9 configuration and can be directly used without modification.

However, to better search for Chinese characters, you need to use the configuration parameters added by coreseek to set Chinese word segmentation.

The following are the core configurations of Chinese word segmentation. Please read them carefully and apply them to your Configuration:

Source data source name #......}
Index name {
# Use the sphinx configuration directly in the following configurations. You can use it directly without changing the configuration #...... source = data source name a # Corresponds to sourcepath = var/data source Name adocinfo = externmlock = 0 morphology = nonemin_word_len = 1html_strip = 0 #...... # directly use the sphinx configuration for the above configurations.
# The following are the core configurations of Chinese word segmentation #Stopwords =/path/to/stowords.txt locationCharset_dictpath =/usr/local/mmseg3/etc/Charset_type = zh_cn.utf-8# Charset_table =...Ngram_len = 0 # The above part is the core configuration of Chinese Word Segmentation
}
Mmseg. ini configuration: In the mmseg configuration file, you can configure the segmentation rules for English and numbers (for example, set china2008 as the whole or split it into china and 2008). For details, seeMmseg. ini configuration


Chinese Word Segmentation core configuration instructions:


Charset_dictpath =/usr/local/mmseg3/etc/

Indicates the directory of the dictionary file, which must contain the uni. lib dictionary file;

For more information about how to create a uni. lib dictionary file, see:Mmseg dictionary ConstructionNote: After changing or modifying the dictionary, you must re-index the data and restart searchd to make the change take effect.

Because BSD/linux is installed under/usr/local/mmseg3/etc by default, use/usr/local/mmseg3/etc;

In Windows, it is the actual path of the dictionary and must end with/, for example, F: \ coreseek-3.2.13-win32 \ etc/

During the test, if Unigram dictionary load Error or Segmentation fault occurs, the dictionary path is incorrectly set.


Charset_type = zh_cn.utf-8

Indicates that the Chinese word segmentation function is enabled; otherwise, the Chinese word segmentation function is invalid and other processing modes of sphinx are used.

After the Chinese word segmentation function is enabled, the data encoding character set read in the source data source is UTF-8, otherwise it cannot be correctly processed;

If it is xml, the correct output is the UTF-8 encoding format;

For MySQL, set the read data output character set to UTF-8:

From MySQL4.1 you can SET the output character SET to UTF-8 through set names UTF8, even if the raw data is GBK;

For versions earlier than MySQL4.1, pleaseContact us directlySolve the problem that GBK or Latin1 output is UTF-8;


# Stopwords = ......


Indicates that the word file is stopped, the words in the file are not involved in the search; the file format is a common UTF-8 text file, each line of one;



# Charset_table = ......
Ngram_len = 0

Cancels the original one-dimensional character segmentation mode and does not interfere with Chinese word segmentation;

Note out the configuration of charset_table!

The configuration of ngram_len must be set to 0!

  • 1
  • 2
  • Next Page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.