Install and use Sphinx in Windows [Chinese full-text search is supported]

Source: Internet
Author: User

Address: http://www.fuchaoqun.com/2008/11/sphinx-on-windows-xp/

 

A while ago, I tried to use sphexample, a full-text retrieval system that can be conveniently called by various languages (PHP/Python/Ruby/etc. Most of the information on the Internet is installed and used in Linux. Of course, as a production environment, it is necessary to deploy it in * nix environment. As a learning test, it is easier to use it in windows.

This article aims to provide a convenient way for sphenders to install configurations in Windows to support full-text retrieval in Chinese. The configurations are partially common in Linux.

1. About sphinx

Sphinx is a full-text search engine released under gplv2, with commercial authorization (for example, embedded into otherProgramTo obtain business authorization, contact the author (sphinxsearch.com.

Generally, Sphinx is an independent search engine designed to provide other applications with full-text search functions featuring high speed, low space usage, and high result relevance. Sphinx can be easily integrated with SQL databases and scripting languages.

Currently, the system has built-in support for MySQL and PostgreSQL database data sources. It also supports reading XML data in a specific format from standard input. By modifyingSource codeYou can add new data sources by yourself (for example, native support for other types of DBMS ).

The search API supports PHP, Python, Perl, Rudy, and Java, and can also be used as a MySQL storage engine. The search API is very simple and can be transplanted to a new language within several hours.

Sphinx features:

    • High-speed index creation (in contemporary CPU, peak performance can reach 10 Mb/s );
    • High-performance search (on 2-4 GB of text data, the average response time for each retrieval is less than 0.1 seconds );
    • Massive Data Processing (it is known that it can process over GB of text data, and MB of documents can be processed on a single CPU system );
    • Provides excellent relevanceAlgorithm, A compound Ranking Method Based on phrase similarity and Statistics (bm25;
    • Supports distributed search;
    • Generate an excerpt from the file;
    • It can be used as the storage engine of MySQL to provide search services;
    • Supports multiple search modes such as Boolean, phrase, and word similarity;
    • The document supports multiple full-text search fields (up to 32 );
    • This document supports multiple additional attributes (such as group information and timestamp );
    • Stop Word query;
    • Supports single-byte encoding and UTF-8 encoding;
    • Native MySQL Support (both MyISAM and InnoDB are supported );
    • Native PostgreSQL support.

The Chinese manual can be obtained here. Thank you for your hard work.

Ii. Install sphinx on Windows

1. Go to http://www.sphinxsearch.com/downloads.htmlto find the latest Windows server. In this example, Win32 release binaries with MySQL support is downloaded and decompressed in the d: \ sphinx directory;

2. create a new data directory under D: \ sphinx \ to store the index file, a log directory to store the log file, and copy D: \ sph1_\ sph1. conf. in to D: \ sph?\ bin \ sph=. conf (change the file name );

3. Modify D: \ sphinx \ bin \ sphinx. conf. Here I will list several changes:

 
Type = MySQL # data source. Here is MySQL
SQL _host = localhost # Database Server
SQL _user = root # database username
SQL _pass = ''# Database Password
SQL _db = test # Database
SQL _port = 3306 # Database Port
 
SQL _query_pre = set names utf8 # Remove the comment before this row. If your database is uft8 encoded
Index test1
{
# INDEX DIRECTORY
Path = D:/sphinx/data/
# Encoding
Charset_type = UTF-8
# UTF-8 encoding table
Charset_table = 0 .. 9, .. z-> .. z, _, .. z, U + 410 .. U + 42f-> U + 430 .. U + 44f, U + 430 .. U + 44f
# Simple word segmentation: only 0 and 1 are supported. To search for Chinese characters, specify 1
Ngram_len = 1
# Characters to be segmented. If you want to search for Chinese characters, remove the preceding comments.
Ngram_chars = u + 3000 .. u + 2fa1f
}
 
# Index test1stemmed: test1
#{
# Path = @ confdir @/data/test1stemmed
# Morphology = stem_en
#}

 

# Comment out the following content if there is no distributed Index

# Index dist1
#{
# 'Stribucted' index type must be specified
# Type = distributed

# Local index to be searched
# There can be used local indexes configured
# Local = test1
# Local = test1stemmed
 
# Remote Agent
# Multiple Remote agents may be specified
# Syntax is 'hostname: Port: index1, [index2 [,...]
# Agent = localhost: 3313: remote1
# Agent = localhost: 3314: remote2, remote3
 
# Remote Agent connection timeout, milliseconds
# Optional, default is 1000 MS, ie. 1 sec
# Agent_connect_time out = 1000
 
# Remote Agent query timeout, milliseconds
# Optional, default is 3000 MS, ie. 3 sec
# Agent_query_timeout = 3000
#}
 
# Search for the services to be modified
Searchd
{
# Logs
Log = D:/sphinx/log/searchd. Log
# PID file, searchd process ID file name
Pid_file = D:/sphinx/log/searchd. PID
 
# Note this when you start the searchd service in windows.
# Seamless_rotate = 1
}

4. Import Test Data

C: \ Program Files \ mysql \ MySQL Server 5.0 \ bin> mysql-uroot test <D:/sphexample/example. SQL

5. Create an index

D: \ sphinx \ bin> indexer.exe-all
Sph00000.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew aksyonoff

Using Config File './sphexample. conf '...
Indexing index 'test1 ′...
Collected 4 docs, 0.0 MB
Sorted 0.0 mhits, 100.0% done
Total 4 docs, 193 bytes
Total 0.101 sec, 1916.30 Bytes/sec, 39.72 docs/sec

D: \ sphinx \ bin>

6. Search for 'test '.

D: \ sphinx \ bin> search.exe Test
Sph00000.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew aksyonoff

Using Config File './sphexample. conf '...
Index 'test1': Query 'test': returned 3 matches of 3 Total in 0.000 Sec

Displaying matches:
1. Document = 1, Weight = 2, group_id = 1, date_added = wed Nov 26 14:58:59 2008
Id = 1
Group_id = 1
Group_id2 = 5
Date_added = 14:58:59
Title = test one
Content = This is my test document number one. Also checking search
Phrases.
2. Document = 2, Weight = 2, group_id = 1, date_added = wed Nov 26 14:58:59 2008
Id = 2
Group_id = 1
Group_id2 = 6
Date_added = 14:58:59
Title = test two
Content = This is my test document number two
3. Document = 4, Weight = 1, group_id = 2, date_added = wed Nov 26 14:58:59 2008
Id = 4
Group_id = 2
Group_id2 = 8
Date_added = 14:58:59
Title = Doc number four
Content = This is to test groups

Words:
1. 'test': 3 Documents, 5 hits
D: \ sphinx \ bin>

All come out.

6. Test Chinese search

Modify the documents data table in the test database,

Update 'test '. 'documents' set 'title' = 'test Chinese', 'content' = 'This is my test document number two, you should find 'where' documents '. 'id' = 2;

Re-indexing:

D: \ sphinx \ bin> indexer.exe-all

Try searching for 'Chinese:

D: \ sphinx \ bin> search.exe Chinese
Sph00000.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew aksyonoff

Using Config File './sphexample. conf '...
Index 'test1': Query 'Chinese': returned 0 matches of 0 total in 0.000 Sec

Words:
D: \ sphinx \ bin>

It seems that this is not found because the encoding in Windows command line is GBK, and of course it cannot be found. We can use a program to create a file Foo. php under D: \ sph?\ API. Pay attention to UTF-8 encoding.

<? PHP
Require 'sphinxapi. php ';
$ S = new sphinxclient ();
$ S-> setserver ('localhost', 3312 );
$ Result = $ S-> query ('Chinese ');
Var_dump ($ result );
?>

Start the sphsf-searchd Service

D: \ sphinx \ bin> searchd.exe
Sph00000.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew aksyonoff

Warning: forcing-Console mode on Windows
Using Config File './sphexample. conf '...
Creating server socket on 0.0.0.0: 3312
Accepting connections

Execute PHP query:

Php d:/sphinx/API/Foo. php

Are the results returned? The rest is to read the manual and explore advanced configurations.

 

Address: http://www.fuchaoqun.com/2008/11/sphinx-on-windows-xp/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.