Install and use Sphinx in Windows

Source: Internet
Author: User

Install and use Sphinx in Windows

A while ago, I tried to use sphexample, a full-text retrieval system that can be conveniently called by various languages (PHP/Python/Ruby/etc. Most of the information on the Internet is installed and used in Linux. Of course, as a production environment, it is necessary to deploy it in * nix environment. As a learning test, it is easier to use it in windows.

This article aims to provide a convenient way for sphenders to install configurations in Windows to support full-text retrieval in Chinese. The configurations are partially common in Linux.

1. About sphinx

Sphinx is a full-text search engine released under gplv2. Commercial authorization (for example, embedded in other programs) needs to contact the author (sphinxsearch.com) for commercial authorization.

Generally, Sphinx is an independent search engine designed to provide other applications with full-text search functions featuring high speed, low space usage, and high result relevance. Sphinx can be easily integrated with SQL databases and scripting languages.

Currently, the system has built-in support for MySQL and PostgreSQL database data sources. It also supports reading XML data in a specific format from standard input. By modifying the source code, you can add new data sources on your own (for example, native support for other types of DBMS ).

The search API supports PHP, Python, Perl, Rudy, and Java, and can also be used as a MySQL storage engine. The search API is very simple and can be transplanted to a new language within several hours.

Sphinx features:

• High-Speed indexing (Peak Performance of up to 10 Mb/s on contemporary CPUs );
• High-performance search (on 2-4 GB of text data, the average response time for each retrieval is less than 0.1 seconds );
• Massive Data Processing (it is known that it can process more than 100 GB of text data, and 100 m of documents can be processed on a single CPU system );
• Provides excellent relevance algorithms and a compound Ranking Method Based on phrase similarity and Statistics (bm25;
• Supports distributed search;
• Generate an excerpt from a file;
• It can be used as a MySQL storage engine to provide search services;
• Supports multiple search modes such as Boolean, phrase, and word similarity;
• The document supports multiple full-text search fields (up to 32 );
• The document supports multiple additional attributes (such as group information and timestamp );
• Stop Word query;
• Supports single-byte encoding and UTF-8 encoding;
• Native MySQL Support (both MyISAM and InnoDB are supported );
• Native PostgreSQL support.
Ii. Install sphinx on Windows

1. Find the latest Windows Version directly in the http://www.sphinxsearch.com/downloads.html. Here I am using Win32 release binaries with MySQL support, download and unzip it in the D:/sphinx directory;

2. create a new data directory under D:/sphinx/to store the index file, a log directory to store the log file, and copy D:/sph?/ sph=. conf. in to D:/sph?/ bin/sph=. conf (change the file name );

3. Modify D:/sphinx/bin/sphinx. conf. Here I will list several changes:

Type = MySQL # data source. Here is MySQL
SQL _host = localhost # Database Server
SQL _user = root # database username
SQL _pass = ''# Database Password
SQL _db = test # Database
SQL _port = 3306 # Database Port

SQL _query_pre = set names utf8 # Remove the comment before this row. If your database is uft8 encoded

Index test1
{
# INDEX DIRECTORY
Path = D:/sphinx/data/
# Encoding
Charset_type = UTF-8
# UTF-8 encoding table
Charset_table = 0 .. 9, .. z-> .. z, _, .. z, U + 410 .. U + 42f-> U + 430 .. U + 44f, U + 430 .. U + 44f
# Simple word segmentation: only 0 and 1 are supported. To search for Chinese characters, specify 1
Ngram_len = 1
# Characters to be segmented. If you want to search for Chinese characters, remove the preceding comments.
Ngram_chars = u + 3000 .. u + 2fa1f
}

# Search for the services to be modified
Searchd
{
# Logs
Log = D:/sphinx/log/searchd. Log

# PID file, searchd process ID file name
Pid_file = D:/sphinx/log/searchd. PID

# Note this when you start the searchd service in windows.
# Seamless_rotate = 1
}

4. Import Test Data

SQL file in D:/sphexample/example. SQL

C:/program files/MySQL Server 5.0/bin> mysql-uroot test <D:/sphexample/example. SQL

5. Create an index

D:/sphexample/bin> indexer.exe test1 (Note: test1 is the index test1 () of sphexample. conf ())
Sph00000.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew aksyonoff

Using Config File './sphexample. conf '...
Indexing index 'test1 ′...
Collected 4 docs, 0.0 MB
Sorted 0.0 mhits, 100.0% done
Total 4 docs, 193 bytes
Total 0.101 sec, 1916.30 Bytes/sec, 39.72 docs/sec

D:/sphinx/bin>

6. Search for 'test '.

D:/sphinx/bin> search.exe test1

The result is as follows:

Using Config File './sphexample. conf '...
Index 'test1': Query 'test': returned 3 matches of 3 Total in 0.000 Sec

Displaying matches:
1. Document = 1, Weight = 2, group_id = 1, date_added = wed Nov 26 14:58:59 2008
Id = 1
Group_id = 1
Group_id2 = 5
Date_added = 14:58:59
Title = test one
Content = This is my test document number one. Also checking search
Phrases.
2. Document = 2, Weight = 2, group_id = 1, date_added = wed Nov 26 14:58:59 2008
Id = 2
Group_id = 1
Group_id2 = 6
Date_added = 14:58:59
Title = test two
Content = This is my test document number two
3. Document = 4, Weight = 1, group_id = 2, date_added = wed Nov 26 14:58:59 2008
Id = 4
Group_id = 2
Group_id2 = 8
Date_added = 14:58:59
Title = Doc number four
Content = This is to test groups

Words:
1. 'test': 3 Documents, 5 hits
D:/sphinx/bin>

6. Test Chinese search

Modify the documents data table in the test database,

Update 'test '. 'documents' set 'title' = 'test Chinese', 'content' = 'This is my test document number two, you should find 'where' documents '. 'id' = 2;

Re-indexing:

D:/sphinx/bin> indexer.exe test1

Try searching for 'Chinese:

D:/sph.exe/bin> search.exe Chinese
Sph00000.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew aksyonoff

Using Config File './sphexample. conf '...
Index 'test1': Query 'Chinese': returned 0 matches of 0 total in 0.000 Sec

Words:
D:/sphinx/bin>

It seems that this is not found because the encoding in Windows command line is GBK, and of course it cannot be found. We can use a program to create a file Foo. php under D:/sphexample/API. Pay attention to UTF-8 encoding.

<? PHP
Require 'sphinxapi. php ';
$ S = new sphinxclient ();
$ S-> setserver ('localhost', 9312 );
$ Result = $ S-> query ('Chinese ');
Var_dump ($ result );
?>

Start the sphsf-searchd Service

D:/sph.exe/bin> searchd.exe
Sph00000.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew aksyonoff

Warning: forcing-Console mode on Windows
Using Config File './sphexample. conf '...
Creating server socket on 0.0.0.0: 9312
Accepting connections

Execute PHP query:

Access http://www.test.com/sphinx/api/foo.php (self-configured VM)

Use coreseek for Word Segmentation
1. Download http://www.coreseek.cn/news/5/89/

2. Install software packages on which the system depends

The basic components of the system need the following software packages:

-Active Python 2.5 (http://www.activestate.com/Products/activepython)
-Mysql_python 1.2.2 (http://sourceforge.net/project/showfiles.php? Group_id = 22307) verification can be disabled

After the first two components are installed, the system can run, but you need to manually modify the configuration file.

Install the required software package on the configuration page:

-GTK-dev 2.12.9 (http://sourceforge.net/project/showfiles.php? Group_id = 98754) verification can be disabled
-Pycairo 1.4.12 (http://ftp.acc.umu.se/pub/GNOME/binaries/win32/pycairo/1.4)
-Pygobject 2.14.1 (http://ftp.acc.umu.se/pub/GNOME/binaries/win32/pygobject/2.14)
-Pygtk 2.12.1 (http://ftp.acc.umu.se/pub/GNOME/binaries/win32/pygtk/2.12 ))

If you download the full version, you can find all the files mentioned above in the preq subdirectory.
Install all the software packages mentioned above (Note: Python and GTK must be installed first)

Note: It must be active python. The official version of Python lacks the Win32 extension support required by the system, and the system cannot work.

Note: After completing this step, you must restart your computer.

3. Decompress CSFT to your desired directory.

4. The content of the CSFT file is roughly the same as that of sphinx (for details about the configuration, see sphinx + MySQL (1) and (2 ))

5. Create a dictionary file

/Bin/mmseg-U/data/unigram.txt # The dictionary is dynamic. You can specify a directory.

· Rename the generated file uni. Lib,

6. Import the sample. SQL database

7. Create an index index.exe -- all (for details, see sphbatch + MySQL (1 ))

The following branches are described as follows:

---------------------------------------------------------------

A:

8. Install sphinxse for MySQL

Http://www.sphinxsearch.com/downloads/mysql-5.0.45-sphinxse-0.9.8-win32.zip

After the download, decompress the package and overwrite the MySQL directory. (Note that the mysql version must be the same)

Go to MySQL and run show engines. Check whether sphtasks exist in the table type.

9. Create a storage engine table
Create Table 'sphregion '(
'Id' int (11) not null,
'Weight' int (11) not null,
'Query' varchar (255) not null,
'Group _ id' int (11) not null,
Key 'query' ('query ')
) Engine = sphsf-connection = 'hsf-: // localhost: 3312/test1 ';

Unlike the MySQL table, the engine = sphsf-connection = 'sphsf-: // localhost: 3312/test1'; indicates that the table uses the sphinxse engine, the connection string with sphsf-is 'sphprohibited: // localhost: 3312/test1, test1 is the index name
According to the official instructions of sphinx, this table must have at least three fields. The names of the fields do not matter, but the order of the types must be integer, integer, or varchar, indicating the Document ID of the record respectively, match the weight and query, and the Document ID and query must be indexed. In addition, several fields can be created in this table. The fields can only be integer or timestamp type. The fields are bound to the result set of sphinx. Therefore, the field name must be in the Sphinx. the attribute names defined in conf are the same; otherwise, the obtained value is null.

10. MySQL sphinxse full-text retrieval storage engine SQL statement usage

After installing the sphinxse storage engine, you must first create a special "engine = sphse" Search table, as shown below:

Create Table articlefulltext (
Id integer not null,
Weight integer not null,
Query varchar (3072) not null,
...
Index (query)
) Engine = sph1_connection = "sph1_: // localhost: 3312/test ";

· The table name and field name can be any name, but the first three attributes must be of the int, Int, and varchar types. You can also have more attributes. The type must be Int or timestamp, and the name must correspond to the sphsf-configuration file to return more information about the search results.

· After creating the table, you can use the following SQL statement to search the full text in MySQL:

· Select * From articlefulltext where query = 'full-Text Retrieval condition ';

· The query return result is the full-text search result, including the Document ID and weight. If the articlefulltext table contains more attributes and other information about the hit results.

· The SQL join operation can easily achieve integrated retrieval, such:

· Select ID, title
From article, articlefulltext
Where articlefulltext. ID = article. ID and query = 'blog'
And publishtime> '2017-03-01 'and refercount> 0
Order by weight * 0.5 + refercount * 0.5;

· The preceding SQL statement can retrieve the articles that contain the 'blog 'keyword and have been referenced since March 1, 2007, and sort them by the weights calculated based on the full-text search weights and reference numbers.

· It can be seen that by embedding the full-text search system functions in the form of a storage engine into the relational database MySQL, you can easily provide the integrated search function, although there are many functional limitations, it is also a smart and convenient way.

11. Generate a Windows service using sphenders.

Searchd -- install -- config "CSFT. conf"

12. Start the service net START | searchd (or another service name)

---------------------------------------------------------------

B:

Configure the sphexample. conf file and support Chinese encoding.

Charset_type = zh_cn.utf-8
Charset_dictpath = D:/csft3.1/bin # directory for splitting lib library files
Min_infix_len = 0

 

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/siren0203/archive/2010/05/07/5564082.aspx

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.