Install and basic settings of sphinx

Source: Internet
Author: User
Tags mysql command line

Install and basic settings of sphinx
What is sphinx? If you still don't know, put it on your dog...
A simple reference to the official Wiki article:

Sphinx is an SQL-based full-text search engine that can be used in combination with MySQL and PostgreSQL for full-text search. It provides more professional search functions than the database itself, this makes it easier for applications to implement professional full-text retrieval. Sphinx specially designs search API interfaces for some scripting languages, such as PHP, Python, Perl, and Ruby. It also designs a storage engine plug-in for MySQL.

Sphinx features:

High-speed indexing (nearly 10 Mb/s on the new CPU );
High-speed search (the average query speed of 2-4g text is less than 0.1 seconds );
High Availability (up to 100 GB of text and MB of documents can be supported on a single CPU );
Provides a good correlation ranking
Supports distributed search;
Generate document summaries;
Search from the plug-in storage engine in MySQL
Support searching Boolean, phrase, and synonym;
Supports multiple full-text search domains for each document (up to 32 by default );
Multiple attributes of each document are supported;
Support word breaking;
Supports single-byte encoding and UTF-8 encoding;

First download the latest stable version of the source code: http://www.sphinxsearch.com/downloads/sphinx-0.9.8.tar.gz
Refer to the official documentation, it seems that this thing is best with MySQL level, no way, then download the source code of a mysql5.1.x: http://dev.mysql.com

If the downloaded source code is saved in/tmp, install MySQL that supports the Sphinx engine first:

1
2

Tar xvf mysql-5.1.26-rc.tar.gz
Tar xvf sphinx-0.9.8.tar.gz

Copy the MySQL engine of sphinx to the MySQL source code:

1

CP-R./sphinx-0.9.8/mysqlse./mysql-5.1.26/storage/sphse

Rebuilding the configure file of MySQL

1
2

CD mysql-5.1.26-rc
Sh build/Autorun. Sh

In the compilation process, you need to add a sphinx parameter. Other parameters are the same as those for compiling MySQL. For example, my:

1
2
3

./Configure -- prefix =/usr/local/MySQL -- With-extra-charsets = all -- With-big-tables -- With-plugins = sphsets
Make
Make install

Other details of MySQL compilation will not be written more... After the installation, enter the MySQL command line and run show engines to check whether there is an engine called sphinx. If yes, it indicates that sphinxse (MySQL sphinx engine) is properly installed.

Then compile sphinx and go to the Sphinx source code directory:

1
2
3
4
5

Ldconfig/usr/local/MySQL/lib/MySQL
Ldconfig/usr/local/MySQL/include/MySQL
./Configure -- prefix =/usr/local/sphure -- With-mysql =/usr/local/MySQL
Make
Make install

So far, we have built weapons. Here is how to use them ~ Time problems, did not study carefully at once, take phpwind as a test product.

The data structure of the PW topic table is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

Create Table if not exists 'PW _ Threads '(
'Tid' mediumint (8) unsigned not null auto_increment,
'Fid' smallint (6) unsigned not null default '0 ',
'Icon 'tinyint (2) not null default '0 ',
'Titlefont' char (15) not null default '',
'Author' char (15) not null default '',
'Authorid' mediumint (8) unsigned not null default '0 ',
'Subobject' char (100) not null default '',
'Toolinfo 'Char (16) not null default '',
'Toolfield' varchar (21) not null default '',
'Ifcheck' tinyint (1) not null default '0 ',
'Type' tinyint (2) not null default '0 ',
'Postdate' int (10) unsigned not null default '0 ',
'Lastpost' int (10) unsigned not null default '0 ',
'Lastposter' char (15) not null default '',
'Hits 'int (10) unsigned not null default '0 ',
'Replies 'int (10) unsigned not null default '0 ',
'Topped' smallint (6) Not null default '0 ',
'Locked' tinyint (1) not null default '0 ',
'Digest 'tinyint (1) not null default '0 ',
'Special 'tinyint (1) not null default '0 ',
'State' tinyint (1) not null default '0 ',
'Ifupload' tinyint (1) not null default '0 ',
'Ifmail' tinyint (1) not null default '0 ',
'Ifmark' smallint (6) Not null default '0 ',
'Ifshield 'tinyint (1) not null default '0 ',
'Anonus us' tinyint (1) not null default '0 ',
'Dig' int (10) Not null default '0 ',
'Fight 'int (10) Not null default '0 ',
'Ptable' tinyint (3) not null default '0 ',
'Ifmagic 'tinyint (1) not null default '0 ',
'Ifhide 'tinyint (1) not null default '0 ',
'Inspect 'varchar (30) not null default '',
Primary Key ('tid '),
Key 'authorid' ('authorid '),
Key 'postdate' ('postdate '),
Key 'digest '('digest '),
Key 'type' ('fid', 'type', 'ifcheck '),
Key 'special '('special '),
Key 'lastpost' ('fid', 'ifcheck', 'topped', 'lastpost ')
) Engine = MyISAM

In actual use, We need to search the full text of the topic table's title "subject", Forum "FID", and posting time "postdate, then sort by the primary key "TID" and page views "hits". The following describes how to use sphinx to achieve this requirement.

Configure sphinx first:
The sphtoken is a sphtoken. conf is the configuration file. Indexes and searches are based on this file. To perform full-text search, you must configure sphinx first. conf to tell sphenders which fields need to be indexed and which fields need to be used in where, orderby, and groupby.

1
2

CD/usr/local/sphinx/etc
VI sphinx. conf. Dist

Open sphsf-. Conf. Dist and you will find that the configuration file is in this format:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Source source name 1 {
...
}
Index name 1 {
Source = source name 1
...
}
Source source name 2 {
...
}
Index name 2 {
Source = source name 2
...
}
Indexer {
...
}
Searchd {
...
}

Below we will write our own sphexample. conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

Source pysche
{
Type = MySQL
SQL _host = localhost
SQL _user = sjcn
SQL _pass = leipang
SQL _db = sjcn
SQL Port = 3306

SQL _query_pre = set names utf8
SQL _query = select tid, FID, subject, hits, postdate/
From pw_threads

SQL _attr_uint = tid
SQL _attr_uint = FID
SQL _attr_uint = hits
SQL _attr_timestamp = postdate

SQL _ranged_throttle = 0
}

Index Ptest
{
Source = pysche
Path =/usr/local/sphinx/var/data/Ptest
Docinfo = extern
Mlock = 0
Morphology = none
Min_word_len = 1
Charset_type = UTF-8
Charset_table = u + ff10 .. U + ff19-> 0 .. 9, 0 .. 9, U + ff41 .. U + ff5a-> .. z, U + ff21 .. U + ff3a-> .. Z ,/
A .. z-> .. z, .. z, U + 0149, U + 017f, U + 0138, U + 00df, U + 00FF, U + 00c0 .. U + 00d6-> U + 00e0 .. U + 00f6 ,/
U + 00e0 .. U + 00f6, U + 00d8 .. U + 00de-> U + 00f8 .. U + 00fe, U + 00f8 .. U + 00fe, U + 0100-> U + 0101, U + 0101 ,/
U + 0102-> U + 0103, U + 0103, U + 0104-> U + 0105, U + 0105, U + 0106-> U + 0107, U + 0107, U + 0108-> U + 0109 ,/
U + 0109, U + 010a-> U + 010b, U + 010b, U + 010c-> U + 010d, U + 010d, U + 010e-> U + 010f, U + 010f ,/
U + 0110-> U + 0111, U + 0111, U + 0112-> U + 0113, U + 0113, U + 0114-> U + 0115, U + 0115 ,/
U + 0116-> U + 0117, U + 0117, U + 0118-> U + 0119, U + 0119, U + 011a-> U + 011b, U + 011b, U + 011c-> U + 011d ,/
U + 011d, U + 011e-> U + 011f, U + 011f, U + 0130-> U + 0131, U + 0131, U + 0132-> U + 0133, U + 0133 ,/
U + 0134-> U + 0135, U + 0135, U + 0136-> U + 0137, U + 0137, U + 0139-> U + 013a, U + 013a, U + 013b-> U + 013c ,/
U + 013c, U + 013d-> U + 013e, U + 013e, U + 013f-> U + 0140, U + 0140, U + 0141-> U + 0142, U + 0142 ,/
U + 0143-> U + 0144, U + 0144, U + 0145-> U + 0146, U + 0146, U + 0147-> U + 0148, U + 0148, U + 014a-> U + 014b ,/
U + 014b, U + 014c-> U + 014d, U + 014d, U + 014e-> U + 014f, U + 014f, U + 0150-> U + 0151, U + 0151 ,/
U + 0152-> U + 0153, U + 0153, U + 0154-> U + 0155, U + 0155, U + 0156-> U + 0157, U + 0157, U + 0158-> U + 0159 ,/
U + 0159, U + 015a-> U + 015b, U + 015b, U + 015c-> U + 015d, U + 015d, U + 015e-> U + 015f, U + 015f ,/
U + 0160-> U + 0161, U + 0161, U + 0162-> U + 0163, U + 0163, U + 0164-> U + 0165, U + 0165, U + 0166-> U + 0167 ,/
U + 0167, U + 0168-> U + 0169, U + 0169, U + 016a-> U + 016b, U + 016b, U + 016c-> U + 016d, U + 016d ,/
U + 016e-> U + 016f, U + 016f, U + 0170-> U + 0171, U + 0171, U + 0172-> U + 0173, U + 0173, U + 0174-> U + 0175 ,/
U + 0175, U + 0176-> U + 0177, U + 0177, U + 0178-> U + 00FF, U + 00FF, U + 0179-> U + 017a, U + 017a ,/
U + 017b-> U + 017c, U + 017c, U + 017d-> U + 017e, U + 017e, U + 0410 .. U + 042f-> U + 0430 .. U + 044f ,/
U + 0430 .. U + 044f, U + 05d0 .. U + 05ea, U + 0531 .. U + 0556-> U + 0561 .. U + 0586, U + 0561 .. U + 0587 ,/
U + 0621 .. U + 063a, U + 01b9, U + 01bf, U + 0640 .. U + 064a, U + 0660 .. u+ 0669, U + 066e, U + 066f ,/
U + 0671 .. U + 06d3, U + 06f0 .. U + 06ff, U + 0904 .. U + 0939, U + 0958 .. U + 095f, U + 0960 .. U + 0963 ,/
U + 0966 .. U + 096f, U + 097b .. U + 097f, U + 0985 .. U + 09b9, U + 09ce, U + 09dc .. U + 09e3, U + 09e6 .. U + 09ef ,/
U + 0a05 .. U + 0a39, U + 0a59 .. U + 0a5e, U + 0a66 .. U + 0a6f, U + 0a85 .. U + 0ab9, U + 0ae0 .. U + 0ae3 ,/
U + 0ae6 .. U + 0aef, U + 0b05 .. U + 0b39, U + 0b5c .. U + 0b61, U + 0b66 .. U + 0b6f, U + 0b71, U + 0b85 .. U + 0bb9 ,/
U + 0be6 .. U + 0bf2, U + 0c05 .. U + 0c39, U + 0c66 .. U + 0c6f, U + 0c85 .. U + 0cb9, U + 0cde .. U + 0e3 ,/
U + 0ce6 .. U + 0cef, U + 0d05 .. U + 0d39, U + 0d60, U + 0d61, U + 0d66 .. U + 0d6f, U + 0d85 .. U + 0dc6 ,/
U + 1900 .. U + 1938, U + 1946 .. U + 194f, U + a800 .. U + a805, U + a807 .. U + a822, U + 0386-> U + 03b1 ,/
U + 03ac-> U + 03b1, U + 0388-> U + 03b5, U + 03ad-> U + 03b5, U + 0389-> U + 03b7, U + 03ae-> U + 03b7 ,/
U + 038a-> U + 03b9, U + 0390-> U + 03b9, U + 03aa-> U + 03b9, U + 03af-> U + 03b9, U + 03ca-> U + 03b9 ,/
U + 038c-> U + 03bf, U + 03cc-> U + 03bf, U + 038e-> U + 03c5, U + 03ab-> U + 03c5, U + 03b0-> U + 03c5 ,/
U + 03cb-> U + 03c5, U + 03cd-> U + 03c5, U + 038f-> U + 03c9, U + 03ce-> U + 03c9, U + 03c2-> U + 03c3 ,/
U + 0391 .. U + 03a1-> U + 03b1 .. U + 03c1, U + 03a3 .. U + 03a9-> U + 03c3 .. U + 03c9, U + 03b1 .. U + 03c1 ,/
U + 03c3 .. U + 03c9, U + 0e01 .. U + 0e2e, U + 0e30 .. U + 0e3a, U + 0e40 .. U + 0e45, U + 0e47, U + 0e50 .. U + 0e59 ,/
U + a000 .. U + a48f, U + 4e00 .. U + 9fbf, U + 3400 .. U + 4dbf, U + 20000 .. U + 2a6df, U + f900 .. U + faff ,/
U + 2f800 .. U + 2fa1f, U + 2e80 .. U + 2eff, U + 2f00 .. U + 2fdf, U + 3100 .. U + 312f, U + 31a0 .. U + 31bf ,/
U + 3040 .. U + 309f, U + 30a0 .. U + 30ff, U + 31f0 .. U + 31ff, U + ac00 .. U + d7af, U + 1100 .. U + 11ff ,/
U + 3130 .. u + 318f, U + a000.. U + a48f, U + a490.. U + a4cf
Html_strip = 0
Min_prefix_len = 0
Min_infix_len = 1
Ngram_len = 1
Ngram_chars = u + 4e00 .. u + 9fbf, U + 3400 .. u + 4dbf, U + 20000 .. u + 2a6df, U + f900.. U + faff ,/
U + 2f800 .. U + 2fa1f, U + 2e80 .. U + 2eff, U + 2f00 .. U + 2fdf, U + 3100 .. U + 312f, U + 31a0 .. U + 31bf ,/
U + 3040 .. U + 309f, U + 30a0 .. U + 30ff, U + 31f0 .. U + 31ff, U + ac00 .. U + d7af, U + 1100 .. U + 11ff ,/
U + 3130 .. u + 318f, U + a000.. U + a48f, U + a490.. U + a4cf
}

Indexer
{
Mem_limit = 32 m
}

Searchd
{
Port = 3312
Log =/usr/local/sphinx/var/log/searchd. Log
Query_log =/usr/local/sphinx/var/log/query. Log
Read_timeout = 5
Max_children = 30
Pid_file =/usr/local/sphinx/var/log/searchd. PID
Max_matches = 1000
Seamless_rotate = 1
Preopen_indexes = 0
Unlink_old = 1
}

Download what I wrote:

Sphquota. conf (4.3 kib, 413 hits)

Note that ngram_chars in the configuration file must not be written incorrectly. In the official Chinese wiki, this is written as ngrams_chars (an additional S ), at first, I wasted a lot of time in this place. After saving the written sphexample. conf file, create an index as follows:

1
2
3
4
5
6
7
8
9

CD/usr/local/sphinx/bin
./Indexer -- all </BLOCKQUOTE>

During the running process, Sphinx will provide some running status prompts. If you understand it, you can check it. If you don't understand it, forget it (but if something goes wrong, you also have to understand it ...).

What should I do next after the index is created? Test the index effect first, for example:

<BLOCKQUOTE> Cd/usr/local/sphinx/bin
./Search sales

The "buy and buy" in it is the keyword I search for. In this case, Sphinx provides the following prompt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Sph00000.9.8-release (r1371)
Copyright (c) 2001-2008, Andrew aksyonoff

Using Config File '/usr/local/sphinx/etc/sphinx. conf '...
Index 'ptest': Query 'sale ': returned 11 matches of 11 Total in 0.000 Sec

Displaying matches:
1. Document = 8, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:34:35 2008
2. Document = 9, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:34:36 2008
3. Document = 10, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:34:37 2008
4. Document = 11, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:34:38 2008
5. Document = 12, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:34:39 2008
6. Document = 13, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:34:40 2008
7. Document = 14, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:34:41 2008
8. Document = 15, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:35:29 2008
9. Document = 16, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:35:30 2008
10. Document = 17, Weight = 2, FID = 5, hits = 1, postdate = Wed Aug 13 14:35:31 2008
11. Document = 18, Weight = 2, FID = 5, hits = 2, postdate = Wed Aug 13 14:35:32 2008

Words:
1. 'shop': 11 documents, 143 hits
2. 'sell': 11 events, 143 hits

Here, the document corresponds to the TID of the PW topic table. From these prompts, we can see that 11 results are searched in total, and Sphinx also gives the value of the field we set.

Okay, so far, for my need to create a full-text index for the phpwind topic table, the Sphinx tool is ready and I have tried it for a moment, the following describes how to apply Sphinx in PHP code. Write the following steps after two days ^

Sphwind was installed two days ago and a phpwind topic table was indexed for testing. Today, I have a little research on the phpwind data table and roughly created a sphinx configuration file suitable for phpwind.

Sphinx.zip(1.8 kib, 563 hits)

I looked at phpwind's search program search. php and found that his search is divided into three types in the "Search post scope" field:

  • Subject title
  • Subject title and content
  • Reply title and reply content
  • Let's look at the phpwind data table. According to phpwind's search, we must create indexes for the pw_threads, pw_tmsgs, and pw_posts tables in sph6. Because I don't have a large amount of phpwind Forum data, I can't do that kind of actual stress and speed tests. I just did some simple tests on the command line, it should be said that sphtracing should be configured in this way.

    The problem of time can only be studied here. If phpwind is used in the following project, continue to use the sphknife to overwrite phpwind.

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.