Sphinx Full-Text Search PHP Tutorial _php Tutorial

Source: Internet
Author: User
Tags mysql code
This is a half year ago did not write the article, now take out to share under. There may be some incorrect or inaccurate places, some languages may be more frivolous, please forgive me.

For an example of the above email data sheet:


CREATE TABLE Email (emailid mediumint (8) unsigned not NULL auto_increment COMMENT ' message ID ', Fromid Int (ten) unsigned NOT NULL Default ' 0 ' COMMENT ' Sender ID ', toid Int (ten) unsigned NOT null default ' 0 ' COMMENT ' recipient ID ', content text unsigned NOT null COMM ENT ' message content ', subject varchar (+) unsigned NOT null COMMENT ' message header ', Sendtime Int (ten) not null COMMENT ' send time ', attachment Varch AR (+) not NULL COMMENT ' attachment ID, comma-separated ', PRIMARY KEY (emailid), Engine=myisam ';

Using the open console, you must open the console PHP to connect to Sphinx (make sure you have established the index source):

D:\coreseek\bin\searchd-c d:\coreseek\bin\sphinx.conf

CORESEEK/API Directory provides PHP interface file sphinxapi.php, this file contains a Sphinxclient class

Introduce this file in PHP, new

$sphinx = new Sphinxclient ();//sphinx hostname and Port $sphinx->setserver (' Loclahost ', 9312);//Set return result set to PHP array format $sphinx-> Setarrayresult (TRUE);//The offset of the matching result, the meaning of the parameter is: Start position, return result bar number, maximum number of matching bars $sphinx->setlimits (0, 20, 1000);//Maximum search time $sphinx- >setmaxquerytime (10); Perform a simple search, this search will query all fields of information, to query the specified field please continue to see the following $index = ' email '//index source is the index class in the configuration file, if there are multiple index source can be used, separated by: ' email,diary ' or use ' * ' The number represents all index source $result = $sphinx->query (' search keyword ', $index); Echo '
';p rint_r ($result); Echo '

$result is an array in which

Total is the amount of data that is matched to

Matches is a matching data that contains id,attrs this information

Words is the word breaker for search keywords

You may wonder why there is no such information as the content of the message, in fact, Sphinx does not return data arrays like MySQL, because Sphinx originally did not record the complete data, only recorded the data after the word.

It depends on the matches array, the ID in matches refers to the first field in the Sql_query SELECT statement in the configuration file, which is in our configuration file.

Sql_query = SELECT emailid,fromid,toid,subject,content,sendtime,attachement from email

So the ID in matches refers to Emailid

As for weight refers to the matching weights, the higher the general weight is returned the highest priority, matching the weight of the relevant content please refer to the Official document

Attrs is the information in the SQL_ATTR_ in the configuration file, and later mentions the use of these properties

Said so much, even if the search results are not the email data we want, but the fact Sphinx is not recorded real data, so to get to the real email data and according to the ID in matches to search MySQL email table, But overall the speed is still much faster than the MySQL like, as long as the hundreds of thousands of data volume above, otherwise with Sphinx will only be slower.

Next introduce Sphinx Some of the usage of MySQL-like conditions

Emailid Range $sphinx->setidrange ($min, $max);  attribute filtering, the filtered properties must be set in the config file Sql_attr_    , before we define these    sql_attr_uint            = Fromid    sql_attr_uint            = toid    Sql_attr_timestamp  = sendtime//If you want to modify these properties again, remember to re-establish the index after the configuration is complete to take effect//Specify some value $sphinx->setfilter (' Fromid ', Array ( ));    The value of Fromid can only be 1 or 2//and the above conditions, can be added to the third parameter $sphinx->setfilter (' Fromid ', Array (), false);    The value of Fromid cannot be 1 or 2//specify a range of values $sphinx->setfilterrange (' toid ', 5, $);    The value of toid between 5-200//and the above conditions, can be added to the third parameter $sphinx->setfilterrange (' Toid ', 5, $, false);    The value of toid is outside of 5-200//Perform search $result = $sphinx->query (' keywords ', ' * ');

Sort mode
Search results can be sorted using the following pattern:

Sph_sort_relevance mode, sorted in descending order of relevance (best match in front)

Sph_sort_attr_desc mode, arranged in descending order of attributes (the higher the value of the property, the greater the number of rows in front)

SPH_SORT_ATTR_ASC mode, arranged in ascending order of attributes (the smaller the attribute value, the more it is in front)

Sph_sort_time_segments mode, descending by time period (last hour/day/week/month), and then by relevance

sph_sort_extended mode, which combines columns in ascending or descending order in a SQL-like way.

sph_sort_expr mode, sorted by an arithmetic expression

Use attributes to sort//to Fromid in reverse order, note that when you use Setsortmode again the previous sort $sphinx->setsortmode ("Sph_sort_attr_desc", ' Fromid ') will be overwritten;// If you want to use multiple field sorting can use sph_sort_extended mode//@id is the Sphinx built-in keyword, here refers to Emailid, as for why Emailid, think about $sphinx->setsortmode ("SPH _sort_attr_desc ", ' Fromid ASC, toid desc, @id desc ');//Perform search $result = $sphinx->query (' keywords ', ' * ');

See the official document sorting mode for more information

Matching mode
Like the following optional matching pattern:

Sph_match_all, matching all query terms (default mode);

Sph_match_any, matches any one of the query words;

Sph_match_phrase, the whole query is regarded as a phrase, which requires a complete match in order;

Sph_match_boolean, consider a query as a Boolean expression

sph_match_extended, the query is treated as an expression Coreseek/sphinx the internal query language. Starting with version Coreseek 3/sphinx 0.9.9, this option is replaced by the option SPH_MATCH_EXTENDED2, which provides more functionality and better performance. This option is retained for compatibility with legacy code-so that legacy application code can continue to work even when Sphinx and its components include API upgrades.

Sph_match_extended2, use the second version of "Extended match mode" to match the query.

Sph_match_fullscan, the query is forced to match by using the full scan mode described below. Note that in this mode, all query terms are ignored, although filters, filter ranges, and groupings still work, but any text match does not occur.

Our main concern is the Sph_match_extended2 extended match pattern, which allows the use of some conditional statements like MySQL

Set the extended match mode $sphinx->setmatchmode ("Sph_match_extended2");//Use conditional statements in a query, fields begin with @, search content contains tests, toid equals 1 messages: $result = $ Sphinx->query (' @content (test) & @toid = 1 ', ' * ');//Set more complex conditions with parentheses and & (with), |, (or),-(not, = =) $result = $sphinx Query (' (@content (test) & @subject = uh) | (@fromid-(100)) ', ' * ');//More syntax see description of Official document matching mode

What is worth mentioning in the extended match pattern is the field of the search, and if the field is set, the fields of the extended match search are not included by default, only with SetFilter () or Setfilterrange ().

Before we set the Fromid, Toid, sendtime as attributes, but also want to use in the extended match mode as a condition to do?

You can just select one more time in the Sql_query statement.

Sql_query = SELECT emailid,fromid,fromid,toid,toid,subject,content,sendtime,sendtime,attachement from email

Setup done remember to re-establish the index

More conditional tricks
Just some tips, but not recommended for use in the deployment environment, as for why, see the end of the article

<, <=, >, >=
The default Sphinx does not have these comparators.

What if I want the message to be sent more than a certain date? Use the Setfilterrange () method to simulate

Greater than or equal to a certain time intercept $time$sphinx->setfilterrange (' Sendtime ', $time, 10000000000)//Time intercept Max is 10 9, plus 1 is not beyond. Greater than a certain time intercept $time$sphinx->setfilterrange (' Sendtime ', $time +1, 10000000000)//less than equals a certain time intercept $time$sphinx-> Setfilterrange (' Sendtime ',-1, $time)    //time-truncated minimum is 0, so should be reduced 1//is greater than a certain time to intercept $time$sphinx->setfilterrange (' Sendtime ',-1 , $time-1)

is not NULL
How to search for empty fields, such as I want to search for empty attachments, someone might want to @attachment (")? In fact, this is a search for two single quotes ... Sphinx Search for strings without quotes

Currently Sphinx does not provide such a function, in fact, can be in the MySQL statement on the hands and feet:

Sql_query = SELECT Emailid,fromid,toidsubject,content,sendtime,attachement! = "As attach is not null from email//here Returns a A new field attachisnotnull, when the Attachisnotnull is 1, the attachment is not empty.

Setup done remember to re-establish the index

Find_in_set ()
Search for a message containing an attachment, MySQL is accustomed to use find_in_set so simple sentence to be done, in the Sphinx must be set in the configuration property Sql_attr_multi Multi-value attribute (MVA):

Sql_attr_multi = Attachment #attachment可以是逗号分隔的附件ID, or a space, semicolon, etc Sphinx can be recognized

Setup done remember to re-index and then PHP can use SetFilter ()//search contains attachment ID of 1 or 2 messages, MySQL syntax is such find_in_set (' Attachment ', ' $sphinx ') SetFilter (' attachment ', array)//You can use Setfilterrange to search for messages containing the attachment ID in the 50-100 range $sphinx->setfilterrange (' Attachment ', 50, 100)

If you want a free, easy-to-use, fast full-text search engine, Sphinx is undoubtedly the best choice, but do not forget the purpose of Sphinx: full-Text search. Don't think about those mess conditions. You want to make the Sphinx search as flexible as MySQL and can be used completely alone in some complex multi-conditional searches, like advanced search for some emails, then I suggest you spend more time on the optimization of PHP or MySQL code, because that might make your search slower.

The best way is to search for the content in the simplest way, and return the ID to the MySQL database search.

http://www.bkjia.com/PHPjc/444552.html www.bkjia.com true http://www.bkjia.com/PHPjc/444552.html techarticle This is a half year ago did not write the article, now take out to share under. There may be some incorrect or inaccurate places, some languages may be more frivolous, please forgive me. The above article ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.