A while ago tried to use a sphinx, a full-text retrieval system that can be easily called by various languages (PHP/PYTHON/RUBY/ETC). Most of the information on the Internet is installed in the Linux environment, of course, as a production environment is necessary to deploy in the *nix environment, as a learning test, or Windows environment more convenient. This article is intended to provide a convenient way for Sphinx to install the configuration under Windows to support Chinese full-text indexing, and the configuration section is generic under Linux. I. About Sphinxsphinx is a full-text search engine published under GPLV2, and commercial authorization (for example, embedding in other programs) requires contacting the author (sphinxsearch.com) for commercial authorization. In general, Sphinx is an independent search engine intended to provide high-speed, low-space-occupancy, high-result-related full-text search capabilities for other applications. Sphinx can be very easy to integrate with SQL database and scripting languages. The current system includes support for MySQL and PostgreSQL database data sources, as well as reading XML data in a specific format from standard input. By modifying the source code, users can add their own new data sources (for example, native support for other types of DBMS). The Search API supports PHP, Python, Perl, Rudy, and Java, and can also be used as a MySQL storage engine. The search API is very simple and can be ported to new languages within a few hours. Sphinx Features: High-speed indexing (peak performance can reach 10mb/seconds on modern CPUs); High-performance search (on 2–4GB text data, the average response time for each retrieval is less than 0.1 seconds); Processing of large amounts of data (currently known to process more than 100GB of text data, on a single CPU system can process 100M documents); An excellent correlation algorithm is provided, which is based on the phrase similarity and statistic (BM25) composite ranking method. Support distributed search; Provide an excerpt of the document to generate; Search service available as a storage engine for MySQL; Support Boolean, phrase, word similarity and many other retrieval modes; Document supports multiple full-text search fields (max. 32); The document supports multiple additional attribute information (for example: grouping information, timestamp, etc.); Stop word Query; Supports single byte encoding and UTF-8 encoding; Native MySQL support (supports both MyISAM and InnoDB); Native PostgreSQL support. The Chinese manual can be obtained here, thanks to the translator's hard work. Ii. Sphinx Installation on Windows 1. Find the latest version of Windows directly in http://www.sphinxsearch.com/downloads.htmlI'm under the Win32 release binaries with the MySQL support, unzip in the D:\sphinx directory after downloading; 2. Create a new data directory under D:\sphinx\ to hold the index file, a log directory file, Copy D:\sphinx\sphinx.conf.in to D:\sphinx\bin\sphinx.conf (note modify file name); 3. Modify the D:\sphinx\bin\sphinx.conf, I listed here several need to modify: type = MySQL # data source, I am here mysql sql_host = localhost # database server sql_user = root # Database user name Sql_pass = ' # database Password sql_db = test # Database SQL _port = 3306 # database Port sql_query_pre = SET NAMES UTF8 # Remove the comment preceding this line if your database is UFT8 encoded index test1 {# index directory path = D:/sphinx/da ta/# encoding Charset_type = utf-8 # Specifies the utf-8 encoding table charset_table = 0..9, A. Z->a. Z, _, a.. Z, u+410..u+42f->u+430..u+44f, u+430..u+44f # Simple participle, only support 0 and 1, if you want to search Chinese, specify 1 Ngram_len = 1 # characters that need word breaker, if you want to search Chinese, remove the previous comment Ngram _chars = u+3000..u+2fa1f} # index Test1stemmed:test1 # {# path = @[email protected]/data/test1stemmed # Morpholog y = stem_en #} # If there is no distributed index, comment out the following content # index Dist1 # {# ' distributed ' index type must be specified # type = distributed # local index to being searched # There can be many local indexes configured # LocaL = test1 # local = test1stemmed # remote Agent # Multiple remote agents may being specified # syntax is ' Hostname:port:index 1,[index2[,...]] # agent = localhost:3313:remote1 # agent = localhost:3314:remote2,remote3 # remote Agent Connection Timeo UT, milliseconds # Optional, default is MS, ie. 1 sec # agent_connect_timeout = $ # Remote Agent query timeout, milliseconds # Optional, default is MS, ie. 3 sec # agent_query_timeout = 3000 #} # Search service needs to modify part Searchd {# log = d:/sphinx/log/searchd.log # PID file, Searchd PR ocess ID file name pid_file = d:/sphinx/log/searchd.pid # Windows startup Searchd Service Be sure to comment out this # seamless_rotate = 1}4. Import test Data C:\Pr Ogram files\mysql\mysql Server 5.0\bin>mysql-uroot test Indexer.exe–all Sphinx 0.9.8-release (r1533) Copyright (c) 2001-2008, Andrew aksyonoff using config file './sphinx.conf ' ... indexing index ' test1′ ... collected 4 docs, 0.0 MB sorted 0.0 mhits, 100.0% done total 4 docs, 193 Bytes Total 0.101 sec, 1916.30 bytes/sec, 39.72 docs/sec d:\sphinx\bin>6. Search ' test ' Test d:\sphinx\bin>search.exe Test Sphinx 0.9.8-release ( r1533) Copyright (c) 2001-2008, Andrew aksyonoff using config file './sphinx.conf ' ... index ' test1′: Query ' Test ': Returned 3 matches of 3 total in 0.000 sec displaying matches:1. document=1, weight=2, group_id=1, date_added=wed Nov 26 14:58:59 Id=1 group_id=1 group_id2=5 date_added=2008-11-26 14:58:59 title=test One content=this is my test document number on E. also checking search within phrases. 2. document=2, weight=2, Group_id=1, date_added=wed (14:58:59 id=2 group_id=1 group_id2=6 14:58:59 Title=test-Content=this is my test document number 3. Document=4, Weight=1, group_id=2, date_added=wed Nov 14:58:59 id=4 group_id=2 group_id2=8 date_added=2008-11-26 14:58:59 title=doc number four content= This was to test groups words:1 ' test ': 3 documents, 5 hits d:\sphinx\bin> have come out. 6. Test Chinese search Modify the Documents data table in the test database, UPDATE ' test '. ' Documents ' SET ' title ' = ' Test Chinese ', ' content ' = ' This is my test document numb Er, it should be searched. ' Where ' documents '. ' id ' = 2; Rebuild index: D:\sphinx\bin>indexer.exe–all search ' Chinese ' try: d:\sphinx\bin>search.exe Chinese Sphinx 0.9.8-release (r1533) Copyright (c) 2001-2008, Andrew aksyonoff using config file './sphinx.conf ' ... index ' test1′ : Query ' Chinese ': returned 0 matches of 0 Total in 0.000 sec words:d:\sphinx\bin> seemingly not found, this is because the encoding in the Windows command line is GBK, of course not. We can use the program to try, create a new foo.php file under D:\sphinx\api, note utf-8 code Setserver (' localhost ', 3312); $result = $s->query (' Chinese '); Var_dump ($result);?> start Sphinx searchd Service D:\sphinx\bin>searchd.exe Sphinx 0.9.8-release (r1533) Copyright (c) 2001-2008, Andrew aksyonoff warning:forcing–console mode on Windows using config file './sphinx.conf ' ... creating server s Ocket on 0.0.0.0:3312 accepting connections execute PHP query: PHP d:/sphinx/api/foo.php The result is not out? The rest of the job is to read the manual and slowly grope for the higher-order configuration
Sphinx Full-Text Search function | Testing under Windows