Using PHP to call Lucene package to achieve full-text retrieval [full-text retrieval] using PHP to call Lucene package to achieve full-text retrieval; using PHP to call Lucene package to achieve full-text retrieval
[Full-text retrieval] use PHP to call the Lucene package for full-text retrieval
--------------------------------------------
Http://www.chinaunix.net author: z33 Published on: 17:43:53
[Comment] [view original] [Php discussion board] [close]
/* Relay the following information */
Author: [url = http://spaces.msn.com/members/newbdez33/#zhang Jie
URL: http://spaces.msn.com/members/newbdez33/
Http://www.phpboom.com/
Due to work needs, you need to use PHP to perform full-text searches on a large number of websites,
Besides, Lucene is the most popular search engine library for full-text search,
It is a sub-project of Apache Jakarta and provides simple and practical APIs,
With these APIs, you can retrieve full-text data of any basic text (including databases.
Because PHP itself supports calling external Java classes, a class is first written in Java,
This class implements two methods by calling Lucene API:
* Public String createIndex (String indexDir_path, String dataDir_path)
* Public String searchword (String ss, String index_path)
CreateIndex is the index creation method,
Two parameters are passed in: indexDir_path (Directory of the index file) and dataDir_path (Directory of the file to be indexed) to return the list of indexed files,
The other is searchword, which is used to retrieve the index through the input keyword parameter (ss). index_path is the Directory of the index file. Returns all Retrieved files.
The source code is very simple. for details, refer to [url = http://newbdez33.googlepages.com/txtfileindexer.java#txtfileindexer.java
The PHP program calls these two methods to call Lucene, so as to achieve the purpose of full-text retrieval.
The PHP call method is as follows:
First, create an instance of the TxtFileIndexer class we wrote,
$ Tf = new Java ('testlucene. TxtFileIndexer ');
Then, call the method based on the normal PHP Class Call method. First, create the index:
$ Data_path = "F:/test/php_lucene/htdocs/data/manual"; // defines the Directory of the indexed content
$ Index_path = "F:/test/php_lucene/htdocs/data/search"; // define the directory for storing the generated index files
$ S = $ tf-> createIndex ($ index_path, $ data_path); // call the Java class method
Print $ s; // print the returned result
Try again this time:
$ Index_path = "F:/test/php_lucene/htdocs/data/search"; // define the directory for storing the generated index files
$ S = $ tf-> searchword ("here is keyword for search", $ index_path );
Print $ s;
In addition, pay attention to the Java class path, which can be set in PHP
Java_require ("F:/test/php_lucene/htdocs/lib/"); // This is an example. put both my class and Lucene under this directory.
Next, let's talk about environment configuration,
First, you must have a Java SDK. I use version 1.4.2. Other versions are fine.
PHP5: you have tried PHP4. it should be OK.
The Java extension in PHP5 was not fully tuned, and it was very inefficient to call Java before. Therefore, the Php/Java Bridge project was used.
1. download JavaBridge
URL: http://sourceforge.net/projects/php-java-bridge/
The current version is
[Url = http://prdownloads.sourceforge.net/php-java-bridge/php-java-bridge_3.0.8_j2ee.zip? Download]php-java-bridge_3.0.8_j2ee.zip
After unpacking
JavaBridge \ WEB-INF \ cgi \ java-x86-windows.dll
JavaBridge \ WEB-INF \ lib \ JavaBridge. jar
Copy to the c: \ php \ ext directory and
Java-x86-windows.dll renamed php_java.dll
2. modify php. ini (example)
Extension = php_java.dll
4. you can find some files for indexing.
You can modify the path of the index file and data file in test. php.
Row 37 of TxtFileIndexer. java limits the indexing of only html files. you can modify the following if necessary.
Based on the current situation (JavaBridge supports Linux and Freebsd), you can
Linux or freebsd/apache2/php4/lucene/JavaBridge
Environment.
This article may be updated at any time. In addition, you can visit:
[Url = http://newbdez33.googlepages.com/php_?e=use PHP to call the lucenepackage for full-text search
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.