Use PHP to call the Lucene package for full-text retrieval. Due to work requirements, PHP is required to perform a large number of full-text searches on the website. The most popular full-text search engine library is Lucene, which is ApacheJakart due to work requirements, you need to use PHP to retrieve a large number of full-text files on your website,
Besides, Lucene is the most popular search engine library for full-text search,
It is a sub-project of Apache Jakarta and provides simple and practical APIs,
With these APIs, you can retrieve full text of any basic text data (including databases.
Because PHP itself supports calling external Java classes, a class is first written in Java,
This class implements two methods by calling Lucene API:
Public String createIndex (String indexDir_path, String dataDir_path)
Public String searchword (String ss, String index_path)
CreateIndex is the index creation method,
Two parameters are passed in: indexDir_path (Directory of the index file) and dataDir_path (Directory of the file to be indexed) to return the list of indexed files,
The other is searchword, which is used to retrieve the index through the input keyword parameter (ss). index_path is the Directory of the index file. Returns all Retrieved files.
The source code is very simple. you can refer to: TxtFileIndexer. java
The PHP program calls these two methods to call Lucene, so as to achieve the purpose of full-text retrieval.
The PHP call method is as follows:
First, create an instance of the TxtFileIndexer class we wrote,
$ Tf = new Java (TestLucene. TxtFileIndexer );
Then, call the method based on the normal PHP Class Call method. First, create the index:
$ Data_path = "F:/test/php_lucene/htdocs/data/manual"; // defines the Directory of the indexed content
$ Index_path = "F:/test/php_lucene/htdocs/data/search"; // define the directory for storing the generated index files
$ S = $ tf-> createIndex ($ index_path, $ data_path); // call the Java class method
Print $ s; // print the returned result
Try again this time:
$ Index_path = "F:/test/php_lucene/htdocs/data/search"; // define the directory for storing the generated index files
$ S = $ tf-> searchword ("here is keyword for search", $ index_path );
Print $ s;
In addition, pay attention to the Java class path, which can be set in PHP
Java_require ("F:/test/php_lucene/htdocs/lib/"); // This is an example. put both my class and Lucene under this directory.
This way, isn't it easy.
PHP source code: test. php
Next, let's talk about environment configuration,
First, you must have a Java SDK. I use version 1.4.2. Other versions are fine.
PHP5: you have tried PHP4. it should be OK.
The Java extension in PHP5 was not fully tuned, and it was very inefficient to call Java before. Therefore, the Php/Java Bridge project was used.
1. download JavaBridge
URL: http://sourceforge.net/projects/php-java-bridge/
The current version is
Php-java-bridge_3.0.8_j2ee.zip
After unpacking
JavaBridgeWEB-INFcgijava-x86-windows.dll
JavaBridgeWEB-INFlibJavaBridge.jar
Copy to the c: phpext directory and
Java-x86-windows.dll renamed php_java.dll
2. modify php. ini (example)
Extension = php_java.dll
[Java]
Java. class. path = "C: phpextJavaBridge. jar; F: testphp_javasehtdocs"
Java. java_home = "C: j2sdk1.4.2 _ 10"
Java. library. path = "c: phpext; F: testphp_javasehtdocs"
3. restart Apache.
4. you can find some files for indexing.
You can modify the path of the index file and data file in test. php.
Row 37 of TxtFileIndexer. java limits the indexing of only html files. you can modify the following if necessary.
Based on the current situation (JavaBridge supports Linux and Freebsd), you can
Linux or freebsd/apache2/php4/lucene/JavaBridge
Environment.
And the most popular full-text search engine library is Lucene, which is Apache Jakart...