Use PHP to call the Lucene package for full-text retrieval _ PHP Tutorial

Source: Internet
Author: User
Tags createindex
Use PHP to call the Lucene package for full-text retrieval. Due to work requirements, PHP is required to perform a large number of full-text searches on the website. The most popular full-text search engine library is Lucene, which is ApacheJakart due to work requirements, you need to use PHP to retrieve a large number of full-text files on your website,
Besides, Lucene is the most popular search engine library for full-text search,
It is a sub-project of Apache Jakarta and provides simple and practical APIs,
With these APIs, you can retrieve full text of any basic text data (including databases.

Because PHP itself supports calling external Java classes, a class is first written in Java,
This class implements two methods by calling Lucene API:

Public String createIndex (String indexDir_path, String dataDir_path)
Public String searchword (String ss, String index_path)
CreateIndex is the index creation method,
Two parameters are passed in: indexDir_path (Directory of the index file) and dataDir_path (Directory of the file to be indexed) to return the list of indexed files,
The other is searchword, which is used to retrieve the index through the input keyword parameter (ss). index_path is the Directory of the index file. Returns all Retrieved files.

The source code is very simple. you can refer to: TxtFileIndexer. java

The PHP program calls these two methods to call Lucene, so as to achieve the purpose of full-text retrieval.
The PHP call method is as follows:
First, create an instance of the TxtFileIndexer class we wrote,

$ Tf = new Java (TestLucene. TxtFileIndexer );

Then, call the method based on the normal PHP Class Call method. First, create the index:

$ Data_path = "F:/test/php_lucene/htdocs/data/manual"; // defines the Directory of the indexed content
$ Index_path = "F:/test/php_lucene/htdocs/data/search"; // define the directory for storing the generated index files
$ S = $ tf-> createIndex ($ index_path, $ data_path); // call the Java class method
Print $ s; // print the returned result

Try again this time:

$ Index_path = "F:/test/php_lucene/htdocs/data/search"; // define the directory for storing the generated index files
$ S = $ tf-> searchword ("here is keyword for search", $ index_path );
Print $ s;

In addition, pay attention to the Java class path, which can be set in PHP

Java_require ("F:/test/php_lucene/htdocs/lib/"); // This is an example. put both my class and Lucene under this directory.

This way, isn't it easy.

PHP source code: test. php

Next, let's talk about environment configuration,
First, you must have a Java SDK. I use version 1.4.2. Other versions are fine.
PHP5: you have tried PHP4. it should be OK.

The Java extension in PHP5 was not fully tuned, and it was very inefficient to call Java before. Therefore, the Php/Java Bridge project was used.

1. download JavaBridge
The current version is

After unpacking
Copy to the c: phpext directory and
Java-x86-windows.dll renamed php_java.dll

2. modify php. ini (example)
Extension = php_java.dll

Java. class. path = "C: phpextJavaBridge. jar; F: testphp_javasehtdocs"
Java. java_home = "C: j2sdk1.4.2 _ 10"
Java. library. path = "c: phpext; F: testphp_javasehtdocs"

3. restart Apache.

4. you can find some files for indexing.
You can modify the path of the index file and data file in test. php.
Row 37 of TxtFileIndexer. java limits the indexing of only html files. you can modify the following if necessary.

Based on the current situation (JavaBridge supports Linux and Freebsd), you can
Linux or freebsd/apache2/php4/lucene/JavaBridge

And the most popular full-text search engine library is Lucene, which is Apache Jakart...

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.