Provided based on the Lucene search engine and open-source with Apache Software License license. SOLR is (based on the Lucene site) "an open-source enterprise search Server Based on Lucene Java Search Library, with XML/HTTP and JSON APIs, highlighted hit results, and face-to-face combination search, cache, replication, and Web management interfaces ".
It is worth noting that large-Traffic web sites, Netflix, Digg, and CNET News.com and CNET reviews use SOLR to enhance the search function. The long string list of public sites driven by SOLR can be found in the SOLR Wiki (see references ).
Learn how to use SOLR and PHP to create a small application for searching the auto parts database. Although the sample database only contains some records, it can easily contain millions of records. All source code used in this article can be obtained from the download section.
Install SOLR
To use SOLR in combination with PHP, you must install SOLR, design the index, prepare the data to be indexed by SOLR, load the index, and write the PHP code to execute the query and display the results. Most of the work required to create a searchable index can be performed through the command line. Of course, SOLR's PHP programming interface will also affect the indexing content.
SOLR is implemented using Java technology. To run SOLR and its management tools, you must install Java v1.5 (Java 5 SDK ). Several providers provide Java v1.5 sdks-for example, Sun Microsystems, IBM, and BEA Systems-and each implementation can drive SOLR. You only need to select the Java package for your operating system and follow the instructions to complete the installation.
In many cases, installing Java v1.5 is as simple as running a self-extracting archive and accepting license agreement terms. The scripts in the archive can complete most of the difficult tasks in a few seconds. Other operating systems (such as Debian) will provide the Java 5 SDK In the apt system library. For example, if Debian or Ubuntu is used, you can usesudo apt-get install sun-java5-jdk
Install Java v1.5 software.
APT will also automatically download all the dependencies required to use the Java 5 SDK, which is very convenient.
If the Java software has been installed and the Java executable file is already inPATH
, Runjava -version
To determine the Java code.
Here, let's use the Mac OS X v10.5 Leopard operating system as the basis for the demonstration. Apple's leopard comes with Java v1.5. Leopard can also run PHP applications as long as the default configuration of Apache is slightly changed. Run in the leopard terminal windowjava -version
The following output is generated.
Listing 1. Running in the leopard terminal windowjava -version
$ which java /usr/bin/java
$ java -version java version "1.5.0_13" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237) Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)
|
Note: Leopard allows you to switch back and forth between Java v1.4 and v1.5 in the/applications/utilities/Java Preferences application. If leopard installation shows v1.4, open Java Preferences and change the settings as shown in Figure 1.
Figure 1. Java Preferences application in leopard
To install SOLR, visit apache.org and clickResources> downloadSelect a project image for easy access, and browse the tarball (. tgz file) of SOLR v1.2 in the folder shown in ). Download will transfer the name is similarApache-solr-1.2.0.tgz. Decompress the tarball with the following code.
Listing 2. decompress the tarball
$ tar xzf apache-solr-1.2.0.tgz
$ ls -F apache-solr-1.2.0 CHANGES.txt NOTICE.txt dist/ lib/ KEYS.txt README.txt docs/ src/ LICENSE.txt build.xml example/
|
In the newly created directoryDistThe folder contains SOLR code bound to the Java archive (jar. The subdirectory example/exampledocs contains formatted data examples-typically XML code-and is prepared for SOLR indexing.
ExampleThe directory contains a complete example SOLR application. To run it, you only need to use the application archive start. jar to start the Java engine.
Listing 3. Start the Java Engine
$ java -jar start.jar 2007-11-10 15:00:16.672::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2007-11-10 15:00:16.866::INFO: jetty-6.1.3 ... INFO: SolrUpdateServlet.init() done 2007-11-10 15:00:18.694::INFO: Started SocketConnector @ 0.0.0.0:8983
|
Applications can now be used on port 8983. Start the browser and typehttp://localhost:8983/solr/admin/
. This is the interface used to manage SOLR (to stop the SOLR server, TypeCTRL + cKey combination ).
However, no data is available for management or query in SOLR indexes.
Load data into SOLR
SOLR is flexible and supports various data types and rules for creating valid indexes. In addition, although SOLR supports a wide range of data types and rules, if the standard components are not enough, you can further customize SOLR by writing new Java classes.
Given a set of data types and rules, you can create a SOLR mode to describe data and control how indexes should be constructed. Then export the data to match the pattern and load the data into SOLR. SOLR dynamically creates indexes and updates each index immediately when a record is created, modified, or deleted.
You can find the default SOLR mode in the SOLR source code library of apache.org. For reference, the following shows the code snippets in the default mode.
Listing 3. Default SOLR mode code snippet
<schema name="example" version="1.1"> ... <fields> <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="text" indexed="true" stored="true"/> <field name="nameSort" type="string" indexed="true" stored="false"/> <field name="cat" type="text" indexed="true" stored="true" multiValued="true"/> ... </fields>
<uniqueKey>id</uniqueKey> ... <copyField source="name" dest="nameSort"/> ... </schema>
|
You do not need to describe most of the content of the mode, but you need to pay attention to the following aspects:
- As shown in, Field
id
Is a string (type="string"
And should be indexed (indexed="true"
). It is also a required field (required="true"
). In this mode, each record that loads SOLR must provide a value for this field.<uniqueKey>id</uniqueKey>
Modifier descriptionid
The field must be unique (SOLR does not require the ID field to be unique; this is only the rule created in the default index mode ). Attributestored="true"
Indicatesid
Fields should be searchable.Why notstored
Setfalse
? You can use unsearchable fields to sort the results in different ways. For example, you can usenameSort
, It isname
Copy of the field (in the last linecopyField
Command), but the behavior is different. Note,nameSort
Yesstring
, Andname
Yestext
. The default index mode is slightly different for processing the two types.
- Field
cat
YesmultiValued
. You can define multiple values for this field. For example, if an application manages content, you can specify multiple titles for an article. You can usecat
Fields (or custom similar fields) to capture all titles.
Listing 4 shows the example/exampledocs/ipod_other.xml file, which represents two entries in the iPod attachment category.
Listing 4. Data formatted in the default SOLR Index Mode
<add> <doc> <field name="id">F8V7067-APL-KIT</field> <field name="name">Belkin Mobile Power Cord for iPod w/ Dock</field> <field name="manu">Belkin</field> <field name="cat">electronics</field> <field name="cat">connector</field> <field name="features">car power adapter, white</field> <field name="weight">4</field> <field name="price">19.95</field> <field name="popularity">1</field> <field name="inStock">false</field> </doc>
<doc> <field name="id">IW-02</field> <field name="name">iPod & iPod Mini USB 2.0 Cable</field> <field name="manu">Belkin</field> <field name="cat">electronics</field> <field name="cat">connector</field> <field name="features">car power adapter for iPod, white</field> <field name="weight">2</field> <field name="price">11.50</field> <field name="popularity">1</field> <field name="inStock">false</field> </doc> </add>
|
add
An element is a SOLR command used to add an encapsulated record to an index. Each record will be captureddoc
Element, which uses a groupfield
To specify the field value. Fieldweight
,price
,inStock
,manu
,features
Andpopularity
All other fields defined in the default SOLR index mode.features
Field ownership andcat
The same attribute, but the meaning is different: it lists the features of the product, the number may be large.
Search for Auto Parts
In this example, the auto parts set is indexed. Each component has multiple fields. Table 1 shows the most important field samples. Field names are listed in the first column. The second column provides a brief description, and the third column lists the logical types. The fourth column displays the index type used to represent the data (as defined in the pattern in listing 5 ).
Table 1. Fields of auto parts record
Name |
Description |
Type |
SOLR type |
Part number (unique, mandatory) |
ID number |
String |
partno |
Name |
Brief Description |
String |
name |
Model (required, multi-value) |
Model, such as "Camaro" |
String |
model |
Model Year (multi-value) |
Model year, such as 2001 |
String |
year |
Price |
Unit Price |
Floating Point |
price |
Inventory |
Inventory? |
Boolean |
inStock |
Function |
Functions of Parts |
String |
features |
Time Mark |
Activity records |
String |
timestamp |
Weight |
Shipping Weight |
Floating Point |
weight |
Listing 3 shows the SOLR mode section used by the auto parts index. Most of them are based on the default SOLR mode. The specific field used -- name and attribute -- is replaced by the one found in the default mode.fields
Element (as shown in Listing 1 ).
Listing 5. Auto Parts Index Mode
<?xml version="1.0" encoding="utf-8" ?> <schema name="autoparts" version="1.0"> ... <fields> <field name="partno" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text" indexed="true" stored="true" required="true" />
<field name="model" type="text_ws" indexed="true" stored="true" multiValued="true" required="true" />
<field name="year" type="text_ws" indexed="true" stored="true" multiValued="true" omitNorms="true" />
<field name="price" type="sfloat" indexed="true" stored="true" required="true" />
<field name="inStock" type="boolean" indexed="true" stored="true" default="false" />
<field name="features" type="text" indexed="true" stored="true" multiValued="true" />
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false" />
<field name="weight" type="sfloat" indexed="true" stored="true" /> </fields>
<uniqueKey>partno</uniqueKey>
<defaultSearchField>name</defaultSearchField> </schema>
|
For the above fields, you need to export and format the auto parts database and upload it to SOLR, as shown in Listing 6.
Listing 6. Auto Parts database formatted for Indexing
<add> <doc> <field name="partno">1</field> <field name="name">Spark plug</field> <field name="model">Boxster</field> <field name="model">924</field> <field name="year">1999</field> <field name="year">2000</field> <field name="price">25.00</field> <field name="inStock">true</field> </doc> <doc> <field name="partno">2</field> <field name="name">Windshield</field> <field name="model">911</field> <field name="year">1991</field> <field name="year">1999</field> <field name="price">15.00</field> <field name="inStock">false</field> </doc> </add>
|
Let's install the new index mode and load the data into SOLR. First, useCTRL + cTogether to stop the SOLR daemon (if it is still running ). Create an archive of the existing SOLR mode in example/SOLR/CONF/Schema. xml. Next, create a text file in Listing 6, save it to/tmp/Schema. XML, and copy it to example/SOLR/CONF/Schema. xml. Create another file for the data shown in listing 7. Now you can restart SOLR and use the posting utility provided in the example.
Listing 7. Enable SOLR with New Mode
$ cd apache-solr-1.2/example $ cp solr/conf/schema.xml solr/conf/default_schema.xml $ chmod a-w solr/conf/default_schema.xml
$ vi /tmp/schema.xml ... $ cp /tmp/schema.xml solr/conf/schema.xml
$ vi /tmp/parts.xml ...
$ java -jar start.jar ... 2007-11-11 16:56:48.279::INFO: Started SocketConnector @ 0.0.0.0:8983
$ java -jar exampledocs/post.jar /tmp/parts.xml SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update... SimplePostTool: POSTing file parts.xml SimplePostTool: COMMITting Solr index changes...
|
Successful! If you need to check whether the index exists and contains two documents, point the browser to http: // localhost: 8983/SOLR/admin/again /. You should see "(autoparts)" at the top of the page )". If you see this, click the query box in the middle of the page and typepartno: 1 or partno: 2
.
The result should be similar to the following:
3 on 10 0 partno: 1 OR partno: 2 2.2 true Boxster 924 Spark plug 1 25.0 2007-11-11T21:58:45.899Z 1999 2000 false 911 Windshield 2 15.0 2007-11-11T21:58:45.953Z 1991 1999
|
Try other queries. The Lucene wiki describes the Lucene query (search engine in SOLR) syntax (see references ).
You should also try to edit and load data again. As declaredpartno
The field is unique. Therefore, when you upload the same part number repeatedly, you only need to replace the old index record with the new record. Besidesadd
You can also usecommit
,optimize
Anddelete
. The last command can delete a specific record by ID or query and delete multiple records.