Provided based on the Lucene search engine and open-source with Apache Software License license. SOLR is (based on the Lucene site) "an open-source enterprise search Server Based on Lucene Java Search Library, with XML/HTTP and JSON APIs, highlighted hit results, and face-to-face combination search, cache, replication, and Web management interfaces ".
It is worth noting that large-Traffic web sites, Netflix, Digg, and CNET News.com and CNET reviews use SOLR to enhance the search function. The long string list of public sites driven by SOLR can be found in the SOLR Wiki (see references ).
Learn how to use SOLR and PHP to create a small application for searching the auto parts database. Although the sample database only contains some records, it can easily contain millions of records. All source code used in this article can be obtained from the download section.
To use SOLR in combination with PHP, you must install SOLR, design the index, prepare the data to be indexed by SOLR, load the index, and write the PHP code to execute the query and display the results. Most of the work required to create a searchable index can be performed through the command line. Of course, SOLR's PHP programming interface will also affect the indexing content.
SOLR is implemented using Java technology. To run SOLR and its management tools, you must install Java v1.5 (Java 5 SDK ). Several providers provide Java v1.5 sdks-for example, Sun Microsystems, IBM, and BEA Systems-and each implementation can drive SOLR. You only need to select the Java package for your operating system and follow the instructions to complete the installation.
In many cases, installing Java v1.5 is as simple as running a self-extracting archive and accepting license agreement terms. The scripts in the archive can complete most of the difficult tasks in a few seconds. Other operating systems (such as Debian) will provide the Java 5 SDK In the apt system library. For example, if Debian or Ubuntu is used, you can use
sudo apt-get install sun-java5-jdkInstall Java v1.5 software.
APT will also automatically download all the dependencies required to use the Java 5 SDK, which is very convenient.
If the Java software has been installed and the Java executable file is already in
java -versionTo determine the Java code.
Here, let's use the Mac OS X v10.5 Leopard operating system as the basis for the demonstration. Apple's leopard comes with Java v1.5. Leopard can also run PHP applications as long as the default configuration of Apache is slightly changed. Run in the leopard terminal window
java -versionThe following output is generated.
Listing 1. Running in the leopard terminal window
$ which java
$ java -version
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)
Note: Leopard allows you to switch back and forth between Java v1.4 and v1.5 in the/applications/utilities/Java Preferences application. If leopard installation shows v1.4, open Java Preferences and change the settings as shown in Figure 1.
Figure 1. Java Preferences application in leopard
To install SOLR, visit apache.org and clickResources> downloadSelect a project image for easy access, and browse the tarball (. tgz file) of SOLR v1.2 in the folder shown in ). Download will transfer the name is similarApache-solr-1.2.0.tgz. Decompress the tarball with the following code.
Listing 2. decompress the tarball
$ tar xzf apache-solr-1.2.0.tgz
$ ls -F apache-solr-1.2.0
CHANGES.txt NOTICE.txt dist/ lib/
KEYS.txt README.txt docs/ src/
LICENSE.txt build.xml example/
In the newly created directoryDistThe folder contains SOLR code bound to the Java archive (jar. The subdirectory example/exampledocs contains formatted data examples-typically XML code-and is prepared for SOLR indexing.
ExampleThe directory contains a complete example SOLR application. To run it, you only need to use the application archive start. jar to start the Java engine.
Listing 3. Start the Java Engine
$ java -jar start.jar
2007-11-10 15:00:16.672::INFO: Logging to STDERR via org.mortbay.log.StdErrLog
2007-11-10 15:00:16.866::INFO: jetty-6.1.3
INFO: SolrUpdateServlet.init() done
2007-11-10 15:00:18.694::INFO: Started SocketConnector @ 0.0.0.0:8983
Applications can now be used on port 8983. Start the browser and type
http://localhost:8983/solr/admin/. This is the interface used to manage SOLR (to stop the SOLR server, TypeCTRL + cKey combination ).
However, no data is available for management or query in SOLR indexes.
Load data into SOLR
SOLR is flexible and supports various data types and rules for creating valid indexes. In addition, although SOLR supports a wide range of data types and rules, if the standard components are not enough, you can further customize SOLR by writing new Java classes.
Given a set of data types and rules, you can create a SOLR mode to describe data and control how indexes should be constructed. Then export the data to match the pattern and load the data into SOLR. SOLR dynamically creates indexes and updates each index immediately when a record is created, modified, or deleted.
You can find the default SOLR mode in the SOLR source code library of apache.org. For reference, the following shows the code snippets in the default mode.
Listing 3. Default SOLR mode code snippet
<schema name="example" version="1.1">
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text" indexed="true" stored="true"/>
<field name="nameSort" type="string" indexed="true" stored="false"/>
<field name="cat" type="text" indexed="true" stored="true" multiValued="true"/>
<copyField source="name" dest="nameSort"/>
You do not need to describe most of the content of the mode, but you need to pay attention to the following aspects:
- As shown in, Field
idIs a string (
type="string"And should be indexed (
indexed="true"). It is also a required field (
required="true"). In this mode, each record that loads SOLR must provide a value for this field.
idThe field must be unique (SOLR does not require the ID field to be unique; this is only the rule created in the default index mode ). Attribute
idFields should be searchable.
false? You can use unsearchable fields to sort the results in different ways. For example, you can use
nameSort, It is
nameCopy of the field (in the last line
copyFieldCommand), but the behavior is different. Note,
text. The default index mode is slightly different for processing the two types.
multiValued. You can define multiple values for this field. For example, if an application manages content, you can specify multiple titles for an article. You can use
catFields (or custom similar fields) to capture all titles.
Listing 4 shows the example/exampledocs/ipod_other.xml file, which represents two entries in the iPod attachment category.
Listing 4. Data formatted in the default SOLR Index Mode
<field name="name">Belkin Mobile Power Cord for iPod w/ Dock</field>
<field name="features">car power adapter, white</field>
<field name="name">iPod & iPod Mini USB 2.0 Cable</field>
<field name="features">car power adapter for iPod, white</field>
addAn element is a SOLR command used to add an encapsulated record to an index. Each record will be captured
docElement, which uses a group
fieldTo specify the field value. Field
popularityAll other fields defined in the default SOLR index mode.
featuresField ownership and
catThe same attribute, but the meaning is different: it lists the features of the product, the number may be large.
Search for Auto Parts
In this example, the auto parts set is indexed. Each component has multiple fields. Table 1 shows the most important field samples. Field names are listed in the first column. The second column provides a brief description, and the third column lists the logical types. The fourth column displays the index type used to represent the data (as defined in the pattern in listing 5 ).
Table 1. Fields of auto parts record
|Part number (unique, mandatory)
|Model (required, multi-value)
||Model, such as "Camaro"
|Model Year (multi-value)
||Model year, such as 2001
||Functions of Parts
Listing 3 shows the SOLR mode section used by the auto parts index. Most of them are based on the default SOLR mode. The specific field used -- name and attribute -- is replaced by the one found in the default mode.
fieldsElement (as shown in Listing 1 ).
Listing 5. Auto Parts Index Mode
<?xml version="1.0" encoding="utf-8" ?>
<schema name="autoparts" version="1.0">
<field name="partno" type="string" indexed="true"
stored="true" required="true" />
<field name="name" type="text" indexed="true"
stored="true" required="true" />
<field name="model" type="text_ws" indexed="true" stored="true"
multiValued="true" required="true" />
<field name="year" type="text_ws" indexed="true" stored="true"
multiValued="true" omitNorms="true" />
<field name="price" type="sfloat" indexed="true"
stored="true" required="true" />
<field name="inStock" type="boolean" indexed="true"
stored="true" default="false" />
<field name="features" type="text" indexed="true"
stored="true" multiValued="true" />
<field name="timestamp" type="date" indexed="true"
stored="true" default="NOW" multiValued="false" />
<field name="weight" type="sfloat" indexed="true" stored="true" />
For the above fields, you need to export and format the auto parts database and upload it to SOLR, as shown in Listing 6.
Listing 6. Auto Parts database formatted for Indexing
<field name="name">Spark plug</field>
Let's install the new index mode and load the data into SOLR. First, useCTRL + cTogether to stop the SOLR daemon (if it is still running ). Create an archive of the existing SOLR mode in example/SOLR/CONF/Schema. xml. Next, create a text file in Listing 6, save it to/tmp/Schema. XML, and copy it to example/SOLR/CONF/Schema. xml. Create another file for the data shown in listing 7. Now you can restart SOLR and use the posting utility provided in the example.
Listing 7. Enable SOLR with New Mode
$ cd apache-solr-1.2/example
$ cp solr/conf/schema.xml solr/conf/default_schema.xml
$ chmod a-w solr/conf/default_schema.xml
$ vi /tmp/schema.xml
$ cp /tmp/schema.xml solr/conf/schema.xml
$ vi /tmp/parts.xml
$ java -jar start.jar
2007-11-11 16:56:48.279::INFO: Started SocketConnector @ 0.0.0.0:8983
$ java -jar exampledocs/post.jar /tmp/parts.xml
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update...
SimplePostTool: POSTing file parts.xml
SimplePostTool: COMMITting Solr index changes...
Successful! If you need to check whether the index exists and contains two documents, point the browser to http: // localhost: 8983/SOLR/admin/again /. You should see "(autoparts)" at the top of the page )". If you see this, click the query box in the middle of the page and type
partno: 1 or partno: 2.
The result should be similar to the following:
3 on 10 0 partno: 1 OR partno: 2 2.2
true Boxster 924 Spark plug 1 25.0 2007-11-11T21:58:45.899Z 1999 2000
false 911 Windshield 2 15.0 2007-11-11T21:58:45.953Z 1991 1999
Try other queries. The Lucene wiki describes the Lucene query (search engine in SOLR) syntax (see references ).
You should also try to edit and load data again. As declared
partnoThe field is unique. Therefore, when you upload the same part number repeatedly, you only need to replace the old index record with the new record. Besides
addYou can also use
delete. The last command can delete a specific record by ID or query and delete multiple records.