Full-Text search engine SOLR Series--Introductory article

Source: Internet
Author: User
Tags solr

SOLR uses the Lucene search library as the core, provides full-text indexing and search for open source enterprise platforms, provides rest of the Http/xml and JSON APIs, and if you're new to SOLR, get started with me! This tutorial takes solr4.8 as a test environment, and the JDK version requires version 1.7 and later.

Get ready

This article assumes that you have a junior level or above in Java and therefore no longer describes the configuration of the Java-related environment. Download unzip SOLR, in the example directory that has the Start.jar file, start:

1 java -jar start.jar

Browser access: http://localhost:8983/solr/, what you see is the SOLR admin interface

Index data

After the service is started, the interface you see currently does not have any data, you can add (update) the document to SOLR via the posting command, delete the document, include some sample files in the Exampledocs directory, run the command:

1 java -jar post.jar solr.xml monitor.xml

The above command is to add two documents to SOLR, open these two files to see what is inside, solr.xml content is:

1234567891011121314151617181920 <add><doc>  <field name="id">SOLR1000</field>  <field name="name">Solr, the Enterprise Search Server</field>  <field name="manu">Apache Software Foundation</field>  <field name="cat">software</field>  <field name="cat">search</field>  <field name="features">Advanced Full-Text Search Capabilities using Lucene</field>  <field name="features">Optimized for High Volume Web Traffic</field>  <field name="features">Standards Based Open Interfaces - XML and HTTP</field>  <field name="features">Comprehensive HTML Administration Interfaces</field>  <field name="features">Scalability - Efficient Replication to other Solr Search Servers</field>  <field name="features">Flexible and Adaptable with XML configuration and Schema</field>  <field name="features">Good unicode support: h&#xE9;llo (hello with an accent over the e)</field>  <field name="price">0</field>  <field name="popularity">10</field>  <field name="inStock">true</field>  <field name="incubationdate_dt">2006-01-17T00:00:00.000Z</field></doc></add>

To add a document to the index, the document is the data source used to search, and now you can search for the keyword "SOLR" through the admin interface, with the following steps:

Click on the button on the right side of the page will display the results of the Execute Query query, the result is just imported into the Solr.xml JSON format display results. SOLR supports rich query syntax, such as searching for name the keyword "search" in the field now to use the syntax name:search , but if you search for it, there is no name:xxx return result because there is no such content in the document.

Data import

There are also a variety of ways to import data into SOLR:

    • You can import data from a database using Dih (Dataimporthandler)
    • CSV file import is supported, so Excel data can also be imported easily
    • Support for JSON-formatted documents
    • Binary documents such as: Word, PDF
    • You can also customize the import in a programmatic way
Update data

What happens if the same document Solr.xml duplicate import? In fact, SOLR uniquely identifies the document based on the field of the document id , and if the imported document id already exists in SOLR, the document is automatically replaced with the newly imported document id . You can try it yourself and observe several parameters of the management interface before and after the replacement: Num Docs , Max Doc Deleted Docs the change.

    • Numdocs: The number of documents in the current system, which may be larger than the number of XML files, because an XML file may have more than one <doc> label.
    • Maxdoc:maxdoc is likely to be larger than the value of numdocs, such as repeating post the same file, the Maxdoc value increases.
    • Deleteddocs: Duplicate post file will replace old document, and the value of Delteddocs will be added 1, but this is only a logical deletion, and is not really removed from the index
Delete data

Delete the specified document by ID, or delete the matching document from a query

12 java -Ddata=args -jar post.jar "<delete><id>SOLR1000</id></delete>"java -Ddata=args -jar post.jar "<delete><query>name:DDR</query></delete>"

The solr.xml document is removed from the index, and the result is no longer returned when you search "SOLR" again. Of course, SOLR also has a transaction in the database, and the transaction is automatically committed when the delete command is executed, and the document is immediately removed from the index. You can also set the commit to False to commit the transaction manually.

1 java -Ddata=args  -Dcommit=false-jar post.jar "<delete><id>3007WFP</id></delete>"

After executing the above command, the document is not really deleted, or you can continue to search for related results, and finally through the command:

1 java -jar post.jar -

Commits the transaction, and the document is completely deleted. Now re-import the file you just deleted into SOLR and continue with our study.

Delete all data:

1 http://localhost:8983/solr/collection1/update?stream.body=<delete><query>*:*</query></delete>&commit=true

Delete specified data

1 http://localhost:8983/solr/collection1/update?stream.body=<delete><query>title:abc</query></delete>&commit=true

Multi-Conditional Delete

1 http://localhost:8983/solr/collection1/update?stream.body=<delete><query>title:abc AND name:zhang</query></delete>&commit=true
Querying data

Query data is obtained through HTTP GET requests, search keywords are specified with parameters, and q you can specify a number of optional parameters to control the return of information, such as: The field returned with the specified, for example fl f1=name , the returned data will only include the name field of the content

1 http://localhost:8983/solr/collection1/select?q=solr&fl=name&wt=json&indent=true
    • Sort

      SOLR provides sorting capabilities, specified by parameters sort , which support positive, reverse, or multiple field ordering

      • Q=video&sort=price desc
      • Q=video&sort=price ASC
      • Q=video&sort=instock ASC, Price desc
        By default, Solr socre is arranged in reverse order, and Socre is a search record that calculates a score based on the correlation degree.
    • Highlight

      In Web search, in order to highlight the results of the search, it is possible to highlight the matching keywords, SOLR provides good support, as long as the parameters are specified:

      • Hl=true #开启高亮功能
      • Hl.fl=name #指定需要高亮的字段
1 http://localhost:8983/solr/collection1/select?q=Search&wt=json&indent=true&hl=true&hl.fl=features
12345 "highlighting":{       "SOLR1000":{           "features":["Advanced Full-Text <em>Search</em> Capabilities using Lucene"]       }}
Text analysis

Text fields are indexed by dividing the text into words and using various conversion methods such as lowercase, complex, and stemming, and the fields are defined in the index in the Schema.xml file.
By default, the search for "Power-shot" does not match "PowerShot", by modifying the Schema.xml file (solr/example/solr/collection1/conf directory), By replacing the features and text fields with the "text_en_splitting" type, you can index them.

123 <field name= Code class= "Java plain" >type= "text_en_splitting" indexed= "true" stored= "true" multivalued= "true" /> ... <field name= "text" type= "text_en_splitting" indexed= "true" stored= "false" multivalued= "true" />

Restart SOLR after modifying and re-import the document

1 java -jar post.jar *.xml

Now we can match it.

    • Power-shot->powershot
    • Features:recharing->rechargeable
    • 1 gigabyte–> 1G

As an introductory article, this article does not introduce too many concepts. Installation to deployment, documentation updates, and a preliminary perceptual understanding of SOLR, the next article introduces the fundamentals of full-text retrieval.

Full-Text search engine SOLR Series--Introductory article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.