SOLR Learning (i) Getting Started

Source: Internet
Author: User
Tags solr

SOLR uses the Lucene search library as the core, provides full-text indexing and search for open source enterprise platforms, provides rest of the Http/xml and JSON APIs, and if you're new to SOLR, get started with me! This tutorial takes solr4.8 as a test environment, and the JDK version requires version 1.7 and later.

Get ready

This article assumes that you have a junior level or above in Java and therefore no longer describes the configuration of the Java-related environment. Download unzip SOLR, in the example directory that has the Start.jar file, start:

1 Java-jar Start.jar

Browser access: http://localhost:8983/solr/, what you see is the SOLR admin interface

Index data

After the service is started, the interface you see currently does not have any data, you can add (update) the document to SOLR via the posting command, delete the document, include some sample files in the Exampledocs directory, run the command:

1 Java-jar Post.jar Solr.xml Monitor.xml

The above command is to add two documents to SOLR, open these two files to see what is inside, solr.xml content is:

1 2 3 4 5 6 7 8 9 Ten-All <add> <doc>   <field name= "id" >SOLR1000</field>   <field name= "name" &GT;SOLR, the Enterprise Search server</field>   <field name= "Manu" >apache software Foundation </field>   <field name= "Cat" >software</field>   <field name= "Cat" > Search</field>   <field name= "Features" >advanced Full-text Search capabilities using Lucene< /field>   <field name= "Features" >optimizedforhigh Volume Web traffic</field>    <field name= "Features" >standards Based Open interfaces-xml and http</field>   <field name= " Features ">comprehensive HTML administration interfaces</field>   <field name=" Features "> Scalability-efficient Replication to other SOLR Search servers</field>   <field name= "Features" > Flexible and adaptable with XML configuration and schema</field>    <field name= "Features" >good Unicode support:h& #xE9; Llo (hello with a accent over the E) </field> & Nbsp; <field name= "Price" >0</field>   <field name= "popularity" >10</field>   <field name= "Instock" >true</field>   <field name= "Incubationdate_dt" > 2006-01-17t00:00:00.000z</field> </doc> </add>

To add a document to the index, the document is the data source used to search, and now you can search for the keyword "SOLR" through the admin interface, with the following steps:

When you click the Execute Query button under the page, the query results appear on the right side, which is the result of the JSON format of the solr.xml that you just imported. SOLR supports rich query syntax, such as searching for the keyword "search" in the field name now to name:search the syntax, but if you search for name:xxx, there is no return result because there is no such content in the document.

Data import

There are also a variety of ways to import data into SOLR:

    • You can import data from a database using Dih (Dataimporthandler)
    • CSV file import is supported, so Excel data can also be imported easily
    • Support for JSON-formatted documents
    • Binary documents such as: Word, PDF
    • You can also customize the import in a programmatic way
Update data

What happens if the same document Solr.xml duplicate import? In fact, SOLR uniquely identifies the document based on the field ID of the document, and if the ID of the imported document already exists in SOLR, the document is automatically replaced with the newly imported document with the same ID. You can try it yourself and observe several parameters of the management interface before and after the replacement: Num Docs,max doc,deleted Docs changes.

    • Numdocs: The number of documents in the current system, which may be larger than the number of XML files, because an XML file may have multiple <doc> tags.
    • Maxdoc:maxdoc is likely to be larger than the value of numdocs, such as repeating post the same file, the Maxdoc value increases.
    • Deleteddocs: Duplicate post file will replace old document, and the value of Delteddocs will be added 1, but this is only a logical deletion, and is not really removed from the index
Delete data

Delete the specified document by ID, or delete the matching document from a query

1 2 Java-ddata=args-jar Post.jar "<delete><id>SOLR1000</id></delete>" Java-ddata=args-jar Post.jar "<delete><query>name:DDR</query></delete>"

At this point the Solr.xml document is removed from the index, and the result is no longer returned when you search for "SOLR" again. Of course, SOLR also has a transaction in the database, and the transaction is automatically committed when the delete command is executed, and the document is immediately removed from the index. You can also set the commit to False to commit the transaction manually.

1 Java-ddata=args-dcommit=false-jar Post.jar "<delete><id>3007WFP</id></delete>"

After executing the above command, the document is not really deleted, or you can continue to search for related results, and finally through the command:

1 Java-jar Post.jar-

Commits the transaction, and the document is completely deleted. Now re-import the file you just deleted into SOLR and continue with our study.

Delete all data:

1 http://localhost:8983/solr/collection1/update?stream.body=<delete><query>*:* </query></ Delete>&commit=true

Delete specified data

1 Http://localhost:8983/solr/collection1/update?stream.body=<delete><query>title:abc</query> </delete>&commit=true

Multi-Conditional Delete

1 HTTP://LOCALHOST:8983/SOLR/COLLECTION1/UPDATE?STREAM.BODY=<DELETE><QUERY>TITLE:ABC and Name:zhang </query></delete>&commit=true
Querying data

The query data is obtained through the GET request of HTTP, the search keyword is specified with parameter q, also can specify a lot of optional parameters to control the return of the information, for example: using FL To specify the returned field, such as F1=name, then the returned data will only include the contents of the Name field

1 Http://localhost:8983/solr/collection1/select?q=solr&fl=name&wt=json&indent=true
    • Sort

      SOLR provides sorting functionality, specified by the parameter sort, which supports positive, reverse, or multiple field ordering

      • Q=video&sort=price desc
      • Q=video&sort=price ASC
      • Q=video&sort=instock ASC, Price desc
        By default, SOLR is sorted according to Socre, and Socre is a search record that calculates a score based on relevance.
    • Highlight

      In Web search, in order to highlight the results of the search, it is possible to highlight the matching keywords, SOLR provides good support, as long as the parameters are specified:

      • Hl=true #开启高亮功能
      • Hl.fl=name #指定需要高亮的字段
1 Http://localhost:8983/solr/collection1/select?q=Search&wt=json&indent=true&hl=true&hl.fl= Features
The returned content contains:
1 2 3) 4 5 "Highlighting": {"SOLR1000": {"Features": ["Advanced Full-text <em>Search</em> capabilities U Sing Lucene "]}}
Text analysis

Text fields are indexed by dividing the text into words and using various conversion methods such as lowercase, complex, and stemming, and the fields are defined in the index in the Schema.xml file.
By default, the search for "Power-shot" does not match "PowerShot", by modifying the Schema.xml file (solr/example/solr/collection1/conf directory), By replacing the features and text fields with the "text_en_splitting" type, you can index them.

1 2 3 <field name= "Features" type= "Text_en_splitting" indexed= "true" stored= "true" multivalued= "true"/> ... <field Name= "text" type= "text_en_splitting" indexed= "true" stored= "false" multivalued= "true"/>

Restart SOLR after modifying and re-import the document

1 Java-jar Post.jar *.xml

Now we can match it.

    • Power-shot->powershot
    • Features:recharing->rechargeable
    • 1 gigabyte–> 1G
Summarize

As an introductory article, this article does not introduce too many concepts. Installation to deployment, documentation updates, and a preliminary perceptual understanding of SOLR, the next article introduces the fundamentals of full-text retrieval.

SOLR Learning (i) Getting Started

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.