SOLR uses the Lucene search library as the core, provides full-text indexing and search for open source enterprise platforms, provides rest of the Http/xml and JSON APIs, and if you're new to SOLR, get started with me! This tutorial takes solr4.8 as a test environment, and the JDK version requires version 1.7 and later.
Get ready
This article assumes that you have a junior level or above in Java and therefore no longer describes the configuration of the Java-related environment. Download unzip SOLR, in the example directory that has the Start.jar file, start:
Browser access: http://localhost:8983/solr/, what you see is the SOLR admin interface
Index data
After the service is started, the interface you see currently does not have any data, you can add (update) the document to SOLR via the posting command, delete the document, include some sample files in the Exampledocs directory, run the command:
1 |
Java-jar Post.jar Solr.xml Monitor.xml |
The above command is to add two documents to SOLR, open these two files to see what is inside, solr.xml content is:
1 2 3 4 5 6 7 8 9 Ten-All |
<add> <doc> <field name= "id" >SOLR1000</field> <field name= "name" >SOLR, the Enterprise Search server</field> <field name= "Manu" >apache software Foundation </field> <field name= "Cat" >software</field> <field name= "Cat" > Search</field> <field name= "Features" >advanced Full-text Search capabilities using Lucene< /field> <field name= "Features" >optimizedforhigh Volume Web traffic</field> <field name= "Features" >standards Based Open interfaces-xml and http</field> <field name= " Features ">comprehensive HTML administration interfaces</field> <field name=" Features "> Scalability-efficient Replication to other SOLR Search servers</field> <field name= "Features" > Flexible and adaptable with XML configuration and schema</field> <field name= "Features" >good Unicode support:h& #xE9; Llo (hello with a accent over the E) </field> & Nbsp; <field name= "Price" >0</field> <field name= "popularity" >10</field> <field name= "Instock" >true</field> <field name= "Incubationdate_dt" > 2006-01-17t00:00:00.000z</field> </doc> </add> |
To add a document to the index, the document is the data source used to search, and now you can search for the keyword "SOLR" through the admin interface, with the following steps:
When you click the Execute Query button under the page, the query results appear on the right side, which is the result of the JSON format of the solr.xml that you just imported. SOLR supports rich query syntax, such as searching for the keyword "search" in the field name now to name:search the syntax, but if you search for name:xxx, there is no return result because there is no such content in the document.
Data import
There are also a variety of ways to import data into SOLR:
- You can import data from a database using Dih (Dataimporthandler)
- CSV file import is supported, so Excel data can also be imported easily
- Support for JSON-formatted documents
- Binary documents such as: Word, PDF
- You can also customize the import in a programmatic way
Update data
What happens if the same document Solr.xml duplicate import? In fact, SOLR uniquely identifies the document based on the field ID of the document, and if the ID of the imported document already exists in SOLR, the document is automatically replaced with the newly imported document with the same ID. You can try it yourself and observe several parameters of the management interface before and after the replacement: Num Docs,max doc,deleted Docs changes.
- Numdocs: The number of documents in the current system, which may be larger than the number of XML files, because an XML file may have multiple <doc> tags.
- Maxdoc:maxdoc is likely to be larger than the value of numdocs, such as repeating post the same file, the Maxdoc value increases.
- Deleteddocs: Duplicate post file will replace old document, and the value of Delteddocs will be added 1, but this is only a logical deletion, and is not really removed from the index
Delete data
Delete the specified document by ID, or delete the matching document from a query
1 2 |
Java-ddata=args-jar Post.jar "<delete><id>SOLR1000</id></delete>" Java-ddata=args-jar Post.jar "<delete><query>name:DDR</query></delete>" |
At this point the Solr.xml document is removed from the index, and the result is no longer returned when you search for "SOLR" again. Of course, SOLR also has a transaction in the database, and the transaction is automatically committed when the delete command is executed, and the document is immediately removed from the index. You can also set the commit to False to commit the transaction manually.
1 |
Java-ddata=args-dcommit=false-jar Post.jar "<delete><id>3007WFP</id></delete>" |
After executing the above command, the document is not really deleted, or you can continue to search for related results, and finally through the command:
Commits the transaction, and the document is completely deleted. Now re-import the file you just deleted into SOLR and continue with our study.
Delete all data:
1 |
http://localhost:8983/solr/collection1/update?stream.body=<delete><query>*:* </query></ Delete>&commit=true |
Delete specified data
1 |
Http://localhost:8983/solr/collection1/update?stream.body=<delete><query>title:abc</query> </delete>&commit=true |
Multi-Conditional Delete
1 |
HTTP://LOCALHOST:8983/SOLR/COLLECTION1/UPDATE?STREAM.BODY=<DELETE><QUERY>TITLE:ABC and Name:zhang </query></delete>&commit=true |
Querying data
The query data is obtained through the GET request of HTTP, the search keyword is specified with parameter q, also can specify a lot of optional parameters to control the return of the information, for example: using FL To specify the returned field, such as F1=name, then the returned data will only include the contents of the Name field
1 |
Http://localhost:8983/solr/collection1/select?q=solr&fl=name&wt=json&indent=true |
- Sort
SOLR provides sorting functionality, specified by the parameter sort, which supports positive, reverse, or multiple field ordering
- Q=video&sort=price desc
- Q=video&sort=price ASC
- Q=video&sort=instock ASC, Price desc
By default, SOLR is sorted according to Socre, and Socre is a search record that calculates a score based on relevance.
- Highlight
In Web search, in order to highlight the results of the search, it is possible to highlight the matching keywords, SOLR provides good support, as long as the parameters are specified:
- Hl=true #开启高亮功能
- Hl.fl=name #指定需要高亮的字段
1 |
Http://localhost:8983/solr/collection1/select?q=Search&wt=json&indent=true&hl=true&hl.fl= Features |
The returned content contains:
1 2 3) 4 5 |
"Highlighting": {"SOLR1000": {"Features": ["Advanced Full-text <em>Search</em> capabilities U Sing Lucene "]}} |
Text analysis
Text fields are indexed by dividing the text into words and using various conversion methods such as lowercase, complex, and stemming, and the fields are defined in the index in the Schema.xml file.
By default, the search for "Power-shot" does not match "PowerShot", by modifying the Schema.xml file (solr/example/solr/collection1/conf directory), By replacing the features and text fields with the "text_en_splitting" type, you can index them.
1 2 3 |
<field name= "Features" type= "Text_en_splitting" indexed= "true" stored= "true" multivalued= "true"/> ... <field Name= "text" type= "text_en_splitting" indexed= "true" stored= "false" multivalued= "true"/> |
Restart SOLR after modifying and re-import the document
1 |
Java-jar Post.jar *.xml |
Now we can match it.
- Power-shot->powershot
- Features:recharing->rechargeable
- 1 gigabyte–> 1G
Summarize
As an introductory article, this article does not introduce too many concepts. Installation to deployment, documentation updates, and a preliminary perceptual understanding of SOLR, the next article introduces the fundamentals of full-text retrieval.
SOLR Learning (i) Getting Started