Apache SOLR Beginner's Tutorial (introductory tour)
Written in front: This article covers all aspects of the introduction of SOLR, please read on a line, and I believe it will help you to have a clear and comprehensive understanding and use of SOLR.
In this example of the Apache SOLR Beginner tutorial, we will discuss how to install the latest version of Apache SOLR and show you how to configure it. In addition, we will show you how to make a sample data file index using SOLR. Apache SOLR supports a variety of formats, including various databases, PDF files, XML files, CSV files, and more. In this example, our index will look at how to index data from a CSV file.
The environment we prefer for this example is windows. Before starting SOLR installation, make sure that you have installed the JDK and Java_home settings correctly.
1. Why Choose Apache SOLR
Apache SOLR is a powerful search server that supports the rest-style API. SOLR is Lucene-based and Lucene supports powerful matching capabilities such as phrases, wildcards, joins, groupings and many more different data types. It is optimized for high traffic using Apache zookeeper. Apache SOLR offers a wide range of features, and we have listed some of the most important features.
- Advanced Full-Text search functionality.
- Xml,json and HTTP-based on open Interface standards.
- Highly scalable and fault tolerant.
- Both mode and modeless configurations are supported.
- Faceted search and filtering.
- Support like English, German, Chinese, Japanese, French and many major languages
- Rich document analysis.
2. Installing Apache SOLR
To first let's download the latest version of Apache SOLR from the following location:
Http://lucene.apache.org/solr/downloads.html
At the time of this writing, the stable version available is 5.0.0. Apache SOLR has undergone a variety of changes from 4.XX to 5.0.0, so if you have different versions of SOLR, you need to download the 5.xx version and use this as a template.
Once the Solr zip file is downloaded, unzip it to a folder. The extracted folder looks like the following.
SOLR's Folder
The bin folder contains scripts to start and stop the server. The example folder contains several sample files. We will use one of these to illustrate how SOLR indexes the data. The server folder contains the Logs folder, and all SOLR logs are written to that folder. This will help the indexing process to check for any error logs. Under the Sever folder, under SOLR, the server folder contains different collections or cores. The configuration and data for each core/set are stored in the corresponding Core/collection.
Apache SOLR comes with a built-in jetty server. But before we start, we have to verify that the instance of Java_home SOLR is set on the machine.
We can start the server with a command line script. Let's go to Solr's Bin directory and enter the following command from the command prompt
SOLR start
This will start the default port under the 8983 SOLR server.
Now we can open the following URL in the browser and verify that our SOLR instance is running. The details of SOLR's management tools are beyond the scope of the example.
http://localhost:8983/Solr/
SOLR Management Console
3. Configure Apache SOLR
In this section, we will show you how to configure the core/collection of SOLR instances, and how to define the fields. Apache SOLR comes with an option called modeless mode. This option allows the user to build a valid schema without having to manually edit the mode file. However, in this example, we will use the schema configuration to understand the internals of SOLR.
3.1 Building the Core
When the SOLR server is launched in standalone mode, the configuration is called the core, when it starts in Solrcloud mode the configuration is called a collection. In this example, we will discuss the standalone server and the core. We'll solrcloud the parking time and discuss it later.
First, we need to create a core index data. The SOLR creation command has the following options:
- - C <name> -creation of names of cores or collections (required).
- - D <confdir> -configuration directory, very useful in Solrcloud mode.
- - N <configname> -configuration name. This defaults to the same name as the core or collection.
- - p <port> -The port of the instance of the local SOLR sends the CREATE command; The default script attempts to detect a port by looking for an instance running SOLR.
- - s <shards> -The number of fragments is the default set of 1 splits.
- -RF <replicas> -The number of copies of each file in the collection. The default value is 1.
In this example, we will use the core name and the configuration directory-d parameter, the-c parameter. For all other parameters we use the default settings.
Now browse SOLR's -5.0.0 \ Bin folder in the Command window and issue the following command.
SOLR Creation-C jcg-d basic_configs
We can see the output below in the command window.
Creating new core ' JCG ' using command:
Http://localhost:8983/solr/admin/cores?action=CREATE&name=jcg&instanceDir=jcg
{
"Responseheader": {
"Status": 0,
"Qtime": 663},
"Core": "JCG"}
Now we navigate to the following URLs, and we can see that the core is populated in the core selection JCG. You can also see the statistical information of the cores.
Http://localhost:8983/Solr
SOLR's core JCG
3.2 Modify Schema.xml File
We need to modify the server\solr\jcg\conf contained fields under the folder of the files in Schema.xml. We will use one of the sample files "Books.csv" that accompanies the installation index of SOLR. Solr-5.0.0\example\exampledocs in the folder where the file resides
Now, we navigate to the folder SERVER\SOLR directory. You will see a folder named JCG is created. Subfolders conf and data each have the core configuration and the index of the database.
Now edit the \server\solr\jcg\conf\server\solr\jcg\conf\schema.xml file, set the unique element and add the following.
Schema.xml
<uniqueKey>id</uniqueKey>
<!-- Fields added for books.csv load-->
<field name="cat" type="text_general" indexed="true" stored="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="price" type="tdouble" indexed="true" stored="true"/>
<field name="inStock" type="boolean" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
We have set the property index to True. This specifies that the field is used for indexes and records can be retrieved using an index. The value set to False will only store the realm, but cannot be queried.
Also, note that we call for another property to store and set it to true. This specifies that the field is stored and can be returned in the output. Setting this field to false causes the field to be uniquely indexed and cannot be retrieved on the output.
The type of field that we have assigned to the "books.csv" file that exists here. In the CSV file "ID" The first field is taken care of by the unique key of the index Schema.xml file automatic element. If you notice, we have missed the field series_t,sequence_i and genre_s did not make any entries. However, when we do index all of these areas have not included any problems. If you want to know this situation, you need to explore the DynamicField part of the Schema.xml file.
Schema.xml
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_is" type="ints" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true" />
<dynamicField name="*_ss" type="strings" indexed="true" stored="true"/>
<dynamicField name="*_l" type="long" indexed="true" stored="true"/>
<dynamicField name="*_ls" type="longs" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_txt" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_bs" type="booleans" indexed="true" stored="true"/>
<dynamicField name="*_f" type="float" indexed="true" stored="true"/>
<dynamicField name="*_fs" type="floats" indexed="true" stored="true"/>
<dynamicField name="*_d" type="double" indexed="true" stored="true"/>
<dynamicField name="*_ds" type="doubles" indexed="true" stored="true"/>
Now that we have modified the configuration, we must stop and start the server. To do this, we need to issue commands from the bin directory via the command line.
SOLR Stop-all
The server will stop now. Now start the server issue from the bin directory by running the following command from the command line.
SOLR start
4. Index data
Apache SOLR comes with a Java program called Simpleposttool Standalone. This program is packaged as a jar and can be seen in the installation directory Example\exampledocs.
Now, we navigate to the Example\exampledocs folder on the command line and type the following command. You will see a bunch of options for using the tools.
Java-jar post.jar-h
In the general usage format the following uses:
Usage:java [systemproperties]-jar Post.jar [-h|-] [<file|folder|url|arg>
[<file|folder|url|arg> ...]]
As we said earlier, we will index the data in the "Books.csv" file. We will navigate to Solr-5.0.0\example\exampledocs at the command prompt and issue the following command.
Java-dtype=text/csv-durl=http://localhost:8983/solr/jcg/update-jar Post.jar Books.csv
The systemproperties used here are:
- -dtype-type of data file.
- The address of the-DURL-JCG core.
The file "Books.csv" will now display the following output in the index and command prompt.
Simpleposttool version 5.0.0
Posting files to [base] URL http://localhost:8983/solr/jcg/update using content-
Type text/csv ...
POSTing file books.csv to [base]
1 files indexed.
Committing SOLR index changes to http://localhost:8983/solr/jcg/update ...
Time spent:0:00:00.647
Now we navigate to the following URL and select the core JCG.
Http://localhost:8983/solr
SOLR's JCG Core data
Take a closer look at the statistics section, the people document parameter will show the number of rows indexed.
5. Accessing the indexed document
Apache SOLR provides a data access based on the rest API, and also provides different parameters to retrieve the data. We'll show you some scenario-based queries.
5.1 Search by name
We'll use its name to retrieve the details of the book. To do this, we will use the following syntax. The URL query event in the parameter "Q".
Open the browser following URL.
Http://localhost:8983/solr/jcg/select?q=name: "A Clash of Kings"
The output will be below,.
By name Solr
5.2 First Letter Search
Now we'll show you how to search for records, and if we only know the starting letter or word, don't remember the full title. We can use the following query to retrieve the results.
Http://localhost:8983/solr/jcg/select?q=name: "A"
Output will list all the books the letter a staring at
SOLR's first letter
5.3 Search using wildcard characters
The SOLR support wildcard search. We will show below how to retrieve all books that contain the title "of".
Http://localhost:8983/solr/jcg/select?q=name: "*of"
SOLR Search for wildcard characters
5.4 Conditions used for search
The search for SOLR support conditions. It is used, we can set the conditions, our query provides the "FQ" parameter. We will show you how to find this in the below query price below ¥6 book.
Http://localhost:8983/solr/jcg/select?q=*&fq=price:[0 to 6]
The output will only list books that are below $6.
SOLR search Criteria
6. The Client API for SOLR
There are different client APIs available to connect to the SOLR server. We have listed some of the widely used SOLR client APIs.
- Solruby–to Connect from Ruby
- Solphp–to Connect from PHP
- Pysolr–to Connect from Python
- Solperl–to Connect from Perl
- Solrj–to Connect from Java
- Solrsharp–to Connect from C #
In addition, SOLR provides a rest-based API that JavaScript can use directly.
PS: Because the length is too long, some passages use translation software translation, please understand, but have been manually modified, does not affect the content learning.
Reference address:https://examples.javacodegeeks.com/enterprise-java/apache-solr/apache-solr-tutorial-beginners/
Apache SOLR Beginner's Tutorial (introductory tour)