Chapter 1 (4 and 5) of Solr In Action

Source: Internet
Author: User
Tags solr

Chapter 1 (4 and 5) of Solr In Action

1.1 feature Overview

Finally, let's take a quick look at the main functions of Solr based on the following categories:

· User Experience

· Data Modeling

· New Features of Solr 4

This book provides a good search experience for your users throughout the book. So let's start with the user experience and look at how Solr makes your users feel better.

1.4.1 user experience Functions

Solr provides a series of important functions to help you build an easy-to-use, intuitive, and powerful search engine. However, you must note that Solr only provides restful HTTP APIs and does not provide UI components and frameworks related to the search interface. You need to roll up your sleeves to write your own search UI, which can take full advantage of the user experience functions listed below:

· Paging and sorting functions

· Category search function

· Automatic completion Function

· Spelling check

· Highlight hit results

· Geographic location query

Paging and sorting

Solr does not return all results that meet the query conditions. Solr optimizes the query results of paging requests. Each time only the top N documents are returned when the first page of results is requested. If the user does not find the desired information in the results on the first page, the user can obtain the subsequent page number through simple API calls and request parameters. The paging function is helpful for two types of key outputs: 1) results are returned faster, because each query only needs to return a small set of the entire search results; 2) it can help you track how many requests target more page content. This indicator shows whether your correlation score is calculated incorrectly. In Chapter 7th, you will learn more about paging and sorting.

 

CATEGORY search:

The classification search function classifies search results into groups based on their features, which provides users with a tool to continuously Optimize search keywords and browse search results. In our real estate search example (Figure 1.1), we can see that the results searched by users through basic search keywords are organized into three categories: house features, house types, and list type. The category search function is one of the most popular powerful Solr functions. In chapter 8th, we will discuss the category search function in detail.

 

Automatic completion

Most users expect that your search application will still return correct results if the entered information is incomplete. The auto-Fill function automatically fills in the keyword based on the document content in the system index file. The Automatic completion feature of Solr allows you to obtain a list of query words recommended based on these input characters by entering a few characters. This can greatly reduce the chance of users entering wrong query words, especially many users now enter search content on mobile devices with a keypad.

The auto-completion function provides an optional query word example. Return to our real estate search application example. When a user inputs "hig", Solr's auto-completion function will return available query words such as highlands neighborhood or highlandsranch. In Chapter 10, we will introduce the auto-completion function in detail.

 

Spelling check

In this era of mobile devices and when everyone is busy, the spelling check function is particularly important. When you enter a query word with spelling errors, you still expect the search engine to handle these small errors elegantly and return the correct query results to the user. Solr supports two basic spell checking modes:

Automatic Error Correction Mode: Solr can automatically correct a spelling error based on whether the word exists in the index.

"Are you looking... "Function: Solr can also suggest a better input scheme based on user input. For example, when a user inputs" hilands ", solr will advise the user." Are you looking for a solution?Highlands? "

The spelling check function has been greatly improved in Solr 4 and we will discuss it in detail in Chapter 10th.

 

Highlight hit results

When you search for a document with a large amount of text, you can use Solr's highlighted hit results function to highlight the hit content. This is very useful in documents with long text content. You can use this function to easily find the Hit search content in the long text content. We will discuss in detail the highlighted hit function in Chapter 9th.

 

Geographic location search

Geographic location search is a great feature in Solr 4. Solr4 supports indexing of longitude and latitude values and can sort documents by geographical distance. Solr can locate the corresponding document records and sort the results based on the distance to a certain point in geographical location (a specific point in longitude and latitude. In our real estate application example, the matching property search results can be performed on the interactive map based on the user's scaling and the movement of the central store, sort the query results by distance from the center.

Another exciting feature in Solr 4 is that you can even draw a variety of geometric figures on a map, such as polygon, to query Geographical locations based on the intersection of different shapes. This is useful when you want to specify the geographical range of a specific block when looking for a house. In Chapter 14th, we will discuss the Solr geo-location query function.

 

 

1.4.2 Data Modeling Functions

As we discussed in section 1.1, Solr optimizes specific types of data processing. In this section, we list a series of key functions that you may use when creating a data model for search, including:

· Value range merging and grouping

· Flexible query support

· Connection Function

· Collection function

· Import rich media data from documents in PDF and word formats

· Data import from relational databases

· Multi-language support

 

·Value range merging and grouping

Although Solr requires that the documents to be processed should be flat and non-normalized as much as possible, it still allows you to group and manage multiple documents according to some common attributes. A value group, also known as a combination of values, allows you to return a specific document group in addition to returning documents.

A classic example of the value range grouping function is the mail list, we can attach all emails that meet the query conditions on the same topic to a list of emails starting with the email that triggered the earliest session and return them to the user. You will learn more about value groups in Chapter 2.

 

·Powerful and flexible query support

Solr provides a series of powerful query functions, including:

· Supports conditional logic with (and), or (or), not

· Wildcard matching supported

· Supports Range Query of dates and numbers

· Fuzzy phrase query is supported.

· Support fuzzy String Matching

· Regular Expression matching

· Supports function Query

If you are not familiar with some of the terms, we will discuss them in Chapter 7th.

·Connection Function

In SQL, you can use JOIN to create a link and connect data between two or more tables through a common attribute called Foregn Key. However, in Solr, join operations are more like subqueries in SQL, but you do not create new documents by linking data between documents. For example, with the join function of Solr, you can return sub-documents whose parent documents meet the query conditions. The Solr connection function is useful when you need to get all comments from a tweet or Weibo. All comments are subdocuments of the original article. We will discuss this function in detail in Chapter 14th.

 

·Document collection

The document collection function allows you to group similar documents according to the description of each document.

This helps avoid returning a lot of similar document results when returning query results. For example, if your search engine is a news application that pushes articles through multiple RSS links, you may receive many reports about the same news at the same time. It is obviously not a good idea to return similar reports to users. At this time, you can use the document collection function to split these similar reports into one group, select a representative report and return it to the user. The collection technology will be discussed in detail in chapter 17th.

 

·Function of importing rich media data from documents in PDF and word formats

In some cases, you may need to process some existing general-purpose documents, such as PDF and Microsoft Word documents, which can also be retrieved. It is easy to implement this with Solr, because Solr directly integrates with the ApacheTika project, which supports almost all popular document formats. The introduction of Rich Media Documents is discussed in Chapter 12th.

·Data import from a relational database

If the data you want to search for is stored in a traditional relational database, you can configure Solr to create a document using SQL query statements. In Chapter 12th, we will discuss how to use the Solr data import interface (DIH)

·Multi-language support

Solr and Lucene support for multi-language environments has been developing for a long time. Solr has a built-in language automatic detection system that provides text analysis solutions for different language environments. In chapter 1, we will learn more about the relevant content.

1.4.3 New Features of Solr 4

Before we end this chapter, let's take a look at what new features Solr 4 brings to us. In general, version 4th is a major milestone for the ApacheSolr community. It suddenly solves various pain points and inconveniences found by Solr users during their years of use. We have selected several highlights here, but it should be noted that the various new features of Solr4 will run through each chapter in this book.

· Almost real-time search and query

· Supports atomic updates with optimistic concurrency mechanisms

· Real-Time Retrieval

· Transaction log persistence layer writing

· Use Zookeeper to easily perform sharding and replication operations

 

• Near real-time search and query

Solr's near-real-time (NRT) query function allows applications to query the newly added text within a few seconds after the index is created. Therefore, with the NRT function, Solr can cope with scenarios with fast content updates, such as toutiao.com or social networks. We will discuss NRT in detail in Chapter 13th.

·Support atomic update with optimistic concurrency mechanism

The atomic update function allows client applications to add, update, delete, or add the value range of existing documents without sending the entire document to Solr. For example, if the price of a house changes in our real estate search example in section 1.2, we can simply send an atomic update to Solr to update the price of the house record, you do not need to resend the information of the entire house record.

You may wonder what happens when two different client users try to update the same document record at the same time. In this case, Solr uses an optimistic concurrency mechanism to avoid conflicting updates. In short, Solr uses a specific value range called _ Version _ to enhance the security of document updates. When two different users attempt to update the same document at the same time, the user who finally submits the update will get the data of the expired version, so the update request will fail. The details of atomic update and Optimistic Concurrency mechanisms will be discussed in chapter 12th.

·Real-time Retrieval

At the beginning of this chapter, we show that Solr is also a NoSQL technology. The real-time acquisition function of Solr absolutely conforms to the typical NoSQL method. It allows you to obtain the latest version of the document content through the unique identifier of the document, there is no need to consider whether the new version of the document content is submitted to the index. This is similar to Cassandra, which is used to store the Key-value. It uses an original key to obtain the latest data corresponding to the key.

Before Solr 4, the text content must be submitted to the Lucene index file before it can be accessed. With Solr4's real-time retrieval function, the process of obtaining document content through a unique identifier has been safely separated from the process of creating Lucene indexes. This is useful when the document content is updated after the index is created. You do not need to submit the document content again to create a new index. In chapter 1, we can see that the overhead of resubmit content is huge, which will affect the query performance.

· Write at the persistent layer

When a document is sent to Solr for indexing, its content is written into a transaction log to avoid data loss due to server faults. The transaction log of Solr is in an intermediate state between sending a document from the client application and submitting the document content to the Lucene index file. It also participates in the implementation of the real-time retrieval function, because the content of the document can be extracted by unique identifiers no matter whether the document has been submitted to the Lucene index file.

Transaction logs allow Solr to separate the persistence of updated content from the visibility of updated content. This means that the document may exist in persistent storage but is not visible in the search results. Your own application can flexibly control when new document content is submitted to the index, so that the new document content can be retrieved during search. You do not need to worry about the loss of new document content if the server crashes before you submit the index. In Chapter 5th, we will discuss the writing and submission policies of the persistent layer.

·Use Zookeeper to easily perform sharding and replication operations

If you have never used Solr before, you may not understand how troublesome it is to use Solr in earlier versions for horizontal scaling. With SolrCloud, horizontal scaling becomes simple and automated. Because solr uses ApacheZookeeper to synchronize configuration and management of master and replica backups. On Apache's official website, Zookeeper is described as follows: "This is a central service used to maintain configuration information, name, and provide distributed synchronization and Grouping services".

In Solr, Zookeeper specifies the copy and backup of the primary shard and Shard, and monitors whether the server can respond to the query request normally. SolrCloud is already bound to the Zookeeper service, so you can start SolrCloud without additional configuration. We will discuss SolrCloud details in chapter 16th.

1.5 conclusion

We hope that you have an intuitive understanding of typical Solr cases and supported data types. As you learned in section 1.1, Solr has optimized the processing of four types of data, including text-centered data, reading data far better than writing data, document-oriented data and more flexible schema data. We also learned that a search engine like Solr is not a general data storage and processing solution, but is used to process keyword searches and sort the results, and help users browse and find more compliant information in the results. Through a virtual real estate search application, we learned how Solr builds indexes based on Lucene and how Solr performs Indexing Based on HTTP, web services such as XML and JSON are used to configure and manage index creation rules. Solr4 can meet the high-concurrency query requirements of massive data through the sharding function and replication extension. Solr4 does not have a single point of failure.

We also discussed some key benefits of choosing Solr from the perspective of different roles in the company. We have seen how Solr solves the questions that may be raised by software architects, system administrators, and even CEOs of the company. Finally, we roughly reviewed Solr's main functions and provided a reading guide so that you can quickly find the chapters that interest you.

We hope that you will be very excited here and want to continue learning about Solr. Now it's time to download the latest Solr and run it on your local machine. In the next chapter, we will do this together.

 


Struts2 in action Chinese Version

As you can see in ChinaJavaWorld, I think the translation is very good! Paste the following address here. If you need it, go to www.chinajavaworld.net/non-cgi/usr/48/48_2480.rar.
 
Where can I go to the Spring in Action (second edition) Chinese version? La

It seems that no one has scanned it yet. Since you know this book is good, it shows that your style is not low. Why bother with Chinese? I have almost finished reading the English version, and I think it is quite easy to read. In contrast to the Chinese version, I read several pages of free trial on csdn. When I saw him translate JPA into "java reserved API", I felt that the translator was not professional enough to continue reading it.
In short, it is recommended that you read English (for this book ). If you think you are not able to use English, find the quick books written by domestic authors ~~ Self-built volume

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.