[Open search product evaluation]-making it easy to customize your own search engine within seven days

Source: Internet
Author: User

[Open search product evaluation ]--

Allows you to easily customize your own search engines within seven days.

 

[Background]

I believe many people have met the need to provide a search function for websites or apps. They have been tossing Lucene for a long time, if you want to perform Chinese word segmentation (for example, the one from the Chinese Emy of sciences), rewrite tokenizer, create indexes on your own, and update data in real time. If the data volume is large, you need to consider how to split the data environment and other issues. Since the beginning of 2014, I started to contact opensearch. At that time, I had to perform a heap search (including searching, searching for posts, searching for posts in a heap, and searching for members in the heap) it took less than one week from getting started to using opensearch skillfully. I feel that this is very consistent with the pace of Internet entrepreneurship, simple and convenient, and can quickly implement my own search interface.

 

[Usage process]

Taking the establishment of a tie-in-heap search as an example (if you do not know what it is, it is understood as something similar to Baidu Post-it), such as searching Zhao Wei, we can find the tie-up related to Zhao Wei. Let me briefly explain how to quickly build a search interface using opensearch.

1. register the opensearch account and follow the operation.

2. Create an application and define the index structure. For example:

You can search for the fields to be indexed, such as the name, Pinyin name, and tags. This is not used for aggregating. It can be filtered. For example, a field (checkin_type) indicates that some of the heaps are private and some are not. Therefore, you need to check checkin_type to be filtered, in this way, you can write a statement to select which matching search results are retained. It can be displayed. The search interface shows the fields to be displayed to the client.

 

3. Data Import: opensearch provides three data import methods that you can choose based on your application needs. For example, importing data from MySQL is a graphical interface. All you need to do is match the fields in MySQL with the index structure you just created. You can also push data through HDFS, sdks, and HTTP APIs. the SDK and HTTP APIs are flexible. For specific practices, refer to the help documentation for details.[Note: MySQL \ HDFSOnly Intranet support]

 

4. create an index. Click the "Import Data" tab on the page to rebuild the index. Click "recreate now". opensearch will follow the field matching method in the database we Just configured, read data from the database and create an index. After the process ends, you can access the search interface.

 

5. Access the search interface and click Search test in the upper-right corner of the application homepage.

?

There is an HTTP interface, and the result data in JSON format is returned after access. This is the simplest prototype of search.

 

[Tips]

The following describes some possible requirements and problems:

1.For example, the sorting RequirementFor example, if field a hits more important than field B's hits, that is, the matching of field a is better than that of field B (such as title and content ). In this way, you can customize the sorting formula. For example, you can use the bm25 algorithm to calculate the static score and text_relevance to calculate the matching degree of a field, fieldterm_proximity is used to calculate the matching density, and there are also time-field attenuation functions.

2.For example, some recall needsFor example, you can search for zhoujielun to find Jay Chou and all documents related to the Stars (not necessarily including the stars ), A large search engine may analyze the query through similar modules such as query refine and query correct to expand the recall. Here we can speculate a little, we can make a new field into the index structure for the specific term we want to recall, and give these fields a sorting weight for recall and proper sorting.

3.For example, you may need to search for something nearby.The sorting function provides a distance function that calculates the sphere distance. This method is O (n). If there is more data, the efficiency may be affected. We can make a field in the index structure, and use geohash algorithm (this algorithm refer to the http://en.wikipedia.org/wiki/Geohash) to convert the two-dimensional coordinates in the query into some strings with the same prefix (for example, we can keep 5, 6, put the geohash strings in the new index field. During retrieval, the two-dimensional coordinates of the input are also calculated as 5, 6, 7, and 8 geohash strings. The distance function is used in the data that can be indexed by these strings for more precise distance calculation and sorting, it can efficiently search for nearby things.

4.In addition, the data volumeCurrently, the index with the largest data volume has about doc files. In this case, the qps500 can still return the search results within 10 ms (of course, the field displayed for each doc in the search results should not be too large, in this way, the response time will be slower due to network transmission data ).

 

[Requirement]

1. we also found some badcases for queries, and some badcases for the intersection of query analysis and results. For example, if we search for "Jay Chou middle school photos ", according to the importance, Jay Chou> middle school = photo. Even if there are no articles in full hit, Jay Chou's photos or Jay Chou's documents should also be recalled. I believe this will be better and better.

Opensearch: a new feature is being developed to rewrite multiple dimensions of a user's query, such as downgrading a low-weight term, supports custom dictionaries (synonyms, correction, deprecated words, and specialized words), which further improves the search performance of long-tail words and reduces the result-free rate.

 

2. It would be better to provide the relevant search functions :). For example, based on the query logs frequently searched by this search application, what other words may be searched by the query user.

Opensearch: You have already planned related functions such as search and drop-down prompts, including click feedback and personalized search.

 

[Summary]

In short, opensearch helps users solve many search functions, so that we can easily customize our own search engines according to the needs of our products in a short time.

 

Exchange: http://www.laiwang.com/

Http://weibo.com/1644971875/Bj0XQhN28? MoD = weibotime & type = repost # _ rnd1408437347760

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.