Iveely Search Engine 2 and 3 questions, use your wisdom to solve it!

Source: Internet
Author: User

I sorted out two simple questions about the search engine this evening. All questions are from the iveely search engine. Share your wisdom with you! It is not difficult, but we hope to find the best solution.

Question 1:

Background:

In the user search process, we split the user's keywords and then matched them. For example, if you enter" Program "Life", after word segmentation, we will get "program" and "life". We can extract the webpage set corresponding to the "program" (9.00235, 123.00691, 96. 00035 ...), and the web page set corresponding to "life" (6.00025, 123.00128, 95. 00245 ...), the integer part is the Web page number, and the fractional part is the actual weight (value) of the keyword under the web page. Next, we will merge the web page set of "program" and "life, then feedback to the user. Problem: In the process of merging, you may encounter the same web page. When you encounter the same web page, we add the fractional part, and the integer part remains unchanged. If the fractional part is greater than 1, multiply the fractional part of the entire set by 0.1, and then accumulate. Problems to be Solved: Please design a Data Structure to solve the above problems with the lowest possible time complexity and space complexity.

Question 2:

Background:

In a search engine, each keyword corresponds to countless webpages, and each webpage corresponds to several keywords. After a search engine obtains a keyword, you must obtain a set of webpages with this keyword in the fastest possible time. Currently, the most common practice is reverse sorting. However, in reverse sorting files, although the keywords of the web page can be quickly extracted, the weight of the web page may not be the same. That is, the structure of the objects to be sorted is unordered.

 

Next, we will abstract the problem as Beijing subway station information. Every site is a keyword and every line is a webpage. Each site is contained by multiple lines (each keyword is included by several webpages), and each line contains multiple sites (each webpage contains multiple keywords ).

Problem generation: 

The inverted file allows us to quickly extract the line corresponding to the site, but unfortunately, for example, the user will return to Metro Line 2 after searching for the Xizhimen, metro Line 4 and Metro Line 13. However, there is another intersection between Metro Line 4 and Metro Line 2: Xuanwu gate. Why do we need to know xuanwumen? In the iveely design, the author thinks that when the intersection site in the search results is more concentrated and reaches a certain level, the site may also be a site that the user is interested in (mathematical proof: (omitted). For example, if a user transfers to the subway, he may want to transfer to the subway at Xizhimen. If the result shows many subway lines that contain xuanwumen, so we assume that xuanwumen can also be a good subway transfer solution.

Problems to be Solved:

Please design a Data Structure and calculate it at the lowest possible time complexity and space complexity, the search result contains the sorting set of the same site (based on the number of times the results contain ). For example, if you enter Xizhimen, you can return the recommended xuanwumen. If there are other sites, the list is listed based on the number of occurrences.

The above questions are self-developed and are problems I encountered in the process of open-source iveely. I think this is a meaningful question, because not only our thinking, but also our code technology, of course, the most important thing is our mathematics. I will issue other similar questions one after another, so that we can discuss and learn them together. Welcome to your attention on iveely search engine, if you have any good comments or suggestions, you can mail liufanping@iveely.com or meager contact me.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.