We use se every day, but there are still a lot of questions about Se, so let's discuss it.
1. Redo is not good.
For example, to retrieve the topic [compile the android kernel], there are only two different results ~ Three types (only the first 10 pages), but many of the retrieved contents are repeated, but they are only reprinted by different people on different websites, there is no such thing as simplifying the information retrieval, which makes us a lot of trouble.
Solution I expected: implementation of comprehensive reading
In the process of retrieval, se naturally collects information based on the search vocabulary, and then aggregates the information into a new one.ArticleIn the early stage, this article may not be a complete article written by people, but more like a tree entry like an encyclopedia. in the later stage, with the improvement of machine intelligence, this article may become an article written by people. But how can this article be obtained? I think clustering for individual keywordsAlgorithmIt should be very effective, and then combine the clustering quantity and rank of the weight article into a new article.
2. Word and word Separation
The complete sentence is searched in the SE and the result is actually the result after word segmentation. It is very rare that the sentence itself exists. However, when we use the entire sentence search, we can retrieve many of these sentences. I think this problem is caused by the fact that se does not understand the word and word carefully.
The solution I expected: Dynamic Planning
In fact, the problems we want to solve are very similar to matrix concatenation. They are all sequential. For example, "I am a number from Heilongjiang", the word segmentation of this sentence (including stopword) yes (I, yes, from Heilongjiang, and numbers ). Okay, then we will retrieve the results of each word separately, and then the retrieval distance is 2, 3... Then, the larger the value of K, the higher the weight given by K, so that we can get the effect of combining words with words. The better result is that we can analyze it in the form of natural language to obtain a higher probability that some of them can be combined to adjust the weight!
Write so much for the time being and add it later.