Index of search and retrieval basics

Source: Internet
Author: User
Keywords Search engine Comparison text
Tags .mall basics behavior data example index learn learn not

Absrtact: This is the first lesson I do simple introduction of the ego is small Peng 90 was born to do SEO has been three years in the accumulation of some experience and non-stop learning, out of this tutorial is only to promote their efforts to learn not decadent down. Statement: I only do text

This is the first lesson I do simple introduction of the ego is small Peng 90 was born to do SEO has been three years in the accumulation of some experience and non-stop learning, out of this tutorial is only to promote their efforts to learn not decadent down.

Disclaimer: I only do text tutorials without any voice and video and do not undertake any SEO projects.

What I'm going to talk about here is the basics. Start with the index:

The indexes are mainly divided into the fields of structure, analysis, organization, storage and retrieval.

The initial index is text-based:

All search engines are developed and expanded around several of these topics, starting in the the 1950s with focus points on text and text documents in document form (text document).

Text comparisons:

Defining the meaning of a word, sentence (news), paragraph, or the entire news story is much harder than defining a name in a database, so text comparisons are important and difficult to index. It's very simple to compare an article with us, the naked eye is very easy to distinguish the quality of the article, and the search engine their ability to understand is rather poor, so that the search engine has been in the simulation of human behavior to judge the core of the article is in the simulation of human behavior to understand the article and modeling, Accurate implementation of this comparison is the core of information retrieval.

Special:

Pictures, videos, audio (music and voice). These media, like text, are compared through its descriptions, but direct comparisons of media content are progressing, such as pictures, which can be used to make a general difference in color awakening.

User's Query method:

This type of query can be called a special search (ad hoc searching) because the scope of the user query is very large and the implementation is not any predictive, so it produced, filtering, classification, question and answer.

Filtering: That is, tracking, according to a person's behavior to determine the search user's hobby or interest, according to his interest found in accordance with his interest in the report alive to retrieve results.

Categories: Categories are typically labeled with a predefined set of labels or categories.

Question and Answer: For example: "The world's most populous country is that?" This is not much to say, you will understand the search.

Correlation:

Relevance is also an important issue in English, called relevance. Relevance is a very important part of the search engine, though it sounds very simple, but when a person determines whether an article is relevant, there are many factors that affect his decision. Chinese search is very complex because there are so many ways of expressing the same concept through different words, it will appear: This table does not match the problem.

The two concepts of topic-related and user-related must be differentiated, and only a distinction can be made to achieve basic optimization, which is two completely different concepts. If a text is related to the topic of the query, that means it is related to the topic of the query, which means the two have the same topic. For example, in Jiaozuo rainstorm weather related news report can query "Jiaozuo bad weather" is related topic. If the user searches for "Jiaozuo weather" This is another concept, he wants to query is not the rainstorm-related news But, the user-related weather forecast, very simple example I think we can soon understand it?

The correlation model is very much I am here simply to introduce the two common and more important, there are also some so-called optimization tools that rely on search engines, click Stream data to enhance the site rankings, this is very common but once you stop using your click Stream data, the site is dangerous, will implicate other rankings. This is caused by the abnormal data of the website.

Here's what we'll talk about here. How to judge the performance of search engines here's not much to say, because for everyone's help is not small, generally from the query throughput, user interaction, indexing speed, the corresponding time is closely related to the news here will have a, coverage and new sex on the judgment.

Conclusion: The collection of data to see the curve of the data, each site has its own data curve.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.