The classification system and performance evaluation of search engine Yahoo

Source: Internet
Author: User
Keywords Search engine evaluation Yahoo

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

At present, many search engines combine the task of organizing the Network information resources through the combination of the key words such as the hierarchical topic catalogue and the keywords provided by the computer retrieval software. Yahoo is the typical representative of this class-style theme-Guide search engine.

Yahoo's charm lies in its browsable ranking theme index. Based on the theme of the classification index, providing a comprehensive classification architecture, combined with high-quality search software, Yahoo successfully established a unique information management and organization mechanism, so that the comprehensive search of network information into reality. This paper makes a further discussion on Yahoo's class-target system, classification principle, retrieval method and performance evaluation.

I. Category system

Yahoo consists of 14 basic categories, including art&humanities (arts and Humanities), Business&economy (Business and Economics), Computers&internet (Computer and Internet/Network), education (education), Entertainment (Entertainment), government (government), Tiyatien (Health and Medicine), News&media (News and media), Recreation&sports (leisure and sports), Reference (Reference), Regional (country and region), Science (Sciences), Socialscience (Social Sciences), Society&culture (Society and Culture).

Depending on the information or the size of the Web site and the needs of the knowledge organization, each basic class is subdivided into different levels of sub categories or subcategories, and the more specific the subject of the site in the sub category. It establishes a fairly detailed directory hierarchy of categories, such as the class head, subclasses, and so on. Its class head design is reasonable, the structure is complete, comprehensive, the class order hierarchy is distinct, the level detail, the broad degree is different, thus provides the foundation for the online rich information resources classification, especially the exact classification.

Ii. Principle of classification

Internetscoutproject's classified expert, Aimeeglassel, said, "There is a close link between the famous classification experts in India and the library experts Yangang Nazin's system of colon classification and the main catalogue of Yahoo Network information resources", It reveals the nature of the classification of network information resources by using the method of facet analysis of Yahoo. Specifically, the following points can be a deep understanding of Yahoo's faceted classification principle or basic process.

1. Use of broad thematic areas to establish a classification index

In order to make its classification system not only has the infinite accommodating, but also has the quite specially specifically, the Yahoo uses the relatively broad topic domain, through the analysis and the synthesis method to establish the comparatively complete classification index. This is consistent with the idea of faceted classification, because dividing knowledge into a broad class-like facet, reflecting the subject content in many ways to avoid the linear one-way structure of enumerated class tables is the main principle of the Yangang Nazin colon taxonomy.

2. A combination of information content based on context

From the Yahoo's classification structure looks, may think it and thesaurus very close, because Yahoo also uses the vocabulary rather than the symbol to compose the corresponding concept word string. However, it is far more complicated than the common thesaurus to look at the ability of the combination. By analyzing the content features of Web pages, we get the concept word string or the indexing Word string composed of some class-object words in Yahoo classification architecture, and put them into the corresponding class-level. The independent vocabulary contained in Yahoo's concept string or retrieval string contains its own name, but once combined with other words, it creates a contextual relationship and has a deep meaning. From this point of view, and the faceted classification is very similar.

3. Use colon to mark information content

The "The 1960s Indian study on tuberculosis treatment" as the information content to be classified as indexing to specifically examine the marking system of both:

In the colon taxonomy for Yangang Nazin, the content is labeled: l,45;421;6;253;f.44 ' N5

Replace the corresponding symbol with the word:

Medicine,lungs; Tuberculosis:treatment; X-ray:research.india ' 1950

If the corresponding punctuation in the faceted formula is replaced with a colon, the resulting string form is the method used in Yahoo to describe the content of the information, which is accordingly expressed as:

Health:DiseasesandConditions:Tuberculosis

It can be seen how similar the information description is! Yahoo uses colon as a unified separator to organize and describe the content of information, not only retains the characteristics of the original faceted marking, but also simplifies the marking system to a certain extent, thus greatly improving the efficiency of the classification indexing of information.

4. Provide different classification path entrance

A "virtual collection of Information" is a great advantage of Yahoo, which is embodied in the flexibility of its conceptual model and reference order (i.e., the fractal order). In a traditional library, a book can only be placed in a fixed position on the bookshelf. But in the digital world, electronic information resources are no longer limited to the only physical location. We can divide an information source into different positions of the class structure. By applying the method of faceted analysis to the organization of network information resources, Yahoo is able to provide a different path branch portal for a source of information in its vast classification hierarchy so that it can complete the query from different paths, providing services to different users retrieving the same content.

For example, if you are looking for the homepage of Wisconsin-madison University in the United States, Yahoo can offer the following categories or search paths:

(1) If starting from regional: class, the corresponding classification path is: regional:u.s.states:wisconsin:cities:madison:education:collegesanduniversities:
Universityofwisconsin-madison.

(2) If starting from the education category, the first few paths are: Education:HigherEducation:CollegesandUniversities, In the Collegesanduniversities directory, select the subcategory of the geographical area "unitedstates@", you can see, and return to the regional directory, and then the same as the previous path. The secret lies in the use of the symbol "@", which provides similar references (CrossReference) to guide users from a subset of the other branches of Yahoo's browsing hierarchy.

Third, the Search method

Yahoo can provide simple search and detailed search. The former mainly retrieves the first level catalogue in its classification structure, the latter can use keyword to form Boolean logic to retrieve, and its retrieval software is mainly provided by OpenText company. The combination of the two is called Bead Wall: one provides a powerful and high-quality directory of the subject guide, the other provides a high level of search tools. In addition, Yahoo in the search, also not only retrieve its own theme directory, but also will retrieve the OpenText company provided by the 1 million Web files received OpenText database.

Admittedly, Yahoo has some drawbacks in search, such as: only keyword search, and only support Boolean operators and and OR, do not provide near, etc., but by the end of its home page with other engines such as AltaVista, Lycos, and so on hyperlinks, Guide users into these places to search, which makes up a number of Yahoo's shortcomings. As a result, Yahoo is still one of the most popular query tools on www.

Iv. Performance evaluation

As a model of the theme Guide search engine, Yahoo has the following advantages:

1. The perfect combination of topic catalogue and retrieval software

By using the method of facet analysis, information management experts compile the topic catalogue, which reflects the knowledge and wisdom of people in selecting and organizing information, and improves the quality of catalogue compilation. At the same time, according to the theme directory of manual for the submission of the Web page to filter, classify and organize, but also to overcome the simple search software to automatically complete the classification of defects, enhance the classification of the hierarchy. Embedding the corresponding retrieval software or tools and integrating with them, providing high quality and efficient retrieval services, thus speeding up the reflection speed of the system, improving the accuracy of the retrieval, making the retrieval result closer to the user's information demand.

2. Reducing the difficulty of information retrieval

Yahoo's database is organized according to 14 categories (small classes with varying numbers) the classification system is very detailed, so it is a good starting point for a broad topic search, especially for those new users and users of fuzzy requirements, it is more natural to choose to browse the topic index which can be expanded in a gradual way than to construct the retrieval style. And, in the user's class below, shows the level of the class contains the number of entries, if the user thinks too much, you can also use keyword retrieval in this scope. Yahoo's directory features and the use of contextual services enable fast and easy retrieval, thus reducing the difficulty of Internet information retrieval to some extent and improving the user-friendliness of the system.

3. Classification selection of search results

Yahoo starts from the classification path, and finally divides the retrieval structure into the class output, which will greatly promote the choice of information. It also makes the necessary processing of the corresponding content in the result list, add some descriptive phrases or sentences to facilitate the user to browse and select: such as: (*) or (cool) tag indicates that the result item is superior to other items in content and layout design; (new) indicates the latest content included in the recent 3rd; and the above mentioned "@ "Indicates the relevant reference, the number of files included in parentheses, and so on. In addition, Yahoo increased the type of results displayed, can be related to the Web site, related Web pages, news and other forms of the corresponding search results. To sum up, Yahoo is developing new ways and means to improve its information retrieval service in order to achieve better service for users.

In summing up the advantages of Yahoo, but also should pay attention to its shortcomings, these defects are often the main theme of the search engine common drawbacks:

1, due to the rapid growth of Internet information, so that the speed of gathering information is far less than the growth rate of network resources, not to mention the speed of compiling the topic directory. This results in the establishment of a small database, and the number of documents collected under some of the shortcomings of the limited, so that users often "anticipation, spellbound", can not meet the corresponding information needs.

2, the simple search table in the default between the words set to ". Or.", and the built-in automatic truncation function, so that in the search often appear many unrelated files, resulting in reduced precision.

3, in order to adapt to different users of the query or search needs, Yahoo to the same information content can often provide a different path entrance, and the symbol "@" to establish the corresponding reference. This aspect enlarges the difficulty which the classification work, on the other hand also makes its classification consistency difficult to obtain the exact safeguard, therefore, often appears from a certain path to start, but cannot find the information content which the Yahoo contains the phenomenon.

4, to be included in the Web page or other information content of the increase in complexity is also invisible to increase the exact classification of the difficulty, such as with the ActiveX technology related to the literature is difficult to accurately classify in Yahoo.

5, in order to compile the high Quality topic catalogue and keep up with the speed of the development of network resources, it is necessary to devote considerable manpower, material and financial resources, and the quality requirement of the personnel engaged in the work is increasing. Otherwise, it will not be able to guarantee the quality of its theme directory, but also fundamentally unable to provide quality services.

V. Revelations and RECOMMENDATIONS

The key and most successful part of Yahoo is that it sets up a "touch board" for search engines, especially for the design and development of search engines in the subject-Guide category. Using Yahoo's advanced search engine experience to further improve the organization and management of network information resources, especially Chinese information resources, is the responsibility that history has given us. Now on the establishment of online Chinese information resources high quality, efficient "navigator", put forward the following suggestions:

1, Yahoo in the digital information organization successfully applied the idea of fractal analysis, set up a set of complete, comprehensive, hierarchical clear topic directory system to improve the quality of information organizations, this is worth our reference and learning.

At present, many Chinese engines in China are unable to keep up with the future situation because of the lack of the classification path, or because the catalogue system lacks the necessary classification subject theory basis, it brings a series of difficulties to the accurate classification and retrieval of the information. We do not necessarily copy the Yahoo classification mode. In the process of compiling the concrete, we should set up the necessary classification frame according to the Chinese thinking habit, the retrieval habit and the existing theory system (such as "the middle Chart Law").

2, should gradually increase the size of the database, thus laying a successful information retrieval material basis. It is recommended that you supplement the contents of the database in two ways: one is to encourage users to submit their own web page address (URL) through the online form, and the second is by their own patrol software to find new online files, will be included in the database, in the process of supplementing, should also pay attention to the regular updating of the database content. On this point, some of the Chinese search engines are still not enough, often only blindly fill the information, but lack of library content should have maintenance work, resulting in a huge database, low retrieval efficiency, information content obsolete, precision difference.

3, in view of the shortcoming of simply relying on manual information classification inefficiency, the study should be strengthened in this regard to consider whether it can be used in the organization of network information resources by means of processing information content, such as automatic classification, automatic indexing and automatic summarization, which are now implemented in the text environment and are further perfected. Manual and machine-assisted combination, will improve the work efficiency, improve the quality of information organization, management.

In addition, we should continue to strengthen the research and development of the search software. WWW Web page content is composed of image, animation, sound, video and other multimedia information. We should actively explore this kind of information retrieval way, but not limited to keyword search method. At present, the storage, indexing and retrieval of multimedia information are attracting more and more attention of people in the field of computer and information management. The development of this technology should be closely tracked and applied to the compilation of Chinese engine retrieval software.

4, participation in information collection, screening and organization of the quality of personnel will directly or indirectly affect the preparation of the thematic classification system quality. Therefore, all the departments or enterprises engaged in the network information service industry, especially the departments or enterprises that develop Chinese search engine, should strengthen the training of personnel, especially their ability in information classification organization, computer retrieval and so on. The organization and development of network information resources is a difficult and promising work, the vast number of librarians, information management and computer field experts should change the concept as soon as possible, and join the ranks of the development, so as to continuously improve the quality of the team.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.