Internet information retrieval capability

Source: Internet
Author: User
Tags iweb web database

Internet Information Acquisition is an essential quality for graduate students. Timely and accurate acquisition of research progress information in the field is a prerequisite for innovation in the research work; Obtaining information in a wide range is an effective means to expand knowledge; it is an effective way to solve the problem by quickly locating the desired information in the vast amount of information.

With the rapid development of the Internet, more and more information is available, and even problems such as information overload and resource confusion [1]. The advent of search engines has solved the difficulty of obtaining Internet information to a certain extent. In recent years, there have been 400 or 500 articles in China discussing Internet information acquisition and the features and usage of various search engines from various application backgrounds and technical perspectives [2]. However, how to systematically improve graduate students' internet information acquisition capabilities (specifically, what indicators or scales should be grasped in Internet Information Acquisition and how to grasp them) it is also a new question worth exploring and summarizing. This article presents the breadth, purity, depth, and speed of Internet information acquisition.

Based on the actual needs of the postgraduate thesis, the five indexes of degree and flexibility are used as examples to discuss the methods and techniques for obtaining Internet information using Google ( search tools.
I. Expansion of Internet Information Acquisition breadth
In general, when a graduate student just entered a research field, his knowledge in the field was unfamiliar, or even blank. It is a convenient way to learn from the Internet. By expanding Internet Information Acquisition in a targeted manner, you can effectively expand the knowledge of the research field. Internet Information Retrieval breadth is defined as W = iacquired/iInternet. iacquired indicates the amount of information that has been obtained, iInternet indicates the amount of information that can be obtained from the Internet, and the amount of information is generally the number of webpages or documents. The expansion of the breadth is to increase the value of W. From the user point of view, it can only be achieved by increasing the iacquired value. iacquired is restricted by the search engine index volume and query completion rate, it is also limited by the user's search commands. You can expand the breadth of Internet Information Acquisition by selecting appropriate search engines and expanding necessary keywords.
Example 1: Select Google in English to expand the language range of available information. Google has nearly 4.3 billion web page indexes, covering more than 250 countries and supports 132 languages. It is currently the largest search engine. When Google is used for the first time, Google determines the language interface based on the current operating system (Simplified Chinese is generally used in mainland China ). For Chinese graduate students, the vast majority of the languages they are familiar with are Chinese and English. Even if they find information about other languages, they may not be able to understand it. If you change Google's language settings to English, you can find Italian, French, Spanish, German, and Portuguese files, google can provide its English translations (click the "translate this page" hyperlink next to the search results), which greatly expands the language range of available information. For example, for a robot's 765 million French and 163 million German webpages, the processing of these webpages is beyond the reach of Google in Chinese.
Instance 2. Expand keywords to extend the coverage of relevant information. Taking the knowledge of searching for "submarine Security" as an example, the keywords to be searched can be expanded: submarine, submarine, submarine, underwater, ocean, naval, safety, accident, marine damage, sinking, escape, rescue, rescue, lifeboats, Rescue Bell, rescue capsule, concealed, not sinking, underwater Sound, communication, sonar, torpedo, mine, attack, antisubmarine, underwater operation, hydrodynamic power, underwater robot, underwater vehicle, submarine, life saving, rescue, simulation, underwater, etc, then, the extended keywords are searched by appropriate combinations. For example, Google only uses "submarine Security" as a keyword to search for all websites. Only 43 300 webpages can be found, you can use "submarine Security" to search 300 web pages (of which are not included in the results of the "submarine Security" search ), you can use "submarine life saving" to search for 700 million webpages. During the search process, you can refer to synonyms, synonyms, antonyms, homophonic words, and incorrect characters (incorrect pinyin, five mistakes, and spelling mistakes) the keywords are gradually expanded in terms of special characters, wildcard characters, simplified characters, traditional Chinese characters, Chinese characters, foreign languages, and abbreviations.
The expansion of Internet Information Acquisition breadth ensures comprehensive and extensive information acquisition, but it often brings negative effects of information overload, which involves how to control the information retrieval purity.
Ii. Control the purity of Internet Information Retrieval
Definition of purity for obtaining Internet Information P = IValuable/IAcquired, where IValuable indicates the amount of available information that has been obtained, and P makes sense when W> 0. The increase of P value can only be achieved by decreasing IAcquired, which is in the opposite direction from W value. The precision of search engines is a prerequisite for improving the purity (Google uses PageRank patented technology to provide search results with extremely high accuracy). users' search commands are a direct means to control the purity. The basic approach to search information purification is to add keywords (logical and), remove keywords (logical non-) or

Phrase search, which is generally supported by search engines. Google also supports searching for a specific file type (filetype), website domain name (site), URL (inurl or allinurl), and Web page title (intitle or allintitle.
Instance 3. Use logical combinations to narrow the search range. Taking the search for "agent-based intelligent robot (intelligent robot) technology materials" as an example, table 1 shows the search results of various logical combinations of keywords in Google. From table 1, we can clearly see the process and effect of purity control.

Table 1 Use logical combination search
Search Method keyword expression the number of webpages searched
Word robot 430 000
Word logic and intelligent robot 526 000
Word logic and agent intelligent robot 109 000
Word logic and non-agent intelligent robot-internet 49 900
Phrase logic and non-agent "intelligent robot"-internet 850

Instance 4. Search within a specific range using a qualified word. Taking the "mit robot research literature" as an example, table 2 shows the search results in Google after the website domain name, URL, and file type are specified in sequence. We can see from the links to the searched web pages (limited space not provided) that the restricted word search is highly targeted and the search results are quite accurate.
Table 2 Use restricted word search
Search Method keyword expression the number of webpages searched
Word robot 430 000
Limited website domain name robot site: 12 800
URL-specific robot site: inurl: Publications 247
File Type Limited: robot site: inurl: Publications filetype: PDF 148

Iii. In-depth mining of Internet Information Acquisition
Before graduate students start, they need to read a large number of special documents. The vast majority of professional technical documents on the Internet are stored in various web databases. common search engines generally have nothing to do with these databases, you must use an on-line retrieval system dedicated to each web database to obtain the necessary information. Internet Information Retrieval depth is defined as D = iWeb-DB/ivaluable, where iWeb-DB is the amount of information retrieved from the Web database, and D is meaningful when ivaluable> 0, the increase in the D value can only be achieved by increasing the iWeb-DB value.
The existing web databases have different styles, but the retrieval methods of scientific literature databases are similar. Generally, you need to "Log on-search-download. The libraries of different colleges and universities will provide the Library's available literature database retrieval portal, account information and user guide, which will not be repeated here.
Iv. Improved Internet Information Acquisition speed
The speed at which Internet information is obtained is defined as, where tSearch is the search time taken to obtain available information, or the time when information is purified. Increasing the S value can be achieved by increasing the purity P value or decreasing the tSearch value. The tSearch value is affected by the response speed of the search engine, but it does not affect much. For example, Google has more than 15000 servers and more than 200 T3-level broadband connections. The search time generally does not exceed 0.2 seconds. Therefore, the tSearch value is mainly affected by the search method. If you can directly provide (rather than reverse review search) as complete and necessary keywords as possible as search clues and use the special features of the search engine, the target information is quickly located. This process is consistent with the improvement of the purity of information retrieval, and is the process of accelerating the increase of P value.
Instance 5. Use "good luck" to get information in one step. In the thesis work, graduate students often need to query the publications list of a laboratory and the electronic resources of a university library to learn the relevant research information or retrieve and download documents, but do not remember the website. You can use the complete name of the target website as a keyword and use the Google homepage's "good luck" function. Generally, you can directly open the target webpage without having to renew the website too much time.

It is found by mistake on the URL. For example, if you use the "Tsinghua University Library" as a keyword, click the "good luck" button to go to the homepage of the Tsinghua University Library: /.
5. Enhanced Internet Information Acquisition flexibility
Internet Information Acquisition flexibility (flexibility) is defined as F = (W + P + D + S)/4, which evaluates Internet Information Acquisition.

The four indicators discussed previously depend more on the functions supported by search engines or search tools. These indicators depend more on user experience and skills. To improve Internet information acquisition flexibility, you must be familiar with the extraction and combination of keywords, the distribution location and possible forms of target information on the Internet, and the usage and expertise of various search engines.
Instance 6. Use a live search engine to enhance the flexibility of information retrieval. When reading English documents, graduate students often encounter unfamiliar terms or abbreviations that cannot be translated. They can use search engines to assist in translation. To translate "these activities included mapping, soil and rock chip sampling, geophysical surveys and RC and diamond drilling. "RC" in the example, first in Kingsoft (, inghua (, Dictionary ( and other online dictionaries, did not find, consider using a search engine.

Table 3 shows the process of obtaining Internet translation information. Both diamond and reverse-circulation (RC) drilling ...", In step 2, we found "... Back-Loop Drilling. Reverse Circulation Drilling ...", Therefore, "RC" is the abbreviation of "reverse Loop Drilling.
Table 3 use Google to obtain translation information
Step search range keyword expression search result (item)
1 Kingsoft Mac RC 0
2 Yinghua golden RC 0
3 dictionary RC 0
4 Google Simplified Chinese webpage RC 103 000
5 all Google websites RC 14 900 000
6 All Google websites RC diamond drilling 12 100
7 Google Simplified Chinese webpage Reverse Circulation 417

1 Wang Jicheng et al. Research Progress of Web Information Retrieval. Computer Research and Development, 187 (2): 193-
2 Jiang Fulan. Search Engine usage skills. Scientific and technological Intelligence Development and economics, 178 (5): 179-

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.