Application of Artificial Intelligence in retrieval of Search Engine Resources

Source: Internet
Author: User
Tags chop

 

Intemet has become the world's largest information resource library. In the face of such a huge information ocean, search engines have emerged to meet people's requirements for fast, accurate, and comprehensive information retrieval. Relying on the search engine, you can easily find the information you need on the network.

 

1 search engine Introduction

 

A search engine is a web crawler.ProgramTo obtain website webpage information and establish a database to provide a query system. There are two basic categories by working principle: Full text search engine and directory ). Full-text search engine databases rely on a software called Web Crawler to automatically obtain a large amount of web page information through various links on the network, and according to certain rules. Google and Baidu are typical full-text search engine systems. Classification directories are collected manually to form database [1], such as the classification directories of Sohu, Sina, and Netease in Yahoo China and in China. In addition, some navigation sites on the internet can also belong to the original category directory.

 

Full-text search engines and classification directories have their own characteristics [2]. Because full-text search engines rely on software, the database capacity is very large, but its query results are often not accurate enough. Classification directories rely on manual collection and sorting of websites, it can provide more accurate query results, but the collected content is very limited. To complement each other, many search engines now provide both types of queries. Generally, queries for full-text search engines are called searches for "All websites" or "All websites ", for example, Google's full-text search, the query of classification directories is called the search of "Classification directories" or "Classification websites", such as Sina search and Yahoo Chinese search.

 

Integration of these two types of search engines also produces other search services, mainly including metasearch Engine) and integrated search engine (all-in-one search page ). The working principle of the full-text search engine is shown in 1.

Figure 1 full-text search engine working principle

 

 

2. Intelligent proxy technology

 

From the perspective of application, intelligent proxy, also known as the agent, is a new achievement in AI research. It is the computing entity that can automatically execute user-delegated tasks based on user requirements without specific requirements. It is widely used, such as Email Filtering Proxy, Information Retrieval proxy, and desktop automatic proxy. This makes web sites and applications more intelligent and practical.

 

2. 1 Key Technologies of intelligent proxy

 

Similar to graphical user interfaces (guis), intelligent proxy has several key technologies. Although not every intelligent proxy must use all technologies, however, the more agents or applications that use these technologies, the more intelligent agents will be capable of intelligence and Agent capabilities ). These key technologies can generally be categorized into four categories: machine technology, content technology, access technology, and security technology.

 

2. 2. Intelligent proxy framework

 

Intelligent proxy is generally composed of a communicator, A inference engine, a transaction processor, a learning machine, and a knowledge base. It can interact with the outside world and users. Its structure is shown in figure 2.

 

 

Figure 2 architecture of intelligent proxy

 

 

The knowledge base is used to store the agent knowledge. You can increase the knowledge or acquire new knowledge by the agent. The inference engine uses the existing knowledge to control the communicator, transaction processing, and learning machine for reasoning; the communicator is responsible for communicating with the outside world. The things processor is a device that constantly processes things to achieve the goal to be completed. The learning machine continuously sums up experience through the agent operation process [3].

 

3. Heuristic SearchAlgorithm

 

Heuristic search algorithms can extract the most valuable information from a large amount of information for access, avoiding excessive irrelevant information. Therefore, they greatly improve the search accuracy [4]. The basic process of heuristic search is as follows:
(1) Given the Initial State S, a finite description of one State is generated.
(2) Use the Q (x) function to generate each subsequent State for S.
(3) check whether the generated status has a target status g. If yes, the search is successful.
(4) If the target status G does not appear, evaluate these nodes using the evaluate function f (x), select the most promising node, and continue to use Q (x) the child node that generates it. Repeat Step 3.
(5) If all possible nodes are extended using Q (x) and the H mark status g still does not appear, the search fails. The evaluation function f (x) is used to estimate the value of the evaluation function for each State in the Open table and re-sort the values based on their sizes.

 

Design Estimation functions should consider two aspects: the price paid and the price to be paid. Generally, the evaluation function f (x) is defined as the estimated cost of the minimum cost path for the initial node to reach the target node after N nodes. The general form is:
F (n) = g (n) + H (N) (1)
Where: G (n) is the actual cost from the initial point S to N, while H (n) is the estimated cost of the optimal path from N to the target node G. Because the time cost can be calculated based on the generated search tree, the estimated cost must use the relevant heuristic information to make some empirical estimates for the ungenerated search path, this estimation comes from the understanding of some features of the problem solution, and hopes to rely on these features to quickly find the solution to the problem. Therefore, H (n) mainly reflects the heuristic information of search.

 

3. 1 A3 algorithm applied in search

 

The A3 algorithm is used in artificial intelligence to search for solutions with limited State spaces. In retrieval of Search Engine resources, web pages cannot be exhausted, and thus all States cannot be determined, however, [5] can be implemented in a limited space. Therefore, you need to make some adjustments to the A3 Algorithm Application in the web crawler program, and the network cannot be exhausted. Therefore, you cannot determine whether the optimal search is achieved by reaching a final definite state. All search sequences that meet the values of the dependent evaluation functions can be considered as A3 algorithms.

 

In the experiment algorithm, place the link to be accessed into the Open table, and put the accessed link into the closed table. Sort the links in the Open table by their valuation function values and obtain these links in sequence. If the sublinks generated by the links are not in the closed table, they are placed in the Open table, in addition, the Open table is re-arranged by the value of the estimated function.

 

3. 2 internal process of Valuation Functions andCode

 

The evaluation function evaluates the link. Therefore, the input item is the link to be judged. The specific process is shown in step 3. The code corresponding to this process is as follows, written in ruby. Evaluate external links

 

If MD [O]! =Nil and ! MD [O]. Include? Key. to_s and ! Out . Include? MD [ 0  ] PMD [  0  ] Out_score = 0  Out [Outnum] = md [ 0  ]  If Te! = "" Or Te! = Nil  For I2in ()... word. Length  If Te. Include? Word [I2]. Chop out_score = Out_score + 10  End ifmd [  0 ]. Include? Word [I2]. Chop out_score = Out_score + 10  End endendoutscore [outnum] = Out_scoreoutnum = Outnum + Lend 

Figure 3 Internal flowchart of valuation function

 

 

4. Algorithm Application and result analysis

 

In the experiment, "Relative Rate of Return" is used to evaluate the performance of resource acquisition in search engines. The formula for relative rate of return is as follows:

(2)

Therefore, the final relative rate of return of this algorithm is 91%. If the width or depth is used for priority traversal, the relative rate of return is n = 0. 2338%, 4.

Figure 4 page access Ratio

 

 

 

Conclusion 5

 

In this article, the heuristic search algorithm is applied to the retrieval of Search Engine resources and has achieved satisfactory results, which is greatly improved compared with the traditional width-first and depth-first algorithms.

 

 

References:

 

[1] Shen hongfang. Internet search engine and function optimization [J]. Journal of model intelligence, 2000, 18 (1): 7-9.
[2] Zheng jiaheng, Song Wen. Automatic Classification of Chinese information [J]. Journal of Information Technology, 2002, 21 (5): 32-36.
[3] Jing Bo. Intrusion Detection Model Based on Intelligent Agent [D]. Taiyuan University of Technology, 2003.
[4] Zhang Li, Li Xing. New Algorithm for automatic classification of Chinese Web pages [J]. Journal of Tsinghua University (Natural Science Edition), 2000, 40 (1): 39-42.
[5] Ren ruijuan, Li Hongjian. Comparison of Chinese WWW search engines [J]. Journal of the university library, 1999, (5): 55-61.

 

 

Source: Xie Juan-wen, Qin shujuan, and Jiao aisheng (1 school of technical engineering, Lanzhou University of Technology, Lanzhou 730050, China; 2. guangdong Ocean University, Zhanjiang 524088, China; 3. department of Mechanical Engineering, Lanzhou Industrial College, Lanzhou 730050, Gansu, China)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.