Some ideas of compiling search engine in PHP

Source: Internet
Author: User
Tags urlencode

Editor's note: This is a wonderful programming teaching article, not only detailed analysis of the search engine principles, but also provides the author of the use of PHP to compile a search engine some ideas. The whole article in a simple way, I believe that whether the master or rookie, can get a lot of inspiration.

When it comes to web search engines, most people will think of Yahoo. Indeed, Yahoo has created an internet search era. However, Yahoo's current technology for searching the web is not the company it developed itself. In August 2000, Yahoo adopted the technology of Google (Www.Google.com), a venture company created by Stanford University students. The reason is very simple, Google's search engine than Yahoo previously used the technology to faster, more accurate search of the required information.

Let us design, develop a strong, efficient search engine and database may be in a short period of time in the technical, financial and other aspects is not possible, but since Yahoo is using other people's technology, then we can also use other people's existing search engine site?

Analysis of programming ideas

We can imagine: simulate a query, to a search engine site issued a corresponding format Search command, and then return the search results, the results of the HTML code analysis, stripped of the extra characters and code, and finally in the format required to display in our own website page.

This way, the point is that we have to select a search that is accurate (so that our search will be more meaningful), faster (because we analyze search results and show extra time), and search results are concise (easy for HTML source code analysis and stripping), As a result of a new generation of search engine Google's various fine features, here we choose it as an example, to see how PHP to achieve the background to Google (www.Google.com) search, personalized display in the foreground of the process.

Let's take a look at the composition of Google's query commands. Access to www. Google.com website, enter "ABCD" in the query bar, click the Query button, we can find the address bar of the browser becomes: http://www. google.com/search?q=abcd&btng=google%cb%d1%cb%f7&hl=zh-cn&lr=, visible, Google is through the form of get way to pass query parameters and submit query command. We can use the file () function in PHP to simulate this query process.

Understanding the file () function

Syntax: Array file (string filename);

The return value is an array, and all the files are read into the array variable. The files here can be local or remote, and the remote file must indicate the protocol used. For example: Result=file ("http://www.") Google.com/search?q=abcd&btng=google%cb%d1%cb%f7&hl=zh-cn&lr= "), the statement simulates the process of querying the word" ABCD "on Google, and returns the search results to the array variable result in each behavior element. The protocol name "http://" cannot be missing because the file read here is remote.

If you want the user to enter search characters for any search, we can make an input text box and submit button, and the search character "ABCD" above is replaced with the variable:

echo
; //没有参数的form,默认提交方式为get,提交到本身
echo ; //构造一个文本输入框
echo ; //构造一个提交查询按钮
echo
;
if (isset( $keywords)) //提交后PHP会生成变量 kwywords,即要求下面的程序在提交后运行
{
urlencode( $keywords); //对用户输入内容进行URL编码
result=file(http://www.Google.com/search?q=. $keywords.&btnG=Google%CB%D1%CB%F7&hl=zh-CN&lr=);
//对查询语句进行变量替换,将查询结果保存在数组变量 result中
$result_string=join( , result); //将数组$result合并成字符串,各数组元素之间用空格粘和
... //进一步处理
}
?>

Description: The above can be implemented using the file_get_contents () function, $result _string=file_get_contents ("http://...")

The above program has been able to query by user input and to synthesize the returned result into a string variable $result_string. Please note that to use the UrlEncode () function to URL-encode user input, you can normally query the input characters, spaces, and other special characters, so as to simulate Google's query command as realistically as possible to ensure the correctness of the search results.

The analysis of Google

For the sake of understanding, now let's assume that what we really need is the title of the search result. Web sites and profiles, etc., which is a concise and typical requirement. So all we have to do is remove the headers and footnotes from Google's search results, including a Google logo, a search-again input box, and a description of the search results, and remove the original HTML formatting tags from the remaining search results and replace them with the format we want.

To do this, we must carefully analyze the Google search results of the HTML source code, find the rules. It's not hard to find that the text in Google's search results is always included in the source's first

Mark and second in the penultimate

Between the tags, and the penultimate second

Tag immediately after the table character, and this combination "

All of the following procedures are followed by "further processing" in the above procedure.

result_string = Strstr (result_string,

); Take result_string from the first

Start the string to remove the Google header

Position= Strpos (result_string,

Position of table symbol

result_string= substr (result_string,0, position);//intercept the first

The string before the table symbol to remove the footnote

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.