PHP Development Search engine Technology full analysis, PHP development search Engine _php Tutorial

Source: Internet
Author: User
Tags php template php website urlencode

PHP Development Search engine Technology full analysis, PHP development search engine


When it comes to web search engines, many people will think of Yahoo. Indeed, Yahoo has created an internet search era. However, Yahoo's current technology for searching the web is not the company's original development. In August 2000, Yahoo adopted the technology of Google, a company created by students at Stanford University. The reason is simple: Google's search engine is faster and more accurate than Yahoo's previously used technology to search for the information it needs.

Let us design, develop a strong, efficient search engine and database for a short period of time in terms of technology, money, etc. is not possible, but since Yahoo is using other people's technology, then we can also use other people's existing search engine site?

  Analysis of programming ideas

We can think of this: simulate a query, send a search engine site the appropriate format of the command, and then return the search results, the results of the HTML code analysis, stripping the extra characters and code, and finally in the format needed to display in our own website page.

So the point is that we have to select a search message that is accurate (so our search will be more meaningful), fast (because we analyze the search results and show the need for extra time), the search results are concise (easy for HTML source code analysis and stripping), As a result of a new generation of search engine Google's various good features, here we choose it as an example, to see how to use PHP in the background to Google search, foreground personalized display of the process.

Let's look at the composition of Google's query commands first. Enter the Google site, enter "ABCD" in the Query field, click the Query button, we can find the browser's address bar becomes: "http://www.google.com/search?q=abcd&btnG=Google%CB%D1%CB% F7&hl=zh-cn&lr= ", it can be seen that Google is through the form of the get way to pass query parameters and submit query commands. We can use the file () function in PHP to simulate this query process.

  Understanding the file () function

Syntax: Array file (string filename);

The return value is an array that reads all of the files into the arrays variable. The files here can be local or remote, and remote files must indicate the protocol being used. For example: Result=file ("Http://www.google.com/search?q=a ... mp;hl=zh-cn&lr="), this statement simulates the process of querying the word "ABCD" on Google, The search results are passed back to the array variable result with each behavior element. Because the file read here is remote, the protocol name "http://" cannot be missing.

If you want to allow the user to enter search characters for any search, we can make an input text box and a Submit button, and replace the searched character "ABCD" in the above with a variable:

Echo '

'; file://a form with no parameters, the default submission method is get, and commits to itself

Echo '; file://constructs a text input box

Echo '; FILE://construct a Submit Query button

Echo '

';

if (isset (keywords)) file://is committed, PHP generates a variable kwywords, which requires the following program to run after commit

{

UrlEncode (keywords); file://URL encoding of user input content

Result=file ("http://www.google.com/search?q=". Keywords. " &btng=google%cb%d1%cb%f7&hl=zh-cn&lr= ");

file://variable substitution of query statements, saving query results in array variable result

Result_string=join ("", result); file://merges the array $result into a string, and the elements of each array are glued together with a space

... file://further treatment

}

? ﹥

The above program has been able to query the user input content, and the returned results are synthesized a string variable $result_string. Note that to use the UrlEncode () function to URL-encode user input, you can correctly query the input characters, spaces, and other special characters, and do so as realistically as possible to simulate Google's query commands to ensure the correctness of the search results.

  An analysis of Google

For the sake of understanding, now assume that what we really need is: the title of the search result. URLs and profiles, etc., this is a concise and typical requirement. So all we have to do is remove the headers and footnotes from Google's search results, including a Google logo, a re-search input box, and a search result description, and strip the original HTML formatting tags from the rest of the search results and replace them with the format we want.

To do this, we must carefully analyze the HTML source of Google search results to find the rules. It is not hard to find that the body of search results in Google is always included in the source code of the first

Mark and Countdown second

Mark, and the second from the bottom

Tag immediately after the table character, and this combination "

All of the following procedures are followed sequentially in the "Further processing" section of the above procedure.

result_string = Strstr (result_string, "");

FILE://takes result_string from the first start of the string to remove the Google Head

Position= Strpos (result_string, "The location of the table symbol

result_string= substr (result_string,0, position);//Truncate the string before the first table symbol to remove the footnote

  Application and implementation

Now that we have the useful HTML source backbone, the remaining question is how to display the content autonomously. We analyze these search result entries, and find that each entry is also very regular between the separate, that is, a paragraph, according to this feature we use the explode () function to cut each entry:

Syntax: Explode (string separator, string string);

Returns an array that is saved in the array by the small string of separator cut.

So:

Result_array=explode ("", result_string); file://with a string "" to cut the result.

We get an array of Result_array, where each element is a search result entry. All we have to do is study each entry and its HTML display format code, and then replace it as required. The following loop is used to process each entry in the Result_array.

for (i=0; I {

... file://processing each entry

}

For each entry, it's easy to find some features: Each entry consists of a title, a summary, a description, a category, a URL, and each part is wrapped, which includes a tag, and then split again: (The following handler is placed in the loop above)

Every_item=explode ("", result_array[I]);

So we get an array of Every_item, where Every_item[0] is the header, Every_item[1] and every_item[2] Two behavior summaries, every_item[3] and every_item[4] and so on the head if it contains "Introduction:", "< font size=-1 color= #6f6f6f > Category:</font>" character, is an introduction or category (because some result entry does not have the item), if the header contains "< font Color=green > "is definitely the URL, this comparison to determine that we often use regular expressions (slightly), if you want to replace is also very convenient, such as the title of the $every_item[0], which itself is linked, we would like to modify the link property, let it open the link in a new window:

Echo Eregi_replace (' {

... file://handle each item in each entry except for the first item (the first item is the title, already shown)

... file://more format modifications

}

This modifies the link properties, and many of the other display formats can be modified, stripped, and replaced with regular replacement eregi_replace ().

So far we've got every item in each search, and can change the format of each item, and even put a nice table on it. However, a good program should be able to adapt to a variety of operating environment, and here is no exception, we actually just discussed the search results of HTML stripping a framework approach, really to do perfect, but also to consider a lot of content, such as to show how much search results, divided into how many pages, etc. You can even shaving Google-related "categories," "Profiles," and other code, so that customers simply do not see the original site. However, these content and requirements can be stripped by parsing HTML. Now everyone can do their own work, do a very rich personalized search engine.


What is the difference between a Web site developed with PHP and ASP?

Simple understanding is: ASP technology is simple and can achieve the majority of Web site construction needs, Engineering plastics technology threshold is low, relative to the technical cost is relatively low, is the current market on the following I open with you detailed analysis of my Views: 1, you want to understand what is PHP and ASP? Simple understanding is: ASP technology is simple and can achieve the vast majority of web site construction needs, Engineering plastics technology threshold is low, relative to the technical cost is relatively low, is the prevailing technology on the market. PHP Technical technical threshold is slightly higher, the technical cost is higher than ASP, but with the gradual deepening of PHP application, the current cost of PHP website construction has been reduced to acceptable level. PHP website Construction Technology in the Web site construction industry to replace the ASP technology is an inevitable trend. 2, running PHP script very fast, beyond the ASP, now large-scale web site is basically developed with PHP, such as: ICBC's website. 3, PHP host mostly support pseudo-static technology, and ASP host basically do not support this technology, search engine will not think is imitation station, garbage station, this is the site of Engineering Plastics promotion is very important. and Web sites are mostly ASP site, PHP development site is much less, search engine is also hate, so it is more conducive to the optimization of the promotion of the site, so that the site is convenient for enterprises to achieve network marketing, farewell to the traditional industry. 4, ASP technology is very mature, but also very common, so designers are very convenient design, natural cost is much lower, may be a simple site 1-2 days designers can get out. and PHP development site technology requirements are relatively high, natural difficulty is relatively large, so that the workload is relatively large, so the production of labor costs are relatively high. 5, the use of PHP technology will provide more high-quality website construction technology, will generally adopt DIV+CSS, page size can be minimized, the highest density of keywords.

What is the PHP engine program?

You're talking about a PHP search engine, or a template engine.

Search engines such as Phpdig
is a web crawler and search engine developed with PHP. Create a glossary by indexing both dynamic and static pages. When you search for a query, it displays the search results page that contains the keywords in a certain sort of collation. Phpdig contains a template system and is able to index Pdf,word,excel, and PowerPoint documents. Phpdig contains three of the most basic search engine technologies: Spider technology, Web page structured information extraction technology or meta data acquisition technology, Word segmentation/indexing technology. Unlike traditional search engines, Phpdig is the best choice for a more specialized, deeper-level, personalized search engine that uses it to create a vertical search engine for a particular field.

There are many such open-source and free PHP search engines: openwebspider, Risearch php
, Sphider, Snoopy, Sphinx, SEO Rank Checker, Phpcrawl,

There are also many template engines:
Smarty is a template engine written in PHP and is one of the most famous PHP template engines in the industry today. It separates the logic code from the external content, provides an easy-to-manage and useful way to separate the PHP code that was originally mixed with HTML code. To put it simply, the goal is to make the PHP programmer separate from the front-end staff, so that programmers change the logic content of the program does not affect the front-end staff of the page design, the front-end staff to re-modify the page does not affect the program logic, which in the multi-person cooperation project is particularly important.
Heyes Template Class
A very easy-to-use but powerful and fast template engine that helps you separate page layouts and designs from your code.
Fasttemplate
A simple variable interpolation template class that parses your template and separates the value of the variable from the HTML code.

Shellpage
An easy-to-use class that allows your entire site layout to be based on template files and modify the template to change the entire site.

STP Simple Template Parser
A simple, lightweight, and easy-to-use template analysis class. It can assemble a page from multiple templates and output the resulting page to a browser or file system.

OO Template Class
A template class for cashing that you can use in your own program.

Simpletemplate
A template engine that can create and structure Web sites. It can parse and compile templates.

Btemplate
A short but fast template class that allows you to detach the PHP logic code from the HTML-decorated code.

Savant
A powerful and lightweight pear compatible templating system. It is non-compiled and uses the PHP language itself as its template language.

Ets-easy Template System
Template systems that can use the exact same data to reorganize templates.

easytemplatephp
A simple but powerful templating system for your site.

Vlibtemplate
A fast, versatile templating system that contains a cache and Debug class.

Avantemplate
A multi-byte secure template engine that consumes very little system resources. It supports variable substitution, and content blocks can be set to show or hide.

http://www.bkjia.com/PHPjc/891595.html www.bkjia.com true http://www.bkjia.com/PHPjc/891595.html techarticle PHP Development Search engine Technology full analysis, PHP development search engine when it comes to web search engine, many people will think of Yahoo. Indeed, Yahoo has created an internet search era. Then ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.