A friend made a website using Dreamweaver. There was no dynamic content, but some personal favorites.Article, Personal introduction, etc. Now there is a lot of content. I want to help him build a search engine. To be honest, this is not difficult, so I made one easily. Now I have seen some people in other forums who want to do this, so I want to talk about this knowledge and focus on understanding the methods.
WriteProgramFirst, let's think about a good idea. The following is my idea. Maybe someone has a better idea, but note that this is just a method problem: traverse all files and read the content to search for keywords, if they match, they are placed in an array to read the array. Before proceeding to these steps, I assume that your webpage is standard and has a title (<title> </title> ), there are also (<BOD *> </body>). If you use Dreamweaver or FrontPage, they exist unless you intentionally delete them. Next let's step by step and improve the search engine in the project.
1. Design search forms
Create a search.htm file under the root directory of the website. The content is as follows:
<HTML>
<Head>
<Title> Search form </title>
<Meta http-equiv = "Content-Type" content = "text/html; charset = gb2312">
</Head>
2. search programs
Create a search. php file under the root directory to process the data transmitted from the search.htm form. The content is as follows:
<? PHP
// Obtain the search keyword
$ Keyword = trim ($ _ post ["keyword"]);
// Check whether it is empty
If ($ keyword = ""){
Echo "the keyword you want to search for cannot be blank ";
Exit; // end the program
}
?>
In this way, if the keyword entered by the visitor is null, a prompt is displayed. The following shows how to traverse all objects.
We can use recursive methods to traverse all files. We can use opendir, readdir, or PHP Directory classes. We now use the former.
<? PHP
// Function for traversing all objects
Function listfiles ($ DIR ){
$ Handle = opendir ($ DIR );
While (false! ==( $ File = readdir ($ handle ))){
If ($ file! = "." & $ File! = ".."){
// Search for a directory
If (is_dir ("$ DIR/$ File ")){
Listfiles ("$ DIR/$ File ");
}
Else {
// Process it here
}
}
}
}
?>
In the red text, we can read and process the searched files. the following describes how to read the file content and check whether the content contains the keyword $ keyword. If yes, assign the file address to an array.
<? PHP
// $ DIR is the search directory, $ keyword is the search keyword, and $ array is the storage array
Function listfiles ($ Dir, $ keyword, & $ array ){
$ Handle = opendir ($ DIR );
While (false! ==( $ File = readdir ($ handle ))){
If ($ file! = "." & $ File! = ".."){
If (is_dir ("$ DIR/$ File ")){
Listfiles ("$ DIR/$ File", $ keyword, $ array );
}
Else {
// Read the file content
$ DATA = fread (fopen ("$ DIR/$ File", "R"), filesize ("$ DIR/$ File "));
// Do not search for itself
If ($ file! = "Search. php "){
// Match
If (eregi ("$ keyword", $ data )){
$ Array [] = "$ DIR/$ File ";
}
}
}
}
}
}
// Define an array $ Array
$ Array = array ();
// Execute the Function
Listfiles (".", "php", $ array );
// Print the search result
Foreach ($ array as $ value ){
Echo "$ value". "<br> \ n ";
}
?>
Now we can combine this result with a program at the beginning, enter a keyword, and then we will find that all the results on your website have been searched out. We are now improving it.
1. List content Headers
Set
If (eregi ("$ keyword", $ data )){
$ Array [] = "$ DIR/$ File ";
}
Change
If (eregi ("$ keyword", $ data )){
If (eregi ("<title> (. +) </title>", $ data, $ m )){
$ Title = $ M ["1"];
}
Else {
$ Title = "No title ";
}
$ Array [] = "$ DIR/$ File $ title ";
}
The principle is that if <title> XXX </title> is found in the file content, the XXX is taken out as the title, if the title cannot be found, the title is named "No title ".
2. Only search for the subject part of the webpage content.
There must be a lot of HTML Code In it, and these are not what we want to search, so we need to remove them. I use regular expressions and strip_tags to work with each other.
Set
$ DATA = fread (fopen ("$ DIR/$ File", "R"), filesize ("$ DIR/$ File "));
// Do not search for itself
If ($ file! = "Search. php "){
// Match
If (eregi ("$ keyword", $ data )){
Change
$ DATA = fread (fopen ("$ DIR/$ File", "R"), filesize ("$ DIR/$ File "));
If (eregi ("<body ([^>] +)> (. +) </body>", $ data, $ B )){
$ Body = strip_tags ($ B ["2"]);
}
Else {
$ Body = strip_tags ($ data );
}
If ($ file! = "Search. php "){
If (eregi ("$ keyword", $ body )){
3. Add a link to the title
Foreach ($ array as $ value ){
Echo "$ value". "<br> \ n ";
}
Change
Foreach ($ array as $ value ){
// Disassemble
List ($ filedir, $ title) = Split ("[]", $ value, "2 ");
// Output
Echo "<a href = $ filedir> $ value </a>". "<br> \ n ";
}
4. Prevent timeout
If there are many files, it is necessary to prevent PHP Execution time-out. You can add
Set_time_limit ("600 ");
The Unit is seconds, so the above is set to 10 minutes.
So the complete program is
<? PHP
Set_time_limit ("600 ");
// Obtain the search keyword
$ Keyword = trim ($ _ post ["keyword"]);
// Check whether it is empty
If ($ keyword = ""){
Echo "the keyword you want to search for cannot be blank ";
Exit; // end the program
}
Function listfiles ($ Dir, $ keyword, & $ array ){
$ Handle = opendir ($ DIR );
While (false! ==( $ File = readdir ($ handle ))){
If ($ file! = "." & $ File! = ".."){
If (is_dir ("$ DIR/$ File ")){
Listfiles ("$ DIR/$ File", $ keyword, $ array );
}
Else {
$ DATA = fread (fopen ("$ DIR/$ File", "R"), filesize ("$ DIR/$ File "));
If (eregi ("<body ([^>] +)> (. +) </body>", $ data, $ B )){
$ Body = strip_tags ($ B ["2"]);
}
Else {
$ Body = strip_tags ($ data );
}
If ($ file! = "Search. php "){
If (eregi ("$ keyword", $ body )){
If (eregi ("<title> (. +) </title>", $ data, $ m )){
$ Title = $ M ["1"];
}
Else {
$ Title = "No title ";
}
$ Array [] = "$ DIR/$ File $ title ";
}
}
}
}
}
}
$ Array = array ();
Listfiles (".", "$ keyword", $ array );
Foreach ($ array as $ value ){
// Disassemble
List ($ filedir, $ title) = Split ("[]", $ value, "2 ");
// Output
Echo "<a href = $ filedir target = _ blank> $ title </a>". "<br> \ n ";
}
?>
So far, you have prepared your own search engine. You can also modify the content processing part to improve it. You can search for titles or content. You can also consider paging. Leave these to yourself.
Here we will explain that replacing eregi with preg_match will be much faster. This is just to make it easy to understand, so we use the commonly used eregi.