Ccterran (original) Author: iwind & nbsp; friends use Dreamweaver as a website, there is no dynamic content, just some personal collections of articles, personal introductions and so on. Now there is a lot of content. I want to help him build a search engine. To be honest, this is not difficult, so I made one easily. Now I have seen some people in other forums who want to do this, so I want to talk about this knowledge and focus on understanding the methods. Before writing a program, you need to have a ccterran (original)
Author: iwind
A friend used Dreamweaver to build a website with no dynamic content. it was just a collection of articles, personal introductions, and so on. Now there is a lot of content. I want to help him build a search engine. To be honest, this is not difficult, so I made one easily. Now I have seen some people in other forums who want to do this, so I want to talk about this knowledge and focus on understanding the methods.
Before writing a program, think of a good idea. The following is my idea. maybe someone has a better idea, but note that this is just a method problem: traverse all the file shards and read the content shard to search for keywords, if it matches, it is placed into an array readable read array. Before proceeding to these steps, I assume that all your web pages are standard and have a title (), There are ( If you use dreamweaver or frontpage, they exist unless you intentionally delete them. Next let's step by step and improve the search engine in the project.
1. design search forms
Create a search.htm file under the root directory of the website. The content is as follows:
Search form
2. search programs
Create a search. php file under the root directory to process the data transmitted from the search.htm form. the content is as follows:
// Obtain the search keyword
$ Keyword = trim ($ _ POST ["keyword"]);
// Check whether it is empty
If ($ keyword = ""){
Echo "the keyword you want to search for cannot be blank ";
Exit; // end the program
}
?>
In this way, if the keyword entered by the visitor is null, a prompt is displayed. The following shows how to traverse all objects.
We can use recursive methods to traverse all files. we can use opendir, readdir, or PHP Directory classes. We now use the former.
// Function for traversing all objects
Function listFiles ($ dir ){
$ Handle = opendir ($ dir );
While (false! ==( $ File = readdir ($ handle ))){
If ($ file! = "." & $ File! = ".."){
// Search for a directory
If (is_dir ("$ dir/$ file ")){
ListFiles ("$ dir/$ file ");
}
Else {
// Process it here
}
}
}
}
?>
In the red text, we can read and process the searched files. the following describes how to read the file content and check whether the content contains the keyword $ keyword. If yes, assign the file address to an array.
// $ Dir is the search directory, $ keyword is the search keyword, and $ array is the storage array
Function listFiles ($ dir, $ keyword, & $ array ){
$ Handle = opendir ($ dir );
While (false! ==( $ File = readdir ($ handle ))){
If ($ file! = "." & $ File! = ".."){
If (is_dir ("$ dir/$ file ")){
ListFiles ("$ dir/$ file", $ keyword, $ array );
}
Else {
// Read the file content
$ Data = fread (fopen ("$ dir/$ file", "r"), filesize ("$ dir/$ file "));
// Do not search for itself
If ($ file! = "Search. php "){
// Match
If (eregi ("$ keyword", $ data )){
$ Array [] = "$ dir/$ file ";
}
}
}
}
}
}
// Define an array $ array
$ Array = array ();
// Execute the function
ListFiles (".", "php", $ array );
// Print the search result
Foreach ($ array as $ value ){
Echo "$ value "."
\ N ";
}
?>
Now we can combine this result with a program at the beginning, enter a keyword, and then we will find that all the results on your website have been searched out. We are now improving it.
1. list content headers
Set
If (eregi ("$ keyword", $ data )){
$ Array [] = "$ dir/$ file ";
}
Change
If (eregi ("$ keyword", $ data )){
If (eregi ("(. +)", $ Data, $ m )){
$ Title = $ m ["1"];
}
Else {
$ Title = "no title ";
}
$ Array [] = "$ dir/$ file $ title ";
}
The principle is that if you findXxxIn this case, xxx is taken out as the title. If no title is found, the title is named "No Title ".
2. only search for the subject part of the webpage content.
There must be a lot of html code in the web page, and these are not what we want to search, so we need to remove them. I use regular expressions and strip_tags to work with each other.
Set
$ Data = fread (fopen ("$ dir/$ file", "r"), filesize ("$ dir/$ file "));
// Do not search for itself
If ($ file! = "Search. php "){
// Match
If (eregi ("$ keyword", $ data )){
Change
$ Data = fread (fopen ("$ dir/$ file", "r"), filesize ("$ dir/$ file "));
If (eregi (" ] +)> (. +)", $ Data, $ B )){
$ Body = strip_tags ($ B ["2"]);
}
Else {
$ Body = strip_tags ($ data );
}
If ($ file! = "Search. php "){
If (eregi ("$ keyword", $ body )){
3. add a link to the title
Foreach ($ array as $ value ){
Echo "$ value "."
\ N ";
}
Change
Foreach ($ array as $ value ){
// Disassemble
List ($ filedir, $ title) = split ("[]", $ value, "2 ");
// Output
Echo "$ value "."
\ N ";
}
4. prevent timeout
If there are many files, it is necessary to prevent PHP execution time-out. You can add
Set_time_limit ("600 ");
The unit is seconds, so the above is set to 10 minutes.
So the complete program is
Set_time_limit ("600 ");
// Obtain the search keyword
$ Keyword = trim ($ _ POST ["keyword"]);
// Check whether it is empty
If ($ keyword = ""){
Echo "the keyword you want to search for cannot be blank ";
Exit; // end the program
}
Function listFiles ($ dir, $ keyword, & $ array ){
$ Handle = opendir ($ dir );
While (false! ==( $ File = readdir ($ handle ))){
If ($ file! = "." & $ File! = ".."){
If (is_dir ("$ dir/$ file ")){
ListFiles ("$ dir/$ file", $ keyword, $ array );
}
Else {
$ Data = fread (fopen ("$ dir/$ file", "r"), filesize ("$ dir/$ file "));
If (eregi (" ] +)> (. +)", $ Data, $ B )){
$ Body = strip_tags ($ B ["2"]);
}
Else {
$ Body = strip_tags ($ data );
}
If ($ file! = "Search. php "){
If (eregi ("$ keyword", $ body )){
If (eregi (" (. +)", $ Data, $ m )){
$ Title = $ m ["1"];
}
Else {
$ Title = "no title ";
}
$ Array [] = "$ dir/$ file $ title ";
}
}
}
}
}
}
$ Array = array ();
ListFiles (".", "$ keyword", $ array );
Foreach ($ array as $ value ){
// Disassemble
List ($ filedir, $ title) = split ("[]", $ value, "2 ");
// Output
Echo "$ title "."
\ N ";
}
?>
So far, you have prepared your own search engine. you can also modify the content processing part to improve it. you can search for titles or content. You can also consider paging. Leave these to yourself.
Here we will explain that replacing eregi with PReg_match will be much faster. This is just to make it easy to understand, so we use the commonly used eregi.