first, what is Phpdig?
Phpdig is a foreign very popular vertical search engine products (rather than a product, rather than a traditional search engine), the use of PHP language, using the PHP program to run efficiently, greatly improve the speed of the search response, It can search the internet like Google or Baidu and other search engines, search content in addition to ordinary Web pages include TXT, doc, xls, PDF and other files, with powerful content search and file parsing functions. Phpdig, like the traditional search engine, contains the following three basic technologies:
1. Spider Technology
2. Web page Structured Information extraction technology or meta data acquisition technology
3. Word segmentation, Indexing technology
Unlike traditional search engines, Phpdig is the best choice for a more specialized, deeper-level, personalized search engine that uses it to create a vertical search engine for a particular field.
Second, how to obtain this phpdig?
Phpdig is a free product (need to retain copyright), the latest version is phpdig-1.8.9 in order to avoid Apache and MySQL version compatibility issues, it is recommended to adopt a lower-level version of the website address is: http://www.phpdig.net, Download address is: Http://www.phpdig.net/navigation.php?action=download explained, I tried the phpdig-1.8.9 version, but there are a lot of problems, instead of PHPdig-1.8.8 the problem is less.
Third, the specific steps
1. Get the Product
Access Http://www.phpdig.net/navigation.php?action=download download PHPdig-1.8.8 to the desktop, unzip to Apache server HTML directory, the general path is: D:\usr\www\ Html\, (if you do not have Apache server installed beforehand, recommended to use Mappm-server v1.1.9 final,mappm-server with a fool-type installation, once done, easy to debug and run php/cgi MySQL program).
2. Run and configure the Phpdig database
Open the browser input http://localhost/phpdig/press ENTER, the page list phpdig all files and include folder, find find found no default home file (Default,index), Error message when you click the search.php file: Unable to connect to database:check the connection script. Prompt unable to complete the database connection, we have not completed the Phpdig database configuration. Return to the Admin directory to find the install.php file, click Run, at first glance, the full English interface (explained, Phpdig currently all versions do not support the Chinese interface), no relationship, if you have a Chinese experience may wish to do their own Chinese, here to provide a copy of my own Chinese cn-language.php Download the document (please copy it to the locales directory). You also need to modify the config.php file (language modification) and style.css files (font modification and style modification) in the includes directory.
Enter install.php after the system requires us to enter the Phpdig admin username and password, by default is admin, enter after the following interface (after the Han):
(Fig. 1)
The information you need to provide is:
If you are testing locally, enter the server name by default localhost (localhost is the default service server name under Mappm-server, which is the default server name for MySQL, Mappm-server built-in MySQL database) The database server port defaults to 3126, can not be filled, the database sock protocol default is empty, the user name defaults to root (mappm-server default username), password is the user password you entered when installing Mappm-server, The Phpdig database name defaults to Phpdig and can be arbitrarily modified, and you can prefix the data tables in the database to null by default.
If you are uploading to a Web server connected to the Internet, ask your server provider for the name or IP address of the MySQL server and the database server port, sock protocol, user name, password, and so on, and the database name and data table prefix settings.
As for the four radio buttons on the right, you can select the default "Build database" for first use (install), depending on the situation.
Confirm that the above information is correct after clicking the Install button, if the connection database is unsuccessful will prompt "Unable to connect to the database" error message, if the database connection is successful, will jump directly into the administration page such as:
(Fig. 2)
3. Introduction to the interface area
Area 1 is a text input area, the default text has three lines, all start with HTTP, you can see here to enter the site of the spider site address (recommended every spider a site).
Area 2 is the spider option, the search depth refers to the site spider to a few levels of directory, the number of links per page refers to the maximum number of pages to crawl a page below the link. By default, it is 0, which refers to the entire station spider for this site.
Region 3 Displays database status information, including Web sites, keywords, indexes, and site information for spiders.
Zone 4 is a drop-down list box, Luo lists the web address of the spider's site, select one of the sites, in zone 5 can be cleared and updated operations.
Region 5 Not only provides cleanup and update operations for selected sites in zone 4, but also provides relevant statistics entry and control of the spider.
4. Running the spider for a specific site
If you are interested in the content of the celestial pole software channel, you can do a more professional search engine than Google for the content of celestial pole software, your search engine will be more comprehensive and deeper than Google. Below we take the spider celestial pole software channel content as an example to introduce how spider a website.
1) Enter http://soft.yesky.com in the area 1 of Figure 2, the search depth and the number of links per page will remain the default of 0
2) Click the Spider button, the page jumps to the Spider Information page, the program starts the content of the spider site http://soft.yesky.com automatically.
Note: The spider site is very slow, and if the site has too much content, the process may continue for a few hours to one day, but you don't have to worry about the script running out because the timeout time of the system is set to up to 48 hours. In this process, you can also interrupt the operation of the Spider program, and can restart the spider program to run the Web site of the spider. It is important to note that if you accidentally shut down the Spider run page during this process, the system does not stop the spider and is still consuming system resources. You can reopen the Spider page and click Stop Spider link to release system resources.
(Fig. 3)
5. Search with Phpdig
After a period of time, the results of the spider program is to crawl the information on the Http://soft.yesky.com Web site into the server database, mainly the content of the other party title information, keyword information and page address information, at this time, You will be able to search by visiting search.php.
(Fig. 4)
You can choose the number of bars that the search results display, you can choose whether to search for fuzzy or exact, and you can select a site for searching, by default searching all sites that have been spiders.
(Fig. 5)
is a search results page that searches for "QQ2006".
6. Problems that exist
Because of the Phpdig language setting problem, the word segmentation problem of the system and the character processing problem of MySQL database, there are still many uncertain factors in the search for Chinese words, and these things need to be solved and perfected further. Welcome to this interested friends to the Web Pottery bar-phpdig theme community to explore.
http://www.bkjia.com/PHPjc/317733.html www.bkjia.com true http://www.bkjia.com/PHPjc/317733.html techarticle first, what is Phpdig? Phpdig is a very popular vertical search engine product abroad (not so much a product as a search technique that distinguishes it from a traditional search engine), using the PHP language ...