Use Phpdig to build your own Google

Source: Internet
Author: User
Tags php language mysql version php file mysql database server port

One, what is Phpdig?

Phpdig is a very popular foreign vertical search engine products (rather than a product, rather than a traditional search engine), the use of PHP language writing, using the PHP program to run high efficiency, greatly improved the search response speed, It can search the internet like Google or Baidu, as well as other search engines, and search content includes txt, doc, XLS, PDFs, and other files in addition to ordinary pages, with powerful content search and file parsing capabilities. Phpdig, like the traditional search engine, contains the following three basic technologies:

1.Spider Technology

2. Web page Structured Information extraction technology or meta data acquisition technology

3. Word segmentation, Indexing technology

Different from the traditional search engine, Phpdig is suitable for the personalized search engine with stronger specialization and deeper level, it is the best choice to use it to build a vertical search engine for a certain field.

Second, how to obtain this phpdig?

Phpdig is a free product (need to retain copyright), the latest version is phpdig-1.8.9 in order to avoid Apache and MySQL version compatibility issues, the proposed lower level version, the website address is: http://www.phpdig.net, Download address is: Http://www.phpdig.net/navigation.php?action=download Explain, I tried the phpdig-1.8.9 version, but there are many problems, instead of PHPdig-1.8.8 the problem is less.

Iii. Specific steps

1. Access to Products

Access http://www.phpdig.net/navigation.php?action=download download PHPdig-1.8.8 to desktop, extract to the Apache server HTML directory, the general path is: d:usrwwwhtml , (If you do not install the Apache server, please install in advance, recommend the use of Mappm-server v1.1.9 final,mappm-server using a fool-type installation, a fix, easy to debug and run the Php/cgi+mysql program).

2. Run and configure the Phpdig database

Open the browser input http://localhost/phpdig/press ENTER, the page lists all Phpdig files and include folders, find a Find no default first page file (Default,index), Click search.php file for error: Unable to connect to database:check the connection script. Hint cannot complete the database connection, originally we have not completed the Phpdig database configuration. Go back to the admin directory to find install.php file, click Run, at first glance, the full English interface (explain, Phpdig currently all versions do not support the Chinese interface), there is no relationship, if you have the experience of Chinese may wish to do their own to the Chinese, Here is a download of my own cn-language.php document (please copy it to the locales directory). In addition, you will need to modify the config.php file (language modification) and style in the includes directory. CSS file (font modification and style modification), please go to the Web page Tao Bar phpdig theme community for help.

Enter install.php after the system requires us to enter Phpdig Admin username and password, by default, are admin, enter after the following interface (after the Han):

(Figure 1)

The information you need to provide is:

If you are testing locally, please enter the default server name localhost (localhost is the default server name under Mappm-server, which is MySQL's default name, mappm-server built-in MySQL database) Database server port defaults to 3126, you can not fill in, the Database sock protocol defaults to NULL, the user name defaults to root (mappm-server default username), the password is you install mappm-server when the user password entered, Phpdig database name defaults to Phpdig, can be arbitrarily modified, at the same time, you can prefix the data table in the database, the default is null.

If you are uploading to a Web server connected to the Internet, ask the server provider for the name or IP address of the MySQL server, as well as the database server port, sock protocol, username, password, and so on, the database name and the data table prefix settings above.

As for the four radio buttons on the right, you can choose the default "build database" for initial use (installation), depending on the situation.

Confirm that the above information is correct, click the Install button, if the connection database is not successful will prompt "Cannot connect to the database" error message, if the database connection succeeds will jump directly into the Administration page as follows:

(Figure 2)

3. Interface Area Introduction

Area 1 is a text input area, the default text has three lines, are beginning with HTTP, you can see it here enter the site address of the site to Spider (It is recommended to Spider only one site at a time).

Area 2 is the spider option, the search depth refers to the site spider to a few levels of directory, the number of links per page refers to a page to crawl up to how many of the following linked pages. By default is 0, refers to the site for the whole station spider.

Zone 3 Displays database status information, including already spider sites, keywords, indexes, and site information being spider.

Area 4 is a drop-down list box that lists the URLs of sites that have been spider, and selects one of them, which can be cleared and updated at Zone 5.

Area 5 provides not only the cleanup and update operations for selected sites in zone 4, but also the relevant statistical information entry and control of Spider.

4. Run Spider for specific sites

If you are interested in the content of the Pole software channel, you can do a more professional search engine than Google to search the content of pole software, your search engine will be more comprehensive and deeper than Google. Let's take the spider Pole Software channel as an example to explain how to spider a website.

1 Enter http://soft.yesky.com in the area 1 of Figure 2, the search depth and the number of links per page remain by default of 0

2 Click the Spider button, the page jumps to the Spider Information page, the program starts automatically spider site http://soft.yesky.com content.

Note: The process of spider web site is very slow, if the site content too much, this process may continue for several hours to one day, but you do not have to worry about the script run timeout, because the system timeout time is set to up to 48 hours. In this process, you can also interrupt the operation of the spider program and can restart the spider program to run the site that is not spider completed. Note that if you accidentally close the Spider run page in this process, the system does not stop spider and still consumes system resources. You can reopen the Spider page and click the Stop spider link to release system resources.

(Figure 3)

5. Using Phpdig to search

After a period of time, the results of the Spider program is to http://soft.yesky.com the information on the Web site to the server database, mainly the other side of the title information, keyword information and page address information, at this time, You can search by visiting search.php.

(Figure 4)

You can choose the number of lines to display the search results, you can choose the fuzzy Lookup or accurate search, in addition, you can choose to search for a site, by default, search has been spider all sites.

(Figure 5)

The above image is a search results page that searches for "QQ2006."

6. The problems that exist

Because of Phpdig's language setting problem, the system's word segmentation problem and the character processing problem of MySQL database, there are still many uncertain factors in Phpdig's search for Chinese vocabulary, these things need to be solved and perfected, Welcome to this interested friends to the Web page Pottery bar-phpdig theme community to explore.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.