Building a cross-platform search engine with java+mysql+php

Source: Internet
Author: User
Tags html page unique id

This search engine is suitable to use in a medium-sized LAN, because the found Web page exists in the database, not only can cable static HTML page, can search PHP, ASP and other dynamic pages. For a 50,000-page system (using PII-400 as a server), the search response time of about 2-10 seconds, fully meet the requirements, because Java, MySQL, PHP are cross-platform software, so this search engine can not only work on the Windows Server, It can also work in other systems such as Linux.

First, the establishment of the search engine needs of the database and data tables.

First set up the database:

c:\mysql\bin\> mysqladmin-uroot-pmypasswd Create Spider

Then build the table structure in the database

c:\mysql\bin\> mysql-uroot-pmypasswd Spider Spider.mysql

Where Spider.mysql is a text file with the following contents:

CREATE TABLE link (
Id int(10) unsigned NOT NULL auto_increment,
Url varchar(120) NOT NULL,
Class tinyint(3) unsigned NOT NULL default 0 ,
IsSearchLink tinyint(3) unsigned default 0,
PRIMARY KEY (Url),
UNIQUE Id (Id),
KEY Url (Url),
KEY Class (Class)
);

# The initial home page address of this local area network, search spiders start searching all other pages from this URL

INSERT into link VALUES (' 1 ', ' HTTP://102.211.69.1/', ' 0 ', ' 0 ');

# datasheet webpagelocal used to store all of the downloaded pages

CREATE TABLE webpagelocal (
Id int(10) unsigned NOT NULL auto_increment,
Url varchar(120) NOT NULL,
Content text NOT NULL,
PRIMARY KEY (Url),
UNIQUE Id (Id),
KEY Url (Url)
);

# data Table Webpagefindfast

# Use makefast.php to extract 512 bytes of retrieval information from the table webpagelocal to store them

CREATE TABLE webpagefindfast (
Id int(10) unsigned NOT NULL,
Url varchar(120) NOT NULL,
Title varchar(64),
Content blob,
PRIMARY KEY (Url),
KEY Url (Url),
KEY Title (Title)
);

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.