Self-developed search engine, full-text index of 5 million web pages on a single machine, any search cannot exceed 20 ms

Source: Internet
Author: User

 

The source code of Search Engine 1.0 is described as follows:

1. gg3m. Search. Demo
Search engine websites. Provides the retrieval service.
Currently, this function supports searching by keywords, including dynamic summarization, keyword highlighting, automatic paging, and custom entries displayed on each page (10 search results are displayed on each page by default)
Unimplemented functions: Display related search keyword entries, sort by relevance, snapshots, web page titles, collection time, web page url, and parallel cluster search
The snapshot, webpage title, collection time, and webpage URL can be implemented based on actual conditions.
Related search keyword bar display, sorting by relevance, and parallel cluster search will be provided in the next version
2. gg3m. Search. Index
Full-text index.

3. the 3500 TXT files in the demodoc directory are used for index testing. Use the test in: E: \ Index \ demodoc or modify it in the Code according to the actual situation.

4. The. idx file in the index directory is the index file created in the test. Use the test in the: e: \ Index \ index directory, or modify the configuration file according to the actual situation.

5. Notes:
A. The current index storage path is E: \ Index \. You can modify the configuration file "app. config, Web. config" in the configuration file according to the actual situation.

B. The storage path of the dictionary required for the index is E: \ Index \ app_data. You can modify the configuration file "app. config" in the configuration file according to the actual situation.
After modification, copy all the files in the app_data folder to the corresponding directory.
C. The word inventory storage path required for searching is the app_data directory of the website project and does not need to be modified.
D. gg3mindex. dll is the core library, which is developed in C language and can be stored in the System32 directory, or the bin or release directory where the project runs.

6. The current version is 2.0 and has the following performance indicators:
A. The single machine can index 5 million web pages,
B. General PC: AMD 2.0, 7200 RPM hard drive, 2 GB memory, and 1000 page (HTML PARSE) text indexed every 4 minutes
C. The retrieval operation of any 50-word search cannot exceed 20 milliseconds
D. the retrieval speed will not change because of the index quantity, and the indexing speed will not slow down because of the document quantity or document size.
E. Development Tool vs2010. The testing environment includes ipvs7 + CPU (AMD 2.0), 5400 to GB hard drive, and 2 GB memory.
Note: The current standalone index is limited to 5 million web pages. If it is exceeded, no index is executed.

7. the development cycle of the current version is less than one month in my spare time, so there are many imperfections. please correct me more.
I will make improvements as soon as possible based on your suggestions and provide new versions after improvement.

Gg3m search engine help center http://www.gg3m.com/help/

8. This software is completely self-created and basically has no reference or third-party code. You can safely use it for learning and commercial purposes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.