Open-source: fully self-developed Search Engine 1.0 source code and instructions, full-text indexing of a single machine 4 million web pages, no more than 20 milliseconds for any retrieval of 50 words

Source: Internet
Author: User

Open-source: fully self-developed Search Engine 1.0Source codeAnd Description: Full-text index on the 4 million web page of a single machine. The retrieval of any 50 words cannot exceed 20 milliseconds

 

 

 

Search Engine Source 1.0Code, Related instructions, as follows:

1. bwsyq. Search. Demo
Search engine websites. Provides the retrieval service.
Currently, this function supports searching by keywords, including dynamic summarization, keyword highlighting, automatic paging, and custom entries displayed on each page (10 search results are displayed on each page by default)
Unimplemented functions: Display related search keyword entries, sort by relevance, snapshots, web page titles, collection time, web page url, and parallel cluster search
The snapshot, webpage title, collection time, and webpage URL can be implemented based on actual conditions.
Related search keyword bar display, sorting by relevance, and parallel cluster search will be provided in the next version
2. bwsyq. Search. Index
Full-text index.

3. the 3500 TXT files in the demodoc directory are used for index testing. Use the test in: D: \ Index \ demodoc or modify it in the Code according to the actual situation.

4. The. idx file in the index directory is the index file created in the test. Use the test in the: D: \ Index \ index directory, or modify the configuration file according to the actual situation.

5. Notes:
A. The current index storage path is D: \ Index \. You can modify the configuration file "app. config, Web. config" in the configuration file according to the actual situation.
<Keyinfo key = "searchidx" value = "D: \ Index \"/>
B. The storage path of the dictionary required for the index is D: \ Index \ app_data. You can modify the configuration file "app. config" in the configuration file according to the actual situation.
<Keyinfo key = "physicapath" value = "D: \ Index \"/> after modification, copy all the files in the app_data folder and the folder to the corresponding directory.
C. The word inventory storage path required for searching is the app_data directory of the website project and does not need to be modified.
D. bwsyqindex. dll is the core library. It is developed in C language and can be stored in the System32 directory, or in the bin or release directory where the project runs.

6. The current version is 2.0 and has the following performance indicators:
A. The single machine can index 4 million web pages,
B. General PC: AMD 2.0, 7200 RPM hard drive, 2 GB memory, and 1000 page (HTML PARSE) text indexed every 4 minutes
C. The retrieval operation of any 50-word search cannot exceed 20 milliseconds
D. the retrieval speed will not change because of the index quantity, and the indexing speed will not slow down because of the document quantity or document size.
E. Development Tool vs2008. The testing environment includes ipvs7 + CPU (AMD 2.0), 5400 to GB hard drive, and 2 GB memory.
Note: The current standalone index is limited to 4 million web pages. If it is exceeded, no index is executed.

7. the development cycle of this version is less than three weekends and three Sundays in my spare time. During this period, I still caught a cold for a week, so there are many imperfections. please correct me more.
I will make improvements as soon as possible based on your suggestions and provide new versions after improvement.

8. This software is completely self-created and has no reference or third-party code. You can safely use it for learning and commercial purposes.

When I was doing the test, I searched 30 million web pages on a single machine, and the search for any (50 words) was no more than 50 milliseconds. I was worried that I would brag about it, therefore, we will first open an index search code on the 4 million web page for you to observe and observe!

I hope all experts will correct your criticism and give more valuable comments. Thank you!

1: http://ishare.iask.sina.com.cn/f/17697171.html
2: http://www.everbox.com/f/xkdvEM3eyrwYXp9yG4d2b7aawZ
3: http://www.everbox.com/f/xO8NFbNyW5frP5SXSlaTzNdDIf
4: http://www.everbox.com/f/XcC7HMzMD7PXRKkNZKEVLfLcEK
5: http://www.everbox.com/f/WXFsqChwLTbLmX2P6gQuQy1z51
6: http://www.everbox.com/f/Bi9izWoxroLcn9mBs3N2S4PBeG
7: http://www.everbox.com/f/Tf3XzMdLVXDcX233qirYPTkMvr

Copyright: you are not allowed to reprint it without permission. Please contact me if you have any questions.Qq 1, 99923309Mail:Bwsyq@bwsyq.comMore details>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.