Iveely search engine 0.3.0 released & how to build your own search engine

Source: Internet
Author: User

Iveely search engine after a month of hard testing, 0.3.0 finally met you. The topic of this version is:Real-Time Information Retrieval.

Project and source code http://iveelyse.codeplex.com maybe you're wondering if I'm referring to "real-time search"? What I want to answer is that this is a huge step towards real-time search. So what are new in 0.3.0? The crawler policy and index policy are changed in 0.3.0. In terms of crawlers, we abandoned the previous full traversal of the entire website. After traversing 3000 web pages in breadth, We will traverse 3000 different web pages in breadth next time and update the previously crawled web pages, this ensures that the latest data can be searched as soon as possible. In terms of indexing, we discard the previous data structure and adopt a new two-dimensional table to quickly locate index items. Of course, iveelyse. resource is added to ensure the existence of the resource file, because many netizens cannot find some files in the first two versions, so do not worry about this issue now. Of course, we also added the help tools project. For example, we can use it to view some iveelyse data information. Of course, it is not comprehensive. Iveelyse view the PageRank page after crawling the blog Garden:

 

Next, let's take a look at how iveelyse 0.3.01 minuteBuild your own search engine.

Step 1: Environment configuration (optional)

Iveelyse is developed in a Windows operating environment. net 4.0, please make sure that your computer or server is installed. net 4.0. If your computer or server is Linux, see mono installation.. NET environment. If the environment exists, ignore this configuration.

Step 2: locate the applicationProgramSet: iveelyse.run.task.exe

By default, iveelyse applications are located in the \ iveelyse \ iveelyse.programfile folder. We want to use iveelyse.run.task.exe to perform our tasks in the previous version.

In this document, we have implemented one-click operations. do not consider other files unimportant at the moment. Every file generated in iveelyse is meaningful. Do not delete any files. The significance of the main file is as follows:

File Name meaning

Strai. aimlIntelligently responds to knowledge files and supports Function Extension.

Stopword.txt Stop Word file

Iveelyse.index.exeIndex Processing Application

Iveelyse.run.task.exeTask driver Application

Iveelyse.segment.exeWord segmentation program

Iveelyse.spider.exeCrawler Application

Iveelyse.tools.exeData viewing tools

Iveelyse. AI. dllIntelligent Response Processing Program

Iveelyse. cache. dllCache Handler

Iveelyse. classify. dllClassification handler (temporarily retained)

Step 3: modify the configuration file: iveelyse. config

Iveelyse. config is the place where the configuration information is stored in iveelyse. Any configuration information is configured here. The meaning of each configuration item is as follows:

Configuration itemDescription

HighlightHighlight color

DelimiterDelimiter

TrainfileHidden Markov word segmentation model training file

Crawlertemp temporary directory for storing crawler data

DatadirOfficial data file storage directory

 PagerankfileWebpage weight storage file

PagerankcontentWebpage URL associated file

PageranklistSet of webpage Weights

IndextempIndex temporary files

IndexdirIndex File

SystemupdateWhether the system is updated to identify whether new data is generated.

CurrentpageindexNumber of the current webpage record

CrawlerCrawler entry addresses, which can be separated by commas (,).

Step 4: Execute the task and run iveelyse.run.task.exe

Run the following command:

(Figure 1)

 

(Figure 2)

Step 5: perform a search and open the webpage http: // 127.0.0.1: 8088/query = yourkeyword

Next, you will see the information for searching at the moment. After a while, you will see the information in your search results, even the information that has just been posted on the homepage.

 

Based on the above information, you can find new information on the page after just four minutes.

 Step 6: Customize the search page (optional)

You may have discovered that the iveelyse search client does not write any programs, and it is entirely from browser requests. At the browser end, you can write your own search interface.

How to write the search interface?

Based on HTTP: // 127.0.0.1: 8088/query = My keywords (First access method) Search rules, you can request this link every time on your page. Detailed. If you need to customize the complex miscellaneous page, you

You can also access server port 5001 through the TCP/IP protocol. If a keyword is input, the server returns the search text (the second access method ).

 

At this moment, a search engine of your own has been born, but don't worry. I'm not a good news, I still want to tell you the shortcomings of iveelyse at this stage.

1. iveelyse is not suitable for large-scale data processing. Although the bigdata project exists, I canceled the support for big data in 0.3.0 because big data processing needs to be distributed, at this stage, the distributed architecture does not bring some significance to the future of iveelyse. Of course, it will be very influential in the future. At this stage, we are more concerned with the characteristics of iveelyse.
2. The current version of iveelyse does not guarantee any performance. Although I know the current performance of iveelyse, the processing of 1 million URL websites can accommodate and the performance can meet certain requirements. However, for servers with low memory and weak CPUs, I cannot guarantee a certain degree of performance degradation. However, performance must be an important issue.
what are the main objectives of iveelyse 0.4.0? From 0.1.0 to 0.3.0, 0.4.0 focuses on Knowledge extraction. What is knowledge extraction? Suppose there is a saying on the webpage that "the chairman of Microsoft Asia Pacific R & D group is Zhang yaqin and he is admitted to the junior class of the Chinese Emy of science and technology at the age of 12 .", We hope that iveelyse can extract the knowledge information and allow users to search for "who is the chairman of Microsoft Asia Pacific R & D group ?" We hope to return "Zhang yaqin", "How old is Zhang yaqin attending the junior class of the Chinese Emy of Science and Technology ?" We expect to return "12 years old ". Maybe you think it is incredible, but this is exactly the positioning creed of iveelyse: What Do You Want To Know most? . If iveelyse is not firm in this direction, it will be meaningless.

 Iveely is the abbreviation of I void everything and enjoy loving you, which expresses the love of a searcher for a search engine,All source code is open for knowledge sharing. If you have good ideas andSuggestion, you can send an email to me: liufanping@iveely.comOr WeiboIf you want to participate and contribute yourCode, Contact me.

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.