Iveely search engine after a month of hard testing, 0.3.0 finally met you. The topic of this version is:Real-Time Information Retrieval.
Project and source code http://iveelyse.codeplex.com maybe you're wondering if I'm referring to "real-time search"? What I want to answer is that this is a huge step towards real-time search. So what are new in 0.3.0? The crawler policy and index policy are changed in 0.3.0. In terms of crawlers, we abandoned the previous full traversal of the entire website. After traversing 3000 web pages in breadth, We will traverse 3000 different web pages in breadth next time and update the previously crawled web pages, this ensures that the latest data can be searched as soon as possible. In terms of indexing, we discard the previous data structure and adopt a new two-dimensional table to quickly locate index items. Of course, iveelyse. resource is added to ensure the existence of the resource file, because many netizens cannot find some files in the first two versions, so do not worry about this issue now. Of course, we also added the help tools project. For example, we can use it to view some iveelyse data information. Of course, it is not comprehensive. Iveelyse view the PageRank page after crawling the blog Garden:
Next, let's take a look at how iveelyse 0.3.01 minuteBuild your own search engine.
Step 1: Environment configuration (optional)
Iveelyse is developed in a Windows operating environment. net 4.0, please make sure that your computer or server is installed. net 4.0. If your computer or server is Linux, see mono installation.. NET environment. If the environment exists, ignore this configuration.
Step 2: locate the applicationProgramSet: iveelyse.run.task.exe
By default, iveelyse applications are located in the \ iveelyse \ iveelyse.programfile folder. We want to use iveelyse.run.task.exe to perform our tasks in the previous version.
In this document, we have implemented one-click operations. do not consider other files unimportant at the moment. Every file generated in iveelyse is meaningful. Do not delete any files. The significance of the main file is as follows:
File Name meaning
Strai. aimlIntelligently responds to knowledge files and supports Function Extension.
Stopword.txt Stop Word file
Iveelyse.index.exeIndex Processing Application
Iveelyse.run.task.exeTask driver Application
Iveelyse.segment.exeWord segmentation program
Iveelyse.spider.exeCrawler Application
Iveelyse.tools.exeData viewing tools
Iveelyse. AI. dllIntelligent Response Processing Program
Iveelyse. cache. dllCache Handler
Iveelyse. classify. dllClassification handler (temporarily retained)
Step 3: modify the configuration file: iveelyse. config
Iveelyse. config is the place where the configuration information is stored in iveelyse. Any configuration information is configured here. The meaning of each configuration item is as follows:
Configuration itemDescription
HighlightHighlight color
DelimiterDelimiter
TrainfileHidden Markov word segmentation model training file
Crawlertemp temporary directory for storing crawler data
DatadirOfficial data file storage directory
PagerankfileWebpage weight storage file
PagerankcontentWebpage URL associated file
PageranklistSet of webpage Weights
IndextempIndex temporary files
IndexdirIndex File
SystemupdateWhether the system is updated to identify whether new data is generated.
CurrentpageindexNumber of the current webpage record
CrawlerCrawler entry addresses, which can be separated by commas (,).
Step 4: Execute the task and run iveelyse.run.task.exe
Run the following command:
(Figure 1)
(Figure 2)
Step 5: perform a search and open the webpage http: // 127.0.0.1: 8088/query = yourkeyword
Next, you will see the information for searching at the moment. After a while, you will see the information in your search results, even the information that has just been posted on the homepage.
Based on the above information, you can find new information on the page after just four minutes.
Step 6: Customize the search page (optional)
You may have discovered that the iveelyse search client does not write any programs, and it is entirely from browser requests. At the browser end, you can write your own search interface.
How to write the search interface?
Based on HTTP: // 127.0.0.1: 8088/query = My keywords (First access method) Search rules, you can request this link every time on your page. Detailed. If you need to customize the complex miscellaneous page, you
You can also access server port 5001 through the TCP/IP protocol. If a keyword is input, the server returns the search text (the second access method ).
At this moment, a search engine of your own has been born, but don't worry. I'm not a good news, I still want to tell you the shortcomings of iveelyse at this stage.
1. iveelyse is not suitable for large-scale data processing. Although the bigdata project exists, I canceled the support for big data in 0.3.0 because big data processing needs to be distributed, at this stage, the distributed architecture does not bring some significance to the future of iveelyse. Of course, it will be very influential in the future. At this stage, we are more concerned with the characteristics of iveelyse.
2. The current version of iveelyse does not guarantee any performance. Although I know the current performance of iveelyse, the processing of 1 million URL websites can accommodate and the performance can meet certain requirements. However, for servers with low memory and weak CPUs, I cannot guarantee a certain degree of performance degradation. However, performance must be an important issue.
what are the main objectives of iveelyse 0.4.0? From 0.1.0 to 0.3.0, 0.4.0 focuses on Knowledge extraction. What is knowledge extraction? Suppose there is a saying on the webpage that "the chairman of Microsoft Asia Pacific R & D group is Zhang yaqin and he is admitted to the junior class of the Chinese Emy of science and technology at the age of 12 .", We hope that iveelyse can extract the knowledge information and allow users to search for "who is the chairman of Microsoft Asia Pacific R & D group ?" We hope to return "Zhang yaqin", "How old is Zhang yaqin attending the junior class of the Chinese Emy of Science and Technology ?" We expect to return "12 years old ". Maybe you think it is incredible, but this is exactly the positioning creed of iveelyse: What Do You Want To Know most? . If iveelyse is not firm in this direction, it will be meaningless.
Iveely is the abbreviation of I void everything and enjoy loving you, which expresses the love of a searcher for a search engine,All source code is open for knowledge sharing. If you have good ideas andSuggestion, you can send an email to me: liufanping@iveely.comOr WeiboIf you want to participate and contribute yourCode, Contact me.