HubbleDotNet + Mongodb build a high-performance search engine-Overview

Source: Internet
Author: User
Tags install mongodb mongodb connection string mongodb version

HubbleDotNet supports interconnection with Mongodb in official code since version 1.2.3. Mongodb is a no-SQL database developed by 10gen, and its read/write performance is much faster than that of traditional relational databases, in addition, it is very convenient for distributed deployment. HubbleDotNet also provides a no-SQL solution by supporting Mongodb. This article will focus on the features of Hubble + Mongodb and some performance tests and comparisons with hubble + SQL and mongoe.net.

Mongodb Installation

Before starting this article, we will briefly introduce how to install Mongodb in windows, and install Mongodb in other operating systems.

First find the corresponding version of the operating system in this link http://www.mongodb.org/downloads, download it and unzip it. We assume that the package is decompressed to C: \ mongodb.

Create a data DIRECTORY And a log directory under c: \ mongodb. The following is the directory structure:

Run cmd in run and run cd c: \ mongodb \ bin to enter the mongodb executable program directory.

Next, run mongod -- dbpath ../data -- logpath ../log/mongodb. log.

After completing the preceding steps, the mongodb service program starts.

Default port

To remotely access mongodb, we need to open the default port of mongodb in the firewall. below is the default port of mongodb-related services:

  • StandaloneMongod27017
  • Mongos27017
  • Shard server (Mongod -- shardsvr): 27018
  • Config server (Mongod -- configsvr): 27019
  • Web stats pageMongod: Add 1000 to port number (28017, by default)
Install as a service

The command line mentioned above is only used for debugging. To officially deploy it on the server, we need to install mongodb as a service for running.

The command line for installing the mongodb service in windows is as follows:

C: \ mongodb \ bin> mongod -- dbpath c: \ mongodb \ data -- logpath c: \ mongodb \ log \ mongodb. log -- logappend-install

Here we need to pay attention to the following two points:

  1. The path in the command line must be an absolute path. If you enter a relative path, you must modify the Service Startup path. Otherwise, the service cannot run.
  2. If the installation is in windows 7 or windows 2008 server, run cmd in administrator mode.
Uninstall Service

If you do not want mongodb, run the following command to uninstall the service:

C: \ mongodb \ bin> mongod -- remove

 

Connection string of the mongodb database adapter in Hubble

In HubbleDotNet, the default mongodb connection string is not used to connect to mongodb. Instead, the standard connection string is used to connect to mongodb.

Connect to mongodb without a user name or password

This is the connection string used to connect to the mongodb database adapter through hubbledotnet without setting the username and password.

Data Source = 127.0.0.1; Initial Catalog = News; Integrated Security = True

You only need to specify the Server IP address Data Soruce and database name Initial Catalog.

Connect to mongodb using the user name and password

This is the connection string used by hubbledotnet to connect to the mongodb database adapter when mongodb sets the user name and password.

Data Source = 127.0.0.1; Initial Catalog = News; User Id = myUsername; Password = myPassword;

 

Overview of HubbleDotNet + Mongodb Functions
  • Supports standard data types such as int, string, double, datetime, etc.
  • Supports full-text indexing and query of mongodb's specified string fields. Mongodb does not support full-text query. After hubbledotnet is configured, it supports full-text query of mongodb. The query syntax is the same as that of other database types. This is the core function of hubble + mongodb.
  • The primary database can be relational databases, such as SQL server and mongodb for image tables. This is a recommended method for data read/write splitting and distributed deployment.
  • Supports active mode indexing of mongodb as the primary database. This mode is useful for real-time indexing.
  • Supports passive mode indexes of mongodb as the primary database. This mode currently does not support hubble to synchronize indexes. You need to write your own code for manual synchronization. The reason is that mongodb does not support triggers.
  • Apsaradb for mongodb supports distributed data deployment.
  • Apsaradb for mongodb supports non-full-text query using SQL statements. For example, select top 10 * from table where price> 100 and price <200 order by price. This is a feature unique to hubbleDotNet. Hubbledotnet implements a syntax conversion from SQL to bson. Callers can access mongodb through standard SQL statements like SQL server, this provides great convenience for developers who do not like bson queries.
  • You can use bson statements to access or configure mongodb. Hubbledotnet provides two stored procedures: sp_excutesql and sp_querysql, which allow you to directly use bson to operate mongodb through hubble.
  • Support for incomplete documents. Mongodb is a document-based database. It does not force the fields of each record to be fixed as in a relational database. The fields of each record may be different. HubbleDotNet supports this design. For fields that do not appear in the record, it is processed as NULL. If the default value is specified, it is processed by the default value.
  • Sub-field support. Mongodb is document-type data and supports sub-fields. Hubble will support child fields in later versions.
Performance test environment:

Software Version

HubbleDotNet version 1.2.5.0

Mongodb version 2.0.5

SQL SERVER 2008

E.net 2.9.4

System Environment

Intel i5 2430 M 2.40 GHz 8 GB windows 7 64bit

7800 RPM HDD

Test Data

The test data is 20 million rows of Internet webpage data. The data file size is 4 GB.

Purpose:

Test the performance of hubble + sqlserver, Hubble + Mongodb, and mongoe.net in the high concurrency of a Single-host system.

Test method:

The test code is used to query 10 times per second, and 840 Common English words are searched. The first 10 titles and content are returned, sorted by matching degree.

Test Case 1:

In this test case, we cancel all the hubbledotnet caches, so that hubbledotnet reads the index from the hard disk each time, and paie.net is also set to read from the file. The first query is for all 840 English words. Restart the computer before each test to clear the file cache of the operating system.

The test results are as follows:

 

Queries per second

Maximum query time (MS)

Average query time (MS)

Minimum query time (MS)

Hubble + Mongodb

10

1573

431

3

Hubble + SQL SERVER

10

8997

931

4

E.net

10

209196

108665

9430

 

 

From this test, we can see that hubbledotnet + mongodb has the best performance under cold start, while e.net is almost 200 times slower than hubbledotnet + mongodb. The reason why e.net is so slow than hubbledotnet is that the access speed of e.net is slow, and the index size of e.net is more than 4 times larger than that of hubbledotnet. Lucene.net's index size is 3.6 GB, while hubbledotnet only has 800 MB. HubbleDotNet 1.2.5.0 optimizes the disk I/O for the first query, which is also the main reason why hubbledotnet has higher performance than e.net.

Test Case 2:

In this test case, we specify RamIndex as Full in hubbledotnet to load all indexes to the memory, and paie.net is also set as the memory index. This test mainly tests the performance of memory indexes.

The test results are as follows:

 

Queries per second

Maximum query time (MS)

Average query time (MS)

Minimum query time (MS)

Memory (MB)

Hubble + Mongodb

10

148

5.53

1

1,164

Hubble + SQL SERVER

10

157

6.17

0

1,170

E.net

10

230

3.58

0

3,611

 

From the test results, the average query time is faster than that of hubbledotnet, and the maximum query time is faster than that of mongoe.net. The reason why the average query time is faster than e.net is that mongoe.net runs a single process, and hubbledotnet interacts with three processes, that is, testing processes, hubble service process and mongodb process (or SQL server process ). When three processes interact, each query triggers a process switchover, which consumes a certain amount of system resources, especially when the query time is delicate, this performance loss will be more prominent. However, the query speed is sufficient for any environment. From this test, if the system load is reached, hubbledotnet may support about 200 queries per second, it is equivalent to querying 16 million times a day. This is very high for the single-host system. If the access volume of a website reaches this level, half of it needs to consider distributed solutions.

The maximum query time hubble is nearly twice faster than lucene, which basically reflects the performance of the query algorithm. This is basically consistent with my other test results (described in other articles), that is, sort by score. The hubble query speed is about twice that of e.net, and sort by other fields, about 5 times.

Memory usage: ipve.net occupies 1.1 GB of memory and hubbledotnet occupies GB of memory. This is mainly because hubbledotnet indexes are smaller than that of ipve.net. The memory usage of HubbleDotNet can also be optimized. In this example, it can be reduced to about 800 MB after optimization.

In terms of persistence, hubbledotnet's memory indexes can be automatically persisted, that is, changes in addition, deletion, and modification during the running process will be automatically stored in the file medium and updated to the memory, in this way, data will not be lost even if the machine is restarted. The memory index solution of explore.net cannot be automatically persistent. You need to write another program for persistence.

 

HubbleDotNet home: http://hubbledotnet.codeplex.com/

HubbleDotNet open-source full-text search database project-technical explanation

HubbleDotNet Weibo: http://weibo.com/hubbledotnet

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.