Go Hubbledotnet+mongodb Building high-performance search engines-Overview

Last Update:2014-05-09 Source: Internet

Author: User

Tags install mongodb mongodb version

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hubbledotnet starting from 1.2.3 version in the official code to support and MongoDB docking, MongoDB is a 10gen company developed No-sql database, its read and write performance than traditional relational database is much faster, and can be very convenient distributed deployment. Hubbledotnet also provides No-sql solutions by supporting MONGODB itself. This article will focus on the features of Hubble+mongodb and the comparison of some performance tests with Hubble+sql, lucene.net.

The installation of Mongodb

Before starting this article, let's briefly describe the installation of Mongodb under Windows, and the installation under other operating systems is similar.

First in this link http://www.mongodb.org/downloads find the corresponding operating system version, download down and unzip. Let's say we extract it to C:\mongodb.

We set up a data directory and a log directory under C:\mongodb. Here is the directory structure:

Then we run cmd in run and then execute CD c:\mongodb\bin into the MongoDB executable directory

Next execute Mongod--dbpath. /data--logpath. /log/mongodb.log

After completing the above steps, MongoDB's service program is started.

Default port

If you want to access MongoDB remotely, we need to open the default port of MongoDB in the firewall, the following is the default port for MongoDB related services:

Standalonemongod: 27017
MONGOs: 27017
Shard Server (mongod--shardsvr): 27018
Config server (mongod--configsvr): 27019
Web Stats page formongod: Add. Port number (28017, by default)

Install as a service

The command line above is only for debugging purposes, and if you want to formally deploy on the server, we need to install MongoDB as a service to run.

The command line to install the MongoDB service under Windows is as follows:

C:\mongodb\bin>mongod--dbpath c:\mongodb\data--logpath c:\mongodb\log\mongodb.log--logappend–install

Here we need to pay attention to the following two points:

The path in the command line must be an absolute path, and if you enter a relative path, you need to modify the service's startup path or the service will not run.
If you are installing under Windows 7 or the Windows Server operating system, you must run CMD in Administrator mode.

Uninstall Service

If we don't want MongoDB, we can uninstall the service with the following command line:

C:\mongodb\bin>mongod--remove

Connection string for MongoDB database adapter in Hubble

In Hubbledotnet, the MongoDB default connection string was not used to connect to MongoDB, but the standard connection string was used to connect

No user name and password to connect to MongoDB

This is MongoDB. The connection string that connects the MongoDB database adapter via hubbledotnet when the user name and password are not set

Data source=127.0.0.1;initial catalog=news;integrated Security=true

We just need to specify the server IP address data soruce and database name Initial Catalog.

Connect to MongoDB via username and password

This is the connection string that MongoDB connects to the MongoDB database adapter via hubbledotnet when setting user name and password

Data source=127.0.0.1;initial catalog=news; User Id=myusername; Password=mypassword;

Hubbledotnet+mongodb Features at a glance

Supports standard data types such as int, string, double,datetime, etc.
Supports full-text indexing and querying of MongoDB's specified string fields. MongoDB itself does not support full-text queries, and hubbledotnet can be configured to support full-text querying of MongoDB, with the same syntax as other database types. This is the core function of Hubble+mongodb.
Supports the main library as a relational database, such as SQL Server, mirroring the table with MongoDB. This is the recommended way to do the data read and write separation and distributed deployment.
Supports MongoDB as the active mode index of the main library, which is useful for real-time indexing
Supports MongoDB as the passive mode index of the main library. This mode is not currently supported by Hubble to synchronize the index, you need to write code manually synchronization. The reason is that MONGODB does not support triggers.
Support for distributed deployment of data through MongoDB
Support for non-full-text querying of MongoDB with SQL statements. such as select top * from table where price > + Price < $ ORDER BY price. This is a unique feature of hubbledotnet. Hubbledotnet implements a SQL-to-Bson syntax transformation that allows callers to access MONGODB through standard SQL statements like SQL Server, which provides a great convenience for developers who do not like Bson queries.
Support for accessing or configuring MONGODB through the Bson statement. Hubbledotnet provides two stored procedures sp_excutesql and Sp_querysql to allow users to manipulate MongoDB directly with Bson via Hubble
Support for incomplete documents. Mongodb is a document database, and it does not enforce that the fields of each record are fixed as a relational database, and that each record may have different fields. Hubbledotnet supports this design, which is handled as null for fields that do not appear in the record, and is handled by default if a default value is specified.
Support for the sub-field. MongoDB is a document-type data that supports sub-fields. Hubble will support child fields in subsequent releases.

Performance test test environment:

Software version

Hubbledotnet version 1.2.5.0

Mongodb version 2.0.5

SQL SERVER 2008

Lucene.Net 2.9.4

System environment

Intel i5 2430M 2.40GHz 8GB windows 7 64bit

7800 Turn Mechanical HDD

Test data

The test data is 20 million lines of Internet Web page data. The data file size is 4GB.

Test target:

Test the performance comparison of Hubble+sqlserver, Hubble+mongodb and lucene.net in high concurrency of stand-alone systems.

Test method:

Through the test code query 10 times per second, query 840 commonly used English words search, return to the first 10 title and content, sorted by matching degree

Test Case 1:

In this test case, we de-hubbledotnet all the caches so that Hubbledotnet reads the index from the hard disk each time, and Lucene.Net is also set to read from the file. A query of 840 English words is all for the first time. and restart the computer before each test to clear the operating system's file cache.

The test results are as follows:

	Number of queries per second	Maximum query time (MS)	Average query time (MS)	Minimum query time (MS)
Hubble+mongodb	10	1573	431	3
Hubble+sql SERVER	10	8997	931	4
Lucene.Net	10	209196	108665	9430

From this test, the performance of Hubbledotnet+mongodb is the best, while the lucene.net is almost 200 times slower than Hubbledotnet+mongodb. The reason why lucene.net is so much slower than hubbledotnet is that lucene.net accesses the IO more slowly, and Lucene.Net's index size is 4 times times larger than Hubbledotnet. The index size of the lucene.net is 3.6GB and hubbledotnet is only 800MB. Hubbledotnet 1.2.5.0 has optimized the disk IO for the first query and is also the main reason why Hubbledotnet is higher than lucene.net performance.

Test Case 2:

In this test case, we specify Ramindex as full in hubbledotnet, the index is all loaded into memory, and Lucene.Net is also set to memory index. This test is primarily a test of the performance of memory indexes.

The test results are as follows:

	Number of queries per second	Maximum query time (MS)	Average query time (MS)	Minimum query time (MS)	Memory (MB)
Hubble+mongodb	10	148	5.53	1	1,164
Hubble+sql SERVER	10	157	6.17	0	1,170
Lucene.Net	10	230	3.58	0	3,611

From this test results, the average query time lucene.net than hubbledotnet slightly faster, the maximum query time hubbledotnet faster than lucene.net. The average query time lucene.net fast reason, my analysis is this, lucene.net is a single process run, and hubbledotnet is 3 process interaction, that is, test process, Hubble service process and MongoDB process (or SQL Server process). Three process interaction, each query will trigger the process of switching, this process consumes a certain amount of system resources, especially when the query time is a subtle level, the performance of the loss will be more prominent. However, this query speed regardless of that environment is enough, from this test, if you want to reach the full load of the system, hubbledotnet probably can support to 200 queries per second, the equivalent of 16 million times a day, this for stand-alone system performance is very high, If a website has reached such a large number of accesses, half of it needs to consider a distributed solution.

The maximum query time of Hubble is nearly 1 time times faster than Lucene, which basically reflects the performance of the query algorithm. This and my other test results (another article) is basically consistent, that is, according to score sort, Hubble query speed is about twice times lucene.net, sorted by other fields, about 5 times times.

Memory consumption method, lucene.net occupy 3.6GB of Memory, Hubbledotnet occupy 1.1 GB of memory, this is mainly because Hubbledotnet index is smaller than lucene.net's sake. The memory footprint of the hubbledotnet can also be optimized, which should be reduced to around 800MB for this example.

On the persistence side, the memory index of hubbledotnet can be automatically persisted, that is, the changes of additions and deletions during the operation are automatically stored in the file media and updated into memory, so that no data will be lost even if the machine restarts. Lucene.Net's memory indexing scheme is not automatically persisted and requires a separate write program persistence.

Transferred from: http://www.cnblogs.com/eaglet/archive/2012/05/10/2494073.htm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More