Go Hubbledotnet+mongodb Building high-performance search engines-Overview

Source: Internet
Author: User
Tags install mongodb mongodb version



Hubbledotnet starting from 1.2.3 version in the official code to support and MongoDB docking, MongoDB is a 10gen company developed No-sql database, its read and write performance than traditional relational database is much faster, and can be very convenient distributed deployment. Hubbledotnet also provides No-sql solutions by supporting MONGODB itself. This article will focus on the features of Hubble+mongodb and the comparison of some performance tests with Hubble+sql, lucene.net.


The installation of Mongodb


Before starting this article, let's briefly describe the installation of Mongodb under Windows, and the installation under other operating systems is similar.



First in this link http://www.mongodb.org/downloads find the corresponding operating system version, download down and unzip. Let's say we extract it to C:\mongodb.



We set up a data directory and a log directory under C:\mongodb. Here is the directory structure:






Then we run cmd in run and then execute CD c:\mongodb\bin into the MongoDB executable directory



Next execute Mongod--dbpath. /data--logpath. /log/mongodb.log



After completing the above steps, MongoDB's service program is started.





Default port


If you want to access MongoDB remotely, we need to open the default port of MongoDB in the firewall, the following is the default port for MongoDB related services:


    • Standalonemongod: 27017
    • MONGOs: 27017
    • Shard Server (mongod--shardsvr): 27018
    • Config server (mongod--configsvr): 27019
    • Web Stats page formongod: Add. Port number (28017, by default)
Install as a service


The command line above is only for debugging purposes, and if you want to formally deploy on the server, we need to install MongoDB as a service to run.



The command line to install the MongoDB service under Windows is as follows:



C:\mongodb\bin>mongod--dbpath c:\mongodb\data--logpath c:\mongodb\log\mongodb.log--logappend–install



Here we need to pay attention to the following two points:


    1. The path in the command line must be an absolute path, and if you enter a relative path, you need to modify the service's startup path or the service will not run.
    2. If you are installing under Windows 7 or the Windows Server operating system, you must run CMD in Administrator mode.
Uninstall Service


If we don't want MongoDB, we can uninstall the service with the following command line:



C:\mongodb\bin>mongod--remove





Connection string for MongoDB database adapter in Hubble


In Hubbledotnet, the MongoDB default connection string was not used to connect to MongoDB, but the standard connection string was used to connect


No user name and password to connect to MongoDB


This is MongoDB. The connection string that connects the MongoDB database adapter via hubbledotnet when the user name and password are not set



Data source=127.0.0.1;initial catalog=news;integrated Security=true



We just need to specify the server IP address data soruce and database name Initial Catalog.


Connect to MongoDB via username and password


This is the connection string that MongoDB connects to the MongoDB database adapter via hubbledotnet when setting user name and password



Data source=127.0.0.1;initial catalog=news; User Id=myusername; Password=mypassword;





Hubbledotnet+mongodb Features at a glance


  • Supports standard data types such as int, string, double,datetime, etc.
  • Supports full-text indexing and querying of MongoDB's specified string fields. MongoDB itself does not support full-text queries, and hubbledotnet can be configured to support full-text querying of MongoDB, with the same syntax as other database types. This is the core function of Hubble+mongodb.
  • Supports the main library as a relational database, such as SQL Server, mirroring the table with MongoDB. This is the recommended way to do the data read and write separation and distributed deployment.
  • Supports MongoDB as the active mode index of the main library, which is useful for real-time indexing
  • Supports MongoDB as the passive mode index of the main library. This mode is not currently supported by Hubble to synchronize the index, you need to write code manually synchronization. The reason is that MONGODB does not support triggers.
  • Support for distributed deployment of data through MongoDB
  • Support for non-full-text querying of MongoDB with SQL statements. such as select top * from table where price > + Price < $ ORDER BY price. This is a unique feature of hubbledotnet. Hubbledotnet implements a SQL-to-Bson syntax transformation that allows callers to access MONGODB through standard SQL statements like SQL Server, which provides a great convenience for developers who do not like Bson queries.
  • Support for accessing or configuring MONGODB through the Bson statement. Hubbledotnet provides two stored procedures sp_excutesql and Sp_querysql to allow users to manipulate MongoDB directly with Bson via Hubble
  • Support for incomplete documents. Mongodb is a document database, and it does not enforce that the fields of each record are fixed as a relational database, and that each record may have different fields. Hubbledotnet supports this design, which is handled as null for fields that do not appear in the record, and is handled by default if a default value is specified.
  • Support for the sub-field. MongoDB is a document-type data that supports sub-fields. Hubble will support child fields in subsequent releases.
Performance test test environment:


Software version



Hubbledotnet version 1.2.5.0



Mongodb version 2.0.5



SQL SERVER 2008



Lucene.Net 2.9.4



System environment



Intel i5 2430M 2.40GHz 8GB windows 7 64bit



7800 Turn Mechanical HDD


Test data


The test data is 20 million lines of Internet Web page data. The data file size is 4GB.


Test target:


Test the performance comparison of Hubble+sqlserver, Hubble+mongodb and lucene.net in high concurrency of stand-alone systems.


Test method:


Through the test code query 10 times per second, query 840 commonly used English words search, return to the first 10 title and content, sorted by matching degree


Test Case 1:


In this test case, we de-hubbledotnet all the caches so that Hubbledotnet reads the index from the hard disk each time, and Lucene.Net is also set to read from the file. A query of 840 English words is all for the first time. and restart the computer before each test to clear the operating system's file cache.



The test results are as follows:


Number of queries per second

Maximum query time (MS)

Average query time (MS)

Minimum query time (MS)

Hubble+mongodb

10

1573

431

3

Hubble+sql SERVER

10

8997

931

4

Lucene.Net

10

209196

108665

9430











From this test, the performance of Hubbledotnet+mongodb is the best, while the lucene.net is almost 200 times slower than Hubbledotnet+mongodb. The reason why lucene.net is so much slower than hubbledotnet is that lucene.net accesses the IO more slowly, and Lucene.Net's index size is 4 times times larger than Hubbledotnet. The index size of the lucene.net is 3.6GB and hubbledotnet is only 800MB. Hubbledotnet 1.2.5.0 has optimized the disk IO for the first query and is also the main reason why Hubbledotnet is higher than lucene.net performance.


Test Case 2:


In this test case, we specify Ramindex as full in hubbledotnet, the index is all loaded into memory, and Lucene.Net is also set to memory index. This test is primarily a test of the performance of memory indexes.



The test results are as follows:


Number of queries per second

Maximum query time (MS)

Average query time (MS)

Minimum query time (MS)

Memory (MB)

Hubble+mongodb

10

148

5.53

1

1,164

Hubble+sql SERVER

10

157

6.17

0

1,170

Lucene.Net

10

230

3.58

0

3,611








From this test results, the average query time lucene.net than hubbledotnet slightly faster, the maximum query time hubbledotnet faster than lucene.net. The average query time lucene.net fast reason, my analysis is this, lucene.net is a single process run, and hubbledotnet is 3 process interaction, that is, test process, Hubble service process and MongoDB process (or SQL Server process). Three process interaction, each query will trigger the process of switching, this process consumes a certain amount of system resources, especially when the query time is a subtle level, the performance of the loss will be more prominent. However, this query speed regardless of that environment is enough, from this test, if you want to reach the full load of the system, hubbledotnet probably can support to 200 queries per second, the equivalent of 16 million times a day, this for stand-alone system performance is very high, If a website has reached such a large number of accesses, half of it needs to consider a distributed solution.



The maximum query time of Hubble is nearly 1 time times faster than Lucene, which basically reflects the performance of the query algorithm. This and my other test results (another article) is basically consistent, that is, according to score sort, Hubble query speed is about twice times lucene.net, sorted by other fields, about 5 times times.



Memory consumption method, lucene.net occupy 3.6GB of Memory, Hubbledotnet occupy 1.1 GB of memory, this is mainly because Hubbledotnet index is smaller than lucene.net's sake. The memory footprint of the hubbledotnet can also be optimized, which should be reduced to around 800MB for this example.



On the persistence side, the memory index of hubbledotnet can be automatically persisted, that is, the changes of additions and deletions during the operation are automatically stored in the file media and updated into memory, so that no data will be lost even if the machine restarts. Lucene.Net's memory indexing scheme is not automatically persisted and requires a separate write program persistence.



Transferred from: http://www.cnblogs.com/eaglet/archive/2012/05/10/2494073.htm


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.