Introduction to hubbledotnet and comparison with e.net

Source: Internet
Author: User

After a full year of development, hubbledotnet can be officially used. Yesterday I released the second test version 0.7.1.0, which is definitely better than the first test version. Hubble.net currently only provides some basic functions. Many advanced functions, such as grouping statistics, deduplication, and multi-table join queries, will be developed in subsequent versions.

 

System Overview

Hubbledotnet is an open-source, free full-text search Database Component Based on. NET Framework. The Open Source protocol is Apache 2.0. Hubbledotnet provides an SQL-based full-text retrieval interface. Users can quickly learn to use hubbledotnet to perform full-text retrieval by performing SQL operations. Hubbledotnet provides a series of full-text retrieval and data mining functions, including full-text indexing and query, multi-domain retrieval and sorting, grouping statistics, deduplication, classification, clustering, and multi-Table association query. Hubbledotnet provides an open database adapter interface that can be perfectly integrated with a variety of databases, and adds full-text retrieval and data mining functions to various database systems. Hubbledotnet has designed perfect concurrency control.ProgramData addition, deletion, modification, and query can be performed concurrently in multiple threads without any conflict. Hubbledotnet also implements cache and memory management design to help users maximize query efficiency. Hubbledotnet strives to become the most popular full-text retrieval component in the. NET development environment over the next few years.

0.7 feature list

1. Index

2. Query

3. Delete

4. Update

5. SQL-based sqlclient Interface

6. Index-level cache

7. query-level cache

8. Data-level cache

9. Multi-field sorting

10. Full-text and metadata combination query

11. Specify the keyword weight

12. Specify the field weight

13. Specify record weight

14. Automatic INDEX OPTIMIZATION

15. Manual INDEX OPTIMIZATION

16. Custom word Divider

17. Custom database Adapter

18. system stored procedures

19. query Analyzer

20. Create and Delete tables

21. Concurrency Control

Physical View

Hubbledotnet integrates full-text search and relational databases, and queries full-text and relational data in the database using SQL statements. The hubbledotnet component is responsible for reverse indexing of full-text data and storing the index in the directory specified by the table. The data is stored in the relational database associated with hubble.net. Hubbledotnet provides an idbadpter interface that allows you to adapt to a custom database. How to add a custom database adapter is described in the Stored Procedure Section. To create an inverted index, You need to perform word segmentation on the input full text. hubbledotnet provides you with an ianalyzer interface to complete the custom word segmentation. How to add a custom word divider is described in the Stored Procedure Section. Hubbledotnet exists as a system service after installation. Hubble.net provides a Hubble. sqlclient component to interact with the system service of hubbledotnet. The sqlclient interface and the sqlclient interface type in ado.net are described in the sqlclient section.

Logic View

Like relational databases, hubbledotnet has the concept of databases and database tables. The databases and data tables of hubble.net only provide a ing relationship with the corresponding relational database, and there is no entity between the database and the data table. When you use SQL statements to operate databases and data tables of hubble.net, hubble.net automatically associates with corresponding relational database entities. From the user's perspective, hubble.net is like a database entity.

Hubbledotnet is responsible for creating inverted indexes for text fields and single-value indexes for untokenized fields. Relational databases are responsible for creating B + tree indexes. If the query statement does not include a full-text field search, it is directly forwarded to the database for query using the database index.

Level 3 Cache

As shown in, hubbledotnet provides three levels of cache solutions.

Index cache: The index-level cache is used to cache inverted indexes and single-value indexes. This cache is automatically managed by the system and cannot be closed. The index-level cache automatically monitors the addition, deletion, and modification of data and makes corresponding modifications.

Query cache: Query-level cache caches query conditions. The hubble.net System Service caches the docid corresponding to different query conditions, you can directly obtain the qualified Document ID from the cache next time, and no longer access the low-level cache or index. Different from the index-level cache, when the table data changes, the query-level cache will become invalid and need to be cached again.

Data Cache: The data level cache runs on the client. The data obtained by the client is cached. The data will be directly obtained from the data cache during the next query, instead of getting data from the hubble.net system service. Like the query-level cache, when table data changes, the data-level cache will become invalid and need to be cached again.

Concurrency Control

Hubbledotnet has designed a perfect concurrency control mechanism. You can perform addition, deletion, modification, and query operations at the same time without any conflict.

Memory Management

Hubbledotnet exists as a system service and does not share memory with applications like Lucene. Hubbledotnet has designed a memory management mechanism. You can set the maximum memory usage. Once hubbledotnet uses more memory, hubbledotnet automatically starts the memory sorting program, remove some infrequently used caches from the memory to free up more memory space for users. You can use the sp_configure stored procedure to view and manage the memory.

Hubble.net Resources

Hubbledotnet project Homepage

Chinese discussion group

Hubbledotnet installation and Demo Video demonstration

Chinese User Manual

Comparison of hubbledotnet and javase.net Functions
Function E.net Hubble.net
Search by entry-termquery Supported Version 0.7 is supported
"And" search-booleanquery Supported Version 0.7 is supported
Search for rangequery in a certain range Supported Version 0.7 is supported
Search by prefix-prefixquery Supported Available in version 0.8
Search for multiple keywords-phrasequery Supported Version 0.7 is supported
Search for similar words-fuzzyquery Supported Using word segmentation, the engishanalyzer provided in 0.7.1.0 can complete similar functions.
Search by wildcard-wildcardquery Supported Available in version 0.8
Update Data-Update Need to be deleted before adding The update statement is called directly. If only non-full-text fields are updated without re-indexing, the speed is very fast.
Incremental Index Supported Version 0.7 is supported
Specify different word divider for different fields Not Supported Version 0.7 is supported
Group statistics-group Not Supported Available in version 0.9
Association with relational databases Not Supported Analyticdb 0.7 supports single-Table Association. Later versions support multi-Table Association.
Concurrency Control Read, write, and optimization cannot be performed at the same time. Read, write, and optimization can be performed at the same time. Version 0.7 is supported.
Memory Management Not Supported You can set the maximum memory usage threshold. When this threshold is reached, the infrequently accessed cache is automatically cleared. Version 0.7 supports
Re-indexing (data does not move, only full-text index Reconstruction) Not Supported Available in version 0.8
Multi-table join query Not Supported Subsequent version Development
Deduplicate-distinct Not Supported Subsequent version Development
Data mining functions such as classification and clustering Not Supported Subsequent version Development

 

Performance Comparison Between hubbledotnet and javase.net

 

Indexing speed

 

Hubble.net is slower than Lucene in terms of indexing speed. I analyzed it mainly because it is not slow indexing speed, but hubble.net needs to insert data into the relational database, which is costly, I believe that binding to a lightweight database, such as SQLite, will speed up a lot. However, I have not yet developed a database adapter based on SQLite, So I lack a test report in this regard.

 

Time of first query

Displays the time curve for retrieving a single keyword (including the time when the disk is read) for the first time when the data volume matches 0.1 million rows from 0.5 million rows to 10 thousand rows.

Average query time

 

 

 

 

Displays the average time curve for retrieving a single keyword (excluding the time when the disk is read) when the data volume matches 0.1 million rows from 0.5 million rows to 10 thousand rows.

The test data is the average time of 100 executions.

 

Query time sorted by metadata field (unkotenized index field)

 

Displays the average time curve for retrieving a single keyword and sorting by metadata fields (excluding the time when the disk is read) when the data volume matches 0.1 million rows from 0.5 million rows to 10 thousand rows.

The metadata field used for testing is a self-increasing Integer type field.

 

Query time of Range Query

Displays the performance comparison curves of lucene.net and hubbledotnet when range query is performed based on a self-increasing Integer type field.

 

The abscissa indicates that the query range is 0-999, 0-1999, 0-2999, and so on.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.