After a full year of development, hubbledotnet can be officially used. Yesterday I released the second test version 0.7.1.0, which is definitely better than the first test version. Hubble.net currently only provides some basic functions. Many advanced functions, such as grouping statistics, deduplication, and multi-table join queries, will be developed in subsequent versions.
System Overview
Hubbledotnet is an open-source, free full-text search Database Component Based on. NET Framework. The Open Source protocol is Apache 2.0. Hubbledotnet provides an SQL-based full-text retrieval interface. Users can quickly learn to use hubbledotnet to perform full-text retrieval by performing SQL operations. Hubbledotnet provides a series of full-text retrieval and data mining functions, including full-text indexing and query, multi-domain retrieval and sorting, grouping statistics, deduplication, classification, clustering, and multi-Table association query. Hubbledotnet provides an open database adapter interface that can be perfectly integrated with a variety of databases, and adds full-text retrieval and data mining functions to various database systems. Hubbledotnet has designed perfect concurrency control.ProgramData addition, deletion, modification, and query can be performed concurrently in multiple threads without any conflict. Hubbledotnet also implements cache and memory management design to help users maximize query efficiency. Hubbledotnet strives to become the most popular full-text retrieval component in the. NET development environment over the next few years.
0.7 feature list
1. Index
2. Query
3. Delete
4. Update
5. SQL-based sqlclient Interface
6. Index-level cache
7. query-level cache
8. Data-level cache
9. Multi-field sorting
10. Full-text and metadata combination query
11. Specify the keyword weight
12. Specify the field weight
13. Specify record weight
14. Automatic INDEX OPTIMIZATION
15. Manual INDEX OPTIMIZATION
16. Custom word Divider
17. Custom database Adapter
18. system stored procedures
19. query Analyzer
20. Create and Delete tables
21. Concurrency Control
Physical View
Hubbledotnet integrates full-text search and relational databases, and queries full-text and relational data in the database using SQL statements. The hubbledotnet component is responsible for reverse indexing of full-text data and storing the index in the directory specified by the table. The data is stored in the relational database associated with hubble.net. Hubbledotnet provides an idbadpter interface that allows you to adapt to a custom database. How to add a custom database adapter is described in the Stored Procedure Section. To create an inverted index, You need to perform word segmentation on the input full text. hubbledotnet provides you with an ianalyzer interface to complete the custom word segmentation. How to add a custom word divider is described in the Stored Procedure Section. Hubbledotnet exists as a system service after installation. Hubble.net provides a Hubble. sqlclient component to interact with the system service of hubbledotnet. The sqlclient interface and the sqlclient interface type in ado.net are described in the sqlclient section.
Logic View
Like relational databases, hubbledotnet has the concept of databases and database tables. The databases and data tables of hubble.net only provide a ing relationship with the corresponding relational database, and there is no entity between the database and the data table. When you use SQL statements to operate databases and data tables of hubble.net, hubble.net automatically associates with corresponding relational database entities. From the user's perspective, hubble.net is like a database entity.
Hubbledotnet is responsible for creating inverted indexes for text fields and single-value indexes for untokenized fields. Relational databases are responsible for creating B + tree indexes. If the query statement does not include a full-text field search, it is directly forwarded to the database for query using the database index.
Level 3 Cache
As shown in, hubbledotnet provides three levels of cache solutions.
Index cache: The index-level cache is used to cache inverted indexes and single-value indexes. This cache is automatically managed by the system and cannot be closed. The index-level cache automatically monitors the addition, deletion, and modification of data and makes corresponding modifications.
Query cache: Query-level cache caches query conditions. The hubble.net System Service caches the docid corresponding to different query conditions, you can directly obtain the qualified Document ID from the cache next time, and no longer access the low-level cache or index. Different from the index-level cache, when the table data changes, the query-level cache will become invalid and need to be cached again.
Data Cache: The data level cache runs on the client. The data obtained by the client is cached. The data will be directly obtained from the data cache during the next query, instead of getting data from the hubble.net system service. Like the query-level cache, when table data changes, the data-level cache will become invalid and need to be cached again.
Concurrency Control
Hubbledotnet has designed a perfect concurrency control mechanism. You can perform addition, deletion, modification, and query operations at the same time without any conflict.
Memory Management
Hubbledotnet exists as a system service and does not share memory with applications like Lucene. Hubbledotnet has designed a memory management mechanism. You can set the maximum memory usage. Once hubbledotnet uses more memory, hubbledotnet automatically starts the memory sorting program, remove some infrequently used caches from the memory to free up more memory space for users. You can use the sp_configure stored procedure to view and manage the memory.
Hubble.net Resources
Hubbledotnet project Homepage
Chinese discussion group
Hubbledotnet installation and Demo Video demonstration
Chinese User Manual
Comparison of hubbledotnet and javase.net Functions
| Function |
E.net |
Hubble.net |
| Search by entry-termquery |
Supported |
Version 0.7 is supported |
| "And" search-booleanquery |
Supported |
Version 0.7 is supported |
| Search for rangequery in a certain range |
Supported |
Version 0.7 is supported |
| Search by prefix-prefixquery |
Supported |
Available in version 0.8 |
| Search for multiple keywords-phrasequery |
Supported |
Version 0.7 is supported |
| Search for similar words-fuzzyquery |
Supported |
Using word segmentation, the engishanalyzer provided in 0.7.1.0 can complete similar functions. |
| Search by wildcard-wildcardquery |
Supported |
Available in version 0.8 |
| Update Data-Update |
Need to be deleted before adding |
The update statement is called directly. If only non-full-text fields are updated without re-indexing, the speed is very fast. |
| Incremental Index |
Supported |
Version 0.7 is supported |
| Specify different word divider for different fields |
Not Supported |
Version 0.7 is supported |
| Group statistics-group |
Not Supported |
Available in version 0.9 |
| Association with relational databases |
Not Supported |
Analyticdb 0.7 supports single-Table Association. Later versions support multi-Table Association. |
| Concurrency Control |
Read, write, and optimization cannot be performed at the same time. |
Read, write, and optimization can be performed at the same time. Version 0.7 is supported. |
| Memory Management |
Not Supported |
You can set the maximum memory usage threshold. When this threshold is reached, the infrequently accessed cache is automatically cleared. Version 0.7 supports |
| Re-indexing (data does not move, only full-text index Reconstruction) |
Not Supported |
Available in version 0.8 |
| Multi-table join query |
Not Supported |
Subsequent version Development |
| Deduplicate-distinct |
Not Supported |
Subsequent version Development |
| Data mining functions such as classification and clustering |
Not Supported |
Subsequent version Development |
Performance Comparison Between hubbledotnet and javase.net
Indexing speed
Hubble.net is slower than Lucene in terms of indexing speed. I analyzed it mainly because it is not slow indexing speed, but hubble.net needs to insert data into the relational database, which is costly, I believe that binding to a lightweight database, such as SQLite, will speed up a lot. However, I have not yet developed a database adapter based on SQLite, So I lack a test report in this regard.
Time of first query
Displays the time curve for retrieving a single keyword (including the time when the disk is read) for the first time when the data volume matches 0.1 million rows from 0.5 million rows to 10 thousand rows.
Average query time
Displays the average time curve for retrieving a single keyword (excluding the time when the disk is read) when the data volume matches 0.1 million rows from 0.5 million rows to 10 thousand rows.
The test data is the average time of 100 executions.
Query time sorted by metadata field (unkotenized index field)
Displays the average time curve for retrieving a single keyword and sorting by metadata fields (excluding the time when the disk is read) when the data volume matches 0.1 million rows from 0.5 million rows to 10 thousand rows.
The metadata field used for testing is a self-increasing Integer type field.
Query time of Range Query
Displays the performance comparison curves of lucene.net and hubbledotnet when range query is performed based on a self-increasing Integer type field.
The abscissa indicates that the query range is 0-999, 0-1999, 0-2999, and so on.