Database index-Bitmap index

Source: Internet
Author: User

When studying hive, I encountered a bitmap index and checked the data. Now it is difficult to manage the following, is a basic knowledge of it.

A: Concept introduction: Baidu Encyclopedia:

Bitmap indexes are primarily created for columns of the same value (for example: category, operator, Department ID, storeroom ID, etc.), an index row in an index block stores key values and start and end rowID, and the location encoding of these key values, Each of the location codes represents the data row that corresponds to the key value. A block might point to the location of dozens of or even hundreds of rows of data.

Of course, Baidu Encyclopedia you know ... Let's take a look at the wiki.

Wikipedia:

Bitmap indexes traditionally been consideredlow-cardinality Columns, which has a modest number of distinct values, either absolutely, or relative to the number of records that contain the Data. The extreme case of cardinality was Boolean data (e.g. does a resident in a city with internet access?), which has a Values, True and False. Bitmap indexes Usebit Arrays(commonly called bitmaps) and answer queries by performingBitwise logical OperationsOn these bitmaps. Bitmap indexes has a significant space and performance advantage over other structures for query of such data. Their drawback is they be less efficient than the traditionalB-treeIndexes for columns whose data is frequently updated:consequently, they be more often employed in read-only systems is specialized for fast query-e.g., data warehouses, and generally unsuitable forOnline transaction ProcessingApplications.

Some researchers argue that bitmap indexes is also useful for moderate or even high-cardinality data (e.g., unique-valued Data) which is accessed in a read-only manner, and queries access multiple bitmap-indexed columns using the and, or or XO R operators extensively.

OK, the translation should be very simple.

Two: Example:

If, the introduction can not understand, to see the example bar:

1. Requirements Description:

There is a table named table,

Composed of three columns, namely, name, gender and marital status, of which only male and female sex, marital status from married, unmarried, divorced the three items, the table has a total of 100w records. Now there is a query: SELECT * FROM table where gender= ' man ' and marital= "unmarried"?

Name (name) gender (Gender) marital status (marital)

Zhang San Male married

Li Si female married

Harry Male Unmarried

Zhao Liu woman divorced

Sun seven women unmarried

...        ...         ...

1) Do not use the index

When you do not use an index, the database can scan all records only one row at a time, and then determine whether the record satisfies the query criteria.

2) B-Tree Index

For sex, the range of desirable values is only ' male ', ' female ', and both men and women may stand at 50% of the table's data, when adding a B-tree index or need to take out half of the data, it is completely unnecessary. Conversely, if a field has a wide range of values, with little repetition, such as a social security number, it is more appropriate to use a B-tree index at this time. In fact, when the fetched row data occupies most of the data in the table, even if the B-tree index is added, the database, such as Oracle, MySQL, will not use the B-tree index, and most likely, a row of full scans.

2. Bitmap indexing

If a user queries a column that has a very small cardinality, that is, only a few fixed values, such as gender, marital status, administrative districts, and so on. To have a smaller Lie Jian index for these cardinality values, you need to create a bitmap index.

For the gender column, the bitmap index forms two vectors, the male vector is 10100 ..., each of the vectors indicates whether the row is male, if it is bit 1, no 0, and the same, the female vector bit 01011.

RowId 1 2 3 4 5

Male 1 0 1 0 0

Female 0 1 0 1 1

For the Marital Status column, the bitmap index generates three vectors, married 11000 ..., unmarried for 00100 ..., divorce 00010 ...

RowId 1 2 3 4 5

Married 1 1 0 0 0

Unmarried 0 0 1 0 1

Divorced 0 0 0 1 0

When we use the query statement "SELECT * from table where gender= ' man ' and marital=" unmarried "?" The first time to remove the male vector 10100 ..., and then take out the unmarried vector 00100 ..., the two vector to do and operation, then generate a new vector 00100 ..., you can find that the third bit is 1, that the table's third row of data is the result we need to query.

RowId 1 2 3 4 5

Male 1 0 1 0 0

and unmarried 0 0 1 0 1

Results 0 0 1 0 0

3. Applicable conditions for bitmap indexing

As stated above, the bitmap index is suitable for columns with only a few fixed values, such as gender, marital status, administrative district, etc., and the identity card type is not suitable for bitmap indexing.

In addition, bitmap indexes are suitable for static data, not for columns that are frequently updated by indexes.

For example, there is a field busy, recording the busy of each machine or not, when the machine is busy, busy is 1, when the machine is not busy, busy is 0. This time someone would say to use a bitmap index because busy has only two values. OK, we use the bitmap index to index the busy field! Suppose user A updates the busy value of a machine with update, such as the Update table set Table.busy=1 where rowid=100?, but there is no commit, and User B updates the busy value of the other machine with update, Update table set Table.busy=1 where rowid=12? This time User B can not update, need to wait for user a commit. Cause: User A updated the busy value of a machine to 1, causing all the busy of the machine's bitmap vectors to change, so that the database locks all the rows of the Busy=1 and is unlocked only after a commit.

Data Source:

Http://www.360doc.com/content/14/0508/15/11965070_375805586.shtml

Https://en.wikipedia.org/wiki/Bitmap_index

Http://baike.baidu.com/link?url=V3bMusSRzP6u-1bBHYDX1WeYs2OAmE9v9KKwmVAwRt_Huk7Ky2QmqXgDbdN5oaZtZgh6rPk740r8r8QN6Jdm6_

Database index-Bitmap index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.