Overview: The role of bitmap indexing in data warehousing-Oracle Data Warehouse-Cnoug ____oracle

Source: Internet
Author: User

About star schema  
  
In the construction of the Data Warehouse, the star pattern shown in the following illustration is almost the most commonly used. It is called star mode because the E-R graphic in the pattern is like a star (it feels strange to say). &NBSP
   
as shown in the figure, the center is a large fact table with some dimension tables around it. The fact table contains the primary information for the Data Warehouse, and each dimension table contains the specific attributes of that fact table. The  
  
Star Query is a connection between a fact table and some dimension tables, which use primary keys to connect to the foreign key of the fact table, and the dimension tables are not connected. &NBSP
  
about bitmap indexing  
 &NBSP
The proper implementation of bitmap indexes in the Data warehouse results in a significant gain in Data Warehouse query performance. Before the bitmap index appears, handling star queries is a way to use the traditional composite B*tree index. Unfortunately, queries such as B*tree index processing are inefficient. &NBSP
  
Consider the following query: Copy content to clipboard code: SELECT sales.time_id, Customers.cust_gender, SALES.AMOUNT&NBSP
from sales, customers 
WHERE sales.cust_id = customers.cust_id;

If the B*tree index is used, then all the dimension tables in the WHERE clause are connected to the Cartesian product and then connected using the composite b*tree and the primary key of the fact table. This is a poor performance.

The appearance of bitmap indexes has changed this situation. The rationale for bitmap indexing is to use bitmaps instead of column values in the index. There is usually a very low set of potential between the keys of the fact table and the dimension table (cardinality), with bitmap indexing, storage is more efficient and requires less storage than the B*tree index, so that more records can be read per read, and bitmap indexes compare to B*tree indexes , the connection and aggregation become the bit arithmetic operations, which greatly reduce the running time, thus getting a great performance improvement. In the example above, a bitmap index is established on the foreign key column of the fact table, and the query first accesses the fact table and then the connection to the dimension table. The results returned on the fact table will be reduced rather than Cartesian product operations on all possible dimension table attributes, so they are extremely effective.

Using Bitmap indexes

How to use bitmap indexes reasonably. Here are a few things to consider.

* If you want to use a bitmap index, the initialization parameter star_transformation_enabled should be set to
TRUE.
* The optimization model should be the CBO. For a data warehouse environment, you should always consider using the CBO (cost-based
OPTIMIZER).
* Bitmap indexes should be built on the foreign key columns of each fact table. (This is just a general rule.)

In addition, for the data table cardinality how to determine the objective is also a problem, 10,000 data contains only 3 values of the set and is low, then 100 million records containing 30,000 records is not low. In this case, we recommend a few lines of data simulation test, in general, in the Data Warehouse environment, bitmap index performance better than the B*tree index. Also note that bitmap indexes are not designed for OLTP databases and should not be used extensively in OLTP databases, especially for those with update operations

Take a closer look today, found that there are several errors in the text, if I do not say much, we teach  
: "If you use the B*tree index, then all the dimension tables in the WHERE clause will be connected to the Cartesian product before you use the composite b* Tree is connected to the primary key of the fact table. This is a poor performance. "&NBSP
does not make Cartesian, either the nested loop or the hash or the sort merge join does not cartesian the product." &NBSP
  
for the query in the text, this establishes an index that avoids the dynamic connection operation of the table. Rather than just improving speed. &NBSP
CREATE BITMAP INDEX sales_cust_gender_bjix 
   on Sales (Customers.cust_gender)  
   from sales, customers 
   where sales.cust_id = customers.cust_id; 
  &NBSP
* Bitmap indexes should be built on the foreign key columns of each fact table. (This is just a general rule.)  &NBSP
  
I'm in favor of that.  

As in the example above:
SELECT sales.time_id, Customers.cust_gender, Sales.amount
From sales, customers
WHERE sales.cust_id = customers.cust_id;
Bitmap indexes should be built on the foreign key columns of each fact table. (This is just a general rule.)

I feel a little confused:
Whether you want to establish an FK (foreign key) point dimension table (customers) on the Ville (for example, cust_id) on the fact (sales) table in Oracle to ensure data consistency;
The benefits of doing this are needless to say (consistency of data, automated Oracle CBO processing), but I'm afraid to do this in the actual DW project because:
1, the FK greatly reduced the load performance;
2, because the dimension table also needs to load, and the load is not necessarily successful, but if because the load of the dimension table is not successful, then put a part of the fact table data in the outside; this is wrong;


I do not know how you deal with the heroes.

My point of view:
1, the Data warehouse in the foreign key constraints are mainly for the database to provide metadata for the optimizer to use, to ensure that the work of data consistency should be completed in the ETL.
2, the data load is best to disable all constraints, the index removed, and then enable constraints and rebuild the index, so much faster (speed up the problem of data loading I recommended an article http://www.cnoug.org/viewthread.php?tid=26155).
3, I do not you understand, you said the dimension table load is not successful, fact table part of the data will also be loaded, it does not get the wrong statistics, or because there is no dimension data fact table data through the association will be filtered out. My approach is to send a warning to the administrator or reload the data (which may require human intervention) if the load is unsuccessful, depending on the situation.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.