OLAP Design Concept of Oracle Analysis System

Last Update:2014-06-15 Source: Internet

Author: User

Tags ibm database database sharding

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Oracle was originally designed to be transaction-based, so it won't be authentic in terms of processing analyticdb. Recent developments in Oracle-related technologies have gradually adapted to OLAP needs.

1 partition technology 2 compression technology 3 index organization Table 4 blocks 5 parallel technology 6 memory result set.

The original Oracle design is based on a short and sophisticated large-scale concurrent transaction. with this, Oracle occupies most of the database market share. microsoft SQLSERVER occupies the small and medium-sized enterprise market with its comprehensive, friendly and easy-to-use type.

To design an Oracle 10 Gb analysis system, you cannot select the default installation method. Even the storage of machines cannot follow the OLTP thinking.

Based on the current RAC technology, it is designed for the high availability of OLTP. It is not suitable for the machine system where the Analysis System resides. Oracle does not have an IBM database sharding database and can be horizontally expanded without limit.

Select an analysis system server, which is basically a single machine with RAID 0 + RAID 5 + SSD large memory + multiple CPUs.

SSD is used to analyze the demand for real-time reports. RAID0 mainly stores data for one month, which is basically a requirement for frequent access to data reports. RAID5 or RAID10 are used to store historical data.

Large Memory is mainly used for group by operations, and multi-CPU users query in parallel.

The hierarchical design of the analysis system. The hierarchical design is mainly to gradually reduce the essence of data and provide a lot of flexibility.

1. on the Data Synchronization layer, set the user and mode DATA_synch to get data from each data source to the user mode. The space of the tablespace DATA_DAY, DATA_MON, DATA_HIS three time periods.

2. Data sharding and summarization. Set the user and mode data_split_sum to extract tables with certain fields from the source data, tables with extraction time granularity, or tables of some users.

For example, an active user table

3. Report Result layer: The data_result layer stores the desired data.

These three layers can be stored in the same database or on different machines.

Tablespace design: Data and index tablespaces are also divided into SSD, RAID0 RAID5 space: ssd_index, ssd_data, raid0_index, raid0_data, raid5_index, raid5_data

Pay attention to putting important tables, important operations involving tables, and tables with high real-time requirements into the SSD tablespace.

Data Synchronization is important to the SSD tablespace, and other unimportant data is stored in the RAID0 tablespace. data stored in the History tablespace for more than a month is stored.

All tables should be indexed into organizational tables because they are stored in sequence. SSD or important tables at the day and week levels can be used to organize tables in the original heap.

The RAID0 tablespace stores the data of the current month. Therefore, you can use non-compressed indexes to organize tables. The FREE block space is 0-10, which mainly depends on the data update cycle of the table, that is, the stable time. for example, after the data in the table is dragged from the outside, the data will be dragged to change the value of the previous day. this is the stable period. if the hard disk space is redundant, you can set another RAID0_DAY tablespace to store tables that take some time to stabilize. it is synchronized to the monthly tablespace only after it is stable. in this way, the FREE block can be set to 0.

The RAID5 space table is designed to be partitioned, dual-partition, compressed, and indexed. The FREE partition is 0. After all, this space mainly stores historical data for more than a month.

The block of all tables should be designed as 64KB-128KB.

Use parallel technology in development/* + paraller (4 )*/

Memory table: with _ as shares the same memory data

In addition to extracting some fields, splitting tables in a database also requires frequent addition of new users, active users, recharge users, purchase users, and many other tables associated with the statistical data at the result layer, resulting in slow speed.

These tables can be made into a family table or a column.

For example, user name, registration time, first recharge time, first purchase time, and first time.

In general, 1 data volume is small, 2 data blocks are read in a small amount, and the number of reads and the number of blocks are included.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More