Optimization of traditional data warehouses (for oracle+datastage)

Last Update:2015-04-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Common meanspartitions, hash-join, Data Warehouse functions, materialized views, bitmap indexes, etc. are common in the Data Warehouse technology,and the tips listed below are the most commonly used optimization tools/techniques in the project, the Green background highlight the part of the unconventional means, the use of the scene is more extreme, need careful consideration. Oracle Parallel Scenario:

Parallel parameters of Sql*loader
parallel processing of transaction failure rollback fast_start_parallel_rollback Parameters
EXPDP set parallelism parameter, set multiple DataPump files
High-volume processing + parallel processing (parallel), reduce the number of select times, logic clear, as much as possible once select......jion after the unified analysis function processing, Select/*+ PARALLEL (table_name, number of parallel) */statistic function sum avgCase and then else endOver (partition by order) analysis function lead/lag,rank,ratio_to_report,period-over-period comparisons etc. .. from table_name Group by
rollup, cube, etc. ..
Create indexes, rebuild, set parallel parameters (such as high-volume ETL full amount, DROP index, ETL and create)
Degree parameters for collecting statistics
There is also the Aleter session enable parallel DML;
Insert/*+ Append parallel (table_i, number of parallel) */
Into Table_i nologging
Select/*+ PARALLEL (A, parallel number) PARALLEL (B, number of parallel) PARALLEL (C, number of parallel) */
......

nologging are often useful when DML Insert, updateInsert, Update,delete scene
1, of course, the fastest is still the CREATE tableNew_tabAs select * from Old_tab2, delete when the amount of data is too large, you can consider the whole amount of export establishment:CREATE table New_tab nologging PARALLEL as SELECT * from Old_tab where condition <> data to delete, truncate original table, Rena Me rename the new table. update can also be the same, the idea of the update to write in select and thentruncate the original table, rename the new table. MERGE: ( same can be used in parallel, nologging ) reduce the number of scan tables, instead of the insert then UPDATE statementFor example: monthlycalculates the cost of the ledger account in the production library and puts it into the fact table of the data Warehouse, but a small number of reversal vouchers can affect operations in recent months. The original total Delete+insert, or the Insert new Data + Time range update operation, in the merge where time overrides the scope of the possible reversal can occur. Parameters:1.alter session set workarea_size_policy=manual;alter session set sort_area_size=107341824;alter session set sort_area_retained_size=107341824;alter session set DB_FILE_MULTIBLOCK_READ_COUNT=128;in parallel DML, parallel Select ETL job can add as above parameters, 10g version seems to repeat two times before it can take effect. db_file_multiblock_read_count with 32k, 16k of large block table space use, for the traditional SATA disk, FC disk effective, increase the return of a single IO. in the same vein, sometimes the inverse paradigm redundancy multidimensional with the fact table to a piece, form a long table,the performance of the db_file_multiblock_read_count+32k/16k block is also good, but the application scenario for the data is reduced, mostly for temporal topic analysis, data marts. 2.modifying Parameters _smm_auto_min_io_size, Smm_auto_max_io_sizeincrease the memory allocation size of each hash join, improve group by performance, with the large PGA. 3, Extreme environment (or test environment, or synchronization data of non-critical transition library), open the parameter alter system set commit_write= ' batch,nowait '; (features that 10gr2 start with) This allows the db to commit without waiting for the LOG buffer to write to redo LOGFILE, which returns the commit completion, need to assess the risk of a power outage when a disaster occurs, if a UPS can be considered open. Note: The extreme environment refers to the frequent commit log file sync waits to become the bottleneck point of time to consider, only consider! open parameters, in most cases the data warehouse does not have this problem.
more extreme, you can also increase the online redo file to 1~2g or even larger, close the archive, reduce the wait for log switching, this article needs to weigh the scene, do not use in the production environment. External Table

Cannot DML, cannot build indexes, does not support partitioning
Suitable for use only once, without modification, convenient load into the data, can be parallel query, you can Nested_loop JOIN, you can Hash_join
Scene with merge in external table

System-level temp table (no DML lock, no redo)Transaction ClassSession Level Direct Path Insert materialized view: Space Exchange Time Table Space MigrationPartitions that can transmit partitioned tables, which are transfers at the physical file level, differ from the SQL level and are of the highest performance for scenarios where cross-region sub-Libraries and sub-libraries are aggregated to the Central library. DataStage aspects of the treatment

Read end: Set enable partitioned reads, modulus mode partition read integer (Zeile)

Write End: Oracle Connect Select Bulk Load mode

drop all indexes, primary keys, and so on before the bulk load is written. After the end of the reconstruction. DataStage Host in the case of multi-CPU, it is recommended to set up multiple parallel node ETL job, easy to push the IO to the limit. (You can test, node's file is built on the Linux Tmpfs, that is,/dev/shm/to speed up, to avoid the ETL process of data through the DataStage host disk, increase the IO bottleneck point, need to test) Similarly, the temp table space for Oracle under a large memory host can be placed here, provided that the use of the temp table space has been smooth, the DBA can estimate the range of fluctuations used, and no longer set automatic growth. Rigorous testing is required!

Optimization of traditional data warehouses (for oracle+datastage)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Optimization of traditional data warehouses (for oracle+datastage)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Optimization of traditional data warehouses (for oracle+datastage)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support