Optimization of traditional data warehouses (for oracle+datastage)

Source: Internet
Author: User

Common meanspartitions, hash-join, Data Warehouse functions, materialized views, bitmap indexes, etc. are common in the Data Warehouse technology,and the tips listed below are the most commonly used optimization tools/techniques in the project, the Green background highlight the part of the unconventional means, the use of the scene is more extreme, need careful consideration.   Oracle Parallel Scenario:
  • Parallel parameters of Sql*loader
  • parallel processing of transaction failure rollback fast_start_parallel_rollback Parameters
  • EXPDP set parallelism parameter, set multiple DataPump files
  • High-volume processing + parallel processing (parallel), reduce the number of select times, logic clear, as much as possible once select......jion after the unified analysis function processing,  Select/*+ PARALLEL (table_name, number of parallel) */statistic function sum avgCase and then else endOver (partition by order) analysis function lead/lag,rank,ratio_to_report,period-over-period comparisons etc. .. from table_name Group by
    rollup, cube, etc. ..
  • Create indexes, rebuild, set parallel parameters (such as high-volume ETL full amount, DROP index, ETL and create)
  • Degree parameters for collecting statistics
  • There is also the Aleter session enable parallel DML;
    Insert/*+ Append parallel (table_i, number of parallel) */
    Into Table_i nologging
    Select/*+ PARALLEL (A, parallel number) PARALLEL (B, number of parallel) PARALLEL (C, number of parallel) */
    ......
nologging are often useful when DML   Insert, updateInsert, Update,delete scene
1, of course, the fastest is still the CREATE tableNew_tabAs select * from Old_tab2, delete when the amount of data is too large, you can consider the whole amount of export establishment:CREATE table New_tab nologging PARALLEL as SELECT * from Old_tab where condition <> data to delete, truncate original table, Rena Me rename the new table. update can also be the same, the idea of the update to write in select and thentruncate the original table, rename the new table.   MERGE: ( same can be used in parallel, nologging ) reduce the number of scan tables, instead of the insert then UPDATE statementFor example: monthlycalculates the cost of the ledger account in the production library and puts it into the fact table of the data Warehouse, but a small number of reversal vouchers can affect operations in recent months. The original total Delete+insert, or the Insert new Data + Time range update operation, in the merge where time overrides the scope of the possible reversal can occur. Parameters:1.alter session set workarea_size_policy=manual;alter session set sort_area_size=107341824;alter session set sort_area_retained_size=107341824;alter session set DB_FILE_MULTIBLOCK_READ_COUNT=128;in parallel DML, parallel Select ETL job can add as above parameters, 10g version seems to repeat two times before it can take effect. db_file_multiblock_read_count with 32k, 16k of large block table space use, for the traditional SATA disk, FC disk effective, increase the return of a single IO.  in the same vein, sometimes the inverse paradigm redundancy multidimensional with the fact table to a piece, form a long table,the performance of the db_file_multiblock_read_count+32k/16k block is also good, but the application scenario for the data is reduced, mostly for temporal topic analysis, data marts.  2.modifying Parameters _smm_auto_min_io_size, Smm_auto_max_io_sizeincrease the memory allocation size of each hash join, improve group by performance, with the large PGA.   3, Extreme environment (or test environment, or synchronization data of non-critical transition library), open the parameter alter system set commit_write= ' batch,nowait '; (features that 10gr2 start with) This allows the db to commit without waiting for the LOG buffer to write to redo LOGFILE, which returns the commit completion, need to assess the risk of a power outage when a disaster occurs, if a UPS can be considered open.   Note: The extreme environment refers to the frequent commit log file sync waits to become the bottleneck point of time to consider, only consider! open parameters, in most cases the data warehouse does not have this problem.
more extreme, you can also increase the online redo file to 1~2g or even larger, close the archive, reduce the wait for log switching, this article needs to weigh the scene, do not use in the production environment. External Table
    • Cannot DML, cannot build indexes, does not support partitioning
    • Suitable for use only once, without modification, convenient load into the data, can be parallel query, you can Nested_loop JOIN, you can Hash_join
    • Scene with merge in external table
System-level temp table (no DML lock, no redo)Transaction ClassSession Level Direct Path Insert   materialized view: Space Exchange Time   Table Space MigrationPartitions that can transmit partitioned tables, which are transfers at the physical file level, differ from the SQL level and are of the highest performance for scenarios where cross-region sub-Libraries and sub-libraries are aggregated to the Central library.    DataStage aspects of the treatment
Read end: Set enable partitioned reads, modulus mode partition read integer (Zeile)
Write End: Oracle Connect Select Bulk Load mode
drop all indexes, primary keys, and so on before the bulk load is written. After the end of the reconstruction. DataStage Host in the case of multi-CPU, it is recommended to set up multiple parallel node ETL job, easy to push the IO to the limit.  (You can test, node's file is built on the Linux Tmpfs, that is,/dev/shm/to speed up, to avoid the ETL process of data through the DataStage host disk, increase the IO bottleneck point, need to test) Similarly, the temp table space for Oracle under a large memory host can be placed here, provided that the use of the temp table space has been smooth, the DBA can estimate the range of fluctuations used, and no longer set automatic growth. Rigorous testing is required!

Optimization of traditional data warehouses (for oracle+datastage)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.