Introduction to OLTP and OLAP
Data processing can be broadly divided into two broad categories: online transaction processing OLTP (on-line transaction processing), online analytical processing OLAP (On-line Analytical Processing). OLTP is the main application of the traditional relational database, mainly basic, daily transaction processing, such as bank transaction. OLAP is the main application of Data Warehouse system, supports complex analysis operations, focuses on decision support, and provides intuitive and understandable query results. OLTP system emphasizes the efficiency of database memory, emphasizes the command rate of various indexes of memory, emphasizes the binding variables and emphasizes concurrent operation;
OLAP Systems emphasize data analysis, emphasizing SQL execution market, emphasizing disk I/O, emphasizing partitioning, and so on.
comparison between OLTP and OLAP :
OLTP, also called online transaction processing (online Transaction processing), represents a highly transactional system, typically a highly available online system, with small transactions and small queries as the primary evaluation of their systems, It generally looks at the number of transaction and execute SQL executed per second. In such a system, a single database often processes more than hundreds of transaction per second, or thousands of, and the Select statement executes thousands of or even tens of thousands of per second. Typical OLTP systems include e-commerce systems, banks, securities, etc., such as the US ebay business database, which is a typical OLTP database.
The most likely bottleneck for OLTP systems is the CPU and disk subsystem.
(1) CPU bottlenecks often appear in the total amount of logical reading and computational function or is the process, the total amount of logical reading equals the logical read of a single statement multiplied by the number of executions, if a single statement execution speed is very fast, but the number of executions is very high, it can also lead to a large amount of logical reading. The method of design and optimization is to reduce the logical reading of individual statements, or to reduce their number of executions. In addition, some computational functions, such as custom functions, decode, and so on, can consume a lot of CPU time, cause the load of the system, correct design method or optimization method, avoid the calculation process as much as possible, such as saving the calculation result to the statistic table is a good method.
(2) disk subsystem in an OLTP environment, its load capacity generally depends on its IOPS processing power. Because in an OLTP environment, disk physical reads are generally DB file sequential read, which is a single block read, but this is a very frequent number of reads. Large performance issues can occur if the disk subsystem is frequently not able to host its iops.
OLTP is more commonly used in design and optimization of the cache technology and B-tree indexing technology, the cache determines that many statements do not need to obtain data from the disk subsystem, so the WEB cache and Oracle data buffer is important to the OLTP system. In addition, in terms of index use, the simpler the better, so the execution plan is also stable, and must use bound variables, reduce statement resolution, minimize Table association, minimize distributed transactions, the basic non-use of partition technology, MV Technology, parallel technology and bitmap indexing. Because of the high concurrency, batch updates are quickly submitted in batches to avoid blocking.
An OLTP system is a system in which data blocks change very frequently and SQL statements are submitted very frequently. For data blocks, the data block should be kept in memory as much as possible, and for SQL, use variable binding techniques to achieve SQL reuse, reduce physical I/O and duplicate SQL parsing, which greatly improves the performance of the database.
This affects performance in addition to binding variables, and may be hot-fast (warm block). When a block is read by multiple users at the same time, Oracle needs to use latch to serialize the user's actions in order to maintain data consistency. When a user obtains the latch, the other users can only wait, the more users get the block, the more waiting is more obvious. This is the problem of hot and fast. This heat can be either a block of data or a rollback end block. For data blocks, usually the data distribution of the database is not uniform, if it is the data block of the index, you can consider creating a reverse index to achieve the purpose of redistribution of data, for the rollback segment data block, you can add a few more rollback segments to avoid this contention.
OLAP, also called online analytical processing (online analytical processing) system, sometimes called DSS decision support system, is what we say the Data Warehouse. In such a system, the execution amount of the statement is not an assessment, because the execution time of a statement can be very long, and the data read is very much. Therefore, in such a system, the standard of assessment is often the disk subsystem throughput (bandwidth), such as the number of MB/s to reach the traffic.
Disk subsystem throughput is often dependent on the number of disks, this time, the cache basically has no effect, the database read-write type is basically db file scattered read and direct path read/write. Should try to use more than the number of disks and relatively large bandwidth, such as 4Gb fiber interface.
in OLAP system, partitioning technology and parallel technology are often used.
The importance of partitioning technology in OLAP system is mainly embodied in the database management, such as database loading, can be implemented by partition Exchange, backup can be implemented by backup partition table space, delete data can be deleted by partition, as for the performance impact of partition, It can make scanning of some large tables very fast (scan only a single partition). In addition, if the partition is combined in parallel, it can also make the whole table scan become very fast. In short, the main function of the partition is the convenience of management, it does not guarantee the improvement of query performance, sometimes the partition will bring performance improvement, sometimes reduce.
Parallel technology In addition to the partitioning technology, in the Oracle 10g, in conjunction with RAC to achieve multi-node simultaneous scanning, the effect is very good, can be a task, such as select Full table scan, the average allocation to multiple RAC nodes up.
In an OLAP system, binding (BIND) variables are not required because the execution of the entire system is small, the analysis time can be ignored for execution time, and the wrong execution plan can be avoided. However, OLAP can use a lot of bitmap index, materialized view, for large transactions, as far as possible to seek speed optimization, there is no need to like OLTP requirements for fast submission, or even deliberately slow down the speed of execution.
The real purpose of a bound variable is in an OLTP system, which typically has the characteristics of a large number of users, a very dense user request, and most of the SQL for these requests can be reused.
For OLAP systems, most of the time the database is running a report job, the execution is basically an aggregate class of SQL operations, such as group by, it is appropriate to set the optimizer mode to All_rows. For some of the Web site class database with more paging operations, setting it to first_rows would be better. But sometimes for OLAP systems, where we have pagination, we can consider using hint in every SQL. Such as:
Select a.* from Table A;
separate design and optimization
Pay special attention to the design, such as in a highly available OLTP environment, do not blindly put the technology of OLAP to use.
such as partitioning technology, assuming that the partitioning keyword is not used in a wide range, and that other fields are used as the where condition, then, if the local index, you will have to scan multiple indexes, and performance becomes much lower. If it is a global index, it loses the meaning of the partition.
Parallel technology is the same, generally in the completion of large-scale tasks to use, such as in real life, translation of a book, you can first arrange for more than one person, each person to translate different chapters, so that can improve the speed of translation. If you only translate a page of books, but also to assign different people to translate different lines, and then combine, there is no need, because in the time allotted work, a person may have been translated.
The same is true of bitmap indexes, which can easily cause blocking and deadlocks if used in an OLTP environment. However, in an OLAP environment, it is possible to improve OLAP query speed because of its unique characteristics. MV is also basically the same, including triggers, etc., on DML frequent OLTP systems, it is easy to become a bottleneck, or even the library cache waiting, and in the OLAP environment, it may be appropriate to improve the query speed.
For OLAP systems, there is little room to optimize memory, increasing CPU processing speed and disk I/O speed is the most straightforward way to improve database performance, which also means increased system costs.
For example, we want to aggregate hundreds of millions of or billions of data, this massive amount of data, all in-memory operation is difficult, but also not necessary, because this data is seldom reused, cache it is not meaningful, but also cause physical I/o quite large. So the bottleneck of this system is often the disk I/O.
For OLAP systems, the optimization of SQL is very important, because it has a large amount of data, and the performance difference between full-table scan and index is very large.
other
The templates available for selection during the previous version of Oracle 10g build are:
Data Warehouse (DW)
General Purpose (Common Purpose, general purpose)
New Database
Transaction processing (transaction processing)
The templates available for selection in the Oracle 11g build repository process are:
General purpose or transaction processing
Customizing the Database
Data Warehouse
The individual's understanding of these templates is: online analytical processing (olap,on-line analytical Processing) with large data volumes and fewer DML. Working with Data Warehouse templates
Online transaction processing (oltp,on-line Transaction processing), with a small amount of data, frequent DML, and many parallel transactions, but generally very short. Use a general purpose or transaction processing template.
DSS (Dds,decision support System), the typical operation is full table scan, long query, long transaction, but the number of general transactions is very small, often is a transaction exclusive system.