Detailed description of Oracle Database Parallel Processing Technology

Last Update:2013-12-17 Source: Internet

Author: User

Tags sybase database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Oracle Database Parallel Processing Technology is a core technology of databases. It enables organizations to efficiently manage and access TB-level data. If efficient parallel processing technology is not available for Oracle databases, these large databases are usually used in data warehouses but are increasingly used in business systems.

In short, parallel processing is to use multiple CPUs and I/O resources to perform a single database operation. Although every major database vendor claims to be able to provide parallel processing capabilities, the architecture provided by each vendor actually has a key difference.

This article discusses the architecture of Oracle 9i parallel processing and demonstrates its superiority over other architectures in practical applications. It should be noted that, the main advantage of the Oracle9i parallel processing architecture is that it fully utilizes the underlying hardware infrastructure-each processor unit, each memory byte, and all available I/O bandwidth under any circumstances. This White Paper also describes the seamless integration between the Oracle Parallel Processing component and other key business components, such as Oracle RealApplication Cluster.

Introduction to Parallel Processing Technology for Oracle databases

Current Databases, whether used in data warehouses, operational data storage (ODS), or OLTP systems, contain a wealth of information. However, because of the massive amount of data involved, it is a huge challenge to find and display information in a timely manner. Oracle Database Parallel Processing technology can solve this challenge. Using the Parallel Processing Technology of Oracle databases, You can process several terabytes of data within several minutes instead of hours or days. Oracle Database Parallel Processing technology achieves this high performance by leveraging all available hardware resources: multiple CPUs, multiple I/O channels, multiple storage arrays, and disk drives, and a large amount of memory. The more efficient the database software can use all these resources, the more effective it is to process queries and other database operations.

In addition, the complexity of today's database applications is greatly increased, not only need to support a large number of concurrent users, but also need to manage different types of users. Therefore, a parallel query architecture should not only ensure that all resources on the underlying hardware platform are fully utilized, but also further allocate these resources to multiple concurrent requests as appropriate. Obviously, the request to support the CEO's strategic decision is more important than the execution of the batch processing report. The parallel query architecture should be able to handle these business requirements: not only based on the request itself, in addition, dynamic allocation should be made based on the number of people sending requests and the number of currently available system resources.

The parallel processing architecture of Oracle9i can fully meet these requirements. The architecture of Oracle9i not only provides industry-leading high performance, but also is the only one that can be adaptive and dynamically adjusted.

Oracle9i's parallel processing architecture takes full advantage of each hardware investment-SMP, clustering, or MPP-to ensure optimal throughput and continuous, optimized system usage at any time.

The Oracle9i database balances all parallel operations based on available resources, request priorities, and actual system load control.

Parallel Design Strategy for Oracle Database Parallel Processing Technology-static and dynamic

The idea of parallel processing is to separate a single task into multiple smaller units. Instead of doing all the work through a process, you can run the tasks in parallel so that multiple processes can run on smaller units at the same time. This can greatly improve performance and optimize the use of the system. However, the most important part of parallel processing is how to make a correct decision to divide a single task into smaller units of work.

Typically, there are two methods for implementing parallel processing of database systems. The main difference is whether physical data layout is required, and static data partitions are used as the prerequisite for parallel processing.

Oracle Database Parallel Processing Technology-static parallelism through physical data partitions-not sharing

In a pure non-shared database architecture, database files must be partitioned on nodes of multiple computer systems for parallel processing. Each node has a data subset, and each node uses a single process or thread to perform all access to this data subset in an exclusive manner. Data access cannot be performed concurrently in a partition. Sometimes, the term "virtual processor" is used to replace nodes. "Virtual processor" is a mechanism for simulating non-shared nodes on SMP computers. For simplicity, we will use "nodes" as the term when discussing a non-shared architecture ). In other words, a pure non-shared system uses a partition or restricted access method to divide the work among multiple processing nodes. Node ownership changes are rare-Database Reorganization, addition or deletion of nodes to adapt to changes in business needs are typical causes of ownership changes. This change in data ownership always means manual management for non-shared systems.

In terms of concept, we can think that a pure non-shared system is very similar to a distributed database. To perform the required read/write operations on a node, transactions on the node must send messages to other nodes with data to be accessed, and coordinate the work completed on other nodes. Passing messages to other nodes and requesting specific operation functions on their datasets is called function transfer. On the other hand, if you request simple data from a remote node, you must access the complete dataset and send it back from the owning node to the requesting node ).

Parallel processing in a non-shared Architecture Works like distributed databases. Each node owns its Data Partition exclusively. No other node can access this data, making the node a single access point and fault point

This method has some basic disadvantages and cannot meet the requirements of high-end environments for scalability and high availability:

(1) first, the non-sharing method is not optimal when used to share all SMP hardware. To achieve the benefits of parallel processing, Physical partitioning of data is required. In SMP systems that share everything, this is obviously a manual and outdated requirement. In the SMP system, each processor can directly and equally access all data.

(2) second, the use of strict partition-based parallel processing policies in non-shared methods usually leads to abnormal resource utilization. For example, when you do not need to access all partitions of a table, or when a single node has a larger non-partition table, it is part of the operation. In these cases, the close ownership mode of parallel processing in a partition is restricted, and all available processing capabilities cannot be used. Therefore, the best solution for processing capabilities cannot be provided.

(3) third, because of the relationship between nodes and physical data partitions, the non-shared system is not flexible in adapting to the changing business needs. When the business grows, it is not easy to expand the system in incremental mode to meet the growing business needs. You can upgrade all existing nodes to keep them symmetric and avoid data re-partitioning. In most cases, it is too expensive to upgrade all nodes. You must add new nodes and reorganize them for physical re-partitioning.) existing databases. A scheme that does not require any restructuring is always better than a scheme that requires restructuring, even if the most complex restructuring tool is available.

(4) Finally, because of the strict restricted access mode, the non-shared system cannot fully utilize the potential fault tolerance capability provided by the cluster system to ensure high system reliability.

There is no doubt that, based on a non-shared architecture that uses static data distribution, a large amount of parallel processing can be parallel and scalable under laboratory conditions. However, in every real environment, the problems mentioned above must be solved correctly to meet the requirements of today's high-end key tasks.

Dynamic Parallel Execution of Oracle Database Parallel Processing Technology -- sharing everything

Using the Dynamic Parallel Processing Framework of Oracle, you can share all data. Any predefined static data distribution made during parallelization and division of work into smaller units, not limited to database creation.

Because it can construct unrestricted and optimized data subsets for each statement, dynamic parallelism during execution can provide equivalent or even better scalability with a non-shared architecture.

Each query has its own characteristics when accessing, connecting, and processing different parts of data. Therefore, each SQL statement must be optimized and parallelized when parsed. When data is changed, if a more optimized parallel execution plan is available or a node is added to the system, Oracle can automatically adapt to the new situation. This provides the highest degree of flexibility for parallelizing any kind of operations:

(1) Before the statement is executed, the physical data subsets accessed in parallel are dynamically optimized for each query requirement.

(2) The degree of parallelism is optimized for each query. Unlike the non-shared environment, there is no required minimum degree of parallelism to call all nodes to access all data, which is the basic requirement for accessing all data.

(3) operations can run concurrently on one or more Real Application Cluster nodes based on the current workload, features, and query importance.

As long as the statements are optimized and parallelized, you can know all subsequent parallel subtasks. The original process becomes the query Coordinator. The PX server of the parallel processing server is allocated from the public buffer pool of the parallel processing server on one or more nodes, and the operation is executed in parallel.

Similar to a non-shared architecture, each parallel processing server in the shared architecture works independently on its individual data subset. The transfer mechanism of data or functions between parallel processes is similar or identical to the aforementioned Non-shared architecture. After determining the request's parallel plan, each parallel processing server knows its data set and task, and inter-process communication is as rare as in a non-shared environment.

However, unlike the non-shared architecture, each SQL statement processed in parallel can be optimized without considering any physical database layout restrictions. This allows each parallel processing to construct an optimal data subset, providing better scalability and performance than a purely non-shared architecture. As long as it is beneficial, subsequent steps of parallel operations will be combined and processed by a parallel processing server, thus reducing the need for data transmission or function transmission.

Why Does Oracle Database Parallel Processing Technology share everything better than not?

The non-shared architecture can be traced back to the massive parallel processing (MPP) system as the only hardware architecture that can provide scalable high-end parallel computing. Each node in the MPP System has its own system component CPU, memory, and disk. It works on different subtasks and cannot share any of its resources.

All of this is over. Currently, most successful and widely used parallel hardware systems are symmetric multi-processor systems (SMP), either stand-alone or as loosely coupled clusters. The SMP system uses a multi-processor that shares public memory and disk resources. Therefore, it is also called a "share everything" system.

Supporters of a purely non-shared architecture always claim to share all the architectures, especially the cluster environment,) There will be a lack of scalability in high-end environments and cause significant overhead, therefore, this architecture cannot be used for high-end applications with high concurrency and/or concurrency. This statement is incorrect. Today's hardware and software technologies have solved all the problems in the past, such as high-speed Cluster Interconnection or Oracle Real Application Clusters's high-speed cache fusion architecture.

The Dynamic Parallel Processing Framework of Oracle is based on the same parallel advanced computing design as the non-shared software. It has all the advantages, it also enhances its functions and overcomes the disadvantages of the architecture of non-shared methods. The software based on the principle of non-sharing can be regarded as the first generation but outdated database parallel processing software.

This article introduces the Parallel Processing Technology for Oracle databases. I hope you can learn more about the parallel processing technology for Oracle databases.

Overview of new features in Oracle 11g R2
Step 8: Be proficient in accessing the Sybase Database from Oracle
C # trial process for connecting to the Oracle database
Introduction C # Calling Oracle databases
How to implement automatic Oracle Database Backup in UNIX

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More