Oracle RAC Learning notes: Basic concepts and Getting Started April 19, 2010 10:39 Source: Book's Blog book Editor: Xiao Xiong
【Technical Development Technical Articles】
Oracle 10g Real Application Clusters introduction1. What is clusterA cluster is made up of two or more independent, network-connected servers. Several hardware vendors have been providing a variety of requirements for cluster performance over the years. Some clusters are only intended to provide high availability, and are transferred to node nodes when the current active node fails. Others are designed to provide distributed connectivity, and the scalability of the work. Another common feature of cluster is that, for an application, it can be seen as a separate server. Similarly, managing several servers should be as simple as managing a server. This functionality is provided by the cluster Manager software.
If it is a single server nodes, the files must be stored in a location that their respective node can access. There are several different topologies to solve the problem of data access, which relies primarily on the main objectives of the cluster design.
A physical network connection is connected to each other as a direct interactive communication of each cluster node.
In short, a cluster is a set of independent servers that work together to form a single system.
2. What is Oracle Real application Cluster (RAC)RAC is a software that enables you to use cluster hardware by running multiple instance that rely on the same database. The database files are stored on a disk that is physically or logically connected to each node. To facilitate read and write operations to files for each active instance.
RAC software manages access to data. So change operations are coordinated between instances, and each instance sees the same information and data mirroring.
With the RAC structure, redundancy can be obtained so that applications can access the database through other instance even when a system is crash or inaccessible.
3. Why use RACRAC can take advantage of standard cluster to reduce module servers costs.
RAC automatically provides the workload management of the service. The services of the application can be grouped or categorized to form a business component to complete the application task. The services in the RAC can be continuous, uninterrupted database operations and support multiple services on multiple instance. Services can be designed to run on one or more instance, and alternate instances can be used to back up instances. If the primary instance fails, Oracle moves services from the failed instance node to the active, replaceable instance. Oracle also automatically balances the data load through the connection.
RAC uses several inexpensive computers to provide database services, just like a large computer, serving a wide range of applications that only large-scale SMP can offer.
RAC is based on the shared disk structure, which can be increased or reduced on demand without the need for artificial data separation in cluster. And the RAC can simply increase and remove the servers from the cluster.
4. Clusters and extensibility
If you use the symmetric multi-processing (symmetric multiprocessing SMP) mechanism to provide transparent services to your application, you should use RAC to achieve the same effect without any changes to your application code.
When a node fails, RAC can exclude the database instance and node itself, thus guaranteeing the database's integrity.
Here are some examples of scalability:* More concurrent batches are allowed.
* Allows for a greater degree of concurrent execution.
* In the OLTP system can be the connection of the user greatly increased.
1) The level of extensibility: There are four main levels* Hardware Scalability: Interoperability is the key to it, which generally relies on higher bandwidth and lower latency.
* OS scalability: In the OS, the synchronization method can determine the scalability of the system. In some cases, the potential scalability of the hardware can be lost because the OS is unable to concurrently maintain multiple resources for the request.
* Scalability of the database management system: A key factor in the concurrency structure is whether concurrency is affected by internal or external processes. The answer to this question affects the mechanism of synchronization.
* Scalability at the application level: applications must be explicitly designed to be extensible. If in most cases, each session is updating the same data, a bottleneck can occur. This refers not only to RAC, but also to the single-instance system.
It should be clear that if any one level is not scalable, the concurrent cluster process may fail, regardless of the scalability of the other tiers. The typical reason for lack of scalability is access to shared resources. This allows concurrent operations to serialize execution on this bottleneck. This is not only a limitation in RAC, but a limitation in all structures.
2) Scaleup and speedup
* Scaleup is the ability to maintain the same level of performance when the workload and resources are proportionally increased (corresponding time)
scaleup= (volume Parallel)/(volume original) –time for IPC
* Speedup refers to the effect of achieving a proportional reduction in execution time by increasing the amount of resources required to complete a fixed workload.
speedup= (time original)/(time parallel) –time for IPC
Among them, IPC is a shorthand--interprocess for interprocess communication communication
RAC Architecture and Concepts
1. RAC Software principle
in a RAC instance, you will see some background processes that do not exist in ordinary instance, which are primarily used to maintain database consistency in each instance. Manage global resources, as follows:
* lmon: Global Queue Service monitoring process--global Enqueue Service monitor * LMD0: Global Queue Service daemon--global Enqueue service Daemon * LMSX: Global buffering service process, X can be from 0 to J--global Cache service Processes * LCK0: Lock process--lock process * DIAG: Diagnostic Process--diagnosibility Process
at the cluster layer, you can find the main processes for cluster ready services software, which provide standard cluster interfaces on all platforms and enable high-availability operations. The following processes can be seen on each cluster node:
* CRSD and Racgimon: engines for high availability operations. * OCSSD: Provides access to member nodes and service groups * EVMD: Event detection process, run by Oracle User management * oprocd:cluster monitoring process
There are also several tools for managing various resources at the global level in cluster. These resources are ASM Instance, RAC Database, Services, and CRS application nodes. The tools covered in this book are mainly server Control (SRVCTL), DBCA, and Enterprise Manager.
2. RAC Software Storage principle
The RAC installation of oracle10g is divided into two stages. The first phase is the installation of CRS, followed by the installation of database software with RAC components and the creation of cluster databases. The Oracle home used by the CRS software must be different from the home used by the RAC software. Although CRS and RAC software in cluster can be shared by using the cluster file system, the software is always installed on a regular basis in the local file system of each node. This supports the upgrade of online patches and eliminates the failure caused by single-node software. In addition, two must be stored in a shared storage device:
* Voting file: It is essentially used to monitor node information for the cluster Synchronization Services daemon. The size is about 20MB.
* Oracle Cluster Registry (OCR) files: Also a key part of CRS. Information that is used to maintain high availability components in cluster. For example, cluster node list, cluster database instance to node mapping and CRS application resource list (such as services, virtual Internal link protocol address, etc.). This file is automatically maintained by SRVCTL similar management tools. Its size is about 100MB.
Voting file and OCR file cannot be stored in ASM because they must be accessible before any Oracle instance is started. Also, both must be stored in a redundant, reliable storage device, such as raid. It is recommended that you put these files on a bare disk as a best practice.
3, the structure of OCR
The configuration information for the cluster is maintained in OCR. OCR relies on a distributed shared cache structure to optimize queries about the cluster knowledge base. Each node in cluster accesses the OCR cache through the OCR process to maintain a copy in its memory. In fact, in cluster, there is only one OCR process that reads and writes OCR in shared storage. This process is responsible for refreshing (refresh) its own local cache and the OCR cache for other nodes in cluster. For access to the cluster Knowledge Base, the OCR client accesses the local OCR process directly. When the client needs to update OCR, they interact with the process that plays the read-write OCR file through the local OCR process.
OCR client applications are: Oracle Universal Installer (OUI), SRVCTL, Enterprise Manager (EM), DBCA, Dbua, NETCA, and Virtual network Protocol Assistant (VIPCA). In addition, OCR maintenance manages the dependency and state information of resources for various applications defined within the CRS, especially applications for database, Instance, services, and nodes.
The configuration file name is Ocr.loc, and the profile variable is ocrconfig_loc. The location of the Cluster knowledge Base is not restricted to bare devices. OCR can be placed on a shared storage device managed by the cluster file system.
NOTE:OCR can also be used as a configuration file in ASM's single instance, with one OCR per node.
4. RAC Database Storage principle
and Single-instance The main difference in how Oracle is stored is that the RAC store must store all the data files in the RAC on a shared device (either a bare device or a cluster file system) for easy access to the same database instance. At least two redo log groups must be created for each instance, and all redo log groups must also be stored on shared devices for the purpose of crash recovery. Each instance online redo log groups is referred to as a instance online redo thread.
In addition, you must create a shared undo tablespace for each instance for Oracle's recommended undo auto-management feature. Each undo tablespace must be shared for all instance, primarily for recovery purposes.
Archive logs cannot be stored on bare devices because their names are automatically generated and each is inconsistent. Therefore, it needs to be stored in a file system. If you use the cluster file system (CFS), you can access these archives at any time on any node. If you do not use CFS, you have to make other cluster members available when recovering those archived logs, such as through a network file system (NFS). If you use the recommended Flash Recovery area feature, it must be stored in a shared directory so that all instance can access it. (A shared directory can be an ASM disk group, or a CFS).
5, RAC and shared storage technology storage is a key component of grid technology. Traditionally, storage is directly attached to every server (directly attached to each individual server DAS). Over the past few years, more flexible storage has emerged and has been applied, primarily through storage space networks or regular Ethernet access. These new storage methods make it possible for multiple servers to access the same set of disks, allowing for simple access in a distributed environment. The
Storage Area Network (SAN) represents the evolution of data storage technology at this point. Traditionally, the data in the C/s system is stored inside the server or attached to the device. It then enters the network attached storage (NAS) phase, which separates the storage devices from the servers and the networks that directly connect them. The principles it adheres to in San further allow storage devices to exist in their respective networks and be exchanged directly through high-speed media. Users can access the storage device's data through the server system, and the server system is connected to the local network (LAN) and SAN.
File System selection is the key to RAC. Traditional file systems do not support parallel mounting of multiple systems. Therefore, the file must be stored in a file system that does not have any file system's bare volume label or that supports multiple system concurrent access.
Therefore, the three primary methods for the shared storage of RAC are:
* bare Label: It is a direct attached bare device that needs to be used for storage and operates in block mode.
* Cluster file system: Also needs to be accessed in the block mode process. One or more cluster file systems can be used to store all RAC files.
* Automated Storage Management (ASM): For Oracle Database Files,asm is a lightweight, dedicated, and optimized cluster file system.
6, Oracle Cluster file system Oracle Cluster file System (OCFS) is a shared file system designed specifically for Oracle RAC. OCFS excludes the need for Oracle Database files to be connected to the logical disk and allows all nodes to share an Oracle Home without having to have a local copy of each node. OCFS labels can span one or more shared disks for redundancy and performance enhancements.
below can be placed in the OCFs file class table: * installation files for Oracle software: in 10g, This setting is supported only in Windows 2000. said that the later version will provide support in Linux, but I have not specifically looked at.
* Oracle files (control files, data files, redo logs files, bfiles, etc.)
* shared profiles (SPFile)
* files created by Oracle during Oracle operation.
* Voting and OCR files
Oracle Cluster file system is free for developers and users. Can be downloaded from the official website.
7. Automatic storage Management (ASM) is a new feature of 10g. It provides a vertically managed file system and a volume label manager dedicated to the creation of Oracle Database files. ASM can provide management of a single SMP machine or cluster nodes across multiple Oracle RAC.
ASM eliminates the need to manually adjust I/O and automatically allocates I/O loads to all available resources to optimize performance. Assists the DBA in managing the dynamic database environment by allowing the database size to be increased without the need to shutdown the databases to regulate storage allocations.
ASM can maintain redundant backups of data to improve fault tolerance. It can also be installed into a reliable storage mechanism.
8. The advantages of choosing RAW or CFS * CFS: Easy installation and management of RAC, using Oracle Managed Files (OMF) for RAC, single Oracle software installation, Oracle data file The S can be automatically extended and unified access to the archive log when the physical node fails.
* Use of bare devices: typically used in cases where CFS is not available or not supported by Oracle, it provides the best performance, does not require a middle tier between Oracle and disk, and if the space is exhausted, automatic scaling on the bare device will fail; ASM, Logical Storage Manager or Logical Volume label management It simplifies the work of bare devices, and they also allow loading of space to a bare device on-line, creating names for bare devices that can be easily managed.
9. Typical cluster stack of RAC
each node in cluster requires a supported, interconnected software protocol to support internal instance interactions while requiring tcp/ IP support for CRS polling. All UNIX platforms Use User Datagram Protocol (UDP) as the primary protocol on Gigabit Ethernet and IPC interactions within the RAC instance. Other supported proprietary protocols include remote shared memory protocols and hypertext protocols for connection interactions between SCI and sunfire for hyper-fiber interaction. In any case, the interaction must be identified by the platform's Oracle.
with Oracle Clusterware, you can reduce installation and support complications. However, if a user uses non-etheric interactions, or if a clusterware-dependent application is developed on a RAC, vendor Clusterware may be required.
As with interactive connections, shared storage scenarios must be identified by Oracle on the current platform. If the CFS is available on the target platform, both the Database area and the flash recovery area can be created on CFS or ASM. If CFS is not available on the target platform, the database area can be created on an ASM or bare device (requires a volume label manager) and the Flash recovery area must be created in ASM.
10. RAC Certification Matrix: It is designed to handle any authentication issues. you can use the matrix to answer any RAC-related authentication issues. The following steps are used:
* Connect and login http://metalink.oracle.com * Click the "Certify and Availability" button on the menu bar * Click on "View Certifications by Product" Connection * Select the right platform for RAC *
11, the necessary global resources
A single-instance environment in which lock coordinates go to a shared resource like a row in a table. Lock avoids two process colleagues modifying the same resources.
In a RAC environment, the synchronization of the internal nodes is critical, because it maintains the consistency of the respective processes in the different nodes, preventing them from modifying the same resource data at the same time. Synchronization of internal nodes ensures that each instance sees the most recent version of block in buffer cache. Indicates that there is no lock-in condition.
1) Coordination of global Resources
The cluster operation requires that access to control shared resources be synchronized in all instance. RAC uses global Resource directory to record usage information for resources in cluster database. Information in the Global Cache Service (GCS) and global Enqueue Service (GES) management GRD.
Each instance maintains part of the GRD in its local SGA. GCS and Ges Specify a instance to manage all information for a particular resource, which is referred to as the Master of the resource. Every instance knows resource's instance masters.
It is important to maintain the dependency (cache coherency) of the cache in the RAC activity. The so-called cache coherency is a technology that maintains the consistency of multiple block versions in different Oracle instances. GCS uses the so-called cache fusion algorithm to implement the cache coherency.
GES manages the internal instance resource operations of all non-cache fusion algorithms and the state trajectory of the Oracle queue mechanism. GES main control resources are the dictionary cache locks and the library cache locks. It also acts as a deadlock detection for all deadlock-sensitive queues and resources.
2) Global Cache Coordination Instance
Suppose that a data block is modified by the first node to become dirty data. And in Clusterwide, there is only one block copy version, whose contents are replaced with the SCN number. The specific steps are as follows:
① The second instance view modifies the block to make a request to the GCS.
②gcs submits a request to the holder (holder) of the block. Here, the first instance is holder.
③ the first instance receives a message and sends the block to the second instance. The first instance saves dirty buffer for recovery purposes. The dirty image of block is called the past image of block. A past image block will not be changed further.
④ received block, the second instance notified the GCS, informing that the block has been holds.
3) write to disk Coordination:example
In the caches of instances in the cluster structure, there may be different modified versions of the same block. The write protocol managed by GCS ensures that only the most recent version is written to disk. It also needs to ensure that other previous versions are purged from the other caches. A request to write a disk can be initiated from any instance, whether it holds the current version of the block or the previous version. Assuming that the first instance hold the previous block mirror, request that Oracle write buffer to disk, as in the following procedure:
① First Instance sends a write request to GCs
②gcs the request to the second instance, the current block's holder
③ second instance writes block to disk after receiving a write request
④ second instance informs the GCs that the write operation is complete
⑤ when a GCS is notified, the GCS command all the past mirrors of the holders to remove their past mirrors. This image will not be required for recovery.
12. RAC and Instance/crash Recovery
1) When a instance fails, when the failure is detected by another instance, the second instance will perform the following recovery operation:
① in the first phase of recovery, GES is re-poured into the queue
②gcs is also re-poured into its resources. The GCS process is only re-poured into those resources that have lost their control. During this time, all GCS resource requests and write requests are temporarily suspended. However, transactions can continue to modify data blocks as long as these transactions have acquired the necessary resources.
③ when a queue is reconfigured, an active instance can gain possession of the instance recovery queue. Therefore, when the GCS resource is re-poured, Smon determines the set of blocks that need to be restored. This collection is called a recovery set. Because, using the cache fusion algorithm, a instance transmits the contents of these blocks to the instance of the request without having to write these blocks to disk. The versions of these blocks on disk may not contain the blocks of the modification operations of the data of other instance processes. This means that Smon needs to merge the redo logs of all failed instance to determine the recovery set. This is because a failed thread may cause a hole (hole) in the redo to be filled with the specified block. So the failed instance redo thread cannot be applied continuously. At the same time, the active instances redo thread does not need to be recovered because Smon can use the mirror of past and current communication buffers.
④ buffer space used for recovery is allocated, and those previously read redo logs identified resources are declared as recovery resources. This avoids other instance accessing these resources.
⑤ all resources required for subsequent recovery operations are obtained, and GRD is not currently frozen. Any data block that does not need to be recovered can now be accessed. So the current system is partially available. At this point, it is assumed that a past or current blocks image needs to be restored, and that in other caches in cluster database, the most recent image is the start recovery point for these special blocks. If the previous mirror and current mirror buffers are not in the active instance caches for the block to be restored, Smon writes a log indicating that the merge failed. Smon restores and writes each block identified in the third step, releasing the resources immediately after the restore, allowing more resources to be used when recovering.
When all blocks are restored, the recovered resources that are occupied are freed and the system is available again.
Note: In recovery, the cost of log merging is proportional to the number of failed instances, and is related to the size of each instance redo logs.
2) Instance Recovery and database availability
Shows how much of the database is available at each step when performing a instance recovery:
A. RAC runs on multiple nodes
B. A node failure is detected
C. The queue portion of the GRD is reset, and resource management is reassigned to the active nodes. This operation is performed more quickly
D. The buffer portion of the GRD is reset, Smon read failure instance redo logs identify the collection of blocks that need to be restored
E. Smon initiates a request to GRD to obtain all of the database blocks in the blocks collection that needs to be restored. When the request is over, all the other blocks are accessible.
F. Oracle performs a rolling forward recovery. The redo logs of the failed thread is applied to database, and those blocks that are fully restored will be immediately accessible
G. Oracle performs roll back recovery. For transactions that have not yet been committed, the undo blocks is applied to the database
H. Instance recovery complete, all data can be accessed
13. Effective internal node row level lock
Oracle supports a valid row-level lock. These row-level locks are created primarily during DML operations, such as update. These locks are held until the transaction is committed or rolled back. Any process that requests a peer lock will be suspended.
The block transport of the cache fusion algorithm is independent of these user-visible row-level locks. The transport of GCS to blocks is an underlying operation that begins without the release of a contemporary row-level lock. Blocks may be transferred from one instance to other instances, while the blocks may be locked.
GCS provide access to the data blocks, allowing concurrent transactions of more than one transaction.
14. Additional memory requirements for RAC Most of the memory that is unique to RAC is allocated from the shared pool when the SGA is created. Because the blocks may be buffered across instances, a larger buffer must be required. Therefore, when you migrate a single instance database to a RAC, keeping the request workload for each instance single-instance, you need to increase the buffer for the instance that is running RAC by 10% The shared pool of cache and 15%. These values are only based on the experience of the RAC size, an initial attempted value. is typically greater than this value.
If you are using the recommended automatic memory management feature, you can set it by modifying the Sga_target initial parameters. However, given that the same number of user accesses are dispersed across multiple nodes, the memory requirements of each instance can be reduced.
The actual resource usage can be v$resource_limit view current_utilization and Max_utilization fields by querying the views in the GCs and GES entities in each instance, as follows:
SELECT resource_name, Current_utilization, max_utilization from V$resource_limit WHERE resource_name like ' g%s_% ';
15. RAC and concurrent execution
The
Oracle Optimizer is based on the execution access cost, which takes into account the cost of concurrent execution and serves as a part of the ideal execution plan.
in a RAC environment, the concurrency selection of the optimizer is made up of two classes of internal and external nodes concurrently. For example, a special query request requires six query processes to complete, and six concurrent dependent execution processes on the local node are idle, and the query is executed by using local resources to obtain the results. This illustrates the cost of efficient internal node concurrency, as well as the need for multi-node concurrent query coordination. If only two concurrent execution dependent processes are available on the local node, the two processes execute the query together with four processes from the other nodes. In this case, both the internal node and the external node are used concurrently to speed up the query.
in real-world decision support applications, queries cannot be servers by various queries. So some concurrent execution servers to complete its task before the other servers becomes idle. Oracle Concurrent execution technology dynamically monitors the idle process and assigns tasks in the queue table of the overloaded process to a process that is in an idle state. In this way, Oracle effectively allocates the query effort for all processes. RAC further expands this efficiency to the entire cluster.
16. Global Dynamic Performance View Global Dynamic Performance View displays information about all instances that are open and accessing RAC database. The standard dynamic performance view shows only information about the local instance.
For all v$ types of views, there is a gv$ view, with the exception of a few other special cases. In addition to the columns,gv$ view in the v$ view, an additional column named inst_id is included, showing the instance number in the RAC. You can access gv$ on any open instance.
In order to query the gv$ view, the initial parallel_max_servers initialization parameter on each instance is set at least 1. This is due to the use of a special concurrency execution for gv$ queries. The coordinator of the concurrent execution runs on the instance of the client connection, and each instance is assigned a slave to query its potential v$ view. If a parallel_max_servers on a instance is set to 0, the node's information cannot be obtained, and the result cannot be obtained if all concurrent servers are very busy. In both cases, you will not get a prompt or error message.
17. RAC and Service
18. Virtual IP Address and RAC
When a node fails completely, the virtual IP address (VIP) is about all valid applications. When a node fails, its associated VIP is automatically assigned to the other node in cluster. When this situation occurs:
* CRS Binding this IP on the MAC address of another node's network card, which is transparent to the user. For direct-attached clients, errors is displayed.
* Subsequent packets destined for the VIP will be directed to the new node, which will send the error RST return packet to the client. This allows the client to quickly obtain errors information and retry the connection to the other nodes.
If you do not use a VIP, the connection to that node waits for a 10-minute TCP expiration time after a node fails.
Oracle RAC Learning notes: Basic concepts and Getting Started