Some basic concepts of Oracle 11g RAC (IV)

Source: Internet
Author: User
After installing grid infrastructure, RAC will focus on installing the Oracle software on the cluster. Grid infrasctructure provides a framework for running RAC, including cluster communication links, node separation, and node member relationships. ASM is the first choice for Oracle database storage. RAC leverages these concepts and extends the basic services required. After grid infrastructure/clusterware is installed successfully, Oracle universal installer detects the establishment of the cluster environment, and then provides the RAC option for installing the entire cluster or specifying several nodes. It is a good practice to use the cluster Check Tool cluvfy to check whether the RDBMS installation meets the prerequisites. Like installing a cluster, Oracle universal installer first copies and links the software on the first node, and then pushes the Oracle main directory to other specified nodes. Unlike grid infrastructure, Oracle
RDBMS can be installed on a shared file system (such as ocfs2 or ACFs). Adding new nodes to the cluster is simplified because no software needs to be re-installed on the new nodes, patching is also simplified-only one Oracle Home Directory needs to be patched. However, patches cannot be installed in rolling mode, so the downtime is inevitable. During the installation process, Oracle universal installer will remind the Administrator to install or upgrade the database, or install only the software. If a new version is released during installation, it is better to install the software and install patches and then create a database. Single-instance and Rac databases RAC and single-instance databases differ in many important aspects. In RAC, a database is accessed by instances on multiple servers in shared storage. Database files, online redo files, control files, and server parameter files must be shared. In addition, flash back logs, archived redo logs, Data Pump dump files, and dataguard broker configuration files can also be shared, depending on your configuration (this is optional, but it is strongly recommended to do so ). When Using ASM, you can also find a local pfile file in each RAC node, which points to the spfile in the corresponding disk group. Another local file is the Oracle password file. Users in the cluster file system usually place these files in a shared location and point them to $ ORACLE_HOME/DBS database file database files through symbolic links, including all the data in the database, including tables, indexes, data dictionaries, and compiled PL/SQL code. In RAC, each data file has only one copy, which is stored in shared storage and accessed by all instances. Oracle does not provide images for data files by default. Most users choose storage redundancy to avoid data loss caused by media faults. Oracle ASM can be used to provide redundancy when the storage array does not have this function. Control File control information about the physical structure of the file storage database, including their status. If you use RMAN without a dedicated RMAN Catalog Database, you can also store information backed up by RMAN in the control file. In a single instance database and RAC, the control file should be mirrored to prevent damage or storage faults. When both ASM and flash recovery are used, multiplexing is automatically performed. By default, Oracle Multiplex Control Files in the disk groups specified by db_create_file_dest and db_recovery_file_dest. In this case, if you use spfile, The control_files parameter will be automatically updated. You need to know that the control files will become a contention point in RAC because they will be updated frequently. Therefore, do not copy too many images to control files, and place them on high-speed storage. Redo and archive in RAC. Each instance has its own online log file, called a thread ). Thread information can be viewed in V $ log and related views.
You need two groups of redo logs in each thread, and if you do not use ASM or the flash recovery zone, you should consider manually reusing the members in the group. Spfile is responsible for the ing between instances and threads (by initializing the thread parameter ). When a new instance is added to the cluster, a corresponding redo thread is required. This can be achieved in two ways: first, execute the SQL statement alter database add logfile group x thread Y. Second, it is automatically created in the policy-managed database. It is then enabled by Oracle.
The lgwr background process refreshes the redo buffer to the redo log. Online redo logs must be stored in High-Speed storage. Otherwise, they may become a point of contention, especially in a system with high-frequency submission. Optimization of poorly designed applications is usually to reduce the frequency of commit, and at least move the redo log and control file to high-speed storage to reduce some performance bottlenecks. In systems with frequent log switching, increasing the number of redo log groups for each thread will be helpful. It can give the archiving process more time to archive redo logs. This method also benefits when the archiving process needs to transfer the archived redo to the standby database. However, most systems currently use Log
The Network Service (lnsn) process asynchronously transmits redo to the Remote File Server (RFS) process of the standby database. In Oracle 10.2 and 11.1, each destination has an lNS process. At 11.2, the lnsn process is replaced by the nssn and nsan background processes. Nssn processes are used to synchronously transmit Redo, while nsan is used to asynchronously transmit redo. The redo log size setting principle is that log switching is not too frequent (AWR and statspack can help define a proper size ). Oracle 11.2 also allows administrators to select the size of the redo log block. Modern storage units use the size of 4 kb slice instead of the original B.
When an instance in RAC fails, all threads are merged to help establish a recovery set. The server monitors the process to perform the rollback or rollback operations.
After a redo log is fully written by the lgwr process, an archive process copies the file to the specified directory.
The Flash recovery zone is introduced in Oracle 10.1, which is the best storage location for archiving logs. If you do not use the flashback recovery area, we recommend that you store the archived logs in a shared file system so that each node can access them. Unlike a single-instance database, RAC requires archiving logs from all threads. When an instance resumes media recovery, you can see from its alter log that Oracle uses all the log files of each thread.

The undo tablespace is similar to the redo thread. the instance of each cluster database is owned by its own undo tablespace. A one-to-one ing between the instance and the Undo tablespace is configured in spfile. However, this ing does not mean that the Undo tablespace will be bound to this instance for a long time. All other instances can also access this undo tablespace to create a block read-consistent pre-image. When adding an instance to a cluster, you need to add a new undo tablespace and map it to the instance, which is the same as the redo log. In the policy-managed database, Oracle can do this on its own. Although you can still use manual undo management, we strongly recommend that you use automatic undo Management (AUM ). The Administrator of the storage option of the RAC database can select from the following options:
  • ASM is the preferred storage option for Oracle and is the only configuration supported in RAC Standard Edition.
  • Ocfs2
  • Bare devices are not recommended, not only because they are discarded by the new Linux kernel, but also not supported in Oracle 11.2.
  • Network File System (NFS)
  • Red Hat Global File System is only supported in Red Hat and Oracle Enterprise Linux. It can be used in the Flash recovery zone and database files.
A rac instance contains two or more instances. Generally, each instance is on a different node and consists of a shared memory structure and background processes. Each instance has its own SGA, which is allocated when the instance is started. Oracle introduced automatic shared memory management (asmm) in 10 Gb, and introduced automatic memory management (AMM) in 11 GB ). However, AMM is not compatible with Linux's large pages, which is a problem for large memory systems. Oracle needs to synchronously access the local shared memory and the entire cluster. All instances can access the SGA of other instances. In RAC, the Oracle Kernel provides the same protection measures for shared memory as in a single instance, and also uses latches and locks. A low-level, lightweight serial device. The process that requests the latches will not be queued. If the process cannot obtain the latches, it will enter the spin state. Spin means that the process will enter a tight loop to prevent being removed from the CPU by the scheduling program of the operating system. If a process does not get a latch for a long time, it will go to sleep and try again after a time interval. The latches are instance-level and have no cluster-range latches. On the other hand, it is more complicated to request a lock at a longer time than to latch it. The lock can be shared or exclusive. The process requesting the lock waits with the first-in-first-out (FIFO) mechanism, and the queue controls the access to the lock. This queue is within the cluster. The requirement for cache consistency means that the locks and latches are more complex than those of a single instance in RAC. As in a single instance, access to the database in the buffer cache and access to the queue must be managed in the local instance, but access to the remote instance must also be managed. For this reason, Oracle uses the Global Resource Directory (GRD) and some additional background processes. (Oracle combines v $ view with instance id to form the GV $ view. a gv $ view contains the dynamic performance view of all instances in the cluster.) Global Resource Directory (GRD) RAC uses some additional background processes for inter-Cache synchronization. Remember that RAC uses the cache fusion structure to simulate a global SGA that spans all nodes in the cluster. The block used to access the buffer cache needs to be coordinated between read and write accesses. the queue for shared resources is also global in the cluster. The global cache service (GCS) is used to access public buffer caches, and the global enqueue Service (GES) is used to manage queues in clusters. GCS and GES are transparent to applications. The original structure used internally is the previously mentioned GRD, which is maintained by the GCS and GES processes. GRD is distributed across all nodes in the cluster and is part of SGA. That is why the SGA of a RAC database is larger than that of a single-instance database in the same situation. Resource management is negotiated by GCS and GES. A specific resource is managed by an instance, which is the resource master. However, it is not fixed. Oracle 9.2 and later versions achieve Dynamic Resource Management (DRM), before 9.2, resource remastering only occurs when the instance fails and GRd is rebuilt. In the new version, if Oracle detects a resource
If an instance other than the master instance accesses a specific resource too frequently at a given interval, resource mastering will occur. In this case, the resource will be remaster to another node, that is, another node that frequently accesses the resource will become the Resource master. Many users have reported some issues about dynamic remastering. When it occurs too frequently, it may cause unnecessary expenses. In this case, you can disable DRM.
(GRD also records which resources are managed by which instances. When an instance fails, it will be very convenient to restore it) explain how GCS works with GES to maintain GRD global cache service (GCS) lmsn background processes use GCs to maintain cache consistency in global buffer cache, multiple copies of the same data block can exist in SGA (the current version has only one copy). GCS tracks the status and position of the data block, the blocks are transmitted to instances of other nodes through internal connections. The global Queue Service (GES) is similar to the GCS. GES operates at the block level to manage global queues in the cluster. Based on experience, if an operation does not involve controlling/moving data blocks in the global buffer cache, it is likely that it is processed by GES. The global Queue Service is responsible for resource operations in all instances, such as access to data dictionaries and database caches or global management of transactions. It can also detect deadlocks in the cluster. It tracks the status of the Oracle queue mechanism when multiple instances access resources at the same time. Global queue service monitoring (lmon) and Global Queue Service background process (LMD) constitute part of the global queue service. The lock process lck0 is responsible for non-Cache access, such as library and row cache requests. Cache fusion is the latest evolution of data transmission between instances. In Oracle 8i, block Ping is used as an alternative. Oracle uses high-speed internal connections to transfer data blocks between all nodes. It is very expensive to use block Ping between instances to transfer data blocks. It is recommended that you associate the workload with the instance to minimize the data block transfer between instances. In Oracle Parallel Server, when an instance requests a data block for modification, the data block is currently held by another instance, it sends a signal to the instance that holds the block, writes the block to the disk, and then returns the readable signal of the block. This method of communication and the amount of read/write operations on the disk is not satisfactory. The block transfer of cache fusion depends on the Global Resource Directory and does not need to exceed 3 hops, which is related to the installation and number of nodes. Obviously, if a cluster has only two nodes, there is a two-way cache transfer. If there are more than two nodes, it is necessary to limit the number of hops to three. Oracle uses a dedicated wait event to measure the communication involving the cache, and determines a two-way or three-way cache transfer based on the actual situation. When an instance requests a data block through cache fusion, it first contacts the Resource master to determine the current status of the resource. If the resource is not in use, then it can read the block from the local disk. If the resource is in use, the Resource master will pass the resource to the requested instance. If the resource receives a request for modification from one or more instances, the resource will be added to GRD. The Resource master, requester, and owner can be different, in this case, a maximum of three jumps are required to obtain the block. The two-way and three-way block transfer mentioned above is related to the resource management method. When the master of the resource holds the requested resource, the request to the data block can be immediately satisfied and the block transmission starts. This is two-way communication. In the three-way scenario, the requester, master, and owner are not the same, so the Resource master needs to forward this request, triggering a new jump. From the discussion just now, you can know that coordination between blocks and their images in the global buffer cache cannot be underestimated. In RAC databases, cache convergence often represents the greatest benefit and the highest cost. The advantage is that cache fusion theoretically increases proportionally and may achieve almost linear scalability. However, the additional workload imposed by cache fusion may be within the range of 10%-20%.

One of the main features of Oracle Database read consistency is that it can provide data from different perspectives at the same time. This feature is called multi-version read consistency. The query is read-consistent, and the write operation does not block the read, and vice versa. Of course, multi-version read consistency is equally effective for RAC, but it involves some other work. System Change number (SCN) is an oracle internal timestamp, which is very important for read consistency. If the local instance requests a consistent read version of a block, it needs to contact the Resource master of the block to determine whether the block has the same SCN version, or the updated version exists in the buffer cache of a remote instance. If the block exists, the Resource master sends a request to the corresponding remote instance to forward the consistent read version of the block to the local instance. If the remote instance holds the version of the request time SCN for this block, it will immediately send this block. If the remote instance holds the version of this block update, it will create a copy of this block (called a pre-image) and roll back the copy application to make it return to the corresponding SCN status, and then send it through an internal connection. System Change number (SCN) SCN is the internal timestamp generated and used by the Oracle database. All events in the database are marked with SCN, and transactions are the same. Oracle read consistency is heavily dependent on the information in the SCN and undo tablespace. SCN needs to be synchronized in the cluster. RAC provides two common schemes for SCN among all nodes: broadcast-on-commit and Lamport. Broadcast-on-commit is the default solution after 10.2. It solves a problem in the Lamport solution: Previously, this default solution was Lamport, which promised better scalability, let the SCN spread like other communications in the cluster, but it does not happen immediately after the commit in a node. This can meet the requirements in most cases. However, the Lamport solution has a problem: it is possible that the SCN of one node has a delay from the SCN of another node, especially when the message transmission is inactive. The latency of this SCN means that the transaction committed on one node "looks" a little too new from another delayed instance. On the other hand, the broadcast-on-commit solution is more resource-intensive. After each commit, the lgwr process updates the global SCN and broadcasts it to all other instances. In rac11.1, the initialization parameter max_commit_propagation_delay allows the database administrator to modify the default setting. this parameter is removed in 11.2.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.