rac after the grid infrastructure was installed, we shifted our attention to the installation of Oracle software on the cluster. As we can see, the Grid Infrasctructure provides a framework for running RAC, including cluster communication links, node separation, node membership, and other services. ASM is the preferred way for Oracle to store databases. RAC leverages these concepts and expands the basic services that are needed. installation options After successfully installing Grid Infrastructure/clusterware, Oracle Universal Installer detects the creation of a clustered environment, The RAC option is then provided to install the entire cluster or the user specifies several of these nodes. It is good practice to use the cluster inspection tool CLUVFY to detect if the prerequisites for an RDBMS installation are met. As with the installation cluster, Oracle Universal installer will first copy and link the software on the first node, and then push the Oracle home directory to the other specified nodes. Unlike grid infrastructure, an Oracle RDBMS can be installed on a shared file system (for example, OCFS2 or ACFS), and adding new nodes to the cluster is simplified because the software does not need to be reinstalled on the new node. Patching is also simplified-only one Oracle home directory needs to be patched. But patches cannot be installed in rolling, so downtime is unavoidable. During the installation, Oracle Universal installer will alert the administrator to install or upgrade the database, or install only the software. If a new version is released at the time of installation, it is good practice to install the software only, and then create the database after patching the upgrade. Single-instance and RAC database rac and single-instance databases differ in many important ways. In RAC, a database is accessed by instances on multiple servers in the shared storage. Database files, online redo files, control files, and server parameter files (spfile) must all be shared. In addition, the flashback log, archived redo logs, data pump dump files, and Dataguard broker profiles can also be shared, depending on your configuration (this is optional, but it is strongly recommended). When using ASM, you can also find a local pfile file in each RAC node that points to the spfile in the corresponding disk group. Another file that is stored locally is the Oracle password file. Users in the clustered file system often put these files in a shared location, pointing to $ORACLE_HOME/DBS&N by symbolic linksThe database file database file contains all the data in the database, including tables, indexes, data dictionaries, and compiled PL/SQL code. In a RAC, each data file has only one copy, is located in the shared storage, and is accessed by all instances. Oracle does not provide mirroring for data files by default, and most users choose to do redundancy at the storage level to avoid data loss due to media failure. When the storage array does not have this capability, Oracle ASM can be used to provide redundancy. Control Files Control the information about the physical structure of the database, including their status. If you use Rman and do not have a dedicated Rman catalog database, you can also store information about Rman backups in the control file. In single-instance databases and RAC, control files should be mirrored to prevent corruption or storage failures. When both ASM and flashback recovery zones are used, they are automatically multiplexed. By default, Oracle is multiplexing the control files in the disk groups specified by Db_create_file_dest and db_recovery_file_dest. In this case, if you use the Spfile,control_files parameter, it will be updated automatically. Be aware that the control files become a contention point in the RAC because they are frequently updated. So do not make too many mirrored copies of the control files, and you should place them on high-speed storage. redo and archiving
In RAC, each instance has its own online log file, called Thread. Thread information can be viewed in v$log and related views.
You need two sets of redo logs per thread, and if you do not use ASM and the Flashback recovery area, you should consider manually multiplexing the members of the group. The SPFile is responsible for the mapping between the instance and the thread (by initializing the parameter thread). When adding a new instance to the cluster, a corresponding redo thread is required, which can be done in two ways: the first, execute SQL statement ALTER DATABASE add logfile group x thread y; The second, in a database (policy-managed) that uses policy management, is created automatically. It is then enabled by Oracle.
The LGWR background process flushes redo buffer to redo log. The online redo log needs to be placed in high-speed storage, otherwise it may become a contention point, especially in a high-frequency-submitted system. Often, the optimization of an application that is not well designed is to reduce the frequency of commit and at least move the redo log and control files to high-speed storage to reduce some performance bottlenecks. In systems with frequent log switching, it is helpful to increase the number of redo log groups per thread, which can give the archive process more time to archive redo logs. This approach can also benefit when the archive process needs to transfer archived redo to the standby database, but most systems now use the log Network Service (LNSN) process to asynchronously transfer the remote File redo to the standby database Server (RFS) session. In Oracle 10.2 and 11.1, each destination has an LNS process, and the 11.2,LNSN process is replaced by the NSSN Nsan background process. The NSSN process is used to synchronously transmit the redo,nsan used to transfer redo asynchronously. The principle of redo log size setting is that log switching is not too frequent (AWR and Statspack can help define an appropriate size). Oracle 11.2 also allows administrators to select the block size of the redo log, with modern storage units replacing the original 512b with a 4kb sector size.
When an instance in the RAC fails, all threads are merged to help establish a recovery set, and the server monitors the process to perform a roll forward or rollback operation.
After the LGWR process fills a redo log, one of the archive processes copies the file to the specified directory.
The Flashback recovery zone was introduced in Oracle 10.1 to be the best place to store archived logs. If you are not using the Flashback recovery area, it is recommended that the archive log be placed on a shared file system so that each node can access it. Unlike a single-instance database, RAC requires an archive log for all threads. When an instance performs a media recovery, you can see from its alter log that Oracle uses all the log files for each thread.
The undo tablespace is similar to the redo thread, and each instance of the cluster database is made up of its own undo table space. A one-to-one mapping between the instance and the Undo table space is configured in SPFile. However, this mapping does not imply that the undo tablespace is permanently bound on the instance, and all other instances can also access the undo tablespace to create a read-consistent pre-image of the block. When you add an instance to the cluster, you need to add a new undo tablespace and map it to that instance, just like redo log. In the Policy-managed database, Oracle can do this on its own. Although you can still use manual undo management, it is strongly recommended that you use automatic undo management (AUM). Storage options for the RAC database the administrator can select from the following options:
- ASM This is the preferred storage option for Oracle and is the only configuration supported in the RAC Standard Edition
- OCFS2
- Bare devices are not recommended, not only because they are deprecated by the new Linux kernel, but also not supported in Oracle 11.2
- Network File System (NFS)
- The Red Hat Global file System is only supported in Red Hat and Oracle Enterprise Linux and can be used on the Flashback recovery area and database files
rac instances A RAC database consists of 2 or more instances, typically each on a different node, consisting of some shared memory structures and background processes. Each instance has its own SGA, which is allocated when the instance is started. Oracle introduced automatic shared memory management (ASMM) in 10g and introduced automatic memory management (AMM) in 11g. But AMM is incompatible with the big pages of Linux, which is a problem for large memory systems. Oracle requires simultaneous access to local shared memory and the entire cluster. All instances have access to the SGA of other instances. In RAC, the Oracle kernel protects shared memory in the same way as a single instance, using latches and locks. The latch is a low-level, lightweight serial device. The process of the request latch is not queued, and if the process does not get the latch, it enters the spin state. Spin means that the process goes into a tight loop to prevent the operating system's scheduler from being removed from the CPU. If a process is not bolted for a long time, it goes to sleep and tries to apply again after a time interval. Latches are instance-level and do not have a cluster-wide latch. On the other hand, the lock is requested at a longer time, more complex than the latch. A lock can be either shared or exclusive, the process of requesting a lock waits in a first-in, in-out (FIFO) mechanism, and the queue controls the access to the lock, which is within the cluster scope. The requirement for cache consistency means that locks and latches are more complex than single instances in a RAC. As in a single instance, access to the database in the buffer cache and the queue must be managed on the local instance, but access to the remote instance also needs to be managed. For this reason, Oracle uses the global Resource Directory (GRD) and some additional background processes. (Oracle combines the v$ view with the instance identity to form a gv$ view, a gv$ view that contains the dynamic performance view of all instances in the cluster) Global resource Directory (GRD) rac uses some additional background processes to synchronize caches between Remember that RAC uses the cache fusion architecture to simulate a global SGA that spans all nodes within the cluster. Accessing blocks in the buffer cache requires coordination between read-consistent and write-through access, and the queues for shared resources are now clustered globally. The global cache service GCS is used to access the public buffer cache, and the global Enqueue service GES is used to manage queues in the cluster. GCS and GES are transparent to the application. The original structure used internally is the previously mentioned GRD, which is maintained by the GCS and the GES process. The GRD is distributed across all nodes of the cluster and is part of the SGA, which is why a RAC database of SGA is larger than a single-instance database in the same situation. Resource management is negotiated by GCS and Ges. A particular resource is managed entirely by one instance, which is resource master. But it is not fixed, Oracle 9.2 later version of the implementation of dynamic resource management (DRM), before 9.2, the resource remastering only occurs when the instance failure, GRD rebuild. In the new version, resource mastering occurs if Oracle detects that an instance other than resource master has access to a particular resource too frequently at a given time interval. In this case, the resource is remaster to the other node, which means that another node that accesses the resource frequently becomes resource master. Many users have feedback on some of the problems of dynamic remastering, which can cause unnecessary expenses when it happens too often. In this case, you can disable DRM.
(Grd also records which resources are managed by which instances, which can be very handy when an instance fails) explains how GCS works with GES to maintain grd global cache service (GCS) LMSN background processes use GCS to maintain cache consistency in the global buffer cache, where multiple copies of the same block can exist in the SGA (only one current version), GCS tracks the state and location of the block, and transmits the block to an instance of another node through an internal connection. Global Queue Service (GES) similar to GCs, GES works at the block level to manage global queues in the cluster. Based on experience, if an operation does not involve controlling/moving a block of data in the global buffer cache, it is likely that GES has been processed. The Global Queue Service is responsible for resource operations in all instances, such as access to data dictionaries and library caches, or global management of transactions. It can also detect deadlocks in a cluster. It tracks the state of the Oracle queue mechanism when multiple instances access resources at the same time. Global Queue Service Monitoring (Lmon) and global queue service background processes (LMD) Form part of the Global queue service. The lock process LCK0 is responsible for non-cached access, such as library and row cache requests. Cache Fusion (cache fusion) Buffer fusion is the latest evolution of data transfer between instances. As an alternative to the use of block ping in Oracle 8i, Oracle uses a high-speed internal connection to transfer blocks of data between all nodes. It is expensive to use the block Ping method to transfer data blocks between instances, and it is recommended that you associate the workload with the instance to minimize the amount of data block transfers between instances. In an Oracle parallel server (Oracle Parallel Server), when an instance requests a block of data to be modified, and the block is currently held by another instance, it sends a signal to the instance holding the block that writes the block to the disk, Then send back the block already readable signal. This method of communication and the amount of read and write operations on the disk is not satisfactory. The block transfer of cache Fusion relies on the global resource directory and does not require more than 3 hops (hop), which is related to the number of installations and nodes. Obviously, if a cluster has only two nodes, then there is a two-way cache transfer. If there are more than 2 nodes, it is necessary to limit the number of hops to 3 times. Oracle uses dedicated wait events to measure the traffic involved in the cache and, depending on the actual situation, decides to make a bidirectional or three-way cache transfer. When an instance requests a block of data through a cache fusion, it first contacts the resource's master to determine the current state of the resource, if the resource does not have a positiveWhen used, it can be obtained by reading from a local disk to obtain this block. If this resource is being used, the resource master will pass the resource to the instance that made the request. If this resource receives a modification request from 1 or more instances shortly thereafter, the resource will be added to the GRD, the master, requester, and holder of the resource can be different, in which case up to three jumps are required to obtain the block. The two-and three-way block transfers mentioned above are related to how resources are managed. When the resource master holds the requested resource, the request to the block is immediately satisfied and the block is transferred, which is a two-way communication. In a three-way situation where the requestor, master, and Holder are not the same, the resource master needs to forward the request, triggering a new jump. As you can see from the discussion just now, it is not possible to underestimate the coordination between blocks and their images in the global buffer cache. In a RAC database, cache fusion often represents the greatest benefit and the highest cost. The advantage is that cache fusion theoretically runs proportionally and may achieve near-linear extensibility. However, the additional workload imposed by cache fusion may be within the range of 10%-20%.
Read consistency One of the main features of the oracle database is the ability to provide different views of the data at the same time, a feature called multi-version read consistency. The query is read-consistent, the write does not block read, and vice versa. Of course, multiple versions of Read consistency are as effective as RAC, but involve a bit of other work. System change Number (SCN) is an internal timestamp of Oracle and is important for read consistency. If the local instance requests a read-consistent version of a block, it needs to contact the block's resource master to determine if the block has the same SCN version, or if the newer version exists in the buffer cache of a remote instance. If this block exists, then resource Master sends a request to the corresponding remote instance to forward the read-consistent version of the block to the local instance. If the remote instance holds the version of the request time SCN for this block, it will send the block immediately. If the remote instance holds the version of this block update, it will create a copy of the block (called the pre-image), and apply rollback to the copy to return it to the corresponding SCN and send it through an internal connection. The system change number (SCN)  SCN is the internal timestamp that is generated and used by the Oracle database, and all events that occur in the database are marked with SCN, and the transaction is the same. The read consistency of Oracle relies heavily on the information in the SCN and undo table spaces. The SCN needs to be synchronized in the cluster, with two scenarios used in RAC to make the SCN common across all nodes: Broadcast-on-commit and Lamport. Broadcast-on-commit is the default scheme after 10.2, which resolves a problem with the Lamport scenario: Previously, the default scheme was Lamport, which promised better extensibility, allowing the SCN to propagate like other communications in the cluster, but not immediately after a commit in one node. This satisfies the requirements in most cases, but there is one problem with the Lamport scenario: the SCN of one node is possible with the SCN of the other node, especially when the messaging is not active. This delay in the SCN means that a transaction committed on one node "looks" a bit too new from the instance of another delay. On the other hand, the BROADCAST-ON-COMMIT scheme is more resource intensive. The LGWR process updates the global SCN after each commit and broadcasts it to all other instances. In RAC11.1, the initialization parameter Max_commit_propagation_delay allows the database administrator to modify the default settings, which are removed in 11.2. Reprint: http://bLog.sina.com.cn/s/blog_5fe8502601016avf.html