Tri Hua, Network name Dbsnake
Oracle ACE Director,acoug Core expert
("Tri Hua article series" has been Tri Hua exclusive authorized in the cloud and the official account of the ink will be released, please continue to pay attention to)
This article details the connection management in the Oracle RAC environment, and describes what the connect time load Balancing, Runtime Connection load Balancing, connect is Connecti On Failover and Runtime Connection Failover, as well as the inside of the TAF, ONS, FCF, FAN, LBA and many other knowledge points. This article focuses on connection management in an Oracle RAC 11GR2 environment, but also illustrates the Oracle RAC 10GR2/9IR2 to reflect differences in connection management between them.
The so-called "connection management", mainly reflected in the Load balancing and Failover two aspects. The load balancing and Failover under Oracle RAC 11gR2 are divided into connect time Load balancing, Runtime Connection depending on whether a pre-existing connection is used (such as a connection in the connection pool) The 4 types of Load balancing, connect time Connection Failover and runtime Connection Failover, usually with the "runtime" prefix, refer to the situation where the connection already exists, such as using The connection pool.
first, Introduction to Connect time Connection Failover
Connect time Connection Failover refers to the Failover when connecting directly to an Oracle database without getting an existing connection from the connection pool. Before the Oracle RAC 11gR2, the Connect time Connection Failover was very easy to implement, requiring multiple VIPs to be specified in the relevant Tnsnames.ora, as well as specifying Failover=on. As shown below:
(description= (Failover=on) (Address_list= (Load_balance=off) (Address= (PROTOCOL=TCP) (HOST=RAC1-VIP) (port=1521)) (Address= (PROTOCOL=TCP) (HOST=RAC2-VIP) (port=1521)) (Address= (PROTOCOL=TCP) (HOST=RAC3-VIP) (port=1521)) (Address= (PROTOCOL=TCP) (HOST=RAC4-VIP) (port=1521)) ) (Connect_data= (service_name=rac10g))
Here the client process first attempts to connect the RAC1-VIP, if not connected, will try to RAC2-VIP, and then not connected, will continue to try, until all the VIP addresses appearing in Address_list in full order until the end of the attempt. The Failover of this client when connecting to an Oracle database is not only applicable to the RAC environment, but also to the Data Guard environment. As shown below:
description= ( failover=on) ( address_list= (load_balance=off) (address= (protocol=tcp) (host= PRIMARY-IP) (port=1521)) (address= (protocol=tcp) (HOST=STANDBY-IP) (port=1521)) (Connect_data= ( service_name=service10g)))
Oracle RAC 11gR2 introduces scan (single client Access Name), and the client defaults to a scan to connect to the entire RAC environment, as shown in the scan's architecture diagram:
As shown, if you use DNS or GNS (Grid naming Service), you can have up to 3 scan VIPs and 3 scan Listener, and if you do not use DNS or GNS, but choose to use the Hosts file, you will only have 1 Scan VIP and one scan Listener.
This is assumed to be configured in Tnsnames.ora:
(DESCRIPTION = (Failover=on) (ADDRESS = (PROTOCOL = TCP) (HOST = Myscan) (PORT = 1521)) (Connect_data = (SERVER = dedicated) (service_name =rac11g)))
Strictly speaking, only when the RAC environment has more than 1 SCAN VIP, the above failover=on is meaningful-it means that when the client is connected to the scan VIP, if one of the scan VIP is not connected, it will immediately try another scan V Ip.
When the Hosts file is used to specify the scan VIP, that is, in the case of only 1 scan VIPs throughout the RAC environment, Failover actually exists, except in this case the Failover will be slower. Because when the scan VIP node is down, the scan VIP will Failover to the other nodes together with the associated scan Listener, but this Failover takes time and the client needs to wait until the Failover process is complete before it can be connected again. Rac.
second, the next introduction Runtime Connection Failover
Runtime Connection Failover refers to a Failover that is connected to an existing condition. This existing connection may be a connection that is being used in the connection pool, or a connection that is not connected to the Oracle database through the connection pool, directly through the OCI client (such as Sqlplus).
This Runtime Connection Failover, that is, if the connection already exists, if the Oracle database side of the abnormal situation (such as Service outage, Instance crashes, the Session is broken) and caused the existing connection interrupted, how Failover's problem.
There are two ways to implement the Runtime Connection Failover, respectively, TAF (Transparent application Failover) and FCF (Fast Connection Failover).
First, introduce TAF. TAF has some of the following knowledge points that need our attention:
1, it can be defined in the connection string in the client side of the Tnsnames.ora, can also be defined in the server side of the service, but the service side of the settings will supersede (override) the settings in the client Tnsnames.ora:
The client can set TAF like this:
(DESCRIPTION = (Failover=on) (ADDRESS = (PROTOCOL = TCP) (HOST = Myscan) (PORT = 1521)) (Connect_data = (SERVER = dedicated) (service_name = Email) (Failover_mode= (Type=select) (Method=basic) (retries=180) (delay=5)))
The server side can set TAF like this:
Srvctl Modify service-d rac11g-s email-q true-p basic-e select-z 180-w 5-j LONG The meaning of the specific parameters can be found in the following comments:
Usage:srvctl Modify service-d <db_unique_name>-S <service_name> [-c {UNIFORM | SINGLETON}] [-p {basic| Preconnect| NONE}] [-l [Primary][,physical_standby][,logical_standby][,snapshot_standby]] [-y {AUTOMATIC | manual}][-q {True|false}] [-x {true|false}] [-j {short| LONG}] [-b {none| service_time| Throughput}] [-e {none| session| SELECT}] [-m {none| BASIC}] [-Z <integer>] [-w <integer>]-D <db_unique_name> unique name for the database-s < ;service> service Name-c {UNIFORM | SINGLETON} service runs on every active server in the server pool hosting this Service (UNIFORM) or just one server (Singl ETON)-P {NONE | BASIC | Preconnect} TAF policy specification-l <role> role of the service (primary, Physical_stand By, Logical_standby, Snapshot_standby)-y <policy> Management policy for the service (AUTOMATIC or M anual)-e <failover type> Failover type (NONE, SESSION,or SELECT)-M <failover method> Failover method (NONE or BASIC)-W <integer> Failover de Lay-z <integer> Failover retries-j <clb_goal> Connection Load balancing goal (short or LO NG). Default is LONG. -B <runtime Load Balancing goal> Runtime load Balancing Goal (Service_time, throughput, or NONE)-X <distr ibuted Transaction processing> Distributed Transaction Processing (TRUE or FALSE)-Q <aq HA notifications> AQ HA notifications (TRUE or FALSE)-H Print usage
2, when the type of TAF is set to select, a simple select operation (excluding select ... for update) can be done "breakpoint continuation", that is, the simple select operation after the implementation of Failover with TAF can be from the place of interruption Continue to be carried out;
3, TAF to DML operation can not do "breakpoint continuation", that is, if a transaction after using TAF implementation Failover, the transaction can not continue from the place of interruption, need to start from the beginning to execute;
4. TAF is only valid for clients and connection pools using OCI connections, where the OCI connection can be a package on the OCI connection, such as Jdbc-oci driver Support TAF, but JDBC thin driver does not support TAF (because the JDBC thin Driver is not based on the OCI).
Next, before we introduce FCF (Fast Connection Failover), we must first introduce the FAN (Fast application Notification).
FAN is a message-active notification mechanism in Oracle RAC. When the service Down/up,instance Down/up is present in the RAC and the node load changes, the Oracle database is able to publish this information through the fan events, and the clients subscribing to these fan events receive the Fan E at the first time After vents, you can respond to these FAN events by making corresponding actions.
Fan events is divided into two types, the first being fan HA events, and the second LBA events, where LBA refers to Load balancing advisory.
FAN HA Events is triggered when service Down/up, instance down/up are present in the RAC. An example of the FAN HA events is shown below:
Event 1:fan Event type:instance Properties: version=1.0 service=prod database=prod instance=prod1 host=node1 Status =down Event 2:fan Event Type:service_member Properties: version=1.0 service=erp database=prod instance=prod1 Host=node1 status=downevent 3:fan Event type:service_member properties:version=1.0 service=erp Database=PROD instance= PROD3 Host=node3 Status=up
An example of the LBA Events,lba events is also generated after the load of the nodes in the RAC is changed, as shown below:
Event 4:fan-event type:service_metrics properties:version=2.0 service=erp database=prod instance=PROD1 percent=70 Serv Ice_quality=good instance=prod2 percent=30 service_quality=good Event 5:fan-event type:service_metrics properties:version=2.0 Service=crm database=prod instance=prod2 percent=30 service_quality=good instance=PROD3 Percent=70 Service_quality=good
These FAN events may be spread across multiple channels, including ONS (Oracle Notification Service), AQ (Advanced Queue), Pmon, and more. Here are two diagrams of the fan events architecture and propagation path, which intuitively illustrate the way fan events is propagated:
Clients subscribing to FAN HA events include: JDBC implicit Connection Cache, OCI, odp.net Connection pools, Listener, Server Side callouts, etc.;
Clients subscribing to LBA events include: JDBC implicit Connection Cache, odp.net Connection pools, listener,oci Session pools, etc.;
After the introduction of fan, it is now possible to start introducing FCF:FCF, which means Fast Connection Failover, which is actually a client that is implemented by subscribing to FAN HA events. The following is an example of two clients implementing FCF by subscribing to FAN HA events:
Example one: JDBC Fast Connection Failover (FCF)
The JDBC connection here refers to the JDBC thin connection. Because the JDBC thin connection is not based on the OCI, the Runtime Connection Failover in this case cannot use TAF, only FCF. And to do the following several things before you can normally use FCF:
1, the implicit connection cache open;
2, open the fastconnectionfailoverenabled;
3, it is best to subscribe directly to the remote ONS (the version prior to Oracle 10GR2 can not directly subscribe to the remote ONS, only by installing ONS locally later to achieve the transfer of FAN events);
4, it is best to set up a TCP timeout in the Java program (the latter will specifically describe how to adjust TCP timeout in the Oracle database);
The demo code is as follows:
Oracledatasource ODS = new Oracledatasource () ... ods.setuser ("Scott") Ods.setpassword ("Tiger") Ods.setconnectioncachingenabled (TRUE); Ods.setfastconnectionfailoverenabled (TRUE); Ods.setconnectioncachename ("Mycache") ods.setconnectioncacheproperties (CP); Ods.setonsconfiguration ("nodes=racnode1:6201,racnode2.:6201"); Ods.seturl ("Jdbc:oracle:thin: @sales1-scan:1521/oltp");//tcp connect timeoutproperties prop = new Properties (); Prop.setproperty ("Minlimit", Min_conn);p rop.setproperty ("Maxlimit", Max_conn);p rop.setproperty ("InitialLimit", Init_conn);p rop.put (Oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR, "1000")); This indicates that TCP timeout is set to 1000 milliseconds, or 1 seconds ods.setconnectioncacheproperties (prop);
Example two: Odp.net Fast Connection Failover (FCF)
For Odp.net, there are a few things you can do to use FCF:
1, the corresponding service of the AQ Notification open:
Srvctl Modify service-d rac11g-s email-q TRUE
2, set the value of the aq_tm_processes to 1;
3, give the specified user De-queue permissions:
exec dbms_aqadm.grant_queue_privilege (' DEQUEUE ', ' SYS. Sys$service_metrics ', <your username=>);
4. Set the HA events=true in the. NET connection string;
The demo code is as follows:
C # using System; Using Oracle.DataAccess.Client; Class Connectionpoolingsample {static void Main () {OracleConnection con = new OracleConnection ();//open a connection us ing ConnectionString attributes//related to connection pooling. Con. ConnectionString = "User Id=scott; Password=tiger;data Source=crm; "+" Min Pool size=10; Connection lifetime=120; Connection timeout=60; "+" HA events=true "," INCR Pool size=5; DECR Pool si=2 "; Con. Open (); Console.WriteLine ("Connection Pool successfully created"); Close and Dispose OracleConnection object con. Close (); Con. Dispose (); Console.WriteLine ("Connection is placed back into the pool."); } }
FCF and TAF have a very big difference is that even if the simple select operation, FCF can not be as TAF as the "breakpoint continuation." For a connection pool configured with FCF, when it receives the FAN HA events containing the Instance/service outage, the connection to the instance/service associated with the cache in the connection pool is immediately marked as invalid (invalid) At the same time, these connections are cleared, and the transaction that use these connections will be aborted and rolled back immediately. When the application catches the error message generated by this aborted transaction, it either returns the related error directly to the end user, or re-obtains a valid connection from the connection pool and executes the aborted transaction again.
When FCF is enabled, if the connection error is returned to the end user, how do you determine the source of the error message (that is, the error returned by FCF)? Very simple, with Isfatalconnectionerror (SQLException e) to judge the Good, the demo code is as follows:
Connection management in an Oracle RAC environment