Two recent Oracle Database faults are related to the firewall. The firewall here is a hardware network firewall, not a software firewall.
Let's talk about it briefly. An Oracle 9i running on a Windows system, the client cannot connect to the database, but there is no problem with the tnsping test. The solution is simple, but we still need to know the cause of the problem.
This problem begins with the entire process of listening to the database. Specifically, it refers to the dedicated server connection mode:
The listening process on the server listens on port 1521
The client initiates a database connection request.
The monitoring Process fork is an Oracle Server Process, also known as the Shadow Process ). The server process selects a port number greater than 1024 for listening. The listening process sends the port number back to the client and requires the client to reconnect to the specified port.
The client reconnects to the new port specified by the listener, that is, reconnects.
The client directly communicates with the Server Process, does not pass the listener, performs session authentication (LOGIN), and executes SQL statements.
From the above process, we can see that the final port connecting the client is actually not 1521. Generally, the firewall only opens several ports and only port 1521 to the Oracle database. In this way, when the client makes a second connection, it cannot pass the firewall, leading to database connection failure.
Fortunately, only 9i and earlier versions of Oracle on Windows have this problem. On Linux and Unix platforms, multiple processes can reuse ports. Oracle Server Process still uses a port (1521) with the listening Process ). By using strace in linux to track the client's connection to the database, we can find that the client only connects once and does not perform the second connection, which has changed compared with the process described above. On Windows, 10 Gbit/s and later versions of libraries also use port multiplexing to avoid such problems.
So how does the Oracle 9i running on Windows solve this problem? The answer is simple. Add a USE_SHARED_SOCKET under the \ HKEY_LOCAL_MACHINE \ SOFTWARE \ ORACLE \ HOMEn (Here n refers to the Oracle Home number, which is 0 when there is only one Oracle Home) Key in the Windows registry, the value is TRUE. Restart the listener and the Oracle service (restart the Oracle service instead of restarting the database. In fact, 10 Gb is the default USE_SHARED_SOCKET is TRUE.
This problem can also be solved by enabling the firewall to access all ports on the Database Host. However, such a solution is often rejected by security personnel.
The problem caused by the firewall is more complicated.
An application often reports a ORA-3113 error, checking that the ORA-3113 comes from a db link to the database. For the convenience of the following description, the database directly connected to the application is called DB_A, and the peer database connected by DB_A through the db link is called DB_ B. No trace or log is found on the DB_ B host. The SQL statement executed by the application is also a very simple SELECT statement, and the returned data volume is not large. However, the wrong statement is not a fixed SQL statement. ORA-3113 error stack trace on the database DB_A connected to the application, and no value is found.
There are many causes of ORA-3113 errors. You can refer to the ITPUB on a post "ORA-03113 Error Analysis".
In this ORA-3113 Error Problem, database DB_ B does not have any logs, one of the most likely causes of this situation is that the Server Process on DB_ B has been aborted, however, it is not because an error occurs during SQL Execution, such as being killed or Network Connection interrupted. The cause of KILL is first ruled out because many errors occur every day. Ask the maintenance personnel, saying that no KILL operation has been performed. The biggest possibility is the network. Following this clue, we checked the network connection of DB_ B with the netstat-na command on DB_A, compared with the session in v $ session in DB_ B, and found that DB_A was connected to the database session of DB_ B, this is much less than the number of network connections seen by the netstat command.
This is a major breakthrough. The first thing to doubt is the firewall. The firewall causes many Oracle connection exceptions. The DBA accessing the database. The two databases use the Cisco Firewall in different service networks. Ask the firewall Maintenance Engineer to check the firewall settings and find that the firewall has set TCP connection timeout (the firewall engineer told me this term, in fact, I personally think the literal meaning of this term is quite different from its actual function.) set it to 1 hour. That is, for all TCP connections through the firewall, if there is no activity within one hour, the firewall will be removed, which will lead to connection interruption. When the connection is removed, no data is sent to both ends of the connection to notify the connection that the connection has been removed.
When a faulty business system is used during normal working hours, DB_A may generate dozens of connections to DB_ B at the highest level. However, during off-peak hours or one night, the firewall will remove most or even all connections. In the next use, the application selects a session in DB_A through the connection pool. The db link of this session has been connected to DB_ B, but the network connection has been removed by the firewall, but this session is unknown, the connection is still considered valid, and a ORA-3113 error occurs when the result tries to submit the SQL statement to DB_ B.
In fact, many applications that use network connections can use the KeepAlive feature to maintain the activity of TCP connections. When a connection is opened, use the setsockopt function to set the socket to SO_KEEPALIVE. In this way, at the OS layer, if a TCP connection is not active within the specified time, A test package is sent to the connected peer to check whether the connected peer exists. If this time is less than the "timeout" set in the firewall, the firewall will check that there is still data in the connection and will not disconnect the connection.
The settings of the keep alive in the operating system. Different systems have different settings. For example, in Linux, set net. ipv4.tcp _ keepalive_time = 120 in sysctl to 120 seconds, that is, 2 minutes. In AIX, use the no command to set the tcp_keepidle parameter to 240, indicating that the detection time is 120 seconds. Note that the unit of this parameter in AIX is 1/2 seconds, while in Linux it is 1 second.
Fortunately, Oracle provides a similar mechanism. That is, Dead Conneciton Detection ). Add the following line to the $ ORACLE_HOME/network/admin/sqlnet. ora file:
Expire_time = NNN
Here NNN is the number of minutes. When the session IDLE time exceeds the specified time, the Oracle database checks whether the session peer (that is, the client) is valid. Avoid session persistence because the client exits abnormally.
Therefore, we can solve the ORA-3113 problem mentioned above by setting expire_time in the sqlnet. ora file in the DB_ B database.
Author: zhl_shanghai