The previous blog post described a number of system call functions, which, combined with accumulated experience, can be used to analyze and solve problems in the actual work.
Problem: A Linux server in Hong Kong computer room, the above installed Sqlplus can not be connected to the Shenzhen company Room Oracle Server, the implementation of Sqlplus Xxxx/[email protected] time directly did not respond, after 2 minutes or so, reported:
Sql*plus:release 11.2.0.1.0 Production on Tue Apr 21 15:44:04 2015
Copyright (c) 1982, Oracle. All rights reserved.
ERROR:
Ora-12547:tns:lost Contact
Environment:
Client machine Address: 172.17.5.13 (redhat5.8+oracle11g client)
Service-side Machine address: 192.168.1.48 (solaris9+oracle10g server)
Tested:
1, tnsping rwdb Normal, this indicates that the client tnsnames configuration is not a problem
2, in other environment testing redhat5.8+oracle 11g Client Connection solaris9+oracle 10g server, success, no compatibility issues.
So, how do you continue to troubleshoot problems? Strace will come in handy at this time.
Execute on the client machine:
#strace-o strace.log sqlplus xxxx/[email protected]
When stuck, the Strace.log shows:
BRK (0x9196000) = 0x9196000
Socket (pf_inet, sock_stream, ipproto_ip) = 9
Fcntl (9, F_SETFL, o_rdonly| O_nonblock) = 0
Connect (9, {sa_family=af_inet, sin_port=htons (1521), sin_addr=inet_addr ("192.168.1.48")}, +) =-1 einprogress ( Operation now in progress)
Times ({tms_utime=1, tms_stime=1, tms_cutime=0, tms_cstime=0}) = 2186561939
Mmap (NULL, 528384, prot_read| Prot_write, map_private| Map_anonymous,-1, 0) = 0x2ae433057000
Poll ([{fd=9, events=pollout}], 1, 60000) = 1 ([{fd=9, revents=pollout}])
GetSockOpt (9, Sol_socket, So_error, [-132438043876392960], [4]) = 0
Fcntl (9, F_GETFL) = 0x802 (Flags o_rdwr| O_nonblock)
Fcntl (9, F_SETFL, O_RDWR) = 0
GetSockName (9, {sa_family=af_inet, sin_port=htons (53136), sin_addr=inet_addr ("172.17.5.13")}, [549755813904]) = 0
GetSockOpt (9, Sol_socket, SO_SNDBUF, [366915001648168960], [4]) = 0
GetSockOpt (9, Sol_socket, SO_RCVBUF, [366915001648239956], [4]) = 0
SetSockOpt (9, Sol_tcp, Tcp_nodelay, [1], 4) = 0
Fcntl (9, f_setfd, fd_cloexec) = 0
Rt_sigaction (Sigpipe, {0x1, ~[ill ABRT BUS FPE SEGV USR2 xcpu xfsz SYS rtmin rt_1], Sa_restorer|sa_restart|sa_siginfo, 0x3 5A240EBE0}, {SIG_DFL, [], 0}, 8) = 0
Write (9, "\0\324\0\0\1\0\0\0\1:\1,\fa \0\177\377\177\10\0\0\1\0\0\232\0:\0\0\10\0" ..., 212) = 212
Read (9,
Analysis:
Here the FD9 is a client Connection server (192.168.1.48) 1521 port socket, this socket is established is normal, but when write sent out the first message out of the card. Here we have reason to start to doubt the network problem, whether the network instability caused the sqlplus anomaly.
Sure enough, through the test, found in Hong Kong to Shenzhen, the VPN network environment is unstable, there are drops:
#ping-S 172.17.5.13 1024x768 >ping.log
14% Packet loss rate:
530 packets transmitted, 455 packets received, 14% packet loss
Round-trip (ms) Min/avg/max = 11/11/17
Contact the network administrator to switch routes from VPN to another leased line after the problem is resolved, the Sqlplus connection is successful. This case also shows that sqlplus to the network environment is very demanding, has always thought that even if there is a packet loss should be able to connect, but it is not true.
Summary:
Through the analysis of strace, it may not be possible to solve the problem at once, but also to consider other factors. This case, even if not analysis strace, through the test network, eventually can find the problem, but Strace can provide a good evidence and reference, here the socket communication problem is evidence, otherwise what is the network caused by the problem?
This article is from "Memory Fragment" blog, declined reprint!
Practical example: Using Strace to analyze database connection problems