A case of WebSphere service failure caused by Automatic update and restart of Windows Server

Source: Internet
Author: User
Tags log log

Recently, the company purchased two Windows Server R2 servers to provide Web services, a machine installed ihs+dm+was8.5 cluster, B machine installed ORACLE11GR2 for data storage, both machines can be connected to the Internet.

Service deployment overnight deployment, testing without any problems, the morning users call feedback does not normally visit the site, remote login found that the IHS+DM service is normal, but the cluster does not start, view Task Manager found no nodeagent and the server in the cluster process, Start the cluster manually after starting Nodeagent, two servers start normally, and then serve normally. At the time, it was suspected that the server was not restarted, and it was a problem with the program, but there was not a continuous follow-up on the matter at hand, but the same thing happened again the next day, when I woke up in the morning and found the service inaccessible. Can't ignore it this time ... After you have collected the relevant logs, start troubleshooting by manually starting nodeagent and clustering and providing services.

1. Check the WebSphere server logs

Check the cluster of server SystemOut.log log, found at 3:15 suddenly burst the following log:

[ --3- A 3: the: -:482CST] 0000004e Peer I odcf8534i: Removed neighbor ip=192.168.1.8udp=11011tcp=11012ID=A0AFD7F939EF4C971FE6825780126B1741B2F9FF version=0; cellname=win-ru03cb21qgacell01;bridgedcells=[];structuredgateway=false;p roperties={inodc=1, epoch=1458522523691, member_startup_time=1458522519269, Membername=win-ru03cb21qgacell01\win-ru03cb21qganode01\appsrv02, member_version=4}, the neighbor set is now2nodes0ip=192.168.1.8udp=11008tcp=11007Id=f271d5e15b5f3696eb6b30d9ef41532f9c5a81e8 version=0; cellname=win-ru03cb21qgacell01;bridgedcells=[];structuredgateway=true;p roperties={inodc=1, epoch=1458522483936, member_startup_time=1458522480920, Membername=win-ru03cb21qgacell01\win-ru03cb21qganode01\nodeagent, member_version=4}1ip=192.168.1.8udp=11005tcp=11006ID=63A7EFDDBD567D67083EFB4FC6A7727DD79C4C32 version=0; cellname=win-ru03cb21qgacell01;bridgedcells=[];structuredgateway=true;p roperties={inodc=1, member_version=4, epoch=1458503412906, odc_publisher_only=false, member_startup_time=1458503408859, membername=win-ru03cb21qgacell01\win-Ru03cb21qgacellmanager01\dmgr}. 

The remaining few lines of irrelevant information are out of silence.

2. Check the WebSphere DM log

Check DM SystemOut.log log found DM at night around 3:15 output service stop and start the log, but stop and start unexplained.

3. Check the WebSphere FFDC log

The log files in Dmgr's FFDC directory were sorted by date, and two log files were found on March 22;

Dmgr_exception.log.1458587814531.txt

Dmgr_25be7f2a_16.03.22_03.16.54.5782445606813376690951.txt

The following output is found:

[3-3: From:578 CST]     FFDC Exception:java.io.IOException SourceId:com.ibm.ws.management.discovery.DiscoveryService.sendQuery Probeid: 189 Reporter:[email protected]java.io.ioexception:admd0004e: Unable to open TCP socket: WIN-ru03cb21qga:7272. Check to see if the remote process has opened the port.

"Unable to open TCP sockets" is not a network problem, then what is the network problem? is the network not allowed to restart the service? is the operating system itself doing what? Then look at the operating system log according to the time point.

4. Check the logs in Windows Event Viewer

Click "Start--" management tool-"Event Viewer", under the Windows log node click on the "System", in the right side of the list of events according to the time of the event 3.15 to filter, finally found the problem;

The original cloud service provider's operating system is set at three o'clock in the morning system updates, system updates automatically after the system restart.

IHS+DM is started as a service by default under the Windows platform, can be started with the operating system, and Nodeagent is not a service and cannot be started with the operating system, which causes the service not to start properly.

A case of WebSphere service failure caused by Automatic update and restart of Windows Server

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.