A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Fault diagnosis tools
In the previous article, we gave a brief overview of various support resources available in the WebSphere environment. These resources can help you find information and obtain help, it can even implement the main functions of the WebSphere Fault Diagnosis tool. In this article, we will further discuss some practical functions and tools that can be used for WebSphere Application Server fault diagnosis.
The following basic statements are generally applicable to the core part of most Fault Diagnosis exercises:
When the system is not running properly, I need to obtain information about the internal conditions of the system and analyze this information to determine the cause of the problem.
Therefore, our development team has invested a lot of effort to improve the mechanism for obtaining and processing this information. Let's take a look at some of the main fields.
Logging and tracking
When you diagnose a fault in WebSphere Application Server, the logging tool may be the first problem identification feature you use. This toolset provides users and IBM support with a deep understanding of runtime capabilities that are required to identify basic issues.
The WebSphere Application Server logging infrastructure is based on the standard Java logging infrastructure (Java. util. Logging. In a typical WebSphere Application Server configuration, the logging is set to write common and severe log messages into two files, namely, systemout. log and systemerr. log.
Important tools used to view log messages include WebSphere Application ServerManagement ConsoleFault diagnosis panel andLog and Trace Analyzer(LTA ).
The Management Console is relatively simple among the two tools. The console provides a simple tool to view local or remote log files in the troubleshooting menu. Although it may lack the advanced features provided by more complex tools, you can use it to quickly view captured logs and track them.
On the other hand, LTA is a very useful and friendly log interface that provides a powerful log association function for multiple products and servers. Simply put, it provides a simple way to view logs at a higher level. LTA adopts the common base events standard, which provides a common format for messages from different products. LTA is part of the Autonomic Computing toolkit. A version of LTA is also provided in the WebSphere Application Server toolkit bundled with WebSphere Application Server.
The same underlying logging infrastructure in WebSphere Application Server Runtime is also used for tracking. The main difference between the two lies in the purpose:
LogsGenerally, it is only used to report important events in the life cycle of the system. Logs are enabled by default, with minimal performance overhead. Log messages are generally translated into local languages for reference by end users and administrators.
TrackingIt can provide extremely detailed information about the event sequence in the system. The trace function is generally disabled by default. Using these features will incur huge performance overhead (depending on the number of trails selected ). Generally, tracing messages are displayed in English only. They are used by technical users who specialize in fault diagnosis.
Although LTA can also be used to check tracking files, some engineers still choose to use dedicated traceanalyzer tools. This tool provides a set of additional functions that are particularly useful for viewing and Learning complex trace files, such as filtering trace information by thread or component, and generating call stack information based on long-term traces, and checks the blank space in the timeline.
Other log-related tools include:
IBM Service Log:(Activity. Log) This tool combines all key messages on a specific WebSphere Application Server Node and contains extended service information that helps identify the problem.
JMX-based monitoring:In addition to simple files, most key WebSphere Application Server events processed by the logging infrastructure are also exported as JMX events. Therefore, various tools can be designed to remotely monitor and capture log information. The Tivoli Monitoring for Web infrastructure tool is an important example.
Special tracking and runtime check
WebSphere Application Server also provides several specific forms of runtime check and tracking to help diagnose some very specific common problems. There are two examples below:
Connection Leakage Detection:If the application fails to release it normally after using the database connection, this special trace can help you identify the cause of the problem.
Session crossover Detection:This special trace and check is used to detect certain scenarios where, due to defects in the application or runtime or handling errors, information from an http session is provided by another user's HTTP session.
To enable these tools, you only need to trace a specific component or set a specific custom attribute. When there are obvious requirements, similar tools for other common problems may be added to the product.
First failed Data Capture
First failed data capture (ffdc)Is a tool built in the runtime of Websphere Application Server, which attempts to automatically capture and save key information in the case of possible exceptions. Because many problems encountered by WebSphere Application Server are related to some Jave exceptions, ffdc monitors all exceptions caused by server operations. Once an exception is thrown, a real-time check is performed to determine whether the exception is unexpected or part of a possible problem. If yes, ffdc will write a record (ffdc accident record) to the file, which contains the stack trace and the environment in which exceptions occur, and a short dump (optional) for the component Status of the server that generates this exception ). You can check the ffdc accident records later to learn more about the situation.
Information captured by ffdc can help diagnose multiple problems (theoretically, it can diagnose any problems closely related to a specific exception ). However, in fact, the interpretation of ffdc accident records may be very challenging, mainly because it is difficult to predict which exceptions are benign and which ones will become the key to diagnosis. Therefore, ffdc tools tend to generate a large number of accident records. (Capturing a large but unnecessary record is better than not capturing a required record .) In future articles, we will provide tutorials on how to configure and use the ffdc tool.
Diagnostic providerIs a new tool introduced in WebSphere Application Server v6.1. If you suspect that a problem has occurred, it allows you to selectively query a specific component in the application server, obtain detailed information about the component status. The diagnostic provider of each component can be used to start the self-check of the component to obtain the static configuration information dump related to the component and the runtime status dump of the component. You can further configure the diagnostic data volume for the runtime status captured by each diagnostic provider to manage performance overhead. You can control diagnostic provider through the console or the wsadmin command line tool.
In addition, several logs and error messages generated by WebSphere Application Server Runtime now contain a diagnostic provider ID to uniquely identify the diagnostic provider located at the error report source. You can then query the specific diagnostic provider to obtain more information and diagnose the cause of the error.
In the current version (Version 6.1), The Connection Manager component, the webcontainer component, and the system management component, as well as the performance and diagnostic Adviser discussed later) there are dedicated diagnostic providers. More Diagnostic providers and a growing set of tools are expected to be implemented in future WebSphere Application Server versions to take advantage of the information they provide. Applications can also use the diagnostic provider framework to provide their own diagnostic data.
At its core, WebSphere Application Server is a JVM process first. Therefore, the idea of providing a series of diagnostic tools for all JVM processes is quite natural. In fact, some problems encountered by WebSphere Application Server users are first manifested at the JVM level, such as insufficient memory and crashes.
Verbosegc logProbably the most common JVM diagnostic type. It shows the sequence of garbage collection cycles during JVM survival. As an initial auxiliary tool for determining the problem, it is often of immeasurable value. It is used to detect and diagnose all types of memory allocation exceptions in the JVM, such as memory leaks, fragments, and GC-related performance issues. Provided by IBM Support AssistantPmatCurrently, it is a basic tool to help analyze verbosegc log files.
Thread dumpIt is also a very common JVM diagnostic type. Thread dump (also known as javacore) can be triggered Based on the Administrator's request, or automatically triggered when JVM encounters certain special circumstances. A thread dump is a text file that contains a relatively short snapshot of the key aspect of the JVM state. The most common part of this snapshot is the list of active threads in the JVM. The thread dump is also named. The most common use of thread dump is to diagnose the cause of suspension, crash, or high CPU usage in the JVM. Thread dump is a relatively short text file, which can be checked using a simple text editor. However, it is often more effective to use a special tool to parse and organize content, automatically detect and highlight key information and exceptions. Currently, there are two main tools used for this purpose: threadanalyzer provided in IBM Support Assistant and thread and monitor dump analyzer provided on alphaWorks.
Heap dumpIs another form of dump generated by JVM, which can be generated on demand or automatically generated when special conditions are met. Generally, a heap dump is a large file that contains a list of all objects in the current JVM heap. It is used for in-depth analysis in case of insufficient memory. For example, analysts can find out which objects occupy the most space in the heap and which objects are being scaled up. Because Heap Storage is a large file, manual check is impractical.Memory Dump diagnostic for JavaThe tool (mdd4j) can be found in IBM Support Assistant, which is currently the main tool for executing such analysis.
The third type of JVM dump isSystem dumpOr a simpleCore files. This is the most overhead, but also the most complete dump method. It is a huge binary file that reflects all JVM content: Every Java object and its fields, every thread, every memory area, and so on. The initial purpose of a system dump is to help diagnose crashes, hangs, or complex memory allocations when other types of dump are insufficient or cannot be generated. However, because the system dump is complete, it can also be used to obtain information about the current status of Websphere Application Server Runtime, and even the execution information of applications during the runtime. It is expected that system dumping will be used more in this respect in the future. There are relatively few WebSphere Application Server JVM system dump Content Check tools available outside IBM. Therefore, the system dump should be sent to IBM support for in-depth analysis. However, IBM recently introducedDiagnostic tooling framework for Java (dtfj)The new technology will make it easy to develop the system dump check tool. Dtfj-based tools are expected to be widely used in the future.
Finally, JVM provides its ownJVM Tracing ToolDifferent from the WebSphere tracing tool, it provides a tracing tool at the invocation level of a single Java method and the internal event level of JVM implementation operations. This type of tracking is most commonly used for internal JVM issue diagnosis and occasionally used to diagnose problems at the WebSphere Application Server level. However, method-level tracking is useful for WebSphere tracking. We plan to expand its usage and write relevant documents in the near future.
The Java diagnostics guide is a major information source that provides information related to various types of JVM diagnostic tools and corresponding dump generation methods. Each of these tools has a document in its own format.
Although the basic functions of various performance-related tools are to monitor, measure, and optimize system performance, they also provide an important mechanism to gain an in-depth understanding of the internal status of the application server, this has immeasurable value for diagnosing various problems (whether or not these problems are directly related to performance ).
The WebSphere Application Server Runtime includes two types of performance tools:
Performance monitoring infrastructure(PMI) provides counters for multiple statistics that reflect internal server operations, such as the number of requests processed per second in each servlet or EJB, average response time, and various resources (such as threads and database connections) and so on. PMI is a dedicated WebSphere tool that provides detailed information about WebSphere Application Server and other related products.
Request MeasurementA mechanism is provided to track the execution flow of a single request when it passes through the system, and to measure the processing time of each step in the execution flow. The information in the request measurement tool can use the standardApplication request Measurement)Infrastructure access. Request measurement can be obtained from multiple products, which makes it possible to track an end-to-end request through a complex system, even if the process involves multiple different server components and layers.
The information provided by these two tools is useful for common fault diagnosis:
Problem isolation:By observing which subsystems or components in the system are active or receiving a specific processing request, we can often trigger a disconnection. when the request is no longer flowing or cannot be completed normally, which component is faulty.
Problem identification:We compare various statistics provided by PMI or request measurement with normal values in a well-running system, and find some specific problems, which are manifested as exceptions in a specific statistical item, for example, resource utilization is too high and overflow occurs.
Both PMI and request measurement information can be exported through public APIs to make use of this information by dedicated or third-party tools. In the WebSphere Application Server consoleTivoli performance ViewerIs a main tool used only to check PMI information. IBM Tivoli Composite Application Manager family of tools (itcam) provides a more comprehensive platform to work with performance diagnostic methods, including PMI, RM, and other technologies, applicable to multiple products and application environments.
Monitoring and detection problems
WebSphere Application Server also contains several important tools to monitor and detect problems when a problem occurs (rather than after the event. If your running application is stable and in good state, you may not be aware that these tools are running, but if there is a problem, you will find that these tools generate alerts, including:
Hanging thread detection toolWhen a thread completes a request for too long, it will automatically issue a warning to help diagnose performance problems such as suspension or slowdown. It provides a hook implanted into the runtime of Websphere Application Server by specifying a timeout length (called the suspension threshold) for the thread ). If the application running time exceeds the specified suspension threshold, the tool sends a notification that the thread may have been suspended.
Performance and diagnosis ConsultantMonitors the system in the background and provides suggestions on setting specific WebSphere Application Server Runtime components or JVM settings. This tool provides various types of recommendations, including recommendations on ORB and web pool settings, session settings, and memory leaks, it even includes recommendations for diagnosing data sources. Each consultant can be enabled or disabled on the WebSphere Application Server console.
Investigate problems related to a specific subsystem
The tools described above are widely used in various situations when WebSphere Application Server is used. In addition, dedicated tool sets for certain types of problems or subsystems are growing. For example:
System Management Configuration ValidationThe tool can perform automatic checks to detect inconsistencies and errors in complex XML file sets (the file contains the complete WebSphere Application Server System Configuration. Due to frequent run-time security checks, this error rarely occurs in products of recent versions. However, undiscovered product defects and unexpected events (such as crashes) occurred during configuration operations ), or operation errors that occur during configuration may still cause this error to occur suddenly. This tool has been embedded in the runtime of Websphere Application Server and can be called on the Management Console (in the fault diagnosis panel ).
DumpnamespaceThe tool provides a simple dump of the content of the JNDI name tree, which is visible to applications on a specific server. This tool is usually used to help identify certain problems. These problems are caused by incorrect JNDI resource configuration on the server or incorrect access to these JNDI resources caused by application code. The dumpnamespace tool is an independent program under the bin directory installed by WebSphere Application Server.
Class Loader ViewerAllows administrators to understand the internal situation of the class loading subsystem on the application server, and sometimes to view the complex configurations of the application server. The Class Loader viewer solves many problems related to class loading. The most common example is classnotfoundexception. The Class Loader viewer is bundled in WebSphere Application Server and can be started in the troubleshooting menu on the Management Console.
Tools to solve Installation Problems
There are several tools used to solve specific installation problems, each of which has a specific purpose:
Installation verification toolAllows you to perform a simple test on a single summary of a specific installation to check the basic running status of the WebSphere Application Server summary. This tool is often used after the product is installed to ensure that all content has been correctly installed.
You can useInstallation verification utilityIt allows you to understand the file-level changes that occur during the system lifecycle. This utility also provides a feature that helps IBM support determine whether the Supported file sets have been correctly installed with WebSphere Application Server and are in the correct position.
In addition, WebSphere Application Server includes other auxiliary tools for problem identification to help you install the service pack or repair package, and determine other installed repair assembly:
Update Installer:Allows you to install different service packs or individual fixes to WebSphere Application Server.
Versioninfo, historyinfoAndGenhistoryreport:These tools allow you to query the WebSphere Application Server installation to determine the software level found or previously installed. Genhistoryreport displays the content in HTML format.
All these tools and commands are bundled in the standard WebSphere Application Server installation.
Debugger and profile analysis tools
Debugging and profiling capabilities are often of high value for application support processes. WebSphere Application Server provides these tools through the JVM or the jvmpi (or jvmti) interface of the JVM. WebSphere Application Server System Management allows you to easily set appropriate JVM parameters to enable these tools through the WebSphere Application Server console or the wsadmin script.
Rational and other eclipse-based development tools, includingRational Application DeveloperAndWebSphere Application Server Toolkit, All have powerful debuggers and profile analysis tools connected to them. You may consider usingPerformance inspectorTool set, which provides various tools to extract runtime performance information from JVM and analyze it, and use the same basic interface. These tools can be downloaded from alphaWorks (for Windows) or SourceForge (for Linux.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service