Slow and hung request detection
When an application takes longer than expected to complete a request, the request can be logged as slow or hung. The Request Timing feature automatically detects slow and hung requests and provides detailed information to help you find the root cause of the delay.
The Request Timing feature identifies incoming requests that run longer than a slow request threshold. The Request Timing feature documents slow and hung requests as they occur, so you don’t need to re-create the problem to collect more data. In addition to the global settings for slow and hung request thresholds, you can set thresholds for specific servlets or JDBC requests. For more information, see the Request Timing feature.
A request can be slow because its code path length is long. Slow requests can also be caused if a resource, which the request depends on, is busy. Users need to know what actions occurred during the processing of that request, and the processing stage of the request.
When a request runs for longer than the slow request threshold, the Request Timing feature generates a warning message in the messages log file. The message includes details about the request and events that made up the request. The default threshold value is 10 seconds.
The following example shows the format of a log message:
TRAS0112W: Request <(Request ID)> has been running on thread <THREADID> for at least <DURATION>. The following stack trace shows that this thread is currently running. <STACK TRACE> <DURATION AND OPERATIONS Table>
Request ID can be used to search for log and trace messages that correspond to the request. If you use binary logging, you can search for log and trace entries with the same
requestID extension by using the binary log command. The
STACK TRACE indicates that the method is running. After the
STACK TRACE, the
DURATION AND OPERATIONS Table provides the format of the request that includes the duration and operation. The
Duration column indicates the time that is taken by the corresponding operation of the request. The plus sign (+) that is shown in the
Duration column indicates events within the request that are still running. The
Duration column also includes duration without the plus sign to indicate that the corresponding operation completed in the specified duration. The
Operation column shows the
EVENT TYPE and optional
CONTEXT INFO for the specified operation.
The message helps you analyze the reason behind the slow request, though it is difficult to determine whether the request is hung at that point or is still running slowly. To address this issue, three messages are displayed for any slow request at the interval specified by the slowRequestThreshold attribute. The stack trace data gives the information about the method that is running, while the request data gives details about the request, like the request ID and duration. The three different stack trace and request data help you better understand where the request is going wrong by looking into what the thread is running currently. No further warnings are logged about the request unless the duration of the request crosses the hung request detection threshold.
Requests that take longer than the hung request threshold are typically blocked by lack of access to a shared resource, for example, a section of code that allows only serial access and transfers information sequentially. Such longer requests are likely to be hung, and can be blocked for a long time or even indefinitely. One particular form of hung request, called a deadlock, occurs when two threads are blocked waiting on each other to return shared resources. For a hung request, users need to know where all the threads are at in their processing, and which locks the threads are waiting on.
If the request exceeds the hung request threshold, the Request Timing feature generates a warning message in the messages log file with the request details and events that make up the request. A series of three Java cores is recorded with a 1-minute delay among the cores. Further Java cores are created if new requests are detected.
The following log message sample shows the log messages for a request that crosses the hungRequestThreshold attribute. The default duration value is 10 min. The value that is configured in the following example is 4 min.
[WARNING ] TRAS0114W: Request AAA7WlpP7l7_AAAAAAAAAAA was running on thread 00000021 for at least 240001.015ms. The following table shows the events that have run during this request. Duration Operation 240001.754ms + websphere.servlet.service | TestWebApp | TestServlet?sleepTime=480000 0.095ms websphere.session.setAttribute | mCzBMyzMvAEnjMJJx9zQYIw | userID 0.007ms websphere.session.setAttribute | mCzBMyzMvAEnjMJJx9zQYIw | visitCount