Failing fast and recovering from errors

duration 20 minutes
Interactive Interactive

Use MicroProfile's Timeout and Retry policies to fail fast and recover when running into failures.

What you'll learn

Explore how to use MicroProfile Timeout and Retry policies from the Fault Tolerance feature to make your microservice more resilient to common failures like network problems or an IOException.

You'll start with a sample bank scenario and experience a network glitch. You'll then enable MicroProfile Fault Tolerance and use the Timeout policy to quickly fail requests that are hanging for too long. Finally, you'll explore the Retry policy and how it can help overcome temporary intermittent failures your application might experience.

When you arrive at the section about the Interactive timeout retry playground, you can modify the parameters of the MicroProfile Timeout and Retry policies in any combination. You can then run a sample scenario to see how the parameter values affect the results.

Background concepts

Use the MicroProfile Timeout and Retry policies to fail quickly and recover from brief intermittent issues. An application might experience these transient failures when a microservice is undeployed, a database is overloaded by queries, the network connection becomes unstable, or the site host has a brief downtime. Use Timeout and Retry together to help overcome or alleviate these kinds of transient failures.

Timeout

You might encounter a page that never finishes loading. Timeout helps by ending requests that have taken too long and are unlikely to return successfully.

Retry

Sometimes the underlying issues might be short lived. In these cases, rather than failing quickly on these transient failures, the Retry policy provides another chance for the request to succeed. Simply retrying the request might be all you need to do to make it succeed.

Example: bank scenario

Imagine that you’re developing a microservice that allows bank clients to check their transaction history. Occasionally, when customers try to view their transaction history, an unforeseen problem might prevent the data from loading and result in an indefinite page load.

Begin by requesting an online transaction history.

To retrieve the transaction history click Open page or enter the following URL into the sample browser: https://global-ebank.openliberty.io/transactions
Click Refresh in the browser to see that the page loads indefinitely, unable to retrieve the transaction history.

Enabling MicroProfile Fault Tolerance in Open Liberty

Microprofile Fault Tolerance allows microservices to handle unavailable services. It uses different policies such as Timeout and Retry to guide the execution and result of some logic. The MicroProfile Fault Tolerance 1.0 feature provides an environment to support resilient microservices through patterns that include timeout and retry. Enable the MicroProfile Fault Tolerance 1.0 feature in the server.xml file of the Open Liberty server.

In the editor, add the following element declaration to the featureManager element that is in the server.xml file.
Copied to clipboard
<feature>mpFaultTolerance-3.0</feature>

Alternatively, click Add.
Then, click Run on the editor menu pane.

Adding the @Timeout annotation

Now that you’ve seen the page load indefinitely due to a server-side problem, let's apply a Timeout policy to limit the time the request waits.

The @Timeout annotation specifies the time in milliseconds allowed for the request to finish. Optionally, you can configure the @Timeout annotation to change the default time unit from milliseconds. For example, @Timeout(value = 2, unit = ChronoUnit.SECONDS).

  • value: The time allowed for the request to finish. The integer value must be greater than or equal to 0. A value of 0 means that the Timeout policy is not applied. If the value is not specified, the default is 1000 ms.
  • unit: The unit of time for the value parameter as described by the java.time.temporal.ChronoUnit class. The default is ChronoUnit.MILLIS for milliseconds.
  • The java.time.temporal.ChronoUnit class defines a standard set of date period units, including NANOS, MICROS, MILLIS, SECONDS, MINUTES, and HOURS.

With a Timeout policy, a TimeoutException occurs when the timeout value has elapsed.

After you modify your server.xml file to include the Fault Tolerance feature, add a Timeout policy to the transaction history microservice.

Add the @Timeout annotation with a value of 2000 on line 7, before the showTransactions method.
Copied to clipboard
@Timeout(2000)

Alternatively, click Add.
This annotation limits the allowed time for the transaction history request to 2000 ms before a TimeoutException occurs.
Then, click Run on the editor menu pane. In the examples, a timeout of 2000 milliseconds is used for demonstration. However, a lower value might be more appropriate.
To retrieve the account transaction history click Open page or enter the following URL into the sample browser: https://global-ebank.openliberty.io/transactions
The initial request times out after 2 seconds.

Adding the @Retry annotation

A Retry policy helps an application recover from transient failures, like a temporary network glitch. The MicroProfile @Retry annotation defines when to reattempt an operation that has failed. The policy parameters include options that identify how many retry attempts are allowed, how long to continue to retry an operation, or how to set a retry or abort condition based on a specific exception.

A request to a service might fail for many different reasons. The default Retry policy initiates a retry for every java.lang.Exception. However, you can base a Retry policy on a specific exception by using the retryOn parameter.

  • retryOn: Specifies an exception class that triggers a retry. You can identify more than one exception as an array of values. For example,
    @Retry(retryOn = {RuntimeException.class, TimeoutException.class}).
    The default is java.lang.Exception.class.

After modifying the server.xml file to include the Fault Tolerance feature and adding a Timeout policy to the sample application, the transaction history microservice fails with a TimeoutException if the request does not complete within 2000 ms. Now, add a Retry policy to the code to retry the request to the transaction history microservice when a TimeoutException occurs.

Add the @Retry annotation on line 12, before the showTransactions method, to retry the service request only when a TimeoutException occurs.
Copied to clipboard
@Retry(retryOn = TimeoutException.class)

Alternatively, click Add.
Then, click Run on the editor menu pane.
With this Retry policy in place, the request to retrieve the account transaction history is automatically retried if a TimeoutException occurs. To demonstrate, click Open page or enter the following URL into the sample browser: https://global-ebank.openliberty.io/transactions
The initial request times out after 2 seconds. However, instead of posting an error message, the request is automatically retried, and you see the transaction history appear.

Adding retry limits

You can set limits on the number of retry attempts to prevent a busy service from becoming overloaded with retry requests. Numerous requests tie up resources, and the service takes longer to recover from its failing condition. The @Retry annotation has parameters that limit the number of retry attempts and that limit the amount of time a service can spend retrying.

  • maxRetries: The maximum number of retry requests. The integer value must be greater than or equal to -1. A value of -1 indicates to continue retrying indefinitely. The default is 3 requests.
  • maxDuration: The maximum amount of time to perform all requests, including the initial request and all retry attempts. After the duration is reached, no more retry attempts are initiated. This integer value must be greater than or equal to 0. The default is 180000 units, as defined by the durationUnit parameter.
  • durationUnit: The unit of time for the maxDuration parameter as described by the java.time.temporal.ChronoUnit class. The default is ChronoUnit.MILLIS for milliseconds.

    The java.time.temporal.ChronoUnit class defines a standard set of date period units, including NANOS, MICROS, MILLIS, SECONDS, MINUTES, and HOURS.

Update the @Retry annotation on line 13 to limit the number of retry attempts to 4 and the retry duration to 10 seconds.
Copied to clipboard
@Retry(retryOn = TimeoutException.class, maxRetries = 4, maxDuration = 10, durationUnit = ChronoUnit.SECONDS)

Alternatively, click Update.
This Retry policy initiates a retry request for each TimeoutException that occurs but limits retry attempts to no more than 4 retries. The operation aborts if the total duration of all retries lasts more than 10 seconds.
Then, click Run on the editor menu pane.
To retrieve the transaction history click Open page or enter the following URL into the sample browser: https://global-ebank.openliberty.io/transactions
The timeline shows that the initial request times out after 2 seconds. The Retry policy attempts the request to the transaction history microservice up to 4 times. Watch as the timeline shows these automatic retry attempts that immediately follow any timeouts that occur within 10 seconds or until the transaction history appears.

Configuring a delay

Sometimes you might want an application to wait before it issues a retry request. For example, if the failure is caused by a sudden spike in requests or a loss of connectivity, waiting might decrease the chance that a previous failure occurs. In these cases, you can define a delay in the Retry policy.

  • delay: The amount of time to wait before retrying a request. The value must be an integer greater than or equal to 0 and be less than the value for maxDuration. The default is 0.
  • delayUnit: The unit of time for the delay parameter as described by the java.time.temporal.ChronoUnit class. The default is ChronoUnit.MILLIS.

    The java.time.temporal.ChronoUnit class defines a standard set of date period units, including NANOS, MICROS, MILLIS, SECONDS, MINUTES, and HOURS.

    Update the @Retry annotation on lines 13 - 16 to include a delay of 200 ms.
    Copied to clipboard
    @Retry(retryOn = TimeoutException.class, maxRetries = 4, maxDuration = 10, durationUnit = ChronoUnit.SECONDS, delay = 200, delayUnit = ChronoUnit.MILLIS)

    Alternatively, click Update.
    This Retry policy indicates that a delay of 200 ms follows each TimeoutException before the application retries another request. However, no more than 4 retries occur, and retry attempts stop after 10 seconds.
    Then, click Run on the editor menu pane.
    To retrieve the transaction history click Open page or enter the following URL into the sample browser: https://global-ebank.openliberty.io/transactions
    The timeline shows how the Retry policy automatically initiates another request with a delay of 200 ms after each timeout of 2000 ms that occurs until the transaction history is displayed.

    Adding jitter to the delay

    The Retry policy also provides a way to add jitter to the delay. Jitter causes a slight variation in the delay time applied between retries. For example, a jitter of 200 ms randomly adds between -200 to 200 ms to each retry delay interval.

    • jitter: A random variation applied to the delay interval between retries. The integer value must be greater than or equal to 0. A value of 0 means that it is not set. If the specified jitter value is larger than the delay value, then the jitter is set to the delay value. The default is 200 units as defined by the jitterDelayUnit parameter.
    • jitterDelayUnit: The unit of time for the jitter parameter as described by the java.time.temporal.ChronoUnit class. The default is ChronoUnit.MILLIS.

      The java.time.temporal.ChronoUnit class defines a standard set of date period units, including NANOS, MICROS, MILLIS, SECONDS, MINUTES, and HOURS.

    Why would you want to add a jitter? Suppose, for example, that multiple applications are making requests to your microservice and causing it to become overloaded. By adding a jitter to the delay time, you allow the retry times of these requests to vary. Therefore, the cluster of retry requests are spread out over time, reducing the chance that the busy service continues to be overloaded.

    Add a jitter of 100 ms to the delay in the @Retry annotation on lines 13 - 17.
    Copied to clipboard
    @Retry(retryOn = TimeoutException.class, maxRetries = 4, maxDuration = 10, durationUnit = ChronoUnit.SECONDS, delay = 200, delayUnit = ChronoUnit.MILLIS, jitter = 100, jitterDelayUnit = ChronoUnit.MILLIS)

    Alternatively, click Add.
    The delay between retries is now spread out between 100 and 300 ms.
    Click Run on the editor menu pane.
    To retrieve the transaction history click Open page or enter the following URL into the sample browser: https://global-ebank.openliberty.io/transactions
    The Retry policy now includes a jitter of 100 ms. The timeline shows that the retry attempts occur somewhere between 100 and 300 ms (200 ms delay ± 100 ms jitter) after any timeout of 2000 ms that occurs within 10 seconds. To see the randomness of the jitter values, click the Refresh button in the sample browser.

    Specifying an abort failure condition

    You might decide that when certain failures occur, retry attempts should be aborted, and the service should fail immediately. For example, suppose this Retry policy was extended to include IOException.class. However, you want to prevent the service from retrying for conditions that you know the service cannot recover from, such as a FileNotFoundException, a subclass of IOException. The Retry policy has a parameter to identify these exceptions.

    • abortOn: Specifies an exception class that stops retries and fails immediately. Use the abortOn parameter to identify subclasses of the retryOn value that you know would not be successful no matter how many times the service retried. You can identify more than one exception as an array of values, such as in
      @Retry(abortOn = {FileNotFoundException.class, UTFDataFormatException}).
      There is no default value.
    The retryOn parameter has been updated in the editor to include IOException.class on line 14. To stop all retry attempts when a FileNotFoundException occurs, add a condition to the @Retry annotation.
    Copied to clipboard
    @Retry(retryOn = {TimeoutException.class, IOException.class}, maxRetries = 4, maxDuration = 10, durationUnit = ChronoUnit.SECONDS, delay = 200, delayUnit = ChronoUnit.MILLIS, jitter = 100, jitterDelayUnit = ChronoUnit.MILLIS, abortOn = FileNotFoundException.class)

    Alternatively, click Add.
    This Retry policy allows all timeouts and IOExceptions from the service to be retried up to 4 times within 10 seconds of the original timeout failure with a delay of 100 to 300 ms between retry attempts. However, it halts all retry attempts and immediately returns if a FileNotFoundException occurs.
    Click Run on the editor menu pane.
    To retrieve the transaction history click Open page or enter the following URL into the sample browser: https://global-ebank.openliberty.io/transactions
    This time the example simulates a FileNotFoundException, and the request to the transaction history microservice fails and displays an error message without any timeouts or retries.

    Interactive timeout retry playground

    Now that you learned about Timeout and Retry, you can explore the parameters in the @Timeout and @Retry annotations and see how they all work together.

    You can make changes to these parameters:

    @Timeout

    • value: The time allowed for the request to finish. The integer value must be greater than or equal to 0. A value of 0 means that the Timeout policy is not applied. If the value is not specified, the default is 1000 ms.

    @Retry

    • maxRetries: The maximum number of retry requests. The integer value must be greater than or equal to -1. A value of -1 indicates to continue retrying indefinitely. The default is 3 requests.
    • maxDuration: The maximum amount of time to perform all requests, including the initial request and all retry attempts. Once the duration is reached, no more retry attempts are initiated. This integer value must be greater than or equal to 0. The default is 180000 ms.
    • delay: The amount of time to wait before retrying a request. The value must be an integer greater than or equal to 0 and less than the value for maxDuration. The default is 0.
    • jitter: A random variation applied to the delay interval between retries. The integer value must be greater than or equal to 0. A value of 0 means that the jitter parameter is not set. If the specified jitter value is larger than the delay value, the jitter is set to the delay value. The default is 200 ms.

    Modify the parameters for the @Timeout and the @Retry annotations. This simulation does not support the retryOn, abortOn, and units parameters. The retryOn parameter defaults to java.lang.Exception.class to include all exceptions and cannot be altered. The value for @Timeout must be greater than 0. All values for the @Timeout annotation and the maxDuration, delay, and jitter parameters for the @Retry annotation default to milliseconds. Repeat the process as many times as you like.

    Click Run in the editor and observe the timeline. The simulation ends when the message "Your recent transactions are unavailable at this time. Please try again later." is shown in the browser.
    updated Code updated!
    updated Code updated!
    updated Code updated!
    updated Code updated!
    updated Code updated!
    updated Code updated!
    updated Code updated!
    updated Code updated!
    Timeout
    0
    10s
    Retry
    updated Code updated!
    updated Code updated!
    Timeout
    0
    10s
    Retry
    updated Code updated!
    updated Code updated!
    Timeout
    0
    10s
    Retry
    updated Code updated!
    updated Code updated!
    updated Code updated!
    updated Code updated!
    Timeout
    0
    10s
    Retry
    updated Code updated!

    Nice work! Where to next?

    Nice work! You learned about the benefits of the Microprofile Fault Tolerance feature Timeout and Retry policies and how to make your microservice more resilient to failures. You learned that the Timeout policy can be used to quickly fail requests that are hanging for too long. You learned to use the Retry policy to reattempt an operation that failed and to recover from the transient failures your application might experience.

    What did you think of this guide?

    Extreme Dislike Dislike Like Extreme Like

    What could make this guide better?

    Raise an issue to share feedback

    Create a pull request to contribute to this guide

    Need help?

    Ask a question on Stack Overflow

    Like Open Liberty? Star our repo on GitHub.

    Star