InstantOn limitations and known issues

In addition to the general InstantOn prerequisites, Open Liberty InstantOn is subject to certain limitations. For example, applications that must run early startup code or that rely on certain Liberty features might require modification to use InstantOn.

For more information about InstantOn prerequisites, see Runtime and host build system prerequisites.

The following sections describe the limitations and known issues with using Open Liberty InstantOn.

Jakarta Transactions configuration limitations

Open Liberty Transaction Manager support for InstantOn has limitations around configuration updates when the application process is restored. The configuration attributes for the Transaction Manager must remain constant between the InstantOn checkpoint and restore. This limitation is true only for the configuration attributes that are specified directly with the transaction server configuration element, for example, recoveryGroup and recoverIdentity. The values for these configuration attributes must not change between checkpoint and restore.

This limitation implies that transaction recovery in a cloud environment cannot work as designed because the recoverIdentity cannot be parameterized by something like the following transaction configuration example. This example gives a unique recoverIdentity for each instance of the application:

<transaction
  ...
  recoveryGroup="peer-group-name"
  recoveryIdentity="${HOSTNAME}${wlp.server.name}"
  ...
/>

Jakarta Transaction before checkpoint

Open Liberty InstantOn does not allow transactions to begin before a checkpoint is performed for the application process. This scenario is possible if the application has early startup code that attempts to start a transaction. Consider the following Servlet:

@WebServlet(urlPatterns = { "/StartupServlet" }, loadOnStartup = 1)
public class StartupServlet extends HttpServlet {
    @Override
    public void init() {
        UserTransaction ut = UserTransactionFactory.getUserTransaction();
        try {
            ut.begin();
            ...
            ut.commit();
        } catch (Exception e) {
            // something went wrong
        }
    }

}

This Servlet example uses the loadOnStartup = 1 attribute. When you use this attribute with the afterAppStart option, the servlet initializes before the checkpoint. The runtime detects this conflict and logs the following message:

[WARNING ] WTRN0155W: An application began or required a transaction during the server checkpoint request. The following stack trace for this thread was captured when the transaction was created:

This warning is followed by a stacktrace that helps identify the application code that is attempting to begin a transaction. The server then fails to checkpoint and the following error is logged:

WTRN0154E: The server checkpoint request failed because the transaction service is unable to begin a transaction.

You can avoid this failure by using the beforeAppStart option or by modifying the component not to use early startup code. In this example, that modification is to remove the loadOnStartup = 1 attribute.

Accessing MicroProfile Config too early

If an application has early startup code and you are using the afterAppStart option, it might get injected with a configuration value from MicroProfile Config before a checkpoint is performed for the application process. If such a configuration value changes at the time the application image container runs, the application might use the stale value that was set when the application process checkpoint was performed.

The Open Liberty runtime detects this situation and logs a warning message when the application container image is run that indicates that a configuration value is changed. The following example uses an example_config configuration key with a default value set to theDefault. When the checkpoint occurs, the environment configuration source is not available to populate MicroProfile configuration values. If this @Inject annotation of the configuration is contained in a CDI bean that is created and used before the checkpoint is performed, the value of theDefault is injected.

    @Inject
    @ConfigProperty(name = "example_config", defaultValue = "theDefault")
    String exampleConfig;

When the InstantOn application container image is run, the environment variable EXAMPLE_CONFIG can be used to provide an updated value. The runtime detects this value and logs the following message:

[WARNING ] CWWKC0651W: The MicroProfile configuration value for the key example_config has changed since the checkpoint action completed on the server. If the value of the key changes after the checkpoint action, the application might not use the updated value.

In this situation, use the beforeAppStart checkpoint option. Another option is to use a Dynamic ConfigSource. The previous example can be modified to use a dynamic ConfigSource by using the Provider<String> type for the exampleConfig variable:

    @Inject
    @ConfigProperty(name = "example_config", defaultValue = "theDefault")
    Provider<String> exampleConfig;

Each call to the get() method of the Provider<String> returns the current value of the ConfigProperty annotation. This behavior allows the application to access the updated configuration value when the application process is restored during the InstantOn application container run.

Injecting a DataSource too early

If an application has early startup code and you are using the afterAppStart option, it might get injected with DataSource before a checkpoint is performed for the application process. In a cloud environment, the configuration of the DataSource likely needs to change at the time the application image container is run. Consider the following Servlet example:

@WebServlet(urlPatterns = "/ExampleServlet", loadOnStartup = 1)
public class ExampleServlet extends HttpServlet {
    @Resource(shareable = false)
    private DataSource exampleDataSource;
    ...
}

This Servlet example uses the loadOnStartup = 1 attribute. When you are using the afterAppStart option, this attribute initializes the servlet before the checkpoint. The deployment information related to the DataSource might need to be configured when you deploy the application to the cloud. Consider the following Open Liberty server.xml configuration.

  <!-- these values are place holders so we don't have to have the env set before checkpoint -->
  <variable name="DB2_DBNAME" defaultValue="placeholder" />
  <variable name="DB2_HOSTNAME" defaultValue="placeholder" />
  <variable name="DB2_PASS" defaultValue="placeholder" />
  <variable name="DB2_PORT" defaultValue="45000" />
  <variable name="DB2_PORT_SECURE" defaultValue="45001" />
  <variable name="DB2_USER" defaultValue="placeholder" />


  <dataSource id="DefaultDataSource">
    <jdbcDriver libraryRef="DB2Lib"/>
    <properties.db2.jcc
      databaseName="${DB2_DBNAME}" serverName="${DB2_HOSTNAME}" portNumber="${DB2_PORT}"
      downgradeHoldCursorsUnderXa="true"/>
    <containerAuthData user="${DB2_USER}" password="${DB2_PASS}"/>
    <recoveryAuthData user="${DB2_USER}" password="${DB2_PASS}"/>
  </dataSource>

This configuration uses placeholder values for things like the database name, hostname, ports, user, and password. This configuration allows the values to be updated with environment variable values or other configuration mechanisms, as described in Configuring microservices running in Kubernetes. These configurations must not be hardcoded into an application image and must be able to be updated when you deploy the application to the cloud.

If an application is injected with a DataSource before the checkpoint and the configuration of the DataSource changes, the application is restarted when the InstantOn application container image is run with the updated configuration. You can avoid this scenario by using the beforeAppStart option or by modifying the component not to be early startup code. In this example, that modification is to remove the loadOnStartup = 1 attribute.

Using product extensions, user features, or features that are not supported by InstantOn

InstantOn supports only a subset of Open Liberty features, as described in Open Liberty InstantOn supported features. Any public features that are enabled outside of the supported set of features for InstantOn cause checkpoint to fail with an error message like the following example:

CWWKC0456E: A checkpoint cannot be taken because the following features configured in the server.xml file are not supported for checkpoint: [usr:exampleFeature-1.0]

This error occurs for any configured features that are not supported for InstantOn. This limitation includes Liberty product extension and Liberty user features.

Updating configuration with a bootstrap.properties file

When an InstantOn application container image is run, the bootstrap.properties file is not read. Values that must be able to be configured when you run an InstantOn application container image must come from alternative sources. For example, you might use environment variables or other configuration mechanisms, as described Configuring microservices running in Kubernetes.

Java SecurityManager is not supported

If Open Liberty is configured to run with the SecurityManager, InstantOn detects this configuration during a checkpoint and fails with the following message:

CWWKE0958E: The server checkpoint request failed because the websphere.java.security property was set in the bootstrap.properties file. This property enables the Java Security Manager and is not valid when a server checkpoint occurs.

Updating JVM options

InstantOn does not support changing the jvm.options when you restore the InstantOn application process. Any JVM options that are required to be set for the JVM process must be defined during the InstantOn container image build.

The IBM Semeru JVM does have limited support for setting JVM options on restore with the use of the OPENJ9_RESTORE_JAVA_OPTIONS environment variable. For more information, see the Java Checkpoint/Restore In Userspace (CRIU) support documentation.

SELinux limitations

If SELinux mode is set to enforcing, SELinux might prevent InstantOn from performing a checkpoint of the application process when you use the checkpoint.sh script in the image template Dockerfile or Containerfile. If the virt_sandbox_use_netlink SELinux setting is disabled, the required netlink Linux system calls are blocked. This block prevents InstantOn from performing a checkpoint of the application process during the container image build. Open Liberty InstantOn detects this limitation and logs the following messages:

CWWKE0962E: The server checkpoint request failed. The following output is from the CRIU /logs/checkpoint/checkpoint.log file that contains details on why the checkpoint failed.
Warn  (criu/kerndat.c:1103): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Error (criu/libnetlink.c:84): Can't send request message: Permission denied
..
Error (criu/cr-dump.c:2099): Dumping FAILED.
CWWKE0963E: The server checkpoint request failed because netlink system calls were unsuccessful. If SELinux is enabled in enforcing mode, netlink system calls might be blocked by the SELinux "virt_sandbox_use_netlink" policy setting. Either disable SELinux or enable the netlink system calls with the "setsebool virt_sandbox_use_netlink 1" command.

To work around this limitation, you can either enable the virt_sandbox_use_netlink SELinux setting with the setsebool virt_sandbox_use_netlink 1 command or disable SELinux enforcing mode. Another option to work around this issue is to use the three-step process to build the InstantOn image. The three-step process requires the use of a --privileged container that grants access to the netlink system calls to the running container that performs the application process checkpoint.

Yama Linux Security Module limitations

If Yama is configured with one of the following modes, InstantOn cannot checkpoint or restore the application process in running containers:

  • 2 - admin-only attach

  • 3 - no attach

When this configuration is present, the /logs/checkpoint/restore.log contains the following error:

Error (criu/arch/x86/kerndat.c:178): 32: ptrace(PTRACE_TRACEME) failed: Operation not permitted

For InstantOn checkpoint and restore to work, Yama must be configured with one of the following modes:

  • 0 - classic ptrace permissions

  • 1 - restricted ptrace

The following supported public cloud Kubernetes services have the default for Yama set to the 1 mode, which allows InstantOn to checkpoint and restore by default:

Access to Linux system calls

As described in Required Linux system calls, CRIU requires several Linux system calls to restore the application process. This requirement might require extra configuration to grant the required system calls to the running container when you use InstantOn.

The following examples are errors that are logged to the /logs/checkpoint/restore.log file when access to specific system calls is blocked.

Blocked clone3 system call
Error (criu/kerndat.c:1377): Unexpected error from clone3: Operation not permitted
Blocked to ptrace system call
Error (criu/arch/x86/kerndat.c:178): 28: ptrace(PTRACE_TRACEME) failed: Operation not permitted
Blocked to vmsplice system call
Error (criu/pipes.c:184): 0x4c11a: Error splicing data: Operation not permitted

The supported public cloud Kubernetes Service environments currently allow the required system calls used by CRIU by default. No additional configuration is required when you use the following cloud providers:

  • Amazon Elastic Kubernetes Service (EKS)

  • Azure Kubernetes Service (AKS)

Running without the necessary Linux capabilities

Errors occur during checkpoint and restore if the required Linux capabilities are not granted. If the required capabilities are not granted for checkpoint, then the following error occurs during the InstantOn container image build:

Can't exec criu swrk: Operation not permitted
Can't read request: Connection reset by peer
Can't receive response: Invalid argument
[ERROR   ] CWWKC0453E: The server checkpoint request failed with the following message: Could not dump the JVM processes, err=-70

The Operation not permitted message indicates that the required Linux capabilities are not granted. If you are using the checkpoint.sh script, the following error occurs during the RUN checkpoint.sh instruction:

Error: building at STEP "RUN checkpoint.sh afterAppStart": while running runtime: exit status 74

To avoid this error, grant the container image build the CHECKPOINT_RESTORE, SYS_PTRACE, and SETPCAP Linux capabilities. If you use the three-step process to build the container image, make sure the container that is running the checkpoint step is a --privileged container.

If the required capabilities are not granted for restore, the following error occurs when you try to run the InstantOn application container image:

/opt/ol/wlp/bin/server: line 1430: /opt/criu/criu: Operation not permitted
CWWKE0961I: Restoring the checkpoint server process failed. Check the /logs/checkpoint/restore.log log to determine why the checkpoint process was not restored. Launching the server without using the checkpoint image.

The Operation not permitted message is an indication that the required Linux capabilities are not granted for restore.

Supported processors

Currently, the only supported processor is X86-64/AMD64. Other processors are expected to be supported in later releases of Open Liberty InstantOn.