Thursday, April 25, 2013

Avoiding and Recovering From Server Failure


A variety of events can lead to the failure of a server instance. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, and unexpected application behavior can all contribute to the failure of a server instance.
For high availability requirements, implement a clustered architecture to minimize the impact of failure events. However, even in a clustered environment, server instances may fail periodically, and it is important to be prepared for the recovery process.

Overload Protection

WebLogic Server 9.0 detects increases in system load that can affect application performance and stability, and allows administrators to configure failure prevention actions that occur automatically at predefined load thresholds.
Overload protection helps you avoid failures that result from unanticipated levels of application traffic or resource utilization.
WebLogic Server attempts to avoid failure when certain conditions occur:
  • Workload manager capacity is exceeded
  • HTTP session count increases to a predefined threshold value
  • Impending out of memory conditions
Failover for Clustered Services

You can increase the reliability and availability of your applications by hosting them on a WebLogic Server cluster. Clusterable services, such as EJBs and Web applications, can be deployed uniformly—on each Managed Server—in a cluster, so that if the server instance upon which a service is deployed fails, the service can fail over to another server in the cluster, without interruption in service or loss of state.

Automatic Restart for Failed Server Instances

WebLogic Server self-health monitoring improves the reliability and availability of server instances in a domain. Selected subsystems within each WebLogic Server instance monitor their health status based on criteria specific to the subsystem. For example, the JMS subsystem monitors the condition of the JMS thread pool while the core server subsystem monitors default and user-defined execute queue statistics. If an individual subsystem determines that it can no longer operate in a consistent and reliable manner, it registers its health state as "failed" with the host server.
Each WebLogic Server instance, in turn, checks the health state of its registered subsystems to determine its overall viability. If one or more of its critical subsystems have reached the FAILED state, the server instance marks its own health state FAILED to indicate that it cannot reliably host an application.
Using Node Manager, server self-health monitoring enables you to automatically reboot servers that have failed. This improves the overall reliability of a domain, and requires no direct intervention from an administrator.
Service-Level Migration

WebLogic Server supports migration of a individual singleton service as well as the server-level migration capability described in the previous section. Singleton services are services that run in a cluster but must run on only a single instance at any given time, such as JMS and the JTA transaction recovery system.
An administrator can migrate a JMS server or the JTS transaction recovery from one server instance to another in a cluster, either in response to a server failure or as part of regularly-scheduled maintenance. This capability improves the availability of pinned services in a cluster, because those services can be quickly restarted on a redundant server should the host server fail.
Managed Server Independence Mode

Managed Servers maintain a local copy of the domain configuration. When a Managed Server starts, it contacts its Administration Server to retrieve any changes to the domain configuration that were made since the Managed Server was last shut down. If a Managed Server cannot connect to the Administration Server during startup, it can use its locally cached configuration information—this is the configuration that was current at the time of the Managed Server's most recent shutdown. A Managed Server that starts up without contacting its Administration Server to check for configuration updates is running in Managed Server Independence (MSI) mode. By default, MSI mode is enabled.

Directory and File Backups for Failure Recovery

Recovery from the failure of a server instance requires access to the domain's configuration and security data. This section describes file backups that WebLogic Server performs automatically, and recommended backup procedures that an administrator should perform.
Recovery from the failure of a server instance requires access to the domain's configuration and security data. The WebLogic Security service stores its configuration data in the config.xml file, and also in an LDAP repository and other files.

Back Up Domain Configuration Directory

By default, an Administration Server stores a domain's configuration data in the domain_name\config directory, where domain_name is the root directory of the domain.
Back up the config directory to a secure location in case a failure of the Administration Server renders the original copy unavailable. If an Administration Server fails, you can copy the backup version to a different machine and restart the Administration Server on the new machine.
Each time a Managed Server starts up, it contacts the Administration Server and if there are changes in to the domain configuration, the Managed Server updates its local copy of the domain config directory.
During operation, if changes are made to the domain configuration, the Administration Server notifies the Managed Servers which update their local /config directory. So, each Managed Server always has an current copy of its configuration data cached locally.
Back Up LDAP Repository

The default Authentication, Authorization, Role Mapper, and Credential Mapper providers that are installed with WebLogic Server store their data in an LDAP server. Each WebLogic Server contains an embedded LDAP server. The Administration Server contains the master LDAP server which is replicated on all Managed Servers. If any of your security realms use these installed providers, you should maintain an up-to-date backup of the following directory tree:
domain_name\servers\adminServer\data\ldap
where domain_name is the domain's root directory and adminServer is the directory in which the Administration Server stores runtime and security data.
Each WebLogic Server has an LDAP directory, but you only need to back up the LDAP data on the Administration Server—the master LDAP server replicates the LDAP data from each Managed Server when updates to security data are made. WebLogic security providers cannot modify security data while the domain's Administration Server is unavailable. The LDAP repositories on Managed Servers are replicas and cannot be modified.
The ldap\ldapfiles subdirectory contains the data files for the LDAP server. The files in this directory contain user, group, group membership, policies, and role information. Other subdirectories under theldap directory contain LDAP server message logs and data about replicated LDAP servers.
Do not update the configuration of a security provider while a backup of LDAP data is in progress. If a change is made—for instance, if an administrator adds a user—while you are backing up the ldapdirectory tree, the backups in the ldapfiles subdirectory could become inconsistent. If this does occur, consistent, but potentially out-of-date, LDAP backups are available, because once a day, a server suspends write operations and creates its own backup of the LDAP data. It archives this backup in a ZIP file below the ldap\backup directory and then resumes write operations. This backup is guaranteed to be consistent, but it might not contain the latest security data.
For information about configuring the LDAP backup, see Configuring Backups for the Embedded LDAP Server in Administration Console Online Help.
Back Up SerializedSystemIni.dat and Security Certificates

Each server instance creates a file named SerializedSystemIni.dat and locates it in the /security directory. This file contains encrypted security data that must be present to boot the server. You must back up this file.
If you configured a server to use SSL, you must also back up the security certificates and keys. The location of these files is user-configurable.
Restarting an Administration Server on Another Machine

If a machine crash prevents you from restarting the Administration Server on the same machine, you can recover management of the running Managed Servers as follows:

  1. Install the WebLogic Server software on the new administration machine (if this has not already been done).
  1. Make your application files available to the new Administration Server by copying them from backups or by using a shared disk. Your application files should be available in the same relative location on the new file system as on the file system of the original Administration Server.
  1. Make your configuration and security data available to the new administration machine by copying them from backups or by using a shared disk.
  1. Restart the Administration Server on the new machine.
Managed Servers and Re-started Administration Server

If an Administration Server stops running while the Managed Servers in the domain continue to run, each Managed Server periodically attempts to reconnect to the Administration Server, at the interval specified the ServerMBean attribute AdminReconnectIntervalSecs. By default, AdminReconnectIntervalSecs is ten seconds.
When the Administration Server starts, it communicates with the Managed Servers and informs them that the Administration Server is now running on a different IP address.
Restarting a Failed Managed Server
Starting a Managed Server When the Administration Server Is Accessible

If the Administration Server is reachable by Managed Server that failed, you can:
  • Restart it manually or automatically using Node Manager—You must configure Node Manager and the Managed Server to support this behavior.
  • Start it manually with a command or script

Starting a Managed Server When the Administration Server Is Not Accessible

If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading its locally cached configuration data from the config directory. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode.
Understanding Managed Server Independence Mode

When a Managed Server starts, it tries to contact the Administration Server to retrieve its configuration information. If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading configuration and security files directly. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode. By default, MSI mode is enabled.

In Managed Server Independence mode, a Managed Server:
  • looks in its local config directory for config.xml—a replica of the domain's config.xml.
  • looks in its security directory for SerializedSystemIni.dat and for boot.properties, which contains an encrypted version of your username and password.

If config.xml and SerializedSystemIni.dat are not in these locations in the server's domain directory, you can copy them from the Administration Server's domain directory.
MSI Mode and Node Manager

You cannot use Node Manager to start a server instance in MSI mode, only to restart it. For a routine startup, Node Manager requires access to the Administration Server. If the Administration Server is unavailable, you must log onto a Managed Server's host machine to start the Managed Server.

MSI Mode and Managed Server Configuration Changes

If you start a Managed Server in MSI mode, you cannot change its configuration until it restores communication with the Administration Server.
Starting a Managed Server in MSI Mode

To start up a Managed Server in MSI mode:

  1. Ensure that the Managed Server's root directory contains the config subdirectory.
If the config directory does not exist, copy it from the Administration Server's root directory or from a backup to the Managed Server's root directory.
Note: Alternatively, you can use the -Dweblogic.RootDirectory=path startup option to specify a root directory that already contains these files.

  1. Start the Managed Server at the command line or using a script.

The Managed Server will run in MSI mode until it is contacted by its Administration Server.


No comments: