Wednesday Nov 04, 2015

Connection Leak Profiling for WLS 12.2.1 Datasource

This is the first of a series of three articles that describes enhancements to datasource profiling in WLS 12.2.1. These enhancements were requested by customers and Oracle support. I think they will be very useful in tracking down problems in the application.

The pre-12.2.1 connection leak diagnostic profiling option requires that the connection pool “Inactive Connection Timeout Seconds” attribute be set to a positive value in order to determine how long before an idle reserved connection is considered leaked. Once identified as being leaked, a connection is reclaimed and information about the reserving thread is written out to the diagnostics log. For applications that hold connections for long periods of time, false positives can result in application errors that complicate debugging. To address this concern and improve usability, two enhancements to connection leak profiling are available:

1. Connection leak profile records will be produced for all reserved connections when the connection pool reaches max capacity and a reserve request results in a PoolLimitSQLException error.

2. An optional Connection Leak Timeout Seconds attribute will be added to the datasource descriptor for use in determining when a connection is considered “leaked”. When an idle connection exceeds the timeout value a leak profile log message is written and the connection is left intact.

The existing connection leak profiling value (0x000004) must be set on the datasource connection pool ProfileType attribute bitmask to enable connection leak detection. Setting the ProfileConnectionLeakTimeoutSeconds attribute may be used in place of InactiveConnectionTimeoutSeconds for identifying potential connection leaks.

This is a WLST script to set the values.

# java weblogic.WLST prof.py
import sys, socket, os
hostname = socket.gethostname()
datasource='ds'
svr='myserver'
connect("weblogic","welcome1","t3://"+hostname+":7001")
# Edit the configuration to set the leak timeout
edit()
startEdit()
cd('/JDBCSystemResources/' + datasource + '/JDBCResource/' + datasource +
'/JDBCConnectionPoolParams/' + datasource )
cmo.setProfileConnectionLeakTimeoutSeconds(120) # set the connection leak timeout
cmo.setProfileType(0x000004) # turn on profiling
save()
activate()
exit()

This is what the console page looks like after it is set.  Note the profile type and timeout value are set on the Diagnostics tab for the datasource.

The existing leak detection diagnostic profiling log record format is used for leaks triggered by either the ProfileConnectionLeakTimeoutSeconds attribute or when pool capacity is exceeded. In either case a log record is generated only once for each reserved connection. If a connection is subsequently released to pool, re-reserved and leaked again, a new record will be generated. An example resource leak diagnostic log record is shown below.  The output can be reviewed in the console or by looking at the datasource profile output text file.

####<mydatasource> <WEBLOGIC.JDBC.CONN.LEAK> <Thu Apr 09 14:00:22 EDT 2015> <java.lang.Exception
at weblogic.jdbc.common.internal.ConnectionEnv.setup(ConnectionEnv.java:398)
at weblogic.common.resourcepool.ResourcePoolImpl.reserveResource(ResourcePoolImpl.java:365)
at weblogic.common.resourcepool.ResourcePoolImpl.reserveResource(ResourcePoolImpl.java:331)
at weblogic.jdbc.common.internal.ConnectionPool.reserve(ConnectionPool.java:568)
at weblogic.jdbc.common.internal.ConnectionPool.reserve(ConnectionPool.java:498)
at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:135)
at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:522)
at weblogic.jdbc.common.internal.RmiDataSource.getConnectionInternal(RmiDataSource.java:615)
at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:566)
at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:559)
...
> <autoCommit=true,enabled=true,isXA=false,isJTS=false,vendorID=100,connUsed=false,doInit=false,'null',destroyed=false,poolname=mydatasource,appname=null,moduleName=null,
connectTime=960,dirtyIsolationLevel=false,initialIsolationLevel=2,infected=false,lastSuccessfulConnectionUse=1428602415037,secondsToTrustAnIdlePoolConnection=10,
currentUser=...,currentThread=Thread[[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)',5,Pooled Threads],lastUser=null,currentError=null,currentErrorTimestamp=null,JDBC4Runtime=true,supportStatementPoolable=true,needRestoreClientInfo=false,defaultClientInfo={},
supportIsValid=true> <[partition-id: 0] [partition-name: DOMAIN] >

For applications that may have connection leaks but also have some valid long-running operations, you will now be able to scan through a list of connections that may be problems without interfering with normal application execution.

Tuesday Nov 03, 2015

Using Eclipse with WebLogic Server 12.2.1

With the installation of WebLogic Server 12.2.1 now including the Eclipse Network Installer, which enables developers to  download and install Eclipse including the specific features of interest, getting up and running with Eclipse and WebLogic Server has never been easier.

The Eclipse Network Installer presents developers with a guided interface to enable the custom installation of an Eclipse environment through the selection of an Eclipse version to be installed and which of the available capabilities are required - such as Java EE 7, Maven, Coherence, WebLogic, WLST, Cloud and Database tools amongst others.  It will then download the selected components and install them directly on the developers machine

Eclipse and the Oracle Enterprise Pack for Eclipse plugins continue to provide extensive support for WebLogic Server enabling it to be used to throughout the software lifecycle; from develop and test cycles with its Java EE dialogs,  assistants and deployment plugins; through to automation of configuration and provisioning of environments with the authoring, debugging and running of scripts using the WLST Script Editor and MBean palette.

The YouTube video WebLogic Server 12.2.1 - Developing with Eclipse provides a short demonstration on how to install Eclipse and the OEPE components using the new Network Installer that is bundled within the WebLogic Server installations.  It then shows the configuring of a new WebLogic Server 12.2.1 server target within Eclipse and finishes with importing a Maven project that contains a Java EE 7 example application that utilizes the new Batch API that is deployed to the server and called from a browser to run.

Monday Nov 02, 2015

Getting Started with the WebLogic Server 12.2.1 Developer Distribution

The new WebLogic Server 12.2.1 release continues down the the path of providing an installation that is smaller to download and able to be installed with a single operation, providing a quicker approach for developers to get started with the product.

New with the WebLogic Server 12.2.1 release is the use of the quick installer technology which packages the product into an executable jar file, which will silently install the product into a target directory.  Through the use of the quick installer, the installed product can now be patched using the standard Oracle patching utility - opatch - enabling developers to download and apply any patches as needed and to also enable a high degree of consistency with downstream testing and production environments.

Despite it's smaller distribution size the developer distribution delivers a full featured WebLogic Server including the rich administration console, the comprehensive scripting environment with WLST, the Configuration Wizard and Domain Builders, the Maven plugins and artifacts and of course all the new WebLogic Server features such as Java EE 7 support, MultiTenancy, Elastic Dynamic Clusters and more.

For a quick look at using the new developer distribution, creating a domain and accessing the administration console, check out the YouTube video: Getting Started with the Developer Distribution.

JMS 2.0 support in WebLogic Server 12.2.1

As part of its support for Java EE 7, WebLogic Server 12.2.1 supports version 2.0 of the JMS (Java Message Service) specification.

JMS 2.0 is the first update to the JMS specification since version 1.1 was released in 2002. One might think that an API that has remained unchanged for so long has grown moribund and unused. However, if you judge the success of an API standard by the number of different implementations, JMS is one of the most successful APIs around.

In JMS 2.0, the emphasis has been on catching up with the ease-of-use improvements that have been made to other enterprise Java technologies. While technologies such as Enterprise JavaBeans or Java persistence are now much simpler to use than they were a decade ago, JMS had remained unchanged with a successful, but rather verbose, API.

The single biggest change in JMS 2.0 is the introduction of a new simplified API for sending and receiving messages that reduces the amount of code a developer must write. For applications that run in WebLogic server itself, the new API also supports resource injection. This allows WebLogic to take care of the creation and management of JMS objects, simplifying the application even further.

Other changes in JMS 2.0 asynchronous send,  shared topic subscriptions and delivery delay. These were existing features WebLogic which are now available using an improved, standard, API.

To find out more about JMS 2.0, see this 15 minute audio-visual slide presentation.

Read these two OTN articles:

See also Understanding the Simplified API Programming Model in the product documentation

In a hurry? See Ten ways in which JMS 2.0 means writing less code.

WebLogic Scripting Tool (WLST) updates in 12.2.1

A number of updates have been implemented in Oracle WebLogic Server and Oracle Fusion Middleware 12.2.1 to simplify the usage of the WebLogic Scripting Tool (WLST), especially when multiple Oracle Fusion Middleware products are being used.    In his blog, Robert Patrick describes what we have done to unify the usage of WLST across the Oracle Fusion Middleware 12.2.1 product line.    This information will be very helpful to WLST users who want to better understand what was implemented in 12.2.1 and any implications for your environments.   

ZDT Patching; A Simple Case – Rolling Restart

To get started understanding ZDT Patching, let’s take a look at it in its simplest form, the rolling restart.  In many ways, this simple use case is the foundation for all of the other types of rollouts – Java Version, Oracle Patches, and Application Updates. Executing the rolling restart requires the coordinated and controlled shutdown of all of the managed servers in a domain or cluster while ensuring that service to the end-user is not interrupted, and none of their session data is lost.

The administrator can start a rolling restart by issuing the WLST command below:

rollingRestart(“Cluster1”)

In this case, the rolling restart will affect all managed servers in the cluster named “Cluster1”. This is called the target. The target can be a single cluster, a list of clusters, or the name of the domain.

When the command is entered, the WebLogic Admin Server will analyze the topology of the target and dynamically create a workflow (also called a rollout), consisting of every step that needs to be taken in order to gracefully shutdown and restart each managed server in the cluster, while ensuring that all sessions on that managed server are available to the other managed servers. The workflow will also ensure that all of the running apps on a managed server are fully ready to accept requests from the end-users before moving on to the next node. The rolling restart is complete once every managed server in the cluster has been restarted.

A diagram illustrating this process on a very simple topology is shown below.  In the diagram you can see that a node is taken offline (shown in red) and end-user requests that would have gone to that node are re-routed to active nodes.  Once the servers on the offline node have been restarted and their applications are again ready to receive requests, that node is added back to the pool of active nodes and the rolling restart moves on to the next node.

Animated image illustrating a rolling restart

 Illustration of a Rolling Restart Across a Cluster.

The rolling restart functionality was introduced based on customer feedback.  Some customers have a policy of preemptively restarting their managed servers in order to refresh the memory usage of applications running on top of them. With this feature we are greatly simplifying that tedious and time consuming process, and doing so in a way that doesn’t affect end-users.

For more information about Rolling Restarts with Zero Downtime Patching, view the documentation.

Friday Oct 30, 2015

Elasticity for Dynamic Clusters

Introducing Elasticity for Dynamic Clusters

WebLogic Server 12.1.2 introduced the concept of dynamic clusters, which are clusters where the Managed Server configurations are based off of a single, shared template.  It greatly simplified the configuration of clustered Managed Servers, and allows for dynamically assigning servers to machine resources and greater utilization of resources with minimal configuration.

In WebLogic Server 12.2.1, we build on the dynamic clusters concept to introduce elasticity to dynamic clusters, allowing them to be scaled up or down based on conditions identified by the user.  Scaling a cluster can be performed on-demand (interactively by the administrator), at a specific date or time, or based on performance as seen through various server metrics.

In this blog entry, we take a high level look at the different aspects of elastic dynamic clusters in WebLogic 12.2.1.0, the next piece in the puzzle for on-premise elasticity with WebLogic Server!  In subsequent blog entries, we will provide more detailed examinations of the different ways of achieving elasticity with dynamic clusters.

The WebLogic Server Elasticity Framework

The diagram below shows the different parts to the elasticity framework for WebLogic Server:

The Elastic Services Framework are a set of services residing within the Administration Server for a for WebLogic domain, and consists of

  • A new set of elastic properties on the DynamicServersMBean for dynamic clusters to establish the elastic boundaries and characteristics of the cluster
  • New capabilities in the WebLogic Diagnostics Framework (WLDF) to allow for the creation of automated elastic policies
  • A new "interceptors" framework to allow administrators to interact with scaling events for provisioning and database capacity checks
  • A set of internal services that perform the scaling
  • (Optional) integration with Oracle Traffic Director (OTD) 12c to notify it of changes in cluster membership and allow it to adapt the workload accordingly

Note that while tighter integration with OTD is possible in 12.2.1, if the OTD server pool is enabled for dynamic discovery, OTD will adapt as necessary to the set of available servers in the cluster.

Configuring Elasticity for Dynamic Clusters

To get started, when you're configuring a new dynamic cluster, or modifying an existing dynamic cluster, you'll want to leverage some new properties surfaced though the DynamicServersMBean for the cluster to set some elastic boundaries and control the elastic behavior of the cluster.

The new properties to be configured include

  • The starting dynamic cluster size
  • The minimum and maximum elastic sizes of the cluster
  • The "cool-off" period required between scaling events

There are several other properties regarding how to manage the shutdown of Managed Servers in the cluster, but the above settings control the boundaries of the cluster (by how many instances it can scale up or down), and how frequently scaling events can occur.  The Elastic Services Framework will allow the dynamic cluster to scale up to the specified maximum number of instances, or down to the minimum you allow.  

The cool-off period is a safety mechanism designed to prevent scaling events from occurring too frequently.  It should allow enough time for a scaling event to complete and for its effects to be felt on the dynamic cluster's performance characteristics.

Needless to say, the values for these settings should be chosen carefully and aligned with your cluster capacity planning!

Scaling Dynamic Clusters

Scaling of a dynamic cluster can be achieved through the following means:

  • On-demand through WebLogic Server Administration Console and WLST 
  • Using an automated calendar-based schedule utilizing WLDF policies and actions
  • Through automated WLDF policies based on performance metrics

On-Demand Scaling

WebLogic administrators have the ability to scale a dynamic cluster up or down on demand when needed:

Manual Scaling using the WebLogic Server Administration Console

In the console case, the administrator simply indicates the total number of desired running servers in the cluster, and the Console will interact with the Elastic Services Framework to scale the cluster up or down accordingly, within the boundaries of the dynamic cluster.

Automated Scaling

In addition to scaling a dynamic cluster on demand, WebLogic administrators can configure automated polices using the Polices & Actions feature (known in previous releases as the Watch & Notifications Framework) in WLDF.

Typically, automated scaling will consist of creating pairs of WLDF policies, one for scaling up a cluster, and one for scaling it down.  Each scaling policy consists of 

  • (Optionally) A policy (previously known as a "Watch Rule") expression
  • A schedule
  • A scaling action

To create an automated scaling policy, an administrator must

  • Configure a domain-level diagnostic system module and target it to the Administration Server
  • Configure a scale-up or scale-down action for a dynamic cluster within that WLDF module
  • Configure a policy and assign the scaling action

For more information you can consult the documentation for Configuring Policies and Actions.

Calendar Based Elastic Policies

In 12.2.1, WLDF introduces the ability for cron-style scheduling of policy evaluations.  Policies that monitor MBeans according to a specific schedule are called "scheduled" policies.  

A calendar based policy is a policy that unconditionally executes according to its schedule and executes any associated actions.   When combined with a scaling action, you can create a policy that can scale up or scale down a dynamic cluster at specific scheduled times.

Each scheduled policy type has its own schedule (as opposed to earlier releases, which were tied to a single evaluation frequency) which is configured in calendar time, and allowing the ability to create the schedule patterns such as (but not limited to):

  • Recurring interval based patterns (e.g., every 5th minute of the hour, or every 30th second of every minute)
  • Days-of-week or days-of-month (e.g., "every Mon/Wed/Fri at 8 AM", or "every 15th and 30th of every month")
  • Specific days and times within a year  (e.g., "December 26th at 8AM EST")

So, for example, an online retailer could configure a pair of policies around the Christmas holidays:

  • A "Black Friday" policy to scale up the necessary cluster(s) to meet increased shopping demand for the Christmas shopping season
  • Another policy to scale down the cluster(s) on December 25th when the Christmas shopping season is over

Performance-based Elastic Policies

In addition to calendar-based scheduling, in 12.2.1 WLDF provides the ability to create scaling policies based on performance conditions within a server ("server-scoped") or cluster ("cluster-scoped").  You can create a policy based on various run-time metrics supported by WebLogic Server.  WLDF also provides a set of pre-packaged, parameterized, out-of-the-box functions called "Smart Rules" to assist in creating performance-based policies.

Cluster-scoped Smart Rules allow you to look at trends in a performance metric across a cluster over a specified window of time and (when combined with scaling actions) scale up or down based on criteria that you specify.  Some examples of the metrics that are exposed through Smart Rules include:

  • Throughput (requests/second)
  • JVM Free heap percentage
  • Process CPU Load
  • Pending user requests
  • Idle threads count
  • Thread pool queue length

Additionally, WLDF provides some "generic" Smart Rules to allow you to create policies based on your own JMX-based metrics.  The full Smart Rule reference can be found here.

And, if a Smart Rule doesn't suit your needs, you can also craft your own policy expressions.  In 12.2.1, WLDF utilizes Java EL 3.0 as the policy expression language, and allows you to craft your own policy expressions based on JavaBean objects and functions (including Smart Rules!) that we provide out of the box.  

Provisioning and Safeguards with Elasticity

What if you need to add or remove virtual machines during the scaling process?  In WLS 12.2.1 you can participate in the scaling event utilizing script interceptors.  A script interceptor provides call-out hooks where you can supply custom shell scripts, or other executables, to be called when a scaling event happens on a cluster.  In this manner, you can write a script to interact with 3rd-party virtual machine hypervisors to add virtual machines prior to scaling up, or remove/reassign virtual machines after scaling down. 

WebLogic Server also provides administrators the ability to prevent overloading database capacity on a scale up event through the data source interceptor feature.  Data source interceptors allow you to set a value for the maximum number of connections allowed on a database, by associating a set of data source URLs and URL patterns with a maximum connections constraint.   When a scale up is requested on a cluster, the data source interceptor looks at what the new maximum connection requirements are for the cluster (with the additional server capacity), and if it looks like the scale up could lead to a database overload it rejects the scale up request.  While this still requires adequate capacity planning for your database utilization, it allows you to put in some sanity checks at run time to ensure that your database doesn't get overloaded by a cluster scale up.

Integration with Oracle Traffic Director

The elasticity framework also integrates with OTD through the WebLogic Server 12.2.1 life cycle management services.  When a scaling event occurs, the elasticity framework interacts with the life cycle management services to notify OTD of the scaling event so that OTD can update its routing tables accordingly.

In the event of a scale up event, for example, OTD is notified of the candidate servers and adjusts the server pool accordingly.  

In the case of a scale down, the life cycle management services notifies OTD which instances are going away.  OTD then halts sending new requests to the servers being scaled down, and routs new traffic to the remaining set of instances in the cluster, allowing the instances to be removed to be shutdown gracefully without losing any requests.

In order for OTD integration to be active, you must enable life cycle management services for the domain as documented here.

The Big Picture - Tying It All Together

The elasticity framework in 12.2.1 provides a lot of power and flexibility to manage the capacity in your on-premise dynamic clusters.  As part of your dynamic cluster capacity planning, you can use elasticity take into account your dynamic cluster's minimum, baseline, and peak capacity needs, and incorporate those settings into your dynamic servers configuration on the cluster.  Utilizing WLDF policies and actions, you can create automated policies to scale your cluster at times of known increased or decreased capacity, or to scale up or down based on cluster performance.

Through the use of script interceptors, you can interact with virtual machine pools to add or remove virtual machines during scaling, or perhaps even move shared VMs between clusters based on need.  You can also utilize the data source interceptor to prevent exceeding the capacity of any databases affected by scale up events.

And, when so configured, the Elasticity Framework can interact with OTD during scaling events to ensure that new and in-flight sessions are managed safely when adding or removing capacity in the dynamic cluster.

In future blogs (and maybe vlogs!) we'll go into some of the details on these features.  This is really just an overview the new features that are available to help our users implement elasticity with dynamic clusters.  We will follow on in the upcoming weeks and months with more detailed discussions and examples of how to utilize these powerful new features.

In the meantime, you can download a demonstration of policy based scaling with OTD integration from here, with documentation about how to set it up and run it here

Feel free to post any questions you have here, or email me directly.  In the meantime, download WebLogic Server 12.2.1 and start poking around! 

Resources

Policy Based Scaling demonstration files and documentation

WebLogic Server 12.2.1 Documentation

Configuring Elasticity for Dynamic Clusters in Oracle WebLogic Server

Configuring WLDF Policies and Actions

Dynamic Clusters Documentation

End-To-End Life Cycle Management and Configuring WebLogic Server MT: The Big Picture

Oracle Traffic Director (OTD) 12c

Java EL 3.0 Specification

Thursday Oct 29, 2015

Oracle WebLogic Server 12.2.1 Continuous Availability

New in Oracle WebLogic Server 12.2.1, Continuous Availability! Continuous Availability is an end to end solution for building Multi Data Center architectures. With Continuous Availability, applications running in multi data center environments can run in Active-Active environments continuously. When one site fails the other site will recover work for the failed site. During upgrades, applications can still run continuously with zero down time. What ties it all together is automated data site failover, reducing human error and risk during failover or switchover events.

Reduce Application Downtime

· WebLogic Zero Down Time Patching (ZDT): Automatically orchestrates the rollout of patches and updates, while avoiding downtime and session loss. Reduces risk, cost and session downtime by automating the rollout process. ZDT automatically retries on failure and rollsback on retry failure retry.  Please read the blog Zero Downtime Patching Released!  to learn more about this feature.

· WebLogic Multitenant Live Partition Migration: In Multitenant environments Live Partition Migration is the ability to move running partitions and resource groups from one cluster to another, without impacting application users. During upgrade, load balancing, or imminent failure partitions can be migrated with zero impact to applications.

· Coherence Persistence: Persists cache data and metadata to durable storage. In case of failure of one or more Coherence servers, or the entire cluster, the persisted data and metadata can be recovered.


Replicate State for Multi-Datacenter Deployments

· WebLogic Cross Domain XA Recovery: When a WebLogic Server domain fails in one site or the entire site comes down, the ability to automatically recover transactions in a domain on the surviving site. This allows automated transaction recovery in Active-Active Maximum Availability Architectures.

· Coherence Federated Caching: Distributes Coherence updates across distributed geographical sites with conflict resolution. The modes of replication are Active-Active with data being continuously replicated and providing applications access to their local cached data, Active-Passive with the passive site serving as backup of the active site, and Hub Spoke where the Hub replicates the cache data to distributed Spokes.


Operational Support for Site Failover

· Oracle Traffic Director (OTD): Fast, reliable, and scalable software load balancer that routes traffic to application servers and web servers in the network. Oracle Traffic Director is aware of server availability, when a server is added to the cluster OTD starts routing traffic to that server. OTD itself can be highly available either in Active-Active or Active-Passive mode.

· Oracle Site Guard: Provides end-to-end Disaster Recovery automation. Oracle Site Guard automates failover or switchover by starting stopping site components in a predetermined order, running scripts and post failover checks. Oracle Site guard minimizes down time and human error during failover or switchover.


Continuous Availability provides flexibility by supporting different topologies to meet application needs.

· Active-Active Application Tier with Active-Passive Database Tier

· Active-Passive Application Tier with Active-Passive Database Tier

· Active-Active Stretch Cluster with Active-Passive Database Tier


Continuous Availability provide applications with Maximum Availability and Productivity, Data Integrity and Recovery, Local Access to data in multi data center environments, Real Time access to data updates, Automated Failover and Switchover of sites, and Reduce Human Error and Risk during failover/switchover. Protect your applications from down time with Continuous Availability. If you want to learn more please read Continuous Availability documentation or watch the Continuous Availability video.

Dynamic Debug Patches in WebLogic Server 12.2.1

Introduction

Whether we like it not, we know that no software is perfect. Bugs happen, in spite of the best efforts by the developers. Worse, in many circumstances, they show up in unexpected ways. They can also be intermittent and hard to reproduce in some cases. In such cases, there is often not enough information even to understand the nature of the problem if the product is not sufficiently instrumented to reveal the underlying causes. Direct access to a customer's production environment is usually not an option. To get better understanding of the underlying problem, instrumented debug patches are usually created with the hope that running the applications with debug patches would provide more insight. This can be a trial and error method and can take several iterations before hitting upon the actual cause. The folks creating a debug patch (typically Support or Development teams in the software provider organization) and the customers running the application are almost always different groups, often in different companies. Thus, each iteration of creating a debug patch, providing it to the customer, getting it applied in customer environment and getting the results back can take substantial time. In turn, it can result in delays in problem resolution.

In addition, there can be other significant issues with deploying such debug patches. Applying patches in a Java EE environment requires bouncing servers and domains or at least redeploying applications. In mission critical deployments, it may not be possible to immediately apply patches. Moreover, when a server is bounced, its state is lost. Thus, vital failure data in memory may be lost. Also, an intermittent failure may not show up for a long time after restarting servers, making quick diagnosis difficult.

Dynamic Debug Patches

In the WebLogic Server 12.2.1 release, a new feature called Dynamic Debug Patches is introduced which aims to simplify the process of capturing diagnostic data for quicker problem resolution. With this feature, debug patches can be dynamically activated without having to restart servers or clusters or redeploy applications in a WebLogic domain. It leverages the JDK's instrumentation feature to hot-swap classes from specified debug patches using run-time WLST commands. With provided WLST commands (as described below), one or more debug patches can be activated within the scope of selected servers, clusters, partitions and applications. Since no server restart  or application redeployment is needed, associated logistical impediments are a non-issue. For one, since the applications and services continue to run, there is less of a barrier to activate these patches in production environments. Also, there is no loss of state. Thus, the instrumented code in newly activated debug patches have a better chance at revealing erroneous transient state and providing meaningful diagnostic information.

Prerequisites

Dynamic debug patches are ordinary jar files containing patched classes with additional instrumentation such as debug logging, print statements, etc. Typically, product development or support teams build these patch jars and make them available to system operations teams for their activation in the field. To make them available to the WebLogic Server's dynamic debug patches feature, system administrators need to copy them to a specific directory in a domain. By default, this directory is the debug_patches subdirectory under the domain root. However, it can be changed by reconfiguring the DebugPatchDirectory attribute of the DebugPatchesMBean.

Another requirement is to start the servers in the domain with the debugpatch instrumentation agent with the following option in the server's startup command. It is automatically added by the startup scripts created for WebLogic Server 12.2.1 domains.

-javaagent:${WL_HOME}/server/lib/debugpatch-agent.jar

Using Dynamic Debug Patches Feature

We will illustrate the use of this feature by activating and deactivating debug patches on a simple toy application.

The Application

We will use a minimalist toy web application which computes the factorial value of an input integer and returns it to the browser.

FactorialServlet.java:

package example;

import java.io.IOException;
import javax.servlet.GenericServlet;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.annotation.WebServlet;

import java.util.Map;
import java.util.HashMap;
import java.util.concurrent.ConcurrentHashMap;

/**
 * A trivial servlet: Returns addition of two numbers.
 */
@WebServlet(value="/factorial", name="factorial-servlet")
public class FactorialServlet extends GenericServlet {

  public void service(ServletRequest request, ServletResponse response)
      throws ServletException, IOException {
    String n = request.getParameter("n");
    System.out.println("FactorialServlet called for input=" + n);
    int result = Factorial.getInstance().factorial(n);
    response.getWriter().print("factorial(" + n + ") = " + result);
  }
}

The servlet delegates to the Factorial singleton to compute the factorial value. As an optimization, the Factorial class maintains a Map of previously computed values which serves as an illustration of retaining stateful information while activating or deactivating dynamic debug patches.

Factorial.java:

package example;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

class Factorial {
  private static final Factorial SINGLETON = new Factorial();
  private Map<String, Integer> map = new ConcurrentHashMap<String, Integer>();

  static Factorial getInstance() {
    return SINGLETON;
  }

  public int factorial(String n) {
    if (n == null) {
      throw new NumberFormatException("Invalid argument: " + n);
    }
    n = n.trim();
    Integer val = map.get(n);
    if (val == null) {
      int i = Integer.parseInt(n);
      if (i < 0)
        throw new NumberFormatException("Invalid argument: " + n);
      int fact = 1;
      while (i > 0) {
        fact *= i;
        i--;
      }
      val = new Integer(fact);
      map.put(n, val);
    }
    return val;
  }
}

Building and Deploying the Application

To build the factorial.war web application, create FactorialServlet.java and Factorial.java files as above in an empty directory. Build the application war files with following commands:

mkdir -p WEB-INF/classes
javac -d WEB-INF/classes FactorialServlet.java Factorial.java
jar cvf factorial.war WEB-INF

Deploy the application using WLST (or WebLogic Server Administration Console):

$MW_HOME/oracle_common/common/bin/wlst.sh
Initializing WebLogic Scripting Tool (WLST) ...
Welcome to WebLogic Server Administration Scripting Shell
Type help() for help on available commands
connect(username, password, adminUrl)  # e.g. connect('weblogic', 'weblogic', 't3://localhost:7001')
Connecting to t3://localhost:7001 with userid weblogic ...
Successfully connected to Admin Server "myserver" that belongs to domain "mydomain".
Warning: An insecure protocol was used to connect to the server.
To ensure on-the-wire security, the SSL port or Admin port should be used instead.
deploy('factorial', 'factorial.war', targets='myserver')

Note that in the illustration above, we targeted the application only to the administration server. It may have been targeted to other managed servers or clusters in real world. We will discuss how to activate and deactivate debug patches over multiple managed servers and clusters in a subsequent article.

Invoke the web application from your browser. For example: http://localhost:7001/factorial/factorial?n=4 You should see the result in the browser and a message in the server's stdout window such as:

FactorialServlet called for input=4

The Debug Patch

The application as written does not perform lot of logging and does not reveal much about its functioning. Perhaps there is a problem and we need more information when it executes. We can create a debug patch from the application code and provide it to the system administrator so he/she can activate it on the running server/application. Let us modify above code to put additional print statements for getting additional information (i.e. the lines with "MYDEBUG" below).

Updated (version 1)  Factorial.java:

class Factorial {
  private static final Factorial SINGLETON = new Factorial();
  private Map<String, Integer> map = new ConcurrentHashMap<String, Integer>();
  static Factorial getInstance() {
    return SINGLETON;
  }
  public int factorial(String n) {
    if (n == null) {
      throw new NumberFormatException("Invalid argument: " + n);
    }
    n = n.trim();
    Integer val = map.get(n);
    if (val == null) {
      int i = Integer.parseInt(n);
      if (i < 0)
        throw new NumberFormatException("Invalid argument: " + n);
      int fact = 1;
      while (i > 0) {
        fact *= i;
        i--;
      }
      val = new Integer(fact);
      System.out.println("MYDEBUG> saving factorial(" + n + ") = " + val);
      map.put(n, val);
    } else {
      System.out.println("MYDEBUG> returning saved factorial(" + n + ") = " + val);
    }
    return val;
  }
}

Build the debug patch jar. Note that this a plain jar file, that is, not built as an application archive. Also note that we need not compile the entire application (although it would not hurt). The debug patch jar should contain only the classes which have changed (in this case, Factorial.class).

mkdir patch_classes
javac -d patch_classes Factorial.java
jar cvf factorial_debug_01.jar -C patch_classes

Activating Debug Patches

In most real world scenarios, creators (developers) and activators (system administraors) of debug patches would be different people. For the purpose of illustration, we will wear multiple hats here. Assuming that we are using the default configuration for the location of the debug patches directory, create the debug_patches directory under the domain directory if it is not already there. Copy factorial_debug_01.jar debug patch jar into the debug_patches directory.  Connect to the server with WLST as above.

First, let us check which debug patches are available in the domain. This can be done with the listDebugPatches command.

Hint: To see available diagnostics commands, issue help('diagnostics') command. To get information on specific command, issue help(commandName), e.g. help('activateDebugPatch').

wls:/mydomain/serverConfig/> listDebugPatches()         
myserver:
Active Patches:
Available Patches:
    factorial_debug_01.jar
    app2.0_patch01.jar
    app2.0_patch02.jar 

factorial_debug_01.jar is the newly created debug patch. app2.0_patch01.jar and app2.0_patch02.jar were created in the past to investigate issues with some other application. The listing above shows no "active" patches since none have been activated so far.

Now, let us activate the debug patch with the activateDebugPatch command.

tasks=activateDebugPatch('factorial_debug_01.jar', app='factorial', target='myserver')
wls:/mydomain/serverConfig/> print tasks[0].status                                                                 
FINISHED
wls:/mydomain/serverConfig/> listDebugPatches()     
myserver:
Active Patches:
    factorial_debug_01.jar:app=factorial
Available Patches:
    factorial_debug_01.jar
    app2.0_patch01.jar
    app2.0_patch02.jar

The command returns an array of tasks which can be used to monitor the progress and status of activation command. Multiple managed servers and/or clusters can be specified as targets if applicable. Corresponding to each applicable target server, there is a task in the returned tasks array. The command can also be used to activate debug patches at the server and middleware level as well. Such patches will be typically created by Oracle Support as needed. Output of listDebugPatches() command above shows that factorial_debug_01.jar is now activated on application "factorial".

Now, let us send some requests to the application: http://localhost:7001/factorial/factorial?n=4 and http://localhost:7001/factorial/factorial?n=5

Server output:

FactorialServlet called for input=4
MYDEBUG> returning saved factorial(4) = 24
FactorialServlet called for input=5
MYDEBUG> saving factorial(5) = 120

Notice that for input=4, saved results were returned since the values were computed and saved in the map due to a prior request. Thus, the debug patch was activated without destroying existing state in the application. For input=5, values were not previously computed and saved, thus a different debug message showed up.

Activating Multiple Debug Patches

If needed, multiple patches which potentially overlap can be activated. A patch which is activated later would mask the effects of a previously activated patch if there is an overlap. Say, in the above case, we need more detailed information from the factorial() method as it is executing its inner loop. Let us create another debug patch, copy it to debug_patches directory and activate it.

Updated (version 2) Factorial.java:

class Factorial {
  private static final Factorial SINGLETON = new Factorial();
  private Map<String, Integer> map = new ConcurrentHashMap<String, Integer>();
  static Factorial getInstance() {
    return SINGLETON;
  }
  public int factorial(String n) {
    if (n == null) {
      throw new NumberFormatException("Invalid argument: " + n);
    }
    n = n.trim();
    Integer val = map.get(n);
    if (val == null) {
      int i = Integer.parseInt(n);
      if (i < 0)
        throw new NumberFormatException("Invalid argument: " + n);
      int fact = 1;
      while (i > 0) {
        System.out.println("MYDEBUG> multiplying by " + i);
        fact *= i;
        i--;
      }
      val = new Integer(fact);
      System.out.println("MYDEBUG> saving factorial(" + n + ") = " + val);
      map.put(n, val);
    } else {
      System.out.println("MYDEBUG> returning saved factorial(" + n + ") = " + val);
    }
    return val;
  }
}

Build factorial_debug_02.jar

javac -d patch_classes Factorial.java
jar cvf factorial_debug_02.jar  -C patch_classes .
cp factorial_debug_02.jar $DOMAIN_DIR/debug_patches

Activate factorial_debug_02.jar

wls:/mydomain/serverConfig/> listDebugPatches()     
myserver:
Active Patches:
    factorial_debug_01.jar:app=factorial
Available Patches:
    factorial_debug_01.jar
    factorial_debug_02.jar
    app2.0_patch01.jar
    app2.0_patch02.jar
wls:/mydomain/serverConfig/> tasks=activateDebugPatch('factorial_debug_01.jar', app='factorial', target='myserver')
wls:/mydomain/serverConfig/> listDebugPatches()                                                                    
myserver:
Active Patches:
    factorial_debug_01.jar:app=factorial
    factorial_debug_02.jar:app=factorial
Available Patches:
    factorial_debug_01.jar
    factorial_debug_02.jar
    servlet3.0_patch01.jar
    servlet3.0_patch02.jar

Now, let us send some requests to the application: http://localhost:7001/factorial/factorial?n=5 and http://localhost:7001/factorial/factorial?n=6

FactorialServlet called for input=5
MYDEBUG> returning saved factorial(5) = 120
FactorialServlet called for input=6
MYDEBUG> multiplying by 6
MYDEBUG> multiplying by 5
MYDEBUG> multiplying by 4
MYDEBUG> multiplying by 3
MYDEBUG> multiplying by 2
MYDEBUG> multiplying by 1
MYDEBUG> saving factorial(6) = 720

We see the additional information printed due to code in factorial_debug_02.jar.

Deactivating Debug Patches

When the debug patch is not needed any more, it can be deactivated deactivateDebugPatches command. To get help on it, execute help('deactivateDebugPatches').

wls:/mydomain/serverConfig/> tasks=deactivateDebugPatches('factorial_debug_02.jar', app='factorial', target='myserver')            
wls:/mydomain/serverConfig/> listDebugPatches()                                                                        
myserver:
Active Patches:
    factorial_debug_01.jar:app=factorial
Available Patches:
    factorial_debug_01.jar
    factorial_debug_02.jar
    servlet3.0_patch01.jar
    servlet3.0_patch02.jar

Now, executing http://localhost:7001/factorial/factorial?n=2 gets us the following output in server's stdout window:

FactorialServlet called for input=2
MYDEBUG> saving factorial(2) = 2

Note that when we had activated factorial_debug_01.jar and factorial_debug_02.jar in that order, the classes in factorial_debug_02.jar masked those in factorial_debug_01.jar. After deactivating factorial_debug_02.jar, the classes in factorial_debug_01.jar got unmasked and became effective again. A list of comma separated list of debug patches may be specified with the deactivateDebugPatches command. To deactivate all active debug patches on applicable target servers, deactivateAllDebugPatches() command may be used.

WLST Commands

Following diagnostic WLST commands are provided to interact with the Dynamic Debug Patches feature. As noted above, help(command-name) shows help for that command.

 Command  Description
activateDebugPatch Activate a debug patch on specified targets.
deactivateAllDebugPatches De-activate all debug patches from specified targets.
deactivateDebugPatches De-activate debug patches on specified targets.
listDebugPatches List activated and available debug patches on specified targets.
listDebugPatchTasks List debug patch tasks from specified targets.
purgeDebugPatchTasks Purge debug patch tasks from specified targets.
showDebugPatchInfo  Show details about a debug patch on specified targets.

Limitations

The Dynamic Debug Patch features leverages JDK's hot-swap feature. The hot-swap feature has a limitation that how-swapped classes cannot have a different shape than the original classes. This means that the classes which are swapped in cannot add, remove or update constructors, methods, fields, super classes, implemented interfaces, etc. Only changes in method bodies are allowed. It should be noted that debug patches typically only gather additional information and not attempt to "fix" the problems as such. Minor fixes which would not change the shape of classes may be tried, but that is not the main purpose of this feature. Therefore, we don't expect this to be a big limitation in practice.

One issue, however is that, in some cases the new debug code may need to maintain some state. For example, perhaps we want to collect some data in some map and only dump it out on some threshold. The JDK limitation regarding shape-change creates problems in that case. The Dynamic Debug Patches provides a DebugPatchHelper utility class to help address some of those concerns. We will discuss that in a subsequent article. Please visit us back to read about it.

Using Diagnostic Context for Correlation

The WebLogic Diagnostics Framework (WLDF) and Fusion Middleware Diagnostics Monitoring System (DMS) provide correlation information in diagnostic artifacts such as logs and Java Flight Recorder (JFR).

The correlation information flows along with a Request across threads within and between WebLogic server processes, and can also flow across process boundaries to/from other Oracle products (such as from OTD or to the Database). This correlation information is exposed in the form of unique IDs which can be used to identify and correlate the flow of a specific request through the system. This information can also provide details on the ordering of the flow as well.

The correlation IDs are described as follows:

  • DiagnosticContextID (DCID) and ExecutionContextID (ECID). This is the unique identifier which identifies the Request flowing through the system. While the name of the ID may be different depending on whether you are using WLDF or DMS, it is the same ID. I will be using the term ECID as that is the name used in the broader set of Oracle products.
  • Relationship ID (RID). This ID is used to describe where in the overall flow (or tree) the Request is currently at. The ID itself is an ordered set of numbers that describes the location of each task in the tree of tasks. The leading number is usually a zero. A leading number of 1 indicates that it has not been possible to track the location of the sub-task within the overall sub-task tree.

These correlation IDs have been around for quite a long time, what is new in 12.2.1 is that WLDF now picks up some capabilities from DMS (even when DMS is not present):

  1) The RelationshipID (RID) feature from DMS is now supported
  2) The ability to handle correlation information coming in over HTTP
  3) The ability to propagate correlation out over HTTP when using the WebLogic HTTP client
  4) The concept of a non-inheritable Context (not covered in this blog, may be the topic of another blog)

For this blog, we will walk through a simple contrived scenario to show how an administrator can make use of this correlation information to quickly find the data available related to a particular Request flow. This diagram shows the basic scenario:


Each arrow in the diagram shows where a Context propagation could occur, however in our example propagation occurs only where we have solid blue arrows. The reason for this is that in our example we are using a Browser client which does not supply a Context, so for our example case the first place where a Context is created is when MySimpleServlet is called. Note that a Context could propagate into MySimpleServlet if it is called by a clients capable of providing the Context (for example, a DMS enabled HTTP client, a 12.2.1+ WebLogic HTTP client, or OTD).

In our contrived applications, we have each level querying the value of the ECID/RID using the DiagnosticContextHelper API, and the servlet will report these values. A real application would not be doing this, this is just for our example purposes so our servlet can display them.

We also have the EJB hard-coded to throw an Exception if the servlet request was supplied with a query string. The application will log warnings when that is detected, the warning log messages will automatically get the ECID/RID values included in them. The application does not need to do anything special to get them.

The applications used here as well as well as some basic instructions are attached in blog_example.zip.

First we will show hitting our servlet with an URL that is not expected to fail (http://myhost:7003/MySimpleServlet/MySimpleServlet):




From the screen shot above we can see that all of the application components are reporting the same ECID (f7cf87c6-9ef3-42c8-80fa-e6007c56c21f-0000022f). We also can see that the RID being reported by each components here are different and show the relationship between each of the components:


Next we will show hitting our servlet with an URL that is expected to fail (http://myhost:7003/MySimpleServlet/MySimpleServlet?fail):

We see that the EJB reported that it failed. In our contrived example app, we can see that the ECID is for the entire flow where the failure occured was "f7cf87c6-9ef3-42c8-80fa-e6007c56c21f-00000231". In a real application, that would not be the case. An administrator would most likely first see warnings reported in the various server logs, and see the ECID reported with those warnings. Since we know the ECID in this case, we can "grep" for it to show what those warnings would look like and that they have ECID/RID reported in them:

Upon seeing that we had a failure, the admin will capture JFR data from all of the servers involved. In a real scenario, the admin may have noticed the warnings in the logs, or perhaps had a Policy/Action (formerly known as Watch/Notification) configured to automatically notify or capture data. For our simple example, a WLST script is included to capture the JFR data.


The assumption is that folks here are familiar with JFR and Java Mission Control (JMC), also that they have installed the WebLogic Plugin for JMC (video on installing the plugin)

Since we have an ECID in hand already related to the failure (in a real case this would be from the warnings in the logs), we will pull up the JFR data in JMC and go directly to the "ECIDs" tab in the "WebLogic" tab group. This tab initially shows us an unfiltered view from the AdminServer JFR, which includes all ECIDs present in that JFR recording:

Next we will copy/paste the ECID "f7cf87c6-9ef3-42c8-80fa-e6007c56c21f-00000231" into the "Filter Column" for "ECID":


With only the specific ECID displayed, we can select that and see the JFR events that are present in the JFR recording related to that ECID. We can right-click to add those associated events to the "operative set" feature in JMC. Once in the "operative set" other views in JMC can also be set to show only the operative set as well, see Using WLDF with Java Flight Recorder for more information.

Here we see screen shots showing the same filtered view for the ejbServer and webappServer JFR data:



In our simple contrived case, the failure we forced was entirely within application code. As a result, the JFR data we see here shows us the overall flow for example purposes, but it is not going to give us more insight into the failure in this specific case itself. In cases where something that is covered by JFR events caused a failure, it is a good way to see what failed and what happened leading up to the failure.

For more related information, see:

Wednesday Oct 28, 2015

Zero Downtime Patching Released!

Patching and Updating WebLogic servers just got a whole lot easier!  The release of Zero Downtime Patching marks a huge step forward in Oracle's commitment both to simplifying the maintenance of WebLogic servers, and to our ability to provide continuous availability.

Zero Downtime Patching allows you to rollout distributed patches to multiple clusters or to your entire domain with a single command. All without causing any service outages or loss of session data for the end-user. It takes what was once a tedious and time-consuming task and replaces it with a consistent, efficient, and resilient automated process.

By automating this process, we're able to drastically reduce the amount of human input required (errors), and we're able to verify the input that is given before making any changes. This will have a huge impact on the consistency and reliability of the process, and it will also greatly improve the effiency of the process.

The process is resilient in that it can retry steps when there are errors, it can pause for problem resolution and resume where it left off, or if desired, it can revert the entire environment back to its original state.

As an administrator, you create and verify a patched OracleHome archive with existing and familiar tools, and place the archive on each node that you want to upgrade. Then, a simple command like the one below will handle the rest.

rolloutOracleHome("Cluster1, Cluster2", "/pathTo/patchedOracleHome.jar", "/pathTo/backupOfUnpatchedOracleHome")

The way the process works is that we take advantage of existing clustering technology combined with an Oracle Traffic Director (OTD) load balancer, to allow us to take individual nodes offline one at a time to be updated. We communicate with the load balancer and instruct it to redirect requests to active nodes. We also created some advanced techiques for preserving active sessions so the end-user will never even know the patching is taking place.

We can leverage this same process for updating the Java version used by servers, and even for doing some upgrades to running applications, all without service downtime for the end-user.

There's a lot of exciting aspects to Zero Downtime (ZDT) Patching that we will be discussing here, so check back often!

For more information about Zero Downtime Patching, view the documentation.

WebLogic Server Multitenant Info Sources

In Will Lyons’s blog entry, he introduced the idea of WebLogic Server Multitenant, and there have been a few other blog entries related to WebLogic Server Multitenant since then. Besides these blogs and the product documentation, there are a couple of other things to take a look at:

I just posted a video on Youtube at https://youtu.be/C5GP_JB88VY This video includes a high-level introduction to WebLogic Server Multitenant. This video is a little bit longer than my other videos, but there are a lot of good things to talk about in WebLogic Server Multitenant.

We also have a datasheet at http://www.oracle.com/us/products/middleware/cloud-app-foundation/weblogic/weblogic-server-multitenant-ds-2742664.pdf, which includes a fair amount of detail regarding the value and usefulness of WebLogic Server Multitenant.

I’m at OpenWorld this week where we are seeing a lot of interest in these new features. One aspect of value that seems to keep coming up is the value of running on a shared platform. There are cases where every time a new environment is added, it needs to be certified against security requirements or standard operating procedures. By sharing a platform, those certifications only need to be done once for the environment. New applications deployed in pluggable partitions would not need a ground-up certification. This can mean faster roll-out times/faster time to market and reduced costs.

 That’s all for now. Keep your eye on this blog. More info coming soon!

Weblogic Server 12.2.1 Multi-Tenancy Diagnostics Overview

Introduction

WebLogic Server 12.2.1 release includes support for multitenancy, which allows multiple tenants to share a single WebLogic domain. Tenants have access to domain partitions which provides an isolated slice of the WebLogic domain's configuration and runtime infrastructure.

This blog provides an overview of the diagnostics and monitoring capabilities available to tenants for applications and resources deployed to their respective partitions.

These features are provided by the WebLogic Server Diagnostic Framework (WLDF) component.

The following topics are discussed in the sections below.

Log and Diagnostic Data

Log and diagnostic data from different sources are made available to the partition administrators. They are broadly classified into the following groups:

  1. Shared data - Log and diagnostic data not directly available to the partition administrators in raw persisted form. It is only available through the WLDF Accessor component.
  2. Partition scoped data - These logs are available to the partition administrators in its raw form under the partition file system directory.

Note that The WLDF Data Accessor component provides access to both the shared and partition scoped log and diagnostic data available on a WebLogic Server instance for a partition.

The following shared logs and diagnostic data is available to a partition administrator.

Log Type
Content Description
Server Log events from Server and Application components pertaining to the partition recorded in the Server log file.
Domain Log events collected centrally from all the Server instances in the WebLogic domain pertaining to the partition in a single log file.
DataSource DataSource log events pertaining to the partition.
HarvestedData Archive Metrics data gathered by the WLDF Harvester from MBeans pertaining to the partition.
Instrumentation Events Archive WLDF Instrumentation events generated by applications deployed to the partition.

The following partition scoped log and diagnostic data is available to a partition administrator.

Log Type
Content Description
HTPP access.log HTTP access.log from partition virtual target's WebServer
JMSServer JMS server message life-cycle events for JMS server resources defined within a resource group or resource group template scoped to a partition.
SAF Agent SAF agent message life-cycle events for SAF agent resources defined within a resource group or resource group template scoped to a partition.
Connector Log data generated by Java EE resource adapter modules deployed to a resource group or resource group template within a partition.
Servlet Context Servlet context log data generated by Java EE web application modules deployed to a resource group or resource group template within a partition.

WLDF Accessor

The WLDF Accessor provides the RuntimeMBean interface to retrieve diagnostic data over JMX. It also provides a query capability to fetch only a subset of the data.

Please refer to the documentation on WLDF Data Accessor for WebLogic Server for a detailed description of this functionality.

WLDFPartitionRuntimeMBean (child of PartitionRuntimeMBean) is the root of the WLDF Runtime MBeans. It provides a getter for WLDFPartitionAccessRuntimeMBean interface which is the entry point for the WLDF Accessor functionality scoped to a partition. There is an instance of WLDFDataAccessRuntimeMBean for each log instance available for partitions.

Different logs are referred to by their logical names according to a predefined naming scheme.

The following table lists the logical name patterns for the different partition scoped logs.

Shared Logs

Log Type

Logical Name

Server Log

ServerLog

Domain Log

DomainLog

JDBC Log

DataSourceLog

Harvested Metrics

HarvestedDataArchive

Instrumentation Events

EventsDataArchive

Partition Scoped Logs

Log Type

Logical Name

HTTP Access Log

HTTPAccessLog/<WebServer-Name>

JMS Server Log

JMSMessageLog/<JMSServer-Name>

SAF Agent Log

JMSSAFMessageLog/<SAFAgent-Name>

Servlet Context Log

WebAppLog/<WebServer-Name>/context-path

Connector Log

ConnectorLog/connection-Factory-jndiName$partition-name

Logging Configuration

WebLogic Server MT supports configuring Level for java.util.logging.Loggers used by application components running within a partition. This will allow Java EE applications using java.util.logging to be able to configure levels for their respective loggers even though they do not have access to the system level java.util.logging configuration mechanism. In case of shared logger instances used by common libraries used across partitions also the level configuration is applied to a Logger instance if it is doing work on behalf of a particular partition.

This feature is available if the WebLogic System Administrator has started the server with -Djava.util.logging.manager=weblogic.logging.WLLogManager command line system property.

If the WebLogic Server was started with the custom log manager as described above, the partition administrator can configure logger levels as follows:

Please refer to the sample WLST script in the WLS-MT documentation.

Note that the level configuration specified in the  PartitionLogMBean.PlatformLoggerLevels attribute is applicable only for the owning partition. It is possible that a logger instance with the same name is used by another partition, each logger's effective level at runtime will defined by the respective partition's  PartitionLogMBean.PlatformLoggerLevels configuration.

Server Debug 

For certain troubleshooting scenarios you may need to enable debug output from WebLogic Server subsystems specific to your partition. The Server debug output is useful to debug internal Server code when it is doing work on behalf of a partition. This needs to be done carefully in collaboration with the WebLogic System Administrator and Oracle Support. The WebLogic System Administrator must enable the ServerDebugMBean.PartitionDebugLoggingEnabled attribute first and will advise you to enable certain debug flags. These flags are boolean attributes defined on the ServerDebugMBean configuration interface. The specific debug flags to be enabled for a partition are configured via the PartitionLogMBean.EnabledServerDebugAttributes attribute. It contains an array of String values that are the names of the specific debug outputs to be enabled for the partition. The debug output thus produced is recorded in the server log from where it can be retrieved via the WLDF Accessor and provided to Oracle Support for further analysis. Note that once the troubleshooting is done the debug needs to be disabled as there is a performance overhead incurred when you enable server debug.

Please refer to the sample WLST script in the WebLogic Server MT documentation on how to enable partition specific server debug.

Diagnostic System Module for Partitions

Diagnostic System Module provides Harvester and Policies and Actions components that can be defined within a resource group or resource group template deployed to a Partition.

Harvester

WLDF Harvester provides the capability to poll MBean metric values periodically and and archive the data in the harvested data archive for later diagnosis and analysis. All WebLogic Server Runtime MBeans visible to the partition including the PartitionRuntimeMBean and its child MBeans as well as custom MBeans created by applications deployed to the partition are allowed for harvesting. Harvester configuration defines the sampling period, the MBean types and instance specification and their respective MBean attributes that needs to be collected and persisted.

Note that the archived harvested metrics data is available from the WLDF Accessor component as described earlier.

The following is an example of harvester configuration persisted in the Diagnostic System Resource XML descriptor.

<harvester>
 <enabled>true</enabled>
 <sample-period>2000</sample-period>
 <harvested-type>
   <name>weblogic.management.runtime.ServletRuntimeMBean</name>
   <harvested-attribute>ExecutionTimeAverage</harvested-attribute>
   <namespace>ServerRuntime</namespace>
 </harvested-type>
 <harvested-type>
  <name>sandbox.mbean.SandboxCustomMBeanImpl</name>
  <namespace>ServerRuntime</namespace>
 </harvested-type>
</harvester>

For further details refer to the WLDF Harvester documentation.

Policies and Actions

Policies are rules that are defined in Java Expression Language (EL) for conditions that need to be monitored. WLDF provides a rich set of actions that can be attached to policies that get triggered if the rule condition is satisfied. 

The following types of rule based policies can be defined.

  • Harvester - Based on WebLogic Runtime MBean or Application owned custom MBean metrics.
  • Log events - Log messages in the server and domain logs.
  • Instrumentation Events - Events generated from Java EE application instrumented code using WLDF Instrumentation.

The following snippets show the configuration of the policies using the EL language.

<watch>
  <name>Session-Count-Watch</name>
  <enabled>true</enabled>
  <rule-type>Harvester</rule-type>
    <rule-expression>wls.partition.query("com.bea:Type=WebAppComponentRuntime,*", "OpenSessionsCurrentCount").stream().anyMatch(x -> x >= 1)
  </rule-expression>
  <schedule>
    <minute>*</minute>
    <second>*/2</second>
  </schedule>
  <notification>jmx-notif1</notification>
</watch>
<watch>
  <name>Partition-Error-Log-Watch</name>
  <rule-type>Log</rule-type>
  <rule-expression>log.severityString == 'Error'</rule-expression>
  <notification>jmx-notif1,r1,r2</notification>
</watch>
<watch>
 <name>Inst-Trace-Event-Watch</name>
  <rule-type>EventData</rule-type>
  <rule-expression>instrumentationEvent.eventType == 'TraceAction'</rule-expression>
  <notification>jmx-notif1</notification>
</watch>

The following types of actions are supported for partitions:

  • JMS
  • SMTP
  • JMX
  • REST
  • Diagnostic Image

For further details refer to the Configuring Policies and Actions documentation.

Instrumentation for Partition Applications

WLDF provides a byte code instrumentation mechanism for Java EE applications deployed within a partition scope. The Instrumentation configuration for the application is specified in the META-INF/weblogic-diagnostics.xml descriptor file.  

This feature is available only if the WebLogic System Administrator has enabled server level instrumentation. Also it is not available for applications that share class loaders across partitions.

The following shows an example WLDF Instrumentation descriptor.

  <instrumentation>
  <enabled>true</enabled>
  <wldf-instrumentation-monitor>
    <name>Servlet_Before_Service</name>
    <enabled>true</enabled>
  <action>TraceAction</action>
  </wldf-instrumentation-monitor>
  <wldf-instrumentation-monitor>
    <name>MyCustomMonitor</name>
    <enabled>true</enabled>
    <action>TraceAction</action>
    <location-type>before</location-type>
    <pointcut>execution( * example.util.MyUtil * (...))</pointcut>
  </wldf-instrumentation-monitor>
</instrumentation>

For further details refer to the WLDF Instrumentation documentation.

Diagnostic Image

The Diagnostic Image is similar to a core dump which captures the state of the different WebLogic Server subsystems in a single image zip file. WLDF supports the capturing of partition specific diagnostic images.

Diagnostic images can be captured in the following ways:

  • From WLST by the partition administrator.
  • As the configured action for a WLDF policy.
  • By invoking the captureImage() operation on the WLDFPartitionImageRuntimeMBean.

Images are output to the logs/diagnostic_images directory in the partition file system.

The image for a partition contains diagnostic data from different sources such as:

  • Connector
  • Instrumentation
  • JDBC
  • JNDI
  • JVM
  • Logging
  • RCM
  • Work Manager
  • JTA

For further details refer to the WLDF documentation.

RCM Runtime Metrics

WebLogic Server 12.2.1 introduced Resource Consumption Management (RCM) feature.  This feature is only available in Oracle JDK JDK8u40 and above.

To enable RCM add the following command line switches on Server start up

-XX:+UnlockCommercialFeatures -XX:+ResourceManagement -XX:+UseG1GC
Please note that RCM is not enabled by default in the startup scripts.
The PartitionResourceMetricsRuntimeMBean which is a child of the PartitionRuntimeMBean provides a bunch of useful metrics for monitoring purposes.

Attribute Getter

Description

isRCMMetricsDataAvailable()

Checks whether RCM metrics data is available for this partition.

getCpuTimeNanos()

Total CPU time spent measured in nanoseconds in the context of a partition.

getAllocatedMemory()

Total allocated memory in bytes for the partition.This metric value increases monotonically over time.

getThreadCount()

Number of threads currently assigned to the partition.

getTotalOpenedSocketCount()

getCurrentOpenSocketCount()

Total  and current number of sockets opened in the context of a partition.

getNetworkBytesRead()

getNetworkBytesWritten()

Total number of bytes read /written from sockets for a partition.

getTotalOpenedFileCount()

getCurrentOpenFileCount()

Total  and current number of files opened in the context of a partition.

getFileBytesRead()

getFileBytesWritten()

Total number of file bytes  read/written in the context of a partition.

getTotalOpenedFileDescriptorCount()

getCurrentOpenFileDescriptorCount()

Total  and current number of file descriptors opened in the context of a partition.

getRetainedHeapHistoricalData()

Returns a snapshot of the historical data for retained heap memory usage for the partition.  Data is returned as a two-dimensional array for the usage of retained heap scoped to the partition over time.  Each item in the array contains a tuple of [timestamp (long), retainedHeap(long)] values.

getCpuUtilizationHistoricalData()

Returns a snapshot of the historical data for CPU usage for the partition. CPU utilization percentage indicates the percentage of CPU utilized by a partition with respect to available CPU to Weblogic Server.

Data is returned as a two-dimensional array for the CPU usage scoped to the partition over time. Each item in the array contains a tuple of [timestamp (long), cpuUsage(long)] values.

Please note that the PartitionMBean.RCMHistoricalDataBufferLimit attribute limits the size of the data arrays for Heap and CPU.

Java Flight Recorder

WLDF provides integration with the Java Flight Recorder which enables WebLogic Server events to be included in the JVM recording. WebLogic Server events generated in the context of work being done on behalf of the partition are tagged with the partition-id and partition-name. These events and the flight recording data are only available to the Weblogic System Administrator.

Conclusion

WLDF provides a rich set of tools to capture and access to different types of monitoring data that will be very useful for troubleshooting and diagnosis tasks. This blog provided an introduction to the WLDF surface area for partition administrators. You are encouraged to take a deeper dive and explore these features further and leverage them in your production environments. More detail information is available in the WLDF documentation for WebLogic Server and the Partition Monitoring section in the WebLogic Server MT documentation.

Partition Import/Export

This article will discuss common use cases scenarios of Import/Export Partition in Weblogic Multitenant Edition
[Read More]

Tuesday Oct 27, 2015

Resource Consumption Management in WebLogic Server MultiTenant 12.2.1 to Control Resource Usage of Domain Partitions

[This blog post is part of a series of posts that introduce you to new features in the recently announced Oracle WebLogic Server 12.2.1, and introduces an exciting performance isolation feature that is part of it.] 

With the increasing push to "doing more with less" in the enterprise, system administrators and deployers are constantly looking to increase density and improve hardware utilization for their enterprise deployments. The support for micro-containers/pluggable Domain Partitions in WebLogic Server Multitenant helps system administrators collocate their existing silo-ed business critical Java EE deployments into a single Mutitenant domain.

Say, a system administrator creates two Partitions "Red" and "Blue" in a shared JVM (a WebLogic Multitenant Server instance), and deploys Java EE applications and resources to them. A system administrator would like to avoid the situation where one partition's applications (say the "Blue" partition) "hogs" all shared resources in the Server instance's JVM (Heap)/the operating system (CPU, File descriptors), and negatively affecting the "Red" partition applications' access to these resources.


Runtime Isolation

Therefore, while consolidating existing enterprise workloads into a single Multitenant Server instance, system administrators would require better control (track, manage, monitor, control) over usage of shared resources by collocated Domain Partitions so that:

 

  • One Partition doesn't consume all available resources, and exhaust them from other collocated partitions. This helps a system administrator plan for, and support consistent performance to all collocated partitions.
  • Fair and effecient allocation of available resources are provided to collocated partitions. This helps a system administrator confidently place complementary workloads in the same environment, while achieving enhanced density and great cost-savings.


Control Resource Consumption Management


Resources

In Fusion Middleware 12.2.1, Oracle WebLogic Server Multitenant supports establishing resource management policies on the following resources

 

  • Heap Retained: Track and control the amount of Heap retained by a Partition
  • CPU Utilization: Track and control the amount of CPU utilization used by a Partition
  • Open File Descriptors: Track and control the amount of open file descriptors (due to File I/O, Sockets etc) used by a Partition.


    Recourse Actions

    When a trigger is breached, a system administrator may want to react to that by automatically taking certain recourse actions in response. The following actions are available out of the box with WebLogic.

     

    • Notify: inform administrator that a threshold has been surpassed
    • Slow: reduce partition’s ability to consume resources, predominantly through manipulation of work manager settings – should cause system to self-correct in certain situations
    • Fail: reject requests for the resource, i.e. throw an exception - only supported for file descriptors today
    • Stop: As an extreme step, initiate the shut down sequence for the offending partition on the current server instance


      Policies

      The Resource Consumption Management feature in Oracle WebLogic Server Multitenant, enables a system administrator to specify resource consumption management policies on resources, and direct WebLogic to automatically take specific recourse actions when the policies are violated. A policy could either be created as one of the following two types

       

      • Trigger: This is useful when resource usage by Partitions are predictable and takes the form "when a resource's usage by a Partition crosses a Threshold, take a recourse action". 

       

      For example, a sample resource consumption policy that a system administrator may establish on a "Blue" Partition to ensure that it doesn't run away with all the Heap looks like: When the “Retained Heap” (Resource) usage for the “Blue” (Partition) crosses “2 GB” (Trigger), “stop” (Action) the partition.

       

      • Fair share: Similar to the Work Manager fair share policy in WebLogic, this policy allows a system administrator to specify "shares" of a bounded-size shared resource to a Partition. WebLogic then ensures that this resource is shared effectively (yet fairly) by competing consumers while honouring the "shares" allocated by the system administrator. 

       

      For example, a sample resource consumption policy that a system administrator who prefers "Red" partition over "Blue" may set the fair-share for the "CPU Utilization" resource in the ration 60:40 in favour of "Red".

      When complementary workloads are deployed to collocated partitions, fair-share policies also helps achieving maximal utilization of resources. For instance, when there are no or limited requests for the "Blue" partition, the "Red" partition would be allowed to "steal" and use all the available CPU time. When traffic resumes on the "Blue" partition and there is contention for CPU, WebLogic would allocate CPU time as per the fair-share ratio set by the system administrator. This helps system administrators reuse a single shared infrastructure and saving infrastructure costs in turn, while still retaining control over how those resources are allocated to Partitions. 

       

      Policy configurations could be defined at the domain level and reused across multiple pluggable Partitions, or they can be defined unique to a Partition. Policy configurations are flexible to support different combinations of trigger-based and fair-share policies for multiple resources to meet your unique business requirements. Policies can also be dynamically reconfigured without any restart of the Partition required. 

      The picture below shows how a system administrator could configure two resource consumption management policies (a stricter "trial" policy and a lax "approved" policy) and how they could be assigned to individual Domain Partitions. Heap and CPU resource by the two domain Partitions are then governed by the policies associated with each of them.

      WLS 12.2.1 RCM resource manager sample schematic


      Enabling Resource Management

      The Resource Consumption Management feature in WebLogic Server 12.2.1 is built on top of the resource management support in Oracle JDK 8u40. WebLogic RCM requires Oracle JDK 8u40 and the G1 Garbage Collector. In WebLogic Server Multitenant, you would need to pass the following additional JVM arguments to enable Resource Management:

      “-XX:+UnlockCommercialFeatures -XX:+ResourceManagement -XX:+UseG1GC”


      Track Resource Consumption

      Resource consumption metrics are also available on a per partition basis, and is provided through a Monitoring MBean, PartitionResourceMetricsRuntimeMBean. Detailed usage metrics on a per partition basis is available through this monitoring Mbean, and system administrators may use these metrics for the purposes of tracking, sizing, analysis, monitoring, and for configuring business-specific Watch and Harvester WLDF rules.


      Conclusion

      Resource Consumption Managers in WebLogic MultiTenant helps provide the runtime isolation and protection needed for applications running in your shared and consolidated environments.


      For More Information

      This blog post only scratches the surface of the possibilities with the Resource Consumption Management feature. For more details on this feature, and how you can configure resource consumption management policies in a consolidated Multitenant domain using Weblogic Scripting Tool (WLST) and Fusion Middleware Control, and best practices, please refer the detailed technical document at "Resource Consumption Management (RCM) in Oracle WebLogic Server Multitenant (MT) - Flexibility and Control Over Resource Usage in Consolidated Environments".

      The Weblogic MultiTenant documentation's chapter "Configuring Resource Consumption Management" also has more details on using the feature.

      This feature is a result of deep integration between the Oracle JDK and WebLogic Server. If you are attending Oracle OpenWorld 2015 in San Francisco, head over to the session titled "Multitenancy in Java: Innovation in the JDK and Oracle WebLogic Server 12.2.1" [CON8633] (Wednesday, Oct 28, 1:45 p.m. | Moscone South—302) to hear us talk about this feature in more detail.

      We are also planning a series of videos on using the feature and we will update this blog entry as they become available.


      [Read More]
      About

      The official blog for Oracle WebLogic Server fans and followers!

      Stay Connected

      Search

      Archives
      « June 2016
      SunMonTueWedThuFriSat
         
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      22
      23
      24
      25
      26
      27
      30
        
             
      Today