Holistic APM?
Can holistic APM approaches help IT managers
solve the performance-management puzzle? Maybe, but only if all the
pieces fall into place.
Sudoku may be the latest
paper-and-pencil craze, but application-performance management is
the puzzle driving IT managers nuts. Application architectures are
complicated, and data center virtualization adds complexity to the
quagmire of performance management. If your bottom line depends on
application speed and user satisfaction, application performance is
critical—and duct-taping overlapping tools together is not an
adequate strategy. APM vendors want to deliver predictive
application performance based on new, holistic approaches that
could offer better, faster and more reliable monitoring, but how
can your organization be sure of a good fit?
Unlike many first-generation products that depend on a manual,
labor-intensive approach, products like Hewlett-Packard’s
Mercury Application Mapping strive to discover and map
relationships between apps and underlying infrastructure
automatically. Although few other APM vendors offer this mapping
feature, products from Digital Fuel Technologies, Integrien,
Oblicore and OpTier are working to fill this gap or can act as an
overall consolidation point for APM data from a variety of vendors.
Once a problem has been identified, holistic APM should take
corrective actions to resolve performance issues or to integrate
with other tools that can, such as those from Opalis Software,
Opsware/iConclude or RealOps. This may include allocating
additional network bandwidth, processing capability on the server,
or even rolling back configuration changes. Without
corrective-action capability, organizations must be content with
fast problem identification and notification. This is a big step
forward in many cases, but not ideal.
Holistic Architecture
Holistic design generally implies a
setup that functions in harmony with the surrounding environment,
all the disparate pieces fitting gracefully into the bigger
picture, but in APM, this technological nirvana has yet to be fully
realized. The APM architecture includes several components
installed at the network and application layers. Although complex
to implement, agents are critical, especially on application
servers and supporting system components. Without information from
the agents, it’s difficult to pinpoint a problem’s
cause—one of the most common sources of
application-performance puzzles.
Additionally, agents installed on application tiers, the OS,
hardware, database components and even client workstations will
detect problems that are affecting the app, such as memory usage,
CPU and network activity. Many IT organizations may forgo the
client agent that tracks performance glitches resulting from the
user workstation, but where customer satisfaction counts, this is
critical. Likewise, organizations may need to install agents on
numerous VMs, depending on their needs. Since pricing for these
tools is based on a per-deployment model, the architecture can have
an impact on purchasing decisions. Synthetic transaction monitoring
detects performance snags during off-peak hours and finds problems
users may experience but not report, while network probes capture
actual user data so you can monitor and baseline the end-user
experience and detect problems during an application
slowdown.
 |
Even if organizations collect all this
data, without a central engine to provide correlations and
analysis, IT managers will quickly be overwhelmed and unable to
isolate and resolve problems. In a holistic APM architecture, data
must be collected by a midtier server, then forwarded to a
correlation engine for root-cause analysis. Many organizations are
then looking to problem-automation software to take corrective
action once the correlation engine has detected a problem. A
holistic APM architecture (see graph ‘Holistic APM
Architecture’) takes all aspects of performance into
consideration and helps dig beneath the surface diagnostic to find
the real cause of a problem. As more converged apps are widely
deployed, the network demand and the number of potential
performance problems will only increase. Gaps in the architecture
will create holes in your understanding of a problem, and
ultimately increase the MTTR (mean time to repair), frustrations
and cost.
Active, Passive and Beyond Most vendors use the term APM to address
any type of application-performance management, defining products
using basic active/passive categories. While these categories may
be over-simplified in most cases, active and passive tools still
provide the backbone of holistic approaches designed to provide as
much data from different perspectives as possible in order to
triangulate and target the actual performance problem. Although
this can result in duplicate monitoring and redundancy with
existing management, a holistic approach can save IT time and
reduce headaches when trying to identify and troubleshoot the
issue.
Synthetic transaction products avoid the need to deploy a specific
agent to detect a performance problem. Instead they mimic the
actions of real users on your system, and may place additional load
on the app being monitored. Users often need to work with the
applications team to build appropriate synthetic transactions.
Also, certain applications may require some
modification—after all, you don’t actually want to ship
the product a simulated customer orders from your e-commerce
site.
Passive monitoring tools track network application traffic and
avoid any additional load on the apps, or they deploy agents on
clients, application servers or hardware. Some tools track and
measure end-user response time without an agent. As TCP application
packets travel through the network, passive monitors track network
round-trip time, server response time, data transfer time and other
key metrics. Although this method is less intrusive, its viability
is determined by your network architecture. In a distributed
environment, you may need many passive appliances to track all
application data.
Synthetic-transaction-monitoring products simulate an end-user and
perform scripted, macro-like transactions against an application
and report on the results. This may identify performance problems
from a user perspective, but without agent technology it’s
difficult to isolate the actual cause of the performance problem on
the application or the hardware and OS infrastructure. Symantec
offers Indepth, Inform and Insight—a suite of products it
calls i3—to provide several options for APM offering agents
and synthetic transactions. Indepth agents provide deep application
metrics for J2EE and .Net apps as well as other common enterprise
apps. Indepth also provides information on the internals of the
application that may be causing a performance problem. Inform adds
alerting, trending and performance reporting, while Insight
aggregates response information across app tiers, applying
algorithms to correlate activity to individual
transactions.
 |
Symantec introduced Insight Inquire this
January; this product adds a synthetic transaction monitoring
capability to track availability and performance of critical Web
apps. Insight Inquire injects synthetic transactions into an
application’s transaction stream, monitoring availability and
performance—with multiple instances available at no
additional charge, allowing installation at different geographic
locations. Although application performance is the primary focus,
Symantec doesn’t provide visibility into network performance
but needs server agents to determine what’s wrong with an
application.
HP’s Mercury End User Management proactively monitors Web
site and app availability in real time, from an end-user
perspective. It simulates end-user processes against apps for
common Web and enterprise apps from PeopleSoft, Oracle, SAP and
others. Mercury Real User Monitor complements synthetic monitors
for environments where there are a large number of users
distributed across multiple locations. It tracks individual user
information to the specific application that handles the user
request, allowing IT managers to focus on discrete users or periods
in time to catch problems. With the breadth of product offerings
from HP and its licensing model, many IT pros might find it
confusing to choose the right products and components to meet their
particular situation. BMC’s Performance Manager/Transaction
Manager, IBM’s Tivoli Composite Application Manager,
NetIQ’s AppManager and Quest Software’s Foglight
products are also strong in this area.
Network Probe Monitoring
In complex enterprises, one of the
keys to understanding and resolving application-performance puzzles
is correlating application response time with other application and
network activity. NetScout’s nGenius monitors test response
time of key business apps, providing a broad context for analyzing
problems. It tests application traffic against service-level
delivery and measurements for troubleshooting problems with
end-users. NetScout’s approach provides a context for
application-response time that includes traffic volume,
utilization, error conditions, alarms, hosts, conversations and
packet captures. However, once a performance problem is detected,
nGenius doesn’t offer server agents that can indicate what
component of the application may be causing the glitch.
NetQoS’ SuperAgent also tracks and measures end-user response
time—without desktop or server agents. It monitors all TCP
application packets as they travel through the network, providing a
way to measure round-trip time, server-response time, data-transfer
time and other metrics. SuperAgent breaks response time into its
basic components: application, network and server latency. NetQoS
continually measures and analyzes performance for all transactions,
compares the response time against the baselines, and alerts IT
when performance deteriorates. As with NetScout’s product,
you’ll need another piece, such as collection agents from
Quest Software, Symantec or Wily Technology, to grab information
within an app after a performance problem is detected.
Merging Monitors & Probes
Although some organizations find it
difficult to deploy agents on all applications, many are using
passive, network probe technology to monitor all application
traffic on their networks, while using agent technology to provide
a deeper level of monitoring of critical systems. Wily, purchased
by CA in March 2006, offers application agents and network traffic
analyzers to collect the detailed information required to diagnose
performance issues. Wily Introscope agents collect performance data
from various components inside Web applications, then report these
metrics to the Collector Enterprise Manager. It acts as the
repository of performance metrics and receives data from one or
more Introscope agents, letting users collect data centrally from
many applications, application servers and supporting
systems.
 |
With that information, the Collector
Enterprise Manager processes performance data and makes it
available to users for production monitoring, triage and diagnosis.
Introscope’s approach uses byte-code instrumentation of the
J2EE applications as the agents are loaded into the Java classes.
Wily in March announced a synthetic transaction product that will
complement its Customer Experience Manager appliance which monitors
all actual transactions. This appliance resides at the
switch-level, connected to a SPAN port. Wily doesn’t focus on
the underlying network-performance management, nor does it offer an
easy way to integrate and correlate network- or
application-performance data from other systems into its
product.
Quest also provides tools that combine server-agent technology with
network-probe analysis. Foglight Experience Monitor uses network
traffic monitoring and minimizes the impact on the network
infrastructure and applications. Experience Monitor tracks each
individual user’s interaction with the applications, and
aggregates the data to a central reporting platform. Also included
in Foglight product line are a number of ‘Cartridges’
that focus on specific active monitoring solutions. Foglight
Cartridges monitor a variety of applications including Java, .Net,
Oracle, SAP and PeopleSoft, collecting diagnostic information on
poorly performing transactions. Foglight can discover application
problems and through a host of other products offer some help in
this area.
Putting the Pieces Together
While contemplating the ideal,
holistic APM, there are a few realistic limitations. The ability to
correct performance problems may be limited by external factors
beyond APM capability. If the problem lies in the application, for
example, and wasn’t detected in testing or quality assurance,
it may be outside the scope of APM. Performance problems might be
the result of a change such as a new security patch, in which case
configuration-management vendors come into play to rollback those
changes and pinpoint the problem. Critical for the success of the
next generation of application management will be the ability to
automatically correlate network, system and application problems
and take corrective action to resolve them.