Almost all of today's business processes are supported and complemented by enterprise IT applications. These applications are often business-critical and therefore tied to strict Service Level Agreements (SLAs). Most enterprises utilize sophisticated monitoring tools to track system health on the application, operating system, hypervisor, hardware, network resources, and storage levels. These health monitoring tools send out alerts to the administrator if any of the warning lights turn yellow or red. When all lights are green, there should be no problem.
Unfortunately, this is often not the case. Green lights can be deceptive, as most monitoring solutions do not include workload performance in their dashboard. Ignoring workload performance means not considering a significant health factor. Workloads are consumed and created by the vast majority of enterprise applications. Thus, slow or failing workloads often directly lead to the failure of vital business applications, such as CRM, ERP, eCommerce, or accounting.
What makes matters worse is that enterprise applications often rely on hundreds or even more than a thousand workload processing jobs. These job sequences can be quite fragile and if a crucial element, early in the chain, fails, some of its dependents will also fail. These failures can lead to the enterprise application not being able to perform as required by the SLA. Hence, the business will lose productivity and, ultimately, money.
The reason for not considering workload when it comes to IT system health is a historic one. The complexity of corporate workloads has organically grown throughout the years. Many organizations still utilize multiple job schedulers, which means that they do not have any ability to coherently monitor workload-health throughout the organization.
So what is needed to give workload automation (WLA) its rightful place, as an infrastructure layer that is critical to the performance of most enterprise software?
1. Consolidated Job Scheduling: The ability to centrally schedule workloads from a single dashboard is a basic but important requirement. Whether the actual scheduler resides on the mainframe or on an x86 server machine does not matter, as long as there is only a single instance of the WLA software running, centrally controlling a set of agents for all target resource environments.
2. Configuration Management Database (CMDB) and Event Correlation: Based on a simple rule-set, event correlation automatically consolidates incident information into an easy-to-interpret report. Event correlation therefore significantly reduces complexity when conducting a BIA . The input for the correlation engine should ideally come from a CMDB containing information on all workload-relevant IT resources and their relationships.
3. Workload Abstraction, Automated Provisioning and Load Balancing: To enable true resource optimization, workloads have to be decoupled from their underlying hardware resources. This decoupling allows the automatic provisioning and load balancing engine to distribute workloads based on business policies, service classes, infrastructure performance, workload capacity demands, and resource availability.
4. Predictive Capabilities and Heuristic Thresholds: Few WLA vendors offer true predictive capabilities, including heuristically deduced performance thresholds, based on historic performance data. However, these predictive abilities are key to truly dynamic automation and resource optimization.
Vendors like CA Technolgies, Cisco, BMC, ASG, IBM, ASG, UC4, and Terma Software Labs offer all of these capabilities in varying degrees and shapes. EMA's WLA Radar Report from 2010 takes a good look at these and other vendors and provides a lot more detail regarding the above four requirements. I am also excited to announce that there will be a new EMA WLA Radar Report coming out in spring of 2012, where we will take a close look the state of the discipline, and, specifically, at all the leading WLA vendors.