What is AIOps?

AIOps has the potential to transform how NetOps and DevSecOps maintain IT infrastructure

Estimated time to read: 6 min

AIOps has become one of the hottest IT operations topics of the decade. With the increase in compute power, advances in artificial intelligence (AI) and machine learning (ML), and the democratization of AI/ML capabilities, it’s easy to understand why. With the right data, machines can now make correlations and reveal insights in mere seconds — a task that would take an experienced engineer hours, or even days. With the industry growth projections for the AIOps market estimated to be 7x from 2021 to 2027, the hype around AIOps isn’t likely to diminish anytime soon.

However, despite its potential, AIOps isn’t a panacea. The application-first world that today’s businesses operate in is complex. The increased analytical horsepower of AIOps alone cannot address fundamental observability challenges. Enterprises that want to get the most out of AIOps must understand its benefits and its limitations.

In this post, we’ll take a closer look at what AIOps is — and isn’t — to separate the hype from the real-world business opportunity.

The core components of AIOps

Gartner, the research firm that coined the term, defines AIOps this way:

“AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.”

While specific AIOps platforms will vary, the core components of AIOps are:

  • Data: AIOps platforms ingest data about IT infrastructure from sources such as IoT devices, servers, network management systems (NMS), ticketing software, and remote monitoring and management (RMM) tools.
  • AI/ML engine: A set of intelligent algorithms, often including domain-specific algorithms, process and analyze data, including continuously updating baselines.
  • Baselines: Dynamic baselines provide a near real-time reference for normal behavior in an environment. AIOps platforms maintain and adapt baselines over time based on real-world data, thereby adding context to anomaly detection.
  • Contextualized information: Using baselines for context, AIOps platforms transform data into actionable information—such as problem detection, predictive alerts, event correlations, and root cause analyses.
  • Process automation: AIOps takes the automation in traditional NMS, IT service management (ITSM), and RMM tools to the next level by partially or completely automating workflows such as problem remediation, ticket creation, and notifications.

Fundamentally, what makes AIOps different from traditional monitoring tools is the shift from static rules to dynamic self-learning algorithms.

Modern enterprises operate in an application-first world where even simple user journeys are enabled by complex IT infrastructures. Using traditional monitoring solutions in these environments can quickly lead to challenges such as alert fatigue and siloed monitoring data that lacks context. For example, alerts triggered by a server exceeding static resource thresholds could — depending on context — indicate anything from malware to normal operation. Static rules can’t provide context. They can only trigger an alert that a human needs to investigate. Conversely, AIOps can layer in information from existing baselines and other data sources to provide context for legitimate alerts and reduce noise when an alert isn’t necessary.

The benefits of AIOps

AIOps enables teams to manage and maintain IT infrastructure more efficiently and intelligently. Let’s break down the three major benefits for NetOps and DevSecOps teams.

Benefit #1: Save time and money

It’s no secret that incident management is expensive. First, the cost of downtime is notoriously high and can easily exceed thousands of dollars per minute. In addition, instead of focusing on productive value-adding work, IT professionals spend their time responding to incidents, troubleshooting, and restoring systems.

In modern IT environments, there are simply too many logs, dashboards, and consoles for a human to analyze at once. Drilling down to determine a root cause takes expertise and time. Where large-scale data processing is required, AI is ideal, exponentially reducing incident response time and cost. For example, identifying a correlation between events across multiple cloud providers that contributed to an uptick in authentication failures might take an experienced engineer an hour or more. AI can detect those correlations in less than a minute.

Benefit #2: Bridge skill gaps and simplify infrastructure management

Escalating incident response and troubleshooting from less-experienced team members (such as level-one support or a junior engineer) to experienced engineers often comes with two hidden costs. First, the time the engineer takes to respond to the escalation increases MTTR. Second, the time spent on issue remediation is taken away from other meaningful work.

Machine reasoning (MR) capabilities in AIOps can enable less-experienced team members to resolve issues without escalation. For example, with tools like Cisco DNA Center, a level-one responder can see MR-enabled suggestions for issue remediation, including guided workflows and granular detail such as specific commands.

Benefit #3: Increase digital transformation speed and reduce risk

Data silos and the risk of change are two of the biggest roadblocks to digital transformations. Limited visibility means teams can’t quickly make informed decisions and rapidly build new products and services without significant risk to system reliability. Similarly, when a change is introduced, low visibility means a limited understanding of the real impact. AIOps unifies and contextualizes information so teams can understand dependencies, better quantify performance from the user perspective, and reduce the risk of change.

For example, suppose a change to a microservice in a public cloud creates errors in a user-facing app. In that case, teams with legacy monitoring tools might not notice until trouble tickets begin flowing in, and an engineer makes the correlation by inspecting logs. In contrast, AIOps can proactively detect deviations from baseline performance and identify potential root causes before end users notice problems.

Is AIOps all you need? (No.)

AIOps is not a comprehensive observability platform.

AIOps is a game-changer for many NetOps and DevSecOps workflows, particularly those related to incident response and root cause analysis. However, AIOps alone can’t overcome data silos, nor can it move the needle on key business outcomes by itself.

With AIOps, teams get a tool that drastically improves analytical capabilities for a given data set, but the quality and comprehensiveness of the underlying data still matter. For many enterprises, observability data is siloed in individual clouds, unexposed to analytics tools, and effectively unobserved in any meaningful way.

AIOps can’t directly solve this problem. And this is where full-stack observability comes in.

Benefits of full-stack observability with AIOps

To understand the benefits of full-stack observability, let’s compare it to domain observability. Modern IT infrastructure includes a wide range of separate domains such as SaaS products, Kubernetes clusters, serverless functions, and on-premises bare metal servers. Domain observability provides visibility into just one of those domains (for example, a Kubernetes monitoring stack). However, the end-user experience depends on all the different domains working together. A problem in any specific domain may or may not directly impact user outcomes, and a human is often left to tie together the outputs of discrete domain observability tools to address business problems.

Full-stack observability goes beyond siloed monitoring of individual domains—or slices of infrastructure—and consolidates data from all clouds and edges. By capturing, enriching, and aggregating telemetry data (such as metrics, events, and logs), a full-stack observability platform breaks down silos. Only then can AIOps capabilities provide the richest of insights.

Full-stack observability in an application-first world

Although AIOps is powerful, it isn’t a replacement for observability. Instead, AIOps should be viewed as a key piece of a larger full-stack observability solution that includes big data, visibility across clouds, NetOps, and DevSecOps. Full-stack observability ties together all the pieces an organization needs to help improve reliability, performance, and user experience in an application-first world.

For a deeper dive into AIOps, full-stack observability, and how teams can use them to navigate the complexity of modern IT environments, download the eBook titled “The Role of AIOps in Full-Stack Observability.”

Resources

Learning Lab: Cisco DNA Center

Site: Cisco Developer Full-stack observability Hub

Site: Cisco full-stack observability solutions