The world is getting automated. We have self-driving cars that can navigate roads and avoid obstacles. There are virtual assistants that can search the internet for answers to your most obscure questions. Within the information technology landscape, we have Artificial Intelligence for IT Operations (AIOps) to manage and assist with our day-to-day IT needs. Currently, AIOps is siloed into technology and use case solutions when IT operations teams need a more holistic ecosystem management solution.
AIOps is a term, coined by Gartner in 2017, for an emerging field that is defined as the “combination of big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.” With such a broad definition there are bound to be complex challenges that need to be addressed.
First, the architecture of our IT environments is complex. There are many technologies that interact with each other. We have dynamic routing protocols steering packets through firewalls, load balancers, and other network services to connect clients with applications and databases. DNS is used to map names to IP addresses. There are containers, hypervisors, clouds – public and private, and a multitude of other components.
Second, the applications themselves are complicated. They consist of multiple components including, but not limited to the front-end processor, middleware engines, and databases. They have certain flows and processes that are business driven and not inherently discoverable through the network infrastructure.
Solutions with Tunnel Vision
Today’s AIOps solutions focus on very specific use cases or on a specific technology. There are solutions that focus on network routing, or firewall security policies, or individual application availability. This is to reduce the complexity necessary for the solution to work effectively. By restricting the parameters that the AIOps solution is responsible for, we simplify the machine learning algorithms that it needs to manage.
The IT environment does not consist of isolated solutions that have no interaction and impact with each other. The local area network, server, application security, storage, wide area network, and cloud components of the IT architectures are all related and interconnected.
We must understand these interconnections to be able to properly augment and automate our IT operations. This means the artificial engine needs to understand how the different technologies (HTTP, DNS, BGP, .NET, etc.) using different languages (XML, syslog, SNMP, node.js, etc.) are used in different services (load balancing, routing, data loss prevention, etc.) to create end-to-end client-application connectivity.
Even if it is the same technology, different vendors utilize different methods to implement it. Vendors customize the technology and the use of different application programming interfaces (APIs) for access to the analytics. This makes it harder for the AIOps solution to create an integrated holistic model. Trying to fit a square peg in a round hole is real.
Connecting the Dots
The AIOps system must be able to understand the interactions between the different technologies. The system must also have some knowledge of the business processes. It cannot just look at data and make analytical assumptions because there is a correlation between the data sets.
An increase in DNS requests for a website means an increase in the number of web connections. This in turn, means that the network utilization is increasing as well. Next, there will be an increase in CPU load on those web servers. Finally, this all means a slower application and increased user latency. It is hard for an AIOps system to predict that an increase in DNS requests means that there will be an increase in application latency and corresponding decrease in application experience (AX) unless the correlation is already taught to the system. Human operators who have prior knowledge of the architecture’s interactions must transfer their knowledge to the AIOps engine for the machine learning to be accurate and effective.
Different use cases have different correlations and thresholds. For different environments, the same data sets may have different relationships. An AIOps engine must be adjusted for each architecture and for each business use case.
From Islands to Holistic Ecosystems
For the AIOps systems to be able to properly understand and manage the entire IT infrastructure, they must evolve from the siloed models that exist today. Two key changes to their design need to occur for this to happen.
First, the AIOps engine needs to become multilingual. All IT environments use multiple technologies and many different vendors. The machine learning engine needs to collect data from the multiple vendors, multiple technologies, and multiple systems. It needs to be able to distill this information into a common database where informative analytics can be performed.
Second, the AIOps engine needs to be made aware of the business policies that drive the design on the IT architecture. Different applications are used at different times. They have different business priorities. The components of the architecture communicate with each other based on the application designs and user requirements. Much of this information is hard to learn, if not impossible, through observation. Knowledge transfer from the architect to the building manager is required.
Ultimately, the AIOps engine needs to look at the overall IT architecture as a forest. It is a living, breathing ecosystem where a change in one aspect can have ripple effects through the environment. At the same time, the AIOps solution needs to micromanage the small details of the individual components in order to maintain the health of this forest.