Decoding AI Success: How Enterprises Navigate and Measure AI Adoption
You can’t manage what you can’t measure, so...
Introduction
Today, Enterprises, across the world, are at different levels of AI maturity. Gartner’s AI Maturity Model is a framework to assess at what level an Enterprise’s AI maturity is: Level 1 - Aware, Level 2 - Active, Level 3 - Operational, Level 4 - Systemic, Level 5 - Futuristic. Thanks to the immense interest that LLM based chatbots like ChatGPT, Bard, Claude, et al have generated; most Enterprises, unless an Enterprise’s leadership is living under a rock, are now aware of AI and would have reached Level 1 or higher in AI maturity. As Enterprises increase their adoption of AI, a key decision that leadership, across these Enterprises, should make is how they measure their success of adoption of AI. As Peter Drucker once said: “You can’t manage what you can’t measure”, it is important that leaders clearly define the Objectives and the Key Results for any investment they make into AI. However, given the probabilistic nature of AI algorithms, measuring success of AI initiatives would not be as easy for Enterprises as it might have been for their other Digital transformation initiatives of the past. In this post, let us see some examples of AI initiatives across different functions of an Enterprise and some strategies that leaders can use to define success for those AI initiatives.
A/B Testing to the Rescue
Across all the functions of Enterprises, Sales and Marketing has been one of the earliest adopters of GenAI. Whether its adoption of co-pilots for bringing in efficiencies or LLM based chatbots for generating text, image and video content to improve effectiveness of their messaging, Sales and Marketing teams are increasingly using GenAI. If you are a CRO or CMO looking to adopt AI in your Organization, how do you define success for such initiatives? One approach could be to employ A/B testing to compare effectiveness of synthetic content to that of human generated content. Researchers at MIT, Yunhao Zhang and Renee Richardson Gosline, conducted a study to gauge how people perceive AI-created content; their findings are detailed in a new paper: “Human Favoritism, Not AI Aversion”. In this study, the researchers adopted an interesting approach to test different types of content generated by: humans-only, augmented humans (content generated by AI was then improved by humans), augmented AI (content generated by humans was then improved by AI) and AI-only. They then collected feedback for the four types of content to assess the perception of the participants. This approach provides a cue for Organizations to adopt similar A/B testing approaches to evaluate the performance of their GenAI initiatives.
Tracking Type I and II Errors
Enterprises are, increasingly, adopting Predictive Maintenance (PdM) to improve availability of assets while lowering costs of maintenance. Predictive maintenance is a proactive maintenance strategy that uses data analysis tools and techniques to detect anomalies and predict equipment failures before they occur. PdM uses Machine Learning algorithms to predict which equipment or machines require what kind of maintenance and when. Given that these Machine Learning algorithms are probabilistic in nature, they may, sometimes, recommend maintenance (for example, replacement of a part) even when not actually needed (Type I errors). A conservative implementation of the algorithms may err on the side of caution, leading to sub-optimal business outcomes. If you are an operations leader, you may want to define success of PdM when you are able to minimise Type I errors (false positives) as well as Type II errors (false negatives).
Statistical Measures
Based on business value and feasibility, Gartner ranked Demand / Revenue Forecasting as the top most use case for AI adoption in the Corporate Finance Organization. Demand Forecasting requires Machine Learning algorithms to predict future demand based on past data as well as changing customer preferences and macro conditions; the models are trained on data collected from multiple data sources: internal data, external data and partner provided data. These forecasting models, once trained, are deployed for production use and then continuously monitored and retrained. Defining success for these initiatives is relatively easy, as there are well-defined metrics that can be used to measure the accuracy of forecasting, such as: Mean Absolute Percentage Error (MAPE), Weighted MAPE (WMAPE) or Mean Absolute Deviation (MAD).
Confusion Matrix Provides Clarity
A study published by Harvard Business Review shows how Employee Experience impacts an Enterprise’s bottom line. It is increasingly becoming an imperative for CHROs to track and improve Employee Experience, given its impact on Revenue and Profits. There are many ways to track Employee Experience, one approach is to use Employee Sentiment Analysis. Employee Sentiment Analysis is the use of AI (particularly, Natural Language Processing) to analyze employee feedback and other unstructured data to assess the levels of engagement of employees. Accuracy, Precision, Recall and F1 Score (derived from Precision and Recall) are some of the most commonly used measures to evaluate the performance of Sentiment Analysis models. Data Scientists use what is called a Confusion matrix to summarize the performance of Sentiment Analysis models.
Conclusion
The journey towards effective AI adoption in enterprises is multifaceted, requiring a nuanced understanding of various strategies and metrics. Leaders may have to select KPIs depending on the business functions, use cases and the AI model(s) used. As leaders navigate this versatile landscape, it’s crucial to remember that the key to unlocking AI’s full potential lies in the thoughtful selection of KPIs that align with core business objectives. By doing so, enterprises can not only measure but also enhance their competitiveness and innovation in an increasingly AI-driven world.
How do you measure the success of AI adoption in your Organization?