Q&A: Evaluating the ROI of AI implementation


Many development teams are beginning to experiment with how they can use AI to benefit their efficiency, but in order to have a successful implementation, they need to have ways to assess that their investment in AI is actually providing value proportional to that investment. 

A recent Gartner survey from May of this year said that 49% of respondents claimed the primary obstacle to AI adoption is the difficulty in estimating and demonstrating the value of AI projects. 

On the most recent episode of our podcast What the Dev?, Madeleine Corneli, lead product manager of AI/ML at Exasol, joined us to share tips on doing just that. Here is an edited and abridged version of that conversation:

Jenna Barron, news editor of SD Times: AI is everywhere. And it almost seems unavoidable, because it feels like every development tool now has some sort of AI assistance built into it. But despite the availability and accessibility, not all development teams are using it. And a recent Gartner survey from May of this year said that 49% of respondents claimed the primary obstacle to AI adoption is the difficulty in estimating and demonstrating the value of AI projects. We’ll get into specifics of how to assess the ROI later, but just to start our discussion, why do you think companies are struggling to demonstrate value here?

Madeleine Corneli: I think it starts with actually identifying the appropriate uses, and use cases for AI. And I think what I hear a lot both in the industry and kind of just in the world right now is we have to use AI, there’s this imperative to use AI and apply AI and be AI driven. But if you kind of peel back the onion, what does that actually mean? 

I think a lot of organizations and a lot of people actually struggle to answer that second question, which is what are we actually trying to accomplish? What problem are we trying to solve? And if you don’t know what problem you’re trying to solve, you can’t gauge whether or not you’ve solved the problem, or whether or not you’ve had any impact. So I think that lies at the heart of the struggle to measure impact.

JB: Do you have any advice for how companies can ask that question and, and get to the bottom of what they are trying to achieve?

MC: I spent 10 years working in various analytics industries, and I got pretty practiced at working with customers to try to ask those questions. And even though we’re talking about AI today, it’s kind of the same question that we’ve been asking for many years, which is, what are you doing today that is hard? Are your customers getting frustrated? What could be faster? What could be better? 

And I think it starts with just examining your business or your team or what you’re trying to accomplish, whether it’s building something or delivering something or creating something. And where are the sticking points? What makes that hard? 

Start with the intent of your company and work backwards. And then also when you’re thinking about your people on your team, what’s hard for them? Where do they spend a lot of their time? And where are they spending time that they’re not enjoying? 

And you start to get into like more manual tasks, and you start to get into like questions that are hard to answer, whether it’s business questions, or just where do I find this piece of information? 

And I think focusing on the intent of your business, and also the experience of your people, and figuring out where there’s friction on those are really good places to start as you attempt to answer those questions.

JB: So what are some of the specific metrics that could be used to show the value of AI?

MC: There’s lots of different types of metrics and there’s different frameworks that people use to think about metrics. Input and output metrics is one common way to break it down. Input metrics are something you can actually change that you have control over and output metrics are the things that you’re actually trying to impact. 

So a common example is customer experience. If we want to improve customer experience, how do we measure that? It’s a very abstract concept. You have customer experience scores and things like that. But it’s an output metric, it’s something you tangibly want to improve and change, but it’s hard to do so. And so an input metric might be how quickly we resolve support tickets. It’s not necessarily telling you you’re creating a better customer experience, but it’s something you have control over that does affect customer experience? 

I think with AI, you have both input and output metrics. So if you’re trying to actually improve productivity, that’s a pretty nebulous thing to measure. And so you have to pick these proxy metrics. So how fast did the test take before versus how fast it takes now? And it really depends on the use case, right? So if you’re talking about productivity, time saved is going to be one of the best metrics. 

Now a lot of AI is also focused not on productivity, but it is kind of experiential, right? It’s a chatbot. It’s a widget. It’s a scoring mechanism. It’s a recommendation. It’s things that are intangible in many ways. And so you have to use proxy metrics. And I think, interactions with AI is a good starting place. 

How many people actually saw the AI recommendation? How many people actually saw the AI score? And then was a decision made? Or was an action taken because of that? If you’re building an application of almost any kind, you can typically measure those things. Did someone see the AI? And did they make a choice because of it? I think if you can focus on those metrics, that’s a really good place to start.

JB: So if a team starts measuring some specific metrics, and they don’t come out favorably, is that a sign that they should just give up on AI for now? Or does it just mean they need to rework how they’re using it, or maybe they don’t have some important foundations in place that really need to be there in order to meet those KPIs?

MC:  It’s important to start with the recognition that not meeting a goal at your first try is okay. And especially as we’re all very new to AI, even customers that are still evolving their analytics practices, there are plenty of misses and failures. And that’s okay. So those are great opportunities to learn. Typically, if you’re unable to hit a metric or a goal that you’ve set, the first thing you want to go back to is double check your use case.

So let’s say you built some AI widget that does a thing and you’re like, I want it to hit this number. Say you miss the number or you go too far over it or something, the first check is, was that actually a good use of AI? Now, that’s hard, because you’re kind of going back to the drawing board. But because we’re all so new to this, and I think because people in organizations struggle to identify appropriate AI applications, you do have to continually ask yourself that, especially if you’re not hitting metrics, that creates kind of an existential question. And it might be yes, this is the right application of AI. So if you can revalidate that, great. 

Then the next question is, okay, we missed our metric, was it the way we were applying AI? Was it the model itself? So you start to narrow into more specific questions. Do we need a different model? Do we need to retrain our model? Do we need better data? 

And then you have to think about that in the context of the experience that you are trying to provide. It was the right model and all of those things, but were we actually delivering that experience in a way that made sense to customers or to people using this?

So those are kind of like the three levels of questions that you need to ask: 

  1. Was it the right application? 
  2. Was I hitting the appropriate metrics for accuracy?
  3. Was it delivered in a way that makes sense to my users? 

Check out other recent podcast transcripts:

Why over half of developers are experiencing burnout

Getting past the hype of AI development tools