Working toward AIOps maturity? It’s never too early (or late) for platform engineering


Until about two years ago, many enterprises were experimenting with isolated proofs of concept or managing limited AI projects, with results that often had little impact on the company’s overall financial or operational performance. Few companies were making big bets on AI, and even fewer executive leaders lost their jobs when AI initiatives didn’t pan out.

Then came the GPUs and LLMs.

All of a sudden, enterprises in all industries found themselves in an all-out effort to position AI – both traditional and generative – at the core of as many business processes as possible, with as many employee- and customer-facing AI applications in as many geographies as they can manage concurrently. They’re all trying to get to market ahead of their competitors. Still, most are finding that the informal operational approaches they had been taking to their modest AI initiatives are ill-equipped to support distributed AI at scale.

They need a different approach.

Platform Engineering Must Move Beyond the Application Development Realm

Meanwhile, in DevOps, platform engineering is reaching critical mass. Gartner predicts that 80% of large software engineering organizations will establish platform engineering teams by 2026 – up from 45% in 2022. As organizations scale, platform engineering becomes essential to creating a more efficient, consistent, and scalable process for software development and deployment. It also helps improve overall productivity and creates a better employee experience.

The rise of platform engineering for application development, coinciding with the rise of AI at scale, presents a massive opportunity. A helpful paradigm has already been established: Developers appreciate platform engineering for the simplicity these solutions bring to their jobs, abstracting away the peripheral complexities of provisioning infrastructure, tools, and frameworks they need to assemble their ideal dev environments; operations teams love the automation and efficiencies platform engineering introduces on the ops side of the DevOps equation; and the executive suite is sold on the return the broader organization is seeing on its platform engineering investment.

Potential for similar outcomes exists within the organization’s AI operations (AIOps). Enterprises with mature AIOps can have hundreds of AI models in development and production at any time. In fact, according to a new study of 1,000 IT leaders and practitioners conducted by S&P Global and commissioned by Vultr, each enterprise employing these survey respondents has, on average, 158 AI models in development or production concurrently, and the vast majority of these organizations expect that number to grow very soon.

When bringing AIOps to a global scale, enterprises need an operating model that can provide the agility and resiliency to support such an order of magnitude. Without a tailored approach to AIOps, the risk posed is a perfect storm of inefficiency, delays, and ultimately, the potential loss of revenue, first-market advantages, and even crucial talent due to the impact on the machine learning (ML) engineer experience.

Fortunately, platform engineering can do for AIOps what it already does for traditional DevOps.

The time is now for platform engineering purpose-built for AIOps

Even though platform engineering for DevOps is an established paradigm, a platform engineering solution for AIOps must be purpose-built; enterprises can’t take a platform engineering solution designed for DevOps workflows and retrofit it for AI operations. The requirements of AIOps at scale are vastly different, so the platform engineering solution must be built from the ground up to address those particular needs.

Platform engineering for AIOps must support mature AIOps workflows, which can vary slightly between companies. However, distributed enterprises should deploy a hub-and-spoke operating model that generally comprises the following steps:

  • Initial AI model development and training on proprietary company data by a centralized data science team working in an established AI Center of Excellence

  • Containerization of proprietary models and storage in private model registries to make all models accessible across the enterprise

  • Distribution of models to regional data center locations where local data science teams fine-tune models on local data

  • Deployment and monitoring of models to deliver inference in edge environments

In addition to enabling the self-serve provisioning of the infrastructure and tooling preferred by each ML engineer in the AI Center of Excellence and the regional data center locations, platform engineering solutions built for distributed AIOps automate and simplify the workflows of this hub-and-spoke operating model.

MORE FROM THIS AUTHOR: Vultr adds CDN to its cloud computing platform

Mature AI involves more than just operational and business efficiencies. It must also include responsible end-to-end AI practices. The ethics of AI underpin public trust. As with any new technological innovation, improper management of privacy controls, data, or biases can harm adoption (user and business growth) and generate increased governmental scrutiny.

The EU AI Act, passed in March 2024, is the most notable legislation to date to govern the commercial use of AI. It’s likely only the start of new regulations to address short and long-term risks. Staying ahead of regulatory requirements is not only essential to remain in compliance; business dealings for those who fall out of compliance may be impacted around the globe. As part of the right platform engineering strategy, responsible AI can identify and mitigate risks through:

  • Automating workflow checks to look for bias and ethical AI practices

  • Creating a responsible AI “red” team to test and validate models

  • Deploying observability tooling and infrastructure to provide real-time monitoring

Platform engineering also future-proofs enterprise AI operations

As AI growth and the resulting demands on enterprise resources compound, IT leaders must align their global IT architecture with an operating model designed to accommodate distributed AI at scale. Doing so is the only way to prepare data science and AIOps teams for success.

Purpose-built platform engineering solutions enable IT teams to meet business needs and operational requirements while providing companies with a strategic advantage. These solutions also help organizations scale their operations and governance, ensuring compliance and alignment with responsible AI practices.

There is no better approach to scaling AI operations. It’s never too early (or late) to build platform engineering solutions to pave your company’s path to AI maturity.


You may also like…

Platform Engineering is not (just) about infrastructure!

The real problems IT still needs to tackle for platforms