Why should CIOs plan for mulit-cloud and on-premise abstraction?

Apr 05, 2023

Infrastructure is perceived to be a big hurdle. 54% of enterprises think of infrastructure as the top Ai hurdle in 2023 vs 61% citing data as their top challenge [Source: Run.ai survey of 450 industry professionals across the US and western EU]

The high failure rate of deployment initiatives. Orgs deploying less than half their models to production increased from 77% to 88% in 2022 [Source: Run.ai survey of 450 industry professionals across the US and western EU]

The high lead time of ai model deployment. Greater than 50% of enterprises spend between 8-90 days to deploy an Ai model. 14% do it in < 7 days [Source: Google]

So what is the underlying reason?

To be honest, while there are model-serving complexities that pose challenges in packaging up inferences for different deployment environments, the problem statement is equally applicable to non-Ai workloads that enterprises build and deploy.

Cloud architecture has given rise to cloud-native architecture for software development. On-premise application development is moving along to adopt cloud-native architecture. But it is not enough.

A clean abstraction is needed. A build-once-run-anywhere technology. Like what Java did to compliers. Like what virtualization did to workloads running on bare metal.

An abstraction that would orchestrate and schedule cloud-native containerized workloads consistently across on-premise, and popular public cloud environments offered by the hyperscalers like AWS, Azure, GPC, IBM Cloud, and many others. The same abstraction is also required for self-hosted Kubernetes environments.

Compute, storage and network components in different hyperscalers work somewhat differently - think instances, filesystems, object stores, block stores, load balancers, service meshes, orchestration layers like EKS (Kubernetes on AWS), AKS (Kubernetes on Azure), and GKS (Kubernetes on GCP). Tooling and automation in these environments tend to work differently. An enterprise invests, learns, trains, and uses these tools only to get locked into these environments. Furthermore, there are a couple of other reasons why on-premise orchestrated environments come into the picture in addition to the public clouds. The compliance departments require data hosting restrictions on some lines of business or certain data sets which causes model training and post-training optimizations to occur in on-premise environments. Some enterprises have adopted a strategy to utilize COTS (components off the shelf) in public clouds and specialized hardware like FPGAs and specialized accelerators in on-premise environments. These are critical technological drivers for embracing cloud-agnostic and multi-cloud abstraction in application development practices.

Another significant business driver for embracing this abstraction is the need for the deployment of enterprise solutions for customers in localized edge environments. The customers of the enterprises also execute to a cloud and/or edge strategy. The enterprise needs to then deliver software solutions in these localized edge environments in a compliant manner.

Therefore, the cloud has become the new server. All the reasons for virtualization on a server apply to the cloud as well.

Management teams change. Bringing in different infrastructure/cloud strategies and causing churn on the development teams, developer toolchains, processes, and productivity in implementing solutions. Costs and the time to market are significant impacts on the business when this happens.

If you paid attention, we have not talked much about Artificial Intelligence (Ai).

I believe Ai presents an excellent opportunity for enterprises to get this abstraction right because the journey is just starting out.

This is the vision encapsulated within our Ai-MicroCloud. I’ll discuss the value of this abstraction more specifically for Ai workloads in a subsequent post. Thank you for reading.

mouli_narayanan’s Substack

Discussion about this post