Serverless on Kubernetes for AI/ML

6 min readMar 1, 2021

by Tom Corcoran, a Solution Architect at Red Hat

AI/ML workflows

Successful Artificial Intelligence and Machine Learning (AI/ML) workflows can be complex and may involve many parties and stages. So much so that many organisations fail to realise the business potential of AI/ML. Reflecting this, Gartner in 2019 colourfully predicted that through 2020, 80% of AI projects will remain alchemy, run by wizards whose talents won’t scale in the organization.

The reasons for this widespread shortfall in the realisation of business value and manyfold. But one of the primary reasons that organisations tend to put a strong emphasis on the data science or machine learning component — clearly the core activity. But often this emphasis leads to a neglect of the many areas surrounding it. This diagram produced by Google, in their publication Hidden Technical Debt in Machine Learning Systems highlights the many responsibilities that need to be executed well for successful AI/ML workflows and projects:

McKinsey, in the State of AI 2020, concluded that organisations that realise above-average business value from AI/ML are much more likely than others to have built a standardised end-to-end platform for AI/ML workflows.

In a series of previous blogs (see references below), I argue that the most appropriate platform to use is Kubernetes, and I illustrate this with examples from Red Hat OpenShift. In that series, we analyse how Kubernetes alleviates several challenges in the areas Google exposes in this diagram, particularly in the areas of

Data preparation
Applying GPU acceleration to AI/ML workloads for Resource Management
And the value of an MLOPS workflow.

In this blog post, we examine another responsibility that Google exposes above — Serving Infrastructure. In particular, we examine how an open-source Kubernetes native serverless solution such as Knative can be a particularly effective approach that fits well into modern cloud-native architectures.

Serving Infrastructure

When we talk about Serving infrastructure, in the context of AI/ML workflows, in modern container-based platforms such as Kubernetes, we’re generally referring to the exposure of AI/ML models using RESTful APIs. Solutions such as Seldon, Tensorflow Serving and others make it easy to expose such APIs and add value to Intelligent applications by allowing them to make inference calls to these models.

For the purposes of today’s discussion, I would like to expand the exposition on the value of KNative Serverless beyond the serving of models to also include consuming applications.

Why Kubernetes and Knative for model serving and consumption?

First, let’s level set — what is Knative? Red Hat, one of its leading contributors, describes it as an open-source community project which adds components for deploying, running, and managing serverless, cloud-native applications to Kubernetes. Knative is available under two deployment options

Knative Serving
In Serving mode, Knative manages autoscaling (including scaling to zero during periods of no traffic), as well as revision control and tracking.
Knative Eventing
Knative Eventing allows software components to publish events or subscribe to events published by others. These software components include AI/ML model serving services and model consuming applications, and Eventing has advantages for both as we’ll explain below.

So what advantages does serverless and in particular Kubernetes based Knative bring to AI/ML workloads and workflows? As a producer or a consumer of AI/ML, why should I care?

There are many reasons — let’s discuss some of them.

Serverless for AI/ML
One of the big draws of the Serverless paradigm is that the serverless engine abstracts away the tasks of provisioning and managing servers. This means the developer/data scientist needs to worry less about these concerns and can concentrate more on writing code and creating models.
This drives real value as they can spend more time on what they’re most productive at.
Kubernetes based Knative for AI/ML
There are several advantages of using an open-source Kubernetes native serverless implementation such as Knative.
While there are mature serverless alternatives to Knative, they tend to be specific to public cloud providers, such as Amazon Web Services’ Lambda, Google Cloud’s Functions, Microsoft’s Azure Functions. They tend to offer little interoperability across clouds and infrastructures. Contrast that to Knative which offers a consistent technology stack across public clouds and the data centre. Anywhere Kubernetes runs, Knative runs. So Knative allows you to avoid cloud vendor lock-in. Furthermore, when using enterprise-grade supported Kubernetes such as OpenShift from Red Hat, the entire experience is identical across all major public clouds, as well as on-premises-based virtual and physical and increasingly at the edge.
Knative Serving for AI/ML
Knative Serving provides two broad capabilities that are interesting and useful as a deployment engine for AI/ML workloads
Autoscaling and scale-to-zero
Knative Serving’s architecture provides autoscaling for the containers ( or pods ) it manages. Importantly this includes the ability to scale to zero in quiet periods. Many AI/ML workloads experience vastly different traffic loads at different times of the day and week. e.g. a credit decisioning model may have the majority of its traffic during normal business hours, driven by staff activity. Similarly in the medical arena, an image recognition model designed to detect tumours may have very occasional use. The ability to automatically scale up in busy periods and scale down to idle can drive significant cost savings, particularly when deployed on public clouds where such idle times could consume zero costs.
This same scale-to-zero paradigm can also be applied to intelligent applications consuming models, allowing them to also enjoy the same cost savings.
Staged Releases.
Knative Serving includes traffic managing capabilities from a project Knative includes called Istio. These include end-to-end encryption, retries, circuit breaking, load balancing and many more. One particularly useful service Knative borrows from Istio is staged-releases — or the ability to declaratively configure the percentage of traffic that various revisions of your model serving component get. So you may have a new revision to your model that you want to test in the wild but initially only on a small percentage of traffic. Then gradually send more and more traffic to it as it hopefully proves its superiority over the current one, which is gradually retired.
This is a powerful capability that also allows you to abort a new release if it proves less accurate than required.
Knative Eventing for AI/ML
Knative Eventing allows systems and workflows to be designed that conform to modern event-driven architectures. Knative Eventing architectures enable subscription to events from various sources such as GitHub, Apache Camel, Apache Kafka topics to name just a few. When events we subscribe to are fired, they are dispatched to event sinks. A popular event sink type is a Knative Service, which can be a model serving component such as Seldon or Tensorflow Serving.
There are two big advantages to event-driven architectures facilitated by Knative Eventing:

These architectures allow components to function more independent of one another and thereby bring greater resiliency through reduced coupling.
The AI model serving system can fit into and be consistent with a wider event-driven enterprise architecture that includes not only AI/ML workflows but also CI/CD and intelligent applications consuming AI/ML models. All enjoying many of the same Kubernetes cluster-based services we discuss in our other blogs (see References below)

Conclusion

Model Serving infrastructure is a key element to get right in order for AI/ML to deliver value to the business. In this article, we explore how serverless and in particular an open-source infrastructure agnostic serverless implementation like Knative can be an excellent choice for model serving and consuming applications. Its primary advantages include

allowing data scientists and developers concentrate on code and model training/building, free of infrastructural concerns
avoidance of cloud vendor lock-in
Cost savings by scaling-to-zero during idle times.
Plugging into modern resilient Kubernetes based event-driven architectures.

All of this drives cost savings and business value — a huge challenge in AI/ML according to Gartner and other thought leaders.

References

Gartner — Top Strategic Predictions for 2019
McKinsey — state of AI 2020
Google — Hidden Technical Debt in Machine Learning Systems
Blog-Knative-for DL-inference
Seldon.io KNative Eventing

My blog posts on AI/ML

Kubernetes: the Savior of AI/ML Business value?
Business Centric AI/ML With Kubernetes — Part 2: Data Preparation
Business Centric AI/ML With Kubernetes — Part 3: GPU Acceleration
Business Centric AI/ML With Kubernetes — Part 4: ML/OPs
Kubernetes — a Platform approach to AI/ML