MLOps: The end of end-to-end

Published on
February 20, 2020
by
No items found.
MLOps: The end of end-to-end

Machine Intelligence is one of our main themes at Mosaic. That means we spend much of our time researching and meeting ML-driven applications and enabling infrastructure.

Looking back at 2019, one of the main themes of our conversations was, how do organisations implement ML into their core technology? The question was not simply one about hiring - as ML engineers are notably difficult to find - but rather about structuring their internal processes.

Was their data labeled such that it would produce meaningful results? How could they make feature engineering repeatable and recyclable? How could they monitor the performance of models after they are deployed?

This process of implementing ML we call "MLOps". We'll talk more about that term in a moment.

To learn about the space, we have met with dozens of founders and experts. The story we heard about MLOps tools and teams is broadly analogous to the one we heard about serverless. Operationalising machine learning processes at an enterprise scale, like deploying serverless applications, is still nascent. It is mainly larger enterprises with access to dedicated engineering resources for R&D which are building out their infrastructure. And much of what is being built in enterprises today is born out of open source libraries.

What was unique about our conversations about MLOps, though, was how enterprises were thinking about building their infrastructure. There has been much conversation among founders and VCs alike about providing companies with an "end-to-end" MLOps experience, but based on what we heard, that is not what enterprises want.

As promised, let's take a step back. What do we mean when we say MLOps? If your first thought is DevOps for Machine Learning, that is a fair start. MLOps encompasses the best practices and tools to bring ML into production, including the most efficient ways for engineers and data scientists to collaborate.

Kyle Gallatin has created two images illustrating the need for MLOps, and what good MLOps supports.

The first elicits nervous laughter:

   

 

Engineers and data scientists are often poorly connected when developing machine learning applications.

The second reminds you why ML Ops remains an emergent category:

   

 

Building and deploying models is a circular and complex process.

Errors in this process can have damaging consequences. Deloitte's Chief Cloud Strategy Officer shared a frustrating postmortem from one company whose ops team could not manage its training database. We don't agree with his conclusion that machine learning systems offer "more risk than reward", but we understand where his sentiment comes from.

In addressing ML's inherent complexity, open source projects have flourished. Databricks' MLFlow v1.0 was released in June 2019, and supports model management and, since October, model governance (not the first to open source governance, but an important addition). Kubeflow has had the simplification of ML development as a goal since December 2017, providing a comprehensive toolkit for ML ops on top of Kubernetes. TensorFlow Extended (TFX) was built for continuous training for ML pipelines.

Each of these resources cannot be used alone. Teams must decide which stack best fits their needs, sometimes before identifying what their long term needs will be. The good news is that the builders of these open source projects are aware of this reality and keen to collaborate. Of course, some are better suited to working alongside each other than others, depending on the ecosystems (AWS, Spark, etc.) to which they are tied.

Enterprises get incredible value from these open source tools. This fact brings us directly back to the main learning we teased at the beginning of this post: do Enterprises want an end-to-end solution?

Companies with ML resources, which tend to be the same that can afford powerful ML Ops tools, are already building part of the pipeline themselves.

We saw this to be true across tech verticals. One travel company was clear that they would rather build than buy, and has crafted a team of ops engineers that work with their data scientists and software developers to facilitate ML deployments. Another senior engineer cited Lyft and Bloomberg as being examples of companies building their own ML infrastructure. Yet another payments business knew they did not have the resources to build everything internally, so they built most of their ML pipeline, and then looked externally to fill in the gaps.

Given this reality today, what should founders build that companies are not building themselves, and how much they can give away for free before cannibalizing their sales growth?

First, identify an underserved (no pun intended) problem, and stay focused on that. A few examples: Our own portfolio company Canotic uses AI + humans to generate, structure, and label data. Seldon enables enterprises to deploy and manage models at scale on Kubernetes, and created the Alibi open-source software libraries for ML explanations and monitoring. Perceptilabs is dead set on simplifying model building and training. Pecan.ai transforms raw data to reduce time-to-model (more generalisable for AI). While some of these tools enable other aspects of the ML development process, those features are not mandatory; they have intelligently built their platforms to integrate with users' existing infrastructure.

Second, be careful about what you open source. Red Hat's CEO was very clear in an earnings call last year about how companies that start with open source software can make money: "[There have] only been two successful models for monetizing open source. One is the public cloud model...offering open source as a service, and then Red Hat's model, which is to be a significant contributor and offer on premise and drive roadmaps for customers." It sounds simple, and yet, decades after Red Hat's founding, many companies building evangelical developer followings struggle (or simply don't try) to monetise. Thus, founders must be able to articulate, ideally from the first software release, which features will and won't be free, even if they are not ready to introduce a paid version.

Exceptional engineers and data scientists are tackling each stage of the ML Ops pipeline. At Mosaic, we continue to be excited by highly differentiated tools that are thoughtful about where and how they will integrate into enterprises' existing infrastructures - without trying to solve everything at once.

Our team will be following up on this post with thoughts on related topics, including explainability and standards in ML Ops.

Founders - if we have not yet met, please do reach out. We would love to hear about what you are building and share perspectives.

Juliet & Jacob

Many thanks to the humans who read and reviewed this post !