Our investment in Canotic: the AI compiler
The resources invested into machine learning over the past five years augur a profound transformation of software development, in which the structure and operation of computer programs for many valuable problems are not designed by engineers but discovered by neural networks and related architectures. This has been called “Software 2.0”, and Mosaic is not alone in thinking it has the potential, like the previous version, to eat the world.
However, the new paradigm is currently highly exclusive. Machine learning systems require curated, labeled data in large volumes. Outside of companies with established data science teams, most valuable data is unstructured and unlabeled (as is the overwhelming majority of data generated globally). Many companies have sprung up over recent years to serve this new market; indeed, there are entire data factories in China refining the ‘new oil’. The process retains a 19th century quality, in terms of the kind of grinding, repetitive labour required, which makes it less appealing to individual domain-experts and means labels will likely only scale linearly with the number of labelers.
We were very excited, then, to discover that Brad Cordova and Henry Setiawan were working on a deeply considered solution: an “AI compiler” called Canotic. We knew Brad as the co-founder of TrueMotion, a pioneering applied ML success story, and Henry from multiple pioneering machine learning systems built from scratch at Google Brain, Google Research and Microsoft Research.
Canotic solves for both generating labels at scale and the quality of the labeler’s work, while beating incumbents on quality, cost and speed.
There are three core technical principles to their approach which really impressed us. First, to abstract the core components of the data labeling process into modules, enabling the construction of data labeling processes of arbitrary and dynamic complexity. Second, to consider all possible sources of labels: however noisy or inaccurate, the Canotic approach enables the extraction of signal, by carefully routing labeling work to the optimal source. Third, to train a meta-model that can generalize and scale beyond these initial labeling sources. The intricate technical innovation allows a product which, true to the team’s devotion to abstraction, hides the mechanics from the user, who can choose simply to post data to an API and receive back labels, faster, cheaper and more accurate than anywhere else.
The power and generality of Brad and Henry’s principled decomposition of this hard problem is indicated by the range of delighted customers they’ve already acquired, in industries ranging from agriculture to enterprise software to manufacturing, with data ranging from satellite imagery to meeting notes.
The gains from machine learning in recent years have accrued to the small number of organisations with the budget and access to quality data labels. We are thrilled to be co-leading a $5M round with Pioneer Square Labs to help make machine learning truly accessible to any company with data.
Canotic is based in Berlin and they would love to hear from talented product and engineering folks who are interested to join them on their journey.
Toby & Bart