Of Amazon’s top 50 best-sellers in “Children's Vaccination & Immunisation”, close to 20 are by anti-vaccine polemicists, and 5 are novels about fictional pandemics. This poses two questions. First, how much content moderation should a universal bookshop do? Second, does Amazon really know what it sells?
The content moderation questions here are closely related to those that applied when Facebook and Twitter banned the US president. A single newspaper or a bookshop has no obligation to give you a platform, but there are other newspapers and other bookshops - what does it mean if there are only three newspapers (or only three with significant reach) and they all ban you? Should they allow you to be on the platform, but not ‘amplify’ you either with an ‘algorithm’ or something as mechanical as a best-seller list (and of course being in the list will increase your sales, so that’s also a moderation choice). What books, exactly, do we want Amazon to ban, or to ‘down-rank’? Who decides? What if Amazon put those books in ‘conspiracy theories’ instead? I don’t think we have a settled consensus.
More interesting to me in this case, though, is the fact that five of the top 50 are not about “Children's Vaccination & Immunisation” at all - they’re novels! This is a much more general problem, that I think that reflects a pretty fundamental aspect of Amazon as a retailer - it does not, in important ways, actually know what it sells, and that has always been inherent to the model.
There’s an old cliché that ecommerce has infinite shelf space, but that’s not quite true for Amazon. It would be more useful to say that it has one shelf that’s infinitely long. Everything it sells has to fit on the same shelf and be treated in the same way - it has to fit into the same retailing model and the same logistics model. That’s how Amazon can scale indefinitely to new products and new product categories - it doesn’t need to create new infrastructure and new retailing to sell new things (and why grocery, which requires an entirely different logistics chain, is such a departure). Instead, it turns products into packets in a network, and the whole point of a packet-switched network is that you don’t have to know what the payload is. That means it’s never really had any idea what any of those SKUs are - they’re a number, a size and a weight. It can say ‘people who bought X also bought Y’, but it doesn’t really know what X and Y are. And, of course, this is compounded by Marketplace - 60% of what’s sold on Amazon isn’t sold by Amazon.
This has always been the long-term Amazon market share question - how many product categories can be converted to Amazon’s model of pure commodities and how many need a different kind of retailing? More generally, this has been the story of ecommerce for the last 25 years - originally we thought there were some things people would never buy online, but in fact the question was how to convert a product from a high-touch experience in a shop to a low-touch experience online. Does it turn out that you don’t actually need that high-touch experience at all? Or do you need to replace it with something else - free returns, or recommendation, or video, or something else again? And Casper discovered that you don’t need to sit on a mattress before buying it, but also that the margin was destroyed by the returns (and the explosion in competition).
The puzzle for Amazon, perhaps, is that each Amazon fulfilment centre has several million SKUs and it doesn’t really know what they are, in any profound sense. The reason a Stephen King thriller appears on a list of books about Children's Vaccination & Immunisation is that Amazon is effectively still indexing its products the way search worked in the 1990s. It’s a ‘BOOK’, and we can see the keywords ‘CHILD’ and ‘VIRUS’. That’s probably a gross over-simplification, and unfair to all the clever people at Amazon working hard managing categories and building product indices. But explain this.
Part of the problem is that Amazon’s data about the products comes from the suppliers. It was designed in a world of barcodes and mainframes, and in a world where most of this didn’t really matter, because you weren’t trying to sell millions of SKUs in one shop. The customer was never supposed to see this data, but now it’s how you sell. It’s also not a problem unique to retail - this is my carefully curated collection of ‘Rock music’, courtesy of record company metadata (half of which still runs on mainframes).
Hence, the moonshot project for Amazon is to move from taking the raw data from the supplier to having a sense of what the thing itself really is. That’s not simple (!) or one project, and even Google fudged it by realising you could use hyperlinks as a vast mechanical turk. And what does ‘is’ mean, anyway? But if Amazon actually knew what was in the packets, and indeed knew what I bought, then it might be a rather different kind of retailer.
Benedict Evans is a Venture Partner at Mosaic Ventures and previously a partner at a16z. You can read more from Benedict here, or subscribe to his newsletter.