The first Amazon Echo, all the way back in 2014, was pitched as a device for a few simple things: play music, ask basic questions, get the weather forecast. Since then, Amazon has found a few new things for people to do, like control smart home devices. But ten years later, Alexa is still mainly concerned with playing music, asking basic questions and checking the weather. And that’s largely because, even as Amazon made Alexa ubiquitous in devices and homes around the world, it never convinced developers to care.
Alexa was never meant to have an app store. Instead, it had “skills” that Amazon hoped developers would use to connect Alexa with new functionality and information. Developers weren’t supposed to build their own things on top of an operating system, they were supposed to build new things for Alexa to do. The difference is subtle but important. Our phones are mostly a series of disconnected experiences — Instagram is a universe completely separate from TikTok and Snapchat and your calendar app and Gmail. That just doesn’t work for Alexa or any other successful assistant. If it knows your to-do list but not your calendar, or your favorite pizza but not your credit card number, there’s not much it can do. It needs access to everything and all the necessary tools it has at its disposal to get things done for you.
In Amazon’s dream world, where ambient computing is perfect and everywhere, you would simply ask Alexa a question or give it an instruction: “Find something fun to do this weekend.” “Book my train to New York next week.” “Enlighten me on deep learning.” Alexa would have access to all the apps and information sources it needs, but you never have to worry about that; Alexa would just handle it as needed and give you the answers. There are a thousand complicated questions about how it actually works, but that’s still the big idea.
“Alexa Skills made it fast and easy for developers to build voice-enabled experiences, unlocking a whole new way for developers and brands to connect with their customers,” Amazon spokesperson Jill Tornifoglio said in a statement. Customers use them billions of times a year, she said, and as the company embraces generative AI, “we’re excited about what’s next.”
In retrospect, Amazon’s idea was almost exactly right. All these years later, OpenAI and other companies are also trying to build their own third-party ecosystems around chatbots, which is just another take on the idea of an interactive interface for the web. But for all its insider knowledge of the AI revolution, Amazon has never figured out how to make skills work. It never solved some fundamental problems for developers, never cracked the user interface, and never found a way to show people what their Alexa device could do if they just asked.
In retrospect, Amazon’s idea was almost exactly right
Amazon has certainly done its best to deliver on skills. The company steadily rolled out new tools to developers, paid them in AWS credits and cash when their skills were used (though it recently stopped doing that), and tried to make skill development virtually effortless. And on some level, all that effort has paid off: Amazon says there are more than 160,000 skills available for the platform. That pales in comparison to the millions of app store apps on smartphones, but it’s still a large number.
However, the interface for finding and using all those skills has always been a mess. Let’s take a simple example: if you ask Alexa to order pizza for you, they might tell you they have a few skills for that and recommend Domino’s. (If you’re wondering why Amazon would choose Domino’s and not Pizza Hut or DoorDash or some other pizza-calling service? Good question. No idea.) You answer yes. “Here’s Domino’s,” says Alexa. A moment later: “Here’s the Domino’s skill, by Domino’s Pizza, LLC.” Just a little while longer: “To connect your Domino’s Pizza profile, go to the Skills setting in your Alexa app. To place a guest order, we need your email address. Enable the ‘Email address’ permission in your Alexa app.” At this point you need to find a hidden setting in an app that you may not even have on your phone; it would be much easier to just go to Domino’s website. Or call the place.
If you know the skill you’re looking for, the system is a little better. You can say “Alexa, open nature sounds” or “Alexa, turn on Jeopardy,” and the skill with that name will open. But if you don’t remember the skill called “Easy Yoga,” asking Alexa to start a yoga workout won’t get you anywhere.
There are small points of friction everywhere in the system. Once you activate an ability, you must explicitly say “stop” or “cancel” to exit the ability and use another one. You can’t do things easily with different skills. I would like to check the price of my pizza, but Alexa won’t allow it. And perhaps most frustrating of all, even once you’ve enabled a skill, you still have to address it specifically. Saying “Alexa, ask AnyList to add spaghetti to my shopping list” isn’t a seamless interaction with an omniscient assistant; that means you have to learn a computer’s incredibly specific language to use it properly.
It turns out that many of the most popular Alexa skills have two things in common: they’re simple question and answer games, and they’re made by a company called Volley. By Song quiz Unpleasant Danger Unpleasant Who wants to become a millionaire Unpleasant Are you smarter than a 5th grader?Volley is one of the companies that has discovered how to develop skills that really work. And Max Child, co-founder and CEO of Volley, says showcasing your skills is one of the most important – and toughest – parts of the job.
“I think one of the underrated reasons the iOS and Android app stores are so successful is because Facebook ads are so good,” he says. The pipeline from a hyper-targeted ad to an app install has been relentlessly perfected over the years, and something like that simply doesn’t exist for voice assistants. The closest equivalent is probably people asking their Alexa devices what they can do – which Child says is happening! – but there’s just no competition with infeed ads and hours of social scrolling. “Because you don’t have that hyper-targeted marketing, you end up having to do broad marketing and build broad games.” Hence games like Danger And Millionairemajor brands that appeal to almost everyone.
One way Volley makes money is through subscriptions. The full one Danger experience costs $12.99 per month, for example, and like so many other modern subscriptions, it’s a lot easier to subscribe than it is to cancel. It’s also one of the few ways to monetize a skill: developers are allowed to have audio ads for certain skills, or ask users to add their credit card information directly like Domino’s does, but for a voice-first user to turn their phone on grabbing it and scrolling through the settings is a high bar to clear. Advertisements are only useful on a large scale. There was a brief moment when many media companies thought the so-called ‘flash briefings’ could be a hit, but it didn’t yield much.
These are not exactly unique challenges. Mobile app stores have similarly major discovery issues, monetization issues, sketchy subscription systems, and more. It’s just that with Alexa, the solution seemed so enticing: you shouldn’t even need an app store. You should be able to just ask for whatever you want, and Alexa can do it for you.
With Alexa, the solution seemed so tempting: you wouldn’t even need an app store
Ten years later, it seems that an all-powerful, omni-compatible voice AI may be impossible to achieve. If Amazon were to make everything so seamless and fast that you never even have to know you’re dealing with a third-party developer and your pizza magically appears at your door, it raises huge privacy concerns and questions about how Amazon chooses it. providers. If it asks you to choose all those default settings for yourself, it signs up every new user for an awful lot of busywork. If it allows developers to own and exploit even more of the experience, it destroys the surrounding simplicity that makes Alexa so appealing in the first place. Too much simplicity and abstraction is actually a problem.
However, we are at something of a turning point. Ten years after its launch, Alexa is changing in two important ways. One is good news for the future of skills, the other may be bad. The good thing is that Alexa is no longer just a voice or even voice-first experience – as Echo Show and Fire TV devices have become more popular, more people are interacting with Alexa with a nearby screen. That could solve many interaction problems and give developers new ways to demonstrate their skills to users. (Screens are also a great place to advertise your skills, a fact Amazon may know all too well.) If Alexa can show you things, it can do so much more.
Child says already the majority of Volley’s players are on a device with a screen. “We have been working on smart TVs for a very long time,” he says, laughing. “Every smart TV sold now has a microphone in the remote control. I really think that casual voice games… can make a lot of sense, and I think they can be even more immersive.
Amazon is also about to redesign Alexa around LLMs, which could be the key to making this all work. A smarter, AI-powered Alexa could finally understand what you’re actually trying to do, eliminating some of the tricky syntax required to use skills. It can understand more complicated questions and multi-step instructions and use skills on your behalf. “Developers now only need to describe the capabilities of their device,” Amazon’s Charlie French said at Amazon’s AI Alexa launch event last year. “They don’t have to try to predict what a customer is going to say.” Amazon is just one of the companies promising that LLMs can do things on your behalf without any additional work required; Should skills even exist in that world, or will the model just figure out how to order pizza?
There are indications that Amazon is lagging behind in its AI work and that plugging in a language model won’t suddenly make Alexa great. (Even the best LLMs feel like they’re only a little bit close to good enough to do this kind of thing.) But even if that’s true, the bigger question only becomes more important: What can virtual assistants really do for us? And how do we ask them to do that? The correct answers are ‘anything you want’ and ‘however you want’. That requires a lot of developers to give Alexa new powers. That requires Amazon to give them a product and a company worth having.