“Alexa, how are you different than Siri?” “I’m more of a home-body”

I’m away from my desk, so I guess I can’t ask Alexa. No problem, I’ve got an iPhone in my pocket.

“Hey Siri, what’s the status of my Amazon order?” “I wish I could, but Amazon hasn’t set that up with me yet.” Doh!

IPAs (intelligent personal assistants*) are in their infancy, but they are a next major step in human-computer interaction. With the expected concurrent growth of IoT and connected devices, IPAs will be everywhere soon. Consider that it is easier to fit a small mic and speaker into a device than a screen and keyboard, and often easier to interact with such via voice outside of the desktop environment.

However, as the highly-contrived (after all I actually am at my desk typing this, and Alexa is giving me dirty looks) scenario above illustrates, IPAs have different capabilities, and different strengths and weaknesses. While Alexa and Siri both want to be my concierge, I’m more likely to talk to Watson when I want to discuss cancer treatments or need to pwn Jeopardy. When I’m hungry after midnight, it’s TacoBot to the rescue.

As a user, I already interact with more than one IPA, and over time this number is only going to grow. I want to use the IPA that is both best and most convenient for my immediate need. I have no interest in being restricted to a single IPA vendor’s ecosystem; likewise I don’t want to have to juggle endpoints and IPAs for every little task. And Taco Bell wants to craft their own brand and persona into TacoBot instead of subsuming it into one of the platform IPAs or chasing every third-party platform in a replay of the mobile app days.

What I really need is for the assorted IPAs in my life to work together on my behalf as a team. When I’m out and about, I want Siri to go ask Alexa when my order will arrive. Neither IPA alone can meet my criteria: report on order status while I’m away from my home, but Siri [mobile] and Alexa [connected to Amazon ordering] can achieve this collaboratively. Consider some of the aspects of complex, non-trivial tasks:

  • Mobility and location
  • Interactions with multiple, cross-vendor external systems
  • Asynchronous: real-world actions may take time to occur and aren’t containable within a one-time “conversation” with a current-state IPA
  • Deep understanding of both complicated domains and of my highly-personalized circumstances and context

So how do we herd these cats? One challenge is the mechanics of IPA-to-IPA communication. Will they speak the same language? How will each understand what another is good at? If the other is knowledgable about an area completely outside of the first IPA’s knowledge area?

APIs are the first, easiest option. They generally require explicit programming, but the interfaces are highly efficient and well-defined structurally. This is both a strength and a weakness, as well-defined structure imparts a rigidity and implication of understanding on both “client” and “server”. The Semantic Web was one attempt to address understanding gaps without explicit programming on both sides.

Another option is the utilization of human language. IPAs are rapidly learning to become better at this defining skill, and if they can communicate with people then why not use natural language capability with each other? Human language can be very expressive, if limited in information rate (good luck speaking more bits/s than a broadband connection), but efficiency and accuracy is a concern, at least with the current state of technology. One argument is that an IPA that does not fully understand a user’s language may better serve the user by simply relaying these words to another more suitable IPA instead of attempting to parse that poorly-understood language into an appropriate API call.

Of course, this is not an either/or decision and both may be utilized to better effect.

Language Interface for Conversation Ais

As this team of IPAs becomes more collaborative, another issue emerges that any manager will appreciate: how best to coordinate so that these IPAs function as a team rather than an inefficient collection of individuals.

  • One low-friction model is command-and-control. Alexa (or Siri, or Cortana, or Google, or… ) is the boss, makes dispatch decisions, and delegates to other IPAs.
  • Agile methodologies may provide inspiration for more collaborative processes. Goals are jointly broken down and estimated in terms of confidence, capability, etc. by the team of IPAs, and individual subtasks agreed upon and committed to by a voting system.
  • Because computation is cheap and generally fast in human time, a Darwinian approach may also work. Individual IPAs can proceed in competition and the best, or fastest, result wins. Previous wins, within a given context, will add a statistical advantage for the winning IPA in future tasks.

As IPAs become more and more entwined in our daily lives and embedded into the devices that surround us, we will learn to utilize them as a collaborative community rather than as individual devices. Unique personas become a “customer service” skill, but IPAs with whom we do not always wish to communicate directly still have value to provide. This collective intelligence is one of the directions in which we can expect to see significant advances.

* Also delicious, delicious beers. Mmmmmm beer…