Posts

Sometimes I get to thinking that Alexa isn’t really my friend. I mean sure, she’s always polite enough (well, usually, but it’s normal for friends to fight, right?). But she sure seems chummy with that pickle-head down the hall too. I just don’t see how she can connect with us both — we’re totally different!

So that’s the state of the art of conversational AI: a common shared agent that represents an organization. A spokesman. I guess she’s doing her job, but she’s not really representing me or M. Pickle, and she can’t connect with either of us as well as she might if she didn’t have to cater to both of us at the same time. I’m exaggerating a little bit – there are some personalization techniques (*cough* crude hacks *cough*) in place to help provide a custom experience:

  • There is a marketplace of skills. Recently, I can even ask her to install one for me.
  • I have a user profile. She knows my name and zip code.
  • Through her marketplace, she can access my account and run my purchase through a recommendation engine (the better to sell you with, my dear!)
  • I changed her name to “Echo” because who has time for a third syllable? (If only I were hamming this up for the post; sadly, a true story)
  • And if I may digress to my other good friend Siri, she speaks British to me now because duh.

It’s a start but, if we’re honest, none of these change the agent’s personality or capabilities to fit with all of my quirks, moods, and ever-changing context and situation. Ok, then. What’s on my wishlist?

  • I want my own agent with its own understanding of me, able to communicate and serve as an extension of myself.
  • I want it to learn everything about how I speak. That I occasionally slip into a Western accent and say “ruf” instead of “roof”. That I throw around a lot of software dev jargon; Python is neither a trip to the zoo nor dinner (well, once, and it wasn’t bad. A little chewy.) That Pickle Head means my colleague S… nevermind. You get the idea.
  • I want my agent to extract necessary information from me in a way that fits my mood and situation. Am I running late for a life-changing meeting on a busy street uphill in a snowstorm? Maybe I’m just goofing around at home on a Saturday.
  • I want my agent to learn from me. It doesn’t have to know how to do everything on this list out of the box – that would be pretty creepy – but as it gets to know me it should be able to pick up on my cues, not to mention direct instructions.

Great, sign me up! So how do I get one? The key is to embrace training (as opposed to coding, crafting, and other manual activities). As long as there is a human in the loop, it is simply impossible to scale an agent platform to this level of personalization. There would be a separate and ongoing development project for every single end user… great job security for developers, but it would have to sell an awful lot of stuff.

To embrace training, we need to dissect what goes into training. Let’s over-simplify the “brain” of a conversational AI for a moment: we have NLU (natural language understanding), DM (dialogue management), and NLG (natural language generation). Want an automatically-produced agent? You have to automate all three of these components.

  • NLU – As of this writing, this is the most advanced component of the three. Today’s products often do incorporate at least some training automation, and that’s been a primary enabler that leads to the assistants that we have now. Improvements will need to include individualized NLU models that continually learn from each user, and the addition of (custom, rapid) language models that can expand upon the normal and ubiquitous day-to-day vocabulary to include trade-specific, hobby-specific, or even made-up terms. Yes, I want Alexa to speak my daughter’s imaginary language with her.
  • DM – Sorry developers, if we make plugin skills ala Mobile Apps 2.0 then we aren’t going to get anywhere. Dialogues are just too complex, and rules and logic are just too brittle. This cannot be a programming exercise. Agents must learn to establish goals and reason about using conversation to achieve those goals in an automated fashion.
  • NLG – Sorry marketing folks, there isn’t brilliant copy for you to write. The agent needs the flexibility to communicate to the user in the most effective way, and it can’t do that if it’s shackled by canned phrases that “reflect the brand”.

In my experience, most current offerings are focusing on the NLU component – and that’s awesome! But to realize the potential of MicroAgents (yeah, that’s right. MicroAgents. You heard it here first) we need to automate the entire agent, which is easier said than done. But that’s not to say that it’s not going to happen anytime soon – in fact, it might happen sooner than you think.  

Echo, I’m done writing. Post this sucker.

Doh!