Of Children and Artificial Intelligence Agents

Hey Google, tell me a joke!” If you share your home with a child and any sort of a voice assistant (Google Home, Kortana, Alexa, Siri), you may be used to hearing this sort of a request. This request typically yields a groan-worthy pun, but alternative joke requests like “Do you know any good ones?,” “Knock, knock,” or “Make me laugh,” may receive the reply “I don’t know how to help with that.” It’s a shame that making artificial intelligence agents (like voice assistants) understand what you want can be so hard, because voice assistants may help children remain curious about their world and help them find answers to questions without needing to first learn spelling and typing. To find out how we can help kids use voice assistants, we asked 87 children (ages 5 – 12) to try out three prototype voice assistants at the Minnesota State Fair research facility and tell us what they thought.

Child points at a speacher while a researcher watches him.

A young child identifying his favorite interface during the study and describing why he likes it.

First, we wanted to know whether it mattered how the prototype AI agent talked about itself and the child. Personified agents referred to themselves with a name and used the “I” pronoun (similar to Alexa), non-personified ones just asked for a question to answer (similar to Google search in your browser). Personalized agents referred to the child by name, but non-personalized ones didn’t know anything about the child. Turns out that children had a strong preference towards personified interfaces, but didn’t really care if the interface knew anything about them. In fact, some kids found it “creepy” if they agent knew their name and age!

Second, we wanted to know how children reacted when the voice assistant had trouble figuring out what they meant. We asked each child to puzzle out an answer to a question about the State Fair. To get the voice assistant to understand the question, they would have to change how the question is worded or divide the question into multiple parts. This was hard for kids! Most kids just tried to say the question louder or substitute synonyms for specific words. It took them a few tries to find a strategy that worked (and many younger kids couldn’t do it without a grown-up helping). The problem is that current voice assistants don’t provide a lot of clues about why they’re having trouble understanding a request. Thanks to the help of the kids who were in our study, we were able to come up with lots of ideas about how voice assistants can be better and more useful to children and families.

There are a lot more findings in our paper, which has recently received the Best Paper Award at IDC 2018. This work was made possible with funding from Mozilla Research Grants and Google Faculty Research Awards.

87% of People Got This Question about Their Door Lock Wrong!

You drive home and park. Your car is full of groceries and other shopping, which take many trips to bring into the house. Five minutes after you drove in, you are still making trips to the car. Is the door locked or unlocked?” What if I told you that 87% of people got this question wrong? Sensors and “smart” devices for your home may hold the promise of making life more convenient, but they may also make it harder to understand and predict things like the state of you “smart” door lock in common situations like the one above.

The main issue at hand is “feature interaction.” This is the idea that some of the features of your future smart home may want one thing (i.e., door locked for security), while others may want another (i.e., door unlocked for convenience). Software engineers who program future smart homes must come up with a clear set of rules for a device like a smart door lock so that it always behaves in clear and predictable ways. But, what is clear and predictable to a computer may not be clear and predictable to a person. My collaborator (Pamela Zave from AT&T Labs Research) and I found this out the hard way by running a study asking people to predict the state of their door lock in scenarios like the one above based on three rules applied to interaction between four features. None of the people in our study got all the questions right (and the one who got the closest was a lawyer!). See how you do by taking the 15-question quiz below:

How did it go? People in our study made some common mistakes that we describe in our paper. The bottom line is that “feature interaction” resolution rules that are simple for computers may require more effort for humans to understand. We think at the core of this may be a mismatch between logic and intuition. People intuit that an automated smart door lock should err on the side of keeping their door locked even in situations where it may be more convenient (and more similar to a regular non-smart door lock) to keep the door unlocked. It is important for researchers from multiple fields to work together to understand people’s intuitions and errors before programming future home systems, so that we won’t be left wondering whether our door is locked or unlocked!

Want more detail? Check out our CHI 2017 Publication.

Purpose, Visibility, and Intersubjectivity in Video-Mediated Communication Technologies

Video-mediated communication may be able to benefit from a number of novel technologies, but designing for a good experience requires considering purpose, visibility, and intersubjectivity for both partners.

Skype, Google Hangouts, Facetime, and ShareTable are all examples of real-time video-mediated communication technologies. Designing, implementing, and deploying novel systems of this sort is a big research priority for me and every semester I get a few entrepreneuring students approaching me with ideas for cool new technology to try in this space: robots, virtual reality, augmented reality, projector-camera systems, and more. Frequently, I ask them to consider a few things first and if you’re new to thinking about computer-mediated communication, these may be helpful for you as well (many of these ideas come from my work with play over videochat).

In this case, let’s assume the “base case” of two people—Alice and Bob—using a potential new technology to communicate with each other (though the questions below can definitely be expanded to consider multi-user interfaces). Consider:

  1. (Purpose) Why is Alice using this technology? Why is Bob? The answer should be specific (e.g., not just “to communicate,” but “to plan a surprise party for Eve together”) and may be different for the two parties. It’s good to come up with at least three such use cases for the next questions.
  2. (Visibility) What does Alice see using this technology? What does Bob see? Consider how Alice is represented in Bob’s space, how Alice can control her view (and then flip it and consider the same things for Bob). Consider if this appropriate for their purposes. For example, maybe Alice is wearing VR goggles and controlling a robot moving through Bob’s room. It’s cool that she can see 360 degree views and control her gaze direction, but what does Bob see? Does he see a robot with a screen that shows Alice’s face encased in VR goggles? Does this achieve level of visibility that is appropriate for their purpose?
  3. (Intersubjectivity) How does Alice/Bob show something the other person? How does Alice/Bob understand what the other person is seeing? The first important case to consider is how Alice/Bob bring attention to themselves and how they know if their partner is actually paying attention to them. If Alice is being projected onto a wall but the camera for the system is on a robot, it will likely be difficult for her to know when Bob is looking at her (i.e., when he’s looking at the wall display it will seem that he’s looking away from the camera). It’s also useful to consider the ability to refer to other objects. Using current videochat this is actually quite hard! If Alice points towards her screen to a book on the shelf behind Bob, Bob would have no idea where she’s pointing (other than generally behind him). Solving this is hard—it’s definitely an open problem in the field—but the technology should at least address it well enough to support the scenarios posed in question 1.

Generally, I find that new idea pitches tend to propose inventions that provide a reasonable experience for Alice but a poor one for Bob. It is important to consider purpose, visibility, and intersubjectivity experience for both of them in order to conceive a system that is actually compelling.