STSM interview: Study of breathing helps to build talking machines

Casual conversation, whether it be idle chatter at the bus stop or fervent debate in the public houses about hockey plays, form a bedrock of camaraderie between people. In this spirit we sat down for a natter with spoken dialogue researcher and LITHME Short Term Scientific Mission grant recipient, Emer Gilmartin.

Emer is a linguist from Ireland whose home institution is the ADAPT Centre in Trinity College, Dublin. She will be conducting her Short Term Scientific Mission for fifteen days in the University of Stockholm and KTH Royal Institute of Technology, both in Stockholm, Sweden.

“My main research is about how humans conduct conversation and dialogue, especially in terms of the timing of dialogue and turn-taking. There are two reasons I do this, one of course is to know more about human dialogue but I do a lot of work with spoken dialogue technology: building machines that people can talk to naturally.”

After a master’s degree where Emer researched how speakers affected each other during dialogue, she decided to go in a more technological direction for her PhD.

“So for my PhD I decided I’d work on casual talk because what we mostly do when we talk to each other is chat or casual talk. What we’re basically doing is we’re building our social relationships and this kind of talk is harder to model.

By a process of starting off studying human-human dialogue, I then became interested in human-machine dialogue because as with anything if you can recreate or model a process you understand it better.”

After hearing about the LITHME project, Emer jumped at the chance to participate.

“I got an email notification last year and I thought that LITHME could help me do some work over with some colleagues in Sweden, particularly around human casual conversation, with a view to recording people.

They have some really exciting measuring technology over there where they can measure people’s breathing and how this correlates with how conversation is managed. They’re pretty much the only lab that do this, so the idea was to get some casual conversation material because the thing is there’s very, very little data of people just chatting. It’s surprising: most data is just people doing a sort of a puzzle together, or something, and what we need is just hangin’ out. So the idea was is just to record some of it over there with the breathing monitors on.”

Not only enthusiastic about the possibilities of her own research, Emer is excited about the LITHME project as a whole.

“I really like the idea that it’s bringing linguists and technicians together. Now most linguists are technical there’s no real hard and fast cut-off between them. But there are areas of linguistics which are very specialised and which are going to become more and more important on the tech side.

For example until now a lot of spoken dialogue, well a lot of dialogue, natural dialogue between people and machines has been, first of all, text-based and then based on very simple language. If you think about what happens with an Alexa you’re basically just either doing a voice search or, you know, ordering it to do something.

Whereas there are all sorts of factors in true human dialogue. For example, as in my case of breathing, which need to be integrated into the tech. And doing that obviously requires very close collaboration between the linguistic-scientists who are studying these phenomena and the more techy side, the computer scientists really who are designing the tech and exploiting the research for tech. I don’t think it’s a one way street, I think there’s give and take on both sides.”

Emer is also particularly interested in the use of language-learning to help migrants integrate into their new home countries: she is involved with a non-profit organisation in Ireland (Listen Here), where they build systems so that people can talk to machines.

“Part of the idea with this casual talk is that we already use it and we’re planning to use it more in voice enabled agents for migrants learning English. Obviously for other countries it would be for migrants learning the language of a host country.

And obviously you want to be able to organise, you know, filling out a form. A huge part of living in another country is just feeling part of. And there’s a big difference between learning how to have a friendly chat and learning how to, you know, open a bank account. And both are valuable.”

Emer remains optimistic about the future possibilities of human-machine interactions, however she is also cautious about the ethical concerns that would arise from that, and the users’ perceptions of the machine’s capabilities.

“At the moment there’s huge interest in remote learning. Which is kind of what we do with the migrants in speech tech and AI.

However unless the applications are realistic and are doing things that people want to do, that they’re really providing extra rather than just talking you through a form. I think there’s going to be sort of a shift to really sitting down and thinking about where do we need human-like interaction.

And there are also huge ethical considerations coming up that no one would have dreamed of. For example; if you build something that sounds like a human. People form an opinion on the abilities of that. So this is huge in healthcare where people are building bots that can do very limited tasks. But because they sound human and use natural language people imagine they have human intelligence. So there are a lot of considerations coming up. But the future’s pretty good in the area in general I think!”

Emer Gilmartin will do her STSM in the University of Stockholm and KTH Royal Institute of Technology, Stockholm,
for fifteen days. Her scientific report will be published on the LITHME website after she completes her STSM.

Emer Gilmartin was interviewed by LITHME intern Peadar Faherty.