STSM interview – The impact of different face coverings on speech processing

Chloe Patman is a PhD candidate at the University of Cambridge. During her undergraduate studies she learned about voice analysis or text analysis in relation to criminal investigations, which sparked an interest in forensic phonetics. She gained hands-on experience of dealing with casework on acoustic level, but also automatic ways of identifying voices.

“This led me to want to do more research in this area. During the early months of my PhD, I saw this opportunity for STSM. I like the idea that this grant opportunity was kind of approaching people that had a background in linguistics but wanted to learn more of the technological side of things. “

“So far, I’ve had a really, really good experience with it. I’ve come into this with a lot of knowledge of linguistics, but not as much knowledge in computational linguistics and speech processing. It’s been a really nice way of throwing myself into something completely new with support and expertise from the institution and a professor who has knowledge in these areas. I think that especially working with speech-to-text software you can easily get an error that you don’t at first understand, so having someone there who has knowledge of this to support and explain to you why the error is happening is crucial.” Patman explains that many who might have an interest in AI generated systems, might feel that diving in can seem like a daunting experience in the beginning. According to her, the STSM visit has made a daunting task much more accessible and enjoyable to work with. It has also boosted her confidence.

“This experience has shown me that I can do something like this and apply to more work in this field.”

Patman discovered the theme of face coverings and their impact on speech intelligibility during her Master’s project, which in turn led to the proposal of her STSM project, which focuses on speech processing. The recordings that Patman had viewed at the time of the interview, had been of female speakers. The main motivation for the focus on female speakers is that a lot of the work done in forensics so far has been driven by male voices. Patman explains that the current research is mostly around male voice, or the reference material, or the reference of analysis. In addition, the calibration of data is all done on the male voice. Therefore, Patman is interested in applying the same methods to female data to get a better understanding.

“Obviously, males and females speak differently, but whether these methods, both automatic and manual, can accurately just be transferred to female voices, or whether we should treat them differently is the question.”

During her master’s degree project, Patman worked on face-mask speech. There was the relevance for forensics for her, but also the COVID-19 pandemic, which motivated the idea for learning how to process speech using speech-to-text software.

“We’re using a lot of the methods that have not been tested as much on female voices as male voices, and whether the measures that we use for males are as good for females. We don’t have as much reference data to compare based on the sample of females.

During her Short-Term Scientific Mission, Patman is studying effects of face-mask-speech with females and looking at how well two different AI speech-to-text softwares can transcribe this data. Focusing on female voices in the context of forensic speech and forensic science is important to Patman because she has encountered cases with female voices in forensic phonology. According to her, understanding how well different methods and systems can deal with female voices in terms of face masks is an important aspect of forensic phonology. Patman explains that it has been found that since the COVID-19 pandemic, it has become more common to wear face masks even after the pandemic.

“Understanding how face coverings and different conditions, such as low quality, are important even from the perspective of police investigation if there’s a witness who’s a female and the police interview needs to be described. In terms of this STSM we look at how well open-source programs can transcribe face-mask-speech and again this is with female voices.”

Patman is also experimenting with different levels of background noise and how much they can impact the transcription. From a studio quality perspective versus low-quality with a face mask, she wants to know how face masks impact speech-to-text softwares as people might be using them from device passwords to transcribing notes or different commercial uses. From the forensic perspective, degrading the material to make it poor quality links directly into forensics because a lot of the materials for casework in forensics are lower quality recordings and therefore it’s important to know if the suspect or someone involved in the crime is wearing a face covering and the recording is of worse quality and how well the programmes can transcribe the speech when the quality is not as good with the face mask as well.

Patman says that she has not had too much with the working groups yet, but she has contacted them and is going to present some of the work to get feedback and input from the working groups. She found contacting and asking for feedback in this way useful, as her project has overlaps in different areas. She specifically names Working Group 1: Computational linguistics and Working Group 2: Language and law.

“For me, coming from a more phonetics background, I didn’t have much experience in computational linguistics, and I think one of the great things about this Short-Term Scientific mission is that it allows a specific knowledge in linguistics, but also the exploration of some area that you don’t have as much knowledge in. It allows you to explore different areas and learn more about them. So, without this like funding, I wouldn’t have been able to explore more into setting up these texts and interacting with people that know how to set up speech-to-text software.”

Patman has plans to present her work to members of both Working Group 1 and 2. She is hoping to show how different elements in speech and technology can be put together and help to create a project. She says that the structure of the working groups and the way that they’re laid out allows for better understanding of the connections between these working groups.

Her experience during the STSM visit has been positive. She emphasises flexibility and choice as an important aspect.: “I 100% would recommend STSM to others. The flexibility of it in terms of states, institutions and supervisors has been as well and allowed you to pick a place and a person that has very specific interests and what you want to look at.”

Patman says that having a good amount of time to focus on the project and the supportive environment has made for a good experience: “It has been nice that it kind of sets aside an amount of time for you to get stuck into things, but not too much to lose focus. It allows you to really delve into something in a space of time. Everyone here has been really receptive and kind of like even just approaching the working groups has been a really pleasant experience and everyone’s been really supportive. So, I would definitely recommend it.”

Chloe Patman, a dedicated PhD Candidate at the University of Cambridge, stands at the forefront of forensic phonetics and linguistics research. Armed with a Bachelor’s degree in English linguistics, she delved deep into the intricate workings of language. Building upon this foundation, she pursued a Master’s degree in forensic phonetics at the University of York, refining her expertise in the intersection of linguistics and law. With a fervent interest in voice and text analysis, Chloe’s work is underpinned by a solid understanding of acoustics, enabling her to decipher the hidden truths within speech patterns and textual evidence. Her unwavering commitment to unraveling linguistic complexities showcases her as a formidable force in the pursuit of scientific truth.

Chloe Patman visited the University of Zurich, Switzerland from 14/08/2023 to 09/10/2023. Her scientific report will be published on the LITHME website. Patman was interviewed by LITHME assistant Enni Kuusela.