Talking about speech at UK Speech 2022

Many conversations took place at the start of the 2022 UK Speech conference at the University of Edinburgh, with the excitement that the event was taking place again, in person. This is a conference about natural language processing and speech technology, including computation associated with speech. Several of the COG-MHEAR teams presented their research and found out about developments in software and hardware that are improving communication.

The keynote speakers showed the range of work being carried out, staring with Dr Joanne Cleland’s work on assessment and treatment of speech disorders in children. She explained about ultrasound that can give real-time feedback on the way in which children’s tongues are moving. This helps clinicians to teach children with speech sound disorders how to articulate correctly.

Speech privacy was covered by Jennifer Williams, who showed a triangle that links convenience, privacy and security that have to be balanced against each other when developing and using speech technology. It is becoming more and more difficult to opt out of speech technology. Researchers can and should take part in discussions with regulators about the best ways forward to ensure safety is balanced with ease of use of developing technologies. Ideally, users will also be able to alter privacy settings easily to suit each situation.

Prof Naomi Harte talked about the way in which speech is much more than a string of sounds. Visual and gestural aspects play their part, as do pauses. Analysing visuals as well as audio can be a great help in identifying exactly what is being said. This can also go wrong: Naomi showed a video of the McGurk effect (https://www.youtube.com/watch?v=2k8fHR9jKVM) where the shape of a mouth can fool the listener into hearing a different sound from that which is being made. More datasets of audio-visual speech are needed, which is one aspect of work on the COG-MHEAR research programme. And more work is needed to ensure that technology can decipher speech correctly.

The poster sessions covered a great range of research. One poster showed how speech technology was included in the creation of a cyborg, so that Peter Scott-Morgan could continue to make speech as his muscle movements became limited to eye movement, due to ALS (motor neuron disease).

Networking opportunities continued beyond the poster sessions, with a reception in the Scottish National Gallery giving delegates the chance to admire great paintings, including Franz Hal’s portrait of the great talker Pieter Verdonck clutching a jawbone. Then a ceilidh gave everyone the opportunity to find out how hard it can be to coordinate actions in real time: a small taste of the complexity that is required to process speech.

The teams are now looking forward to next year’s Interspeech conference in Dublin. [Interspeech 2023]