Skip to content Skip to footer
News

We are beginning to recognise the potential of voice

 A colourful illustrative image. A fragment of a face on the left. A graphic representation of sound waves extends from the mouth.

Photo: Dreamstime

We are beginning to recognise the potential of voice

Just utter a few sentences, and an application on your phone will tell you if you should see a cardiologist. Thanks to the work carried out at the AGH UST, this is how the future of heart arrhythmia diagnosis might look like.

Voices can be high or low, unpleasant or soothing; they can be whispers or screams, they can tremble and break. The indispensable parts of every utterance include intonation, speed of articulating subsequent sounds and words, and frequency and depth of breaths. Finally, there are also things that we cannot hear – truncated words or sentences and sounds with inaudible frequencies for the human ear.

Studies show that these properties might hold significant clues related to our health. Using specialised methods of voice processing, scientists are able to determine changes, with high probability, that co-occur not only with respiratory diseases, but also with Alzheimer’s, Parkinson’s, and psychomotor hyperactivity. Psychologists and psychiatrists also look forward to seeing the results of this investigation, because they search for objective health assessment methods in patients with suspected disorders, e.g. depression. Fortunately, they might just get what they want, as the diagnostic possibilities of voice processing are dynamically expanding.

When I was just starting to write my doctoral dissertation, the literature contained relatively little publications on the use of voice technologies in the context of the so-called ‘digital therapeutics’; however, it was predominantly the commercial world that was not open to applying these technologies. I think it has changed when COVID appeared’, says Dr (Eng.) Daria Hemmerling from the Faculty of Electrical Engineering, Automatics, Computer Science, and Biomedical Engineering. ‘When we lost the ability to meet face-to-face in the doctor’s office, it turned out that the overwhelming portion of medical services can be transferred to the auditory channel, consultations can occur exclusively by phone. Now, we got used to the fact that some services can be provided in this way – it simply makes things more efficient. Therefore, more and more companies became interested in this, and this brought about the involvement of research institutions, because to introduce something effective, it must be studied first.

What do voices tell us?

Uttering words and sentences is a complex activity that requires joint efforts of many muscles and organs and engages large areas of our brains; which is why it can be an indicator of problems in several important areas. Hence, changes in our voices are the first alarming symptom that can be detected by loved ones of people who struggle with various disorders, for example those resulting from progressing neurodegenerative changes. The voice analysis methods studied by scientists are based on the same mechanism, but are marked by considerably higher sensitivity and precision. During Dr Daria Hemmerling’s previous investigations, she managed to identify certain traits of the voice that change with the progression of Parkinson’s disease. These traits include some of the so-called ‘biomarkers’ – properties that can be objectively measured and the changes of which indicate the progression of biological processes. The research project has found its continuation in the LIDER programme, funded by the National Centre for Research and Development, which Dr Daria Hemmerling leads.

In her most recent project, the AGH UST scientist will continue to work on voices, but will focus on a different category of health problems – this time investigating cardiological issues. The National Science Centre granted Dr Hemmerling PLN 48,400 for her project related to the quantification of vocal biomarkers in patients with cardiac failure.

This grant will give us the opportunity to demonstrate in the Polish population that these vocal biomarkers work and can objectively indicate discrepancies that occur as a result of various disorders, in this case heart arrhythmia. We want to investigate which biomarkers can differentiate healthy and sick people; and if we eventually find those biomarkers, to determine the process of disorder progression and what begins to happen there, because this has not yet been fully discovered, the project leader says.

The idea for this project resulted from a different grant implemented in cooperation with the Techmo company, which originated from the AGH UST.

Cooperating with them, I saw both the potential and the difficulties they have, which is why I was able to establish a beautiful cooperation with the Upper-Silesian Medical Centre in Katowice and its doctors. Especially with Tomasz Jadczyk, MD, whose incredible energy and faith in the ability of digital technologies to change the medical world are unparalleled – especially to help people, but also to make it easier for doctors to diagnose patients. Tomasz Jadczyk and I, we both see the potential of vocal diagnosis – he from the medical perspective, I from the technological one’, says Dr Hemmerling passionately.

Within the project, 100 patients will be studied from the cardiological department of the Upper-Silesian Medical Centre in Katowice. This is the facility where a special anechoic chamber has been created for such needs; that is, a room designed specifically to maximally reduce any sounds. As a result, the recorded sound will not be distorted by the properties of the room in which it was made, and the potential of specialised recording equipment will be used to its full capacity, facilitating the achievement of the best possible sound quality. This is crucial for the premises of this research project, which focuses on miniscule variations in the process of speech production.

Vowels to the front row

It has been acknowledged that the signal generated by humans when uttering vowels is stable and that it can only be distorted exactly by a developing disorder. Therefore, these sounds are frequently analysed in medicine. This will also be the first step in the research carried out by Dr Hemmerling’s team. Subsequently, to test speech, simultaneously reducing the stress related to the participation in the experimental procedure and reducing the emotional distance between participants and scientists, patients will be talking about a simple text with which they will have had a chance to familiarise themselves moments earlier. Interestingly, the text itself will be about daily life, which, based on the answers of the participants, will allow scientists to draw conclusions about what interests their subjects and what lifestyle they lead – the latter plays a crucial role in assessing the risk of civilisational diseases, including heart conditions. Finally, the participants will hear an open and emotionally neutral question. All responses will be recorded to be later thoroughly analysed by scientists.

This will be free speech, which will give us information about whether these people are tired after a period of vocal effort, whether they start to make longer pauses or keep the intonation at an unchanged level, whether they stutter frequently, or whether they truncate certain consonants, for example, at the end of words; because such things can point to various diseases. The same goes for the scope of vocabulary, whether it is limited, whether the subjects repeat certain elements of words or entire words – all this can be indicative of various disorders’, explains Dr Hemmerling. ‘(...) I think that, given the context of heart arrhythmia, everything needs to be investigated – changes in frequency, amplitude, voice strength, number and duration of pauses for breaths, cadence of speech. These might be our indicators’.

Artificial intelligence to track our heart rate

The core research question will be answered during the analysis of digital recordings. The first stage of signal processing is called parametrisation and relies on extracting concrete parameters from the voice signal.

These parameters allow us to present the way in which we hear those sounds with a string of numbers. And this we can interpret’, Dr Hemmerling tells us about her plans. ‘In fact, there are several parameters connected to energy, signal envelope, melody, articulator function, and participation of various organs during the speech generation process. In general, this system gets quite complicated. Indicating this multiparameter space of such properties that will point to the development of pathologies will be a milestone for this project’.

In this fundamental task, our scientist will find help from doctors and algorithms. Conducting standard medical tests by a cardiologist will allow them to recognise pathological processes and determine their stage of development. The role of algorithms, a.k.a. ‘recipes’ for data processing, will be to find correlations between changes in vocal signal and specified pathologies. In other words, scientists want artificial intelligence to help them determine which aspects of voice are indicative of developing pathologies; and consequently, which could be used as diagnostic guidelines.

As humans, we cannot visualise ten dimensions. Algorithms allow us to reduce such multidimensional spaces to, let’s say, three dimensions that we can very well imagine.

Such algorithms, created for multifarious purposes, abound, and selecting them will depend, among other things, on the amount of data that the scientists will be able to collect during the recording sessions. The best effects can be obtained from deep learning models, that is, the kind that collects data from many different sources and learns to draw conclusions on their own; however, using them may not be possible because of the insufficient amount of data collected.

Perhaps we will touch on deep learning models or focus on hybrid methods, that is, the kind where we combine deep learning with manual methods, i.e. those where we will designate the limit values that can already indicate pathologies ourselves’, Dr Hemmerling says.

We might ask a question: If we do find such vocal biomarkers, does it automatically mean that we are diagnosed with cardiological disorders?

I think we would have to validate the results because the hundred patients that we will examine within this project is not enough. Later, we would have to think about how we could test a larger group of subjects. Our sample can yield evidence for a certain direction of activities, and this is beautiful; however, after that, we can consider a path to implement this project. Maybe it will be possible to create an application or a separate device independent of mobile phones, but one that could be easily distributed among patients. I think the experiment could then be presented as a game of sorts, or the screen could display an avatar showing a pleasant memory, which could lead us through the entire procedure. Using such diagnostic methods could even be entertaining’.

Better to prevent, but also provide improved treatment

The research of Dr Hemmerling, for which she received funding, is fundamental, which means that it should expand knowledge in a given field, but its results do not have to translate to practical applications. However, in the case of vocal biomarkers, purposes of use abound, and creating an application is just one of them. We might expect that each will bring significant benefits. First and foremost, it is the efficiency of diagnosis – not only could the examination be completed in a matter of minutes, the initial diagnosis would not require the presence of a doctor. This would lift a paramount burden off of the healthcare system, as doctors would admit patients who already had an initial diagnosis, which would reduce waiting lists. At the same time, it would be easier to monitor current patients, which would improve quality of care with minimal time expenditure. Even greater hopes are connected to the stage in which the disease is when it is diagnosed – it is possible that vocal biomarkers would allow us to detect diseases significantly sooner than it is possible using current methods. Early diagnosis gives better chances of complete recovery and reduces the amount of medicines taken by patients and the number of doctor’s appointments – therefore, it takes the heat off of the healthcare system in more than one way.

What interests and excites me the most is pinpointing the moment in which atheromatous changes originate. It affects young people: their cholesterol levels increase, but they are not yet aware that this is the first symptom of atherosclerosis. As unexperienced people without medical expertise, they cannot recognise that moment, but they also tend to forget about regular check-ups – especially those who have small children’, Dr Hemmerling shares her concerns. ‘In the future, I hope that we will be able to flare up ‘a yellow light’ above the heads of people in which such processes begin to brew and tell them: stop and think about yourself now’.

Stopka