How the Human Brain Detects the ‘Music’ of Speech
(Originally published by UCSF)
Researchers at UC San Francisco have identified neurons in the human brain that respond to pitch changes in spoken language, which are essential to clearly conveying both meaning and emotion.
The study was published online Aug. 24, 2017, in Science by the lab of Edward Chang, MD, a professor of neurological surgery at the UCSF Weill Institute for Neurosciences, and led by Claire Tang, a fourth-year graduate student in the Chang lab.
“One of the lab’s missions is to understand how the brain converts sounds into meaning,” Tang said. “What we’re seeing here is that there are neurons in the brain’s neocortex that are processing not just what words are being said, but how those words are said.”
Changes in vocal pitch during speech – part of what linguists call speech prosody – are a fundamental part of human communication, nearly as fundamental as melody to music. In tonal languages such as Mandarin Chinese, pitch changes can completely alter the meaning of a word, but even in a non-tonal language like English, differences in pitch can significantly change the meaning of a spoken sentence.
For instance, "Sarah plays soccer,” in which “Sarah” is spoken with a descending pitch, can be used by a speaker to communicate that Sarah, rather than some other person, plays soccer; in contrast, “Sarah plays soccer” indicates that Sarah plays soccer, rather than some other game. And adding a rising tone at the end of a sentence (“Sarah plays soccer?”) indicates that the sentence is a question.
The brain’s ability to interpret these changes in tone on the fly is particularly remarkable, given that each speaker also has their own typical vocal pitch and style (that is, some people have low voices, others have high voices, and others seem to end even statements as if they were questions). Moreover, the brain must track and interpret these pitch changes while simultaneously parsing which consonants and vowels are being uttered, what words they form, and how those words are being combined into phrases and sentences – with all of this happening on a millisecond scale.
Previous studies in both humans and non-human primates have identified areas of the brain’s frontal and temporal cortices that are sensitive to vocal pitch and intonation, but none have answered the question of how neurons in these regions detect and represent changes in pitch to inform the brain’s interpretation of a speaker’s meaning.
Neurons Distinguish Speaker, Phonetics, and Intonation
Chang, a neurosurgeon at the UCSF Epilepsy Center, specializes in surgeries to remove brain tissue that causes seizures in patients with epilepsy. In some cases, to prepare for these operations, he places high-density arrays of tiny electrodes onto the surface of the patients’ brains, both to help identify the location triggering the patients’ seizures and to map out other important areas, such as those involved in language, to make sure the surgery avoids damaging them.
In the new study, Tang asked 10 volunteers awaiting surgery with these electrodes in place to listen to recordings of four sentences as spoken by three different synthesized voices:
“Humans value genuine behavior”
“Movies demand minimal energy"
“Reindeer are a visual animal”
“Lawyers give a relevant opinion”
The sentences were designed to have the same length and construction, and could be played with four different intonations: neutral, emphasizing the first word, emphasizing the third word, or as a question. You can see how these intonation changes alter the meaning of the sentence: “Humans [unlike Klingons] value genuine behavior;” “Humans value genuine [not insincere] behavior;” and “Humans value genuine behavior?” [Do they really?]
Tang and her colleagues monitored the electrical activity of neurons in a part of the volunteers' auditory cortices called the superior temporal gyrus (STG), which previous research had shown might play some role in processing speech prosody.
They found that some neurons in the STG could distinguish between the three synthesized speakers, primarily based on differences in their average vocal pitch range. Other neurons could distinguish between the four sentences, no matter which speaker was saying them, based on the different kinds of sounds (or phonemes) that made up the sentences (“reindeer” sounds different from “lawyers” no matter who’s talking). And yet another group of neurons could distinguish between the four different intonation patterns. These neurons changed their activity depending on where the emphasis fell in the sentence, but didn’t care which sentence it was or who was saying it.
To prove to themselves that they had cracked the brain’s system for pulling intonation information from sentences, the team designed an algorithm to predict how neurons’ response to any sentence should change based on speaker, phonetics, and intonation and then used this model to predict how the volunteers’ neurons would respond to hundreds of recorded sentences by different speakers. They showed that while the neurons responsive to the different speakers were focused on absolute pitch of the speaker’s voice, the ones responsive to intonation were more focused on relative pitch: how the pitch of the speaker’s voice changed from moment to moment during the recording.
“To me this was one of the most exciting aspects of our study,” Tang said. “We were able to show not just where prosody is encoded in the brain, but also how, by explaining the activity in terms of specific changes in vocal pitch.”
These findings reveal how the brain begins to take apart the complex stream of sounds that make up speech and identify important cues about the meaning of what we’re hearing, Tang says. Who is talking, what are they saying, and just as importantly, how are they saying it?
“Now, a major unanswered question is how the brain controls our vocal tracts to make these intonational speech sounds,” said Chang, the paper’s senior author. Chang is also a member of the Kavli Institute for Fundamental Neuroscience at UCSF. “We hope we can solve this mystery soon.”
Volunteers Enable Deeper Look into Human Brain
The patients involved in the study were all at UCSF undergoing surgery for severe, untreatable epilepsy. Brain surgery is a powerful way to halt epilepsy in its tracks, potentially completely stopping seizures overnight, and its success is directly related to the accuracy with which a medical team can map the brain, identifying the exact pieces of tissue responsible for an individual's seizures and removing them.
The UCSF Comprehensive Epilepsy Center is a leader in the use of advanced intracranial monitoring to map out elusive seizure-causing brain regions. The mapping is done by surgically placing a flexible electrode array under the skull on the brain’s outer surface or cortex and recording the brain’s activity in order to pinpoint the parts of the brain responsible for triggering seizures. In a second surgery a few weeks later, the electrodes are removed and the unhealthy brain tissue that causes the seizures is removed.
This setting also permits a rare opportunity to ask basic questions about how the human brain works, such as how it controls speaking. The neurological basis of speech motor control has remained unknown until now because scientists cannot study speech mechanisms in animals and because non-invasive imaging methods lack the ability to track the very rapid time course of the brain signals that drive the muscles that create speech, which change in hundredths of seconds.
But presurgical brain mapping can record neural activity directly, and can detect changes in electrical activity on the order of a few milliseconds.
Liberty S. Hamilton, PhD, of UCSF was also a co-author on the new study.
The research was supported by the National Institutes of Health, New York Stem Cell Foundation, the Howard Hughes Medical Institute, the McKnight Foundation, The Shurl and Kay Curci Foundation, and The William K. Bowes Foundation.