Teaching English to Speakers of Other Languages (TESOL) Certification and Career Services

 Communication via Brain Computer Interface

TESOL2

Machine Translates Thoughts into Speech in Real Time

Model of the brain-machine interface for real-time synthetic speech production. The stroke-induced lesion (red X) disables speech output, but speech motor planning in the cerebral cortex remains intact. Signals collected from an electrode in the speech motor cortex are amplified and sent wirelessly across the scalp as FM radio signals. The Neuralynx System amplifies, converts, and sorts the signals. The neural decoder then translates the signals into speech commands for the speech synthesizer. Credit: Guenther, et al.

Decoding Spoken And Imagined Word Groups Using Electrocorticographic Signals in Humans

G. Schalk1,2,3,4,5 , D. Barbour6, E.C. Leuthardt3, X. Pei1

1. Brain-Computer Interface R&D Prog, Wadsworth Center, NYS Dept Health, Albany, NY; 2. Dept Neurol, Albany Medical College, Albany, NY; 3. Dept Neurosurg, Washington Univ, St. Louis, MO; 4 Dept Biomed Eng, Rensselaer Polytechnic Institute, Troy, NY; 5. Dept of Biomed Sci State Univ of New York, Albany, NY; 6. Dept Biomed Eng, Washington Univ, St. Louis, MO;

Signals from the brain can provide a new communication channel - a brain-computer interface (BCI) - for people who are paralyzed. BCIs allow people to perform simple functions, such as word processing on a computer. Unfortunately, the current generation of BCI devices faces significant problems of performance or practicality that impede widespread clinical use of this exciting new way to communicate.

It is possible that the detection of imagined words in electrocorticographic activity (ECoG) recorded from the cortical surface could be the basis for a BCI system that is powerful, easy to learn, and suitable for widespread dissemination and long-term use. To create the basis for a BCI system based on imagined words, we have begun to decode different groups of words using ECoG signals.

In this study, we evaluated nine patients who were temporarily implanted with an ECoG array prior to epilepsy resection. Each subject spoke or imagined speaking words that were presented one at a time auditorily or visually. In offline analyses, we divided the words into four groups that were based on vowels (ee, eh, ah, oo) or consonants in these words. We then extracted several frequency-based features, as well as the Local Motor Potential (LMP), from each signal channel and applied a naive Bayes classifier to classify each spoken/imagined word into one of the four vowel or one of the four consonant groups. Finally, we mapped the accuracy of determining the correct word group achieved at each location onto a three-dimensional brain model, which highlighted the brain areas that gave the most information about the consonant or vowel group. For imagined speech, the brain areas with the highest classification accuracy were Wernicke's area and the supramarginal gyrus; for actual speech, the areas with the highest accuracy were Supplementary Motor area (SMA) and motor cortex. Classification accuracies for groups based on consonants were higher than those for vowels. Thus, ECoG feature changes over specific brain cortices could be used by a BCI to decode the intended word group or even the intended word, and to thereby communicate the subject's intent.

This research initiative could lead to important and clinically practical BCI protocols of value to people with or without disabilities. This work may also contribute to fundamental investigations of language processing in humans. End.


Speech Synthesizer

Real-time audio feedback was achieved by instantaneous computer speech synthesis of predicted formant frequencies from the Kalman filter. Specifically, a C-language implementation of the Klatt formant-based speech synthesizer was used for speech synthesis. The formant synthesizer utilizes a total of 63 user-specified parameters, of which only two, corresponding to the first and second formant frequencies (F1 and F2), were actively controlled by the participant. The remaining parameters, whose values affect sound quality and naturalness , were fixed to typical values. The computational overhead of the speech synthesizer was very low, requiring less than 1 ms of computational time to synthesize a 10 ms sound waveform. Synthesized waveforms were directly buffered onto the onboard soundcard via the DirectSound interface and played on speakers positioned in front of the participant.

Real-Time Vowel Production Task

The real-time experimental task was based on the "center-out" design used in many motor control studies. Each trial of a center-out task involves movement from a central location to a target location randomly chosen from a set of peripheral target locations. For example, a computer cursor may be moved by a mouse from a central location on a screen to one of eight peripheral targets [29]. In the current case, the center-out task was carried out in an auditory space defined by the formant frequency plane, involving movement from a central vowel location (UH in ) to one of three peripheral vowel locations (IY, A, OO in ). The three target vowels are located at three extreme corners of the F1/F2 space for English vowels. The target stimulus was randomly chosen on each trial and was presented acoustically to the subject prior to the start of the production attempt. After target stimulus presentation, the subject was given an instruction to "speak" the recently heard stimulus. After the speak instruction, the BMI began synthesizing formant frequencies predicted from the current neural activity. The trial was ended after a maximum duration of 6 seconds or when the decoded formant frequencies entered and remained inside a target region around the endpoint vowel for 500 ms. These circular target regions spanned approximately 150 Hz in F1 and 300 Hz in F2 and contained a small attractor force to help keep the participant's production within the target region once it was entered. This attractor force did not affect the formant trajectory outside the target regions. For the first 10 sessions, no visual feedback was provided to the subject. In the last 15 sessions the subject could view a cursor position in the formant plane corresponding to the ongoing sound output. No difference in performance was noted between sessions with visual feedback and sessions without visual feedback.

Author Contributions 

Conceived and designed the experiments: FG PRK. Performed the experiments: JB EJW DA SS JLB PE HM PRK. Analyzed the data: FG JB EJW ANC JAT MP RL. Wrote the paper: FG JB. Constructed electronics: DA. Performed implant surgery: PE. Performed pre-surgical neuroimaging: HM.