|
|
|
|
| Interview with:
Han Shu '96, MEng '97
Han Shu '96, who moved from mainland China to the US when he was 15, is pursuing a PhD with the Spoken Language Systems group at MIT's Laboratory for Computer Science. As a MIT undergrad, he was a 6A co-op student with BBN's Speech and Language Processing Department and contributed to the development of the technology of handwriting recognition, fully automated telephone number retrieval, face recognition, and speech recognition in both English and Mandarin Chinese. What problems in speech and language processing are you working on in your PhD research? My PhD research focuses on modeling the dynamics of speech sound for speech recognition. The current dominant approach extracts features from speech signal at a constant rate. However, the acoustic cues important for phonetic classification typically are not uniformly present in the speech signal. Motivated by this understanding, another approach extracts features from possible phonetic segments. I am attempting to put the two approaches in a common framework, thus enabling the combination of the two approaches. What are some differences between representing speech in English and in Mandarin Chinese? People in the speech community have generally found that techniques used for recognizing and synthesizing speech for one language generally carry over to another. However, there are still some differences due to the different attributes of various languages. For example, Mandarin is tonal while English is not, so modeling pitch information improves the discriminability between Chinese characters, but it is not as helpful for English. What are some potential societal benefits from learning to synthesize and recognize speech? Air travel has brought people with different cultural backgrounds together with lightening speed, but in many cases language barriers still prevent people from communicating with one another. Speech recognition and synthesis, language translation, and computer-assisted language learning technologies will enable people without a common language to communicate with greater ease, then we will truly live in a small global village. According to a recent New York Times article, 6 million people or 5% of the US labor force are currently in the telesales and service industry. The advancement of speech technology and the understanding of human dialog has started to change this, but a more pervasive shift to human-like automated information agents is still to come. As we understand more about speech user interface, interacting with computers, cell phones, and other devices using speech will become commonplace. These new technologies will only be possible with more fundamental research. Hopefully government-funded research by DARPA and NSF at universities and corporate research laboratories will continue to play a pivotal role.
|
|