Keynote Talks

IWSLT 2012 is happy to feature three keynote talks, sponsored by NICT.

  • Dr. Dong Yu, Microsoft Research, USA
  • Prof. Hideki Isozaki, Okayama Prefectural University, Japan
  • Dr. Chai Wutiwiwatchai, National Electronics and Computer Technology Center (NECTEC), Thailand

Dr. Dong Yu

Who Can Understand Your Speech Better — Deep Neural Network or Gaussian Mixture Model?
Dr. Dong Yu, Microsoft Research

Abstract: Recently we have shown that the context-dependent deep neural network (DNN) hidden Markov model (CD-DNN-HMM) can do surprisingly well for large vocabulary speech recognition (LVSR) as demonstrated on several benchmark tasks. Since then, much work has been done to understand its potential and to further advance the state of the art. In this talk I will share some of these thoughts and introduce some of the recent progresses we have made. 

In the talk, I will first briefly describe CD-DNN-HMM and bring some insights on why DNNs can do better than the shallow neural networks and Gaussian mixture models. My discussion will be based on the fact that DNN can be considered as a joint model of a complicated feature extractor and a log-linear model. I will then describe how some of the obstacles, such as training speed, decoding speed, sequence-level training, and adaptation, on adopting CD-DNN-HMMs can be removed thanks to recent advances. After that, I will show ways to further improve the DNN structures to achieve better recognition accuracy and to support new scenarios. I will conclude the talk by indicating that DNNs not only do better but also are simpler than GMMs.

Bio: Dr. Dong Yu joined Microsoft Corporation in 1998 and Microsoft Speech Research Group in 2002, where he is currently a senior researcher. He holds a PhD degree in computer science from University of Idaho, an MS degree in computer science from Indiana University at Bloomington, an MS degree in electrical engineering from Chinese Academy of Sciences, and a BS degree (with honors) in electrical engineering from Zhejiang University.  His recent work focuses on deep neural network and its applications to large vocabulary speech recognition. Dr. Dong Yu has published over 100 papers in speech processing and machine learning and is the inventor/co-inventor of around 50 granted/pending patents. He is currently serving as an associate editor of IEEE transactions on audio, speech, and language processing (2011-) and has served as an associate editor of IEEE signal processing magazine (2008-2011) and the lead guest editor of IEEE Transactions on Audio, Speech, and Language Processing special issue on deep learning for speech and language processing (2010-2011).

Prof. Hideki Isozaki

Head Finalization: Translation from SVO to SOV
Prof. Hideki Isozaki, Okayama Prefectural University

Abstract: Asian languages such as Japanese and Korean follow Subject-Object-Verb (SOV) word order, which is completely different from European languages such as English and French that follow Subject-Verb-Object word. The difference is not limited to the position of "Object" or the accusative case, and the former is also called head-final and the latter is also called head-initial. Because of the difference, phrase-based SMT between SVO and SOV does not work well. This talk introduces Head Finalization that reorders sentences into the head-final word order. According to the result of the NTCIR-9 workshop, Head Finalization was quite effective for English-to-Japanese patent translation.

Bio: Hideki Isozaki is a professor of Okayama Prefectural University, Japan. He received B.E., M.E., and Ph.D. from the University of Tokyo in 1983, 1986, and 1998 respectively. After joining Nippon Telegraph and Telephone Corporation (NTT) in 1986, he has worked on logical inference, information extraction, named entity recognition, question answering, summarization, and machine translation. From 1990 to 1991, he was a visiting scholar at Stanford University. He has authored or coauthored over 100 papers and Japanese books including LaTeX with Complete Control and Question Answering Systems.

Dr. Chai Wutiwiwatchai

Toward Universal Network-based Speech Translation
Dr. Chai Wutiwiwatchai, National Electronics and Computer Technology Center (NECTEC)

Abstract: The speech translation technology has been widely expected to play an important role in today global communication. This talk will address activities of a recently developed international consortium, called Universal Speech Translation Advanced Research (U-STAR), which composes 26 research organizations from 23 Asian and European countries. This largest research consortium has jointly developed a network-based speech translation service which supports translation among 23 languages and accepts up to 17 languages speech input. The service has been developed based on shared language resources in travel and sport domains. Users are able to access the service via a freely available iPhone application, namely VoiceTra4U-M. This talk will start by describing the initiation of the U-STAR consortium, followed by summarizing the development issues on both language resource and system engineering parts. Some statistics and analyses of the global usage during a few months field-testing after service launching will be revealed. Finally, challenging issues to improve the service accuracy and to extend the number of supported languages and translation domains will be discussed. 

Bio: Chai Wutiwiwatchai received his BEng (the first honor) and MEng degrees of electrical engineering from Thammasat and Chulalongkorn University, Thailand in 1994 and 1997 respectively. He received his PhD in Computer Science from Tokyo Institute of Technology in 2004 under the Japanese Governmental scholarship. He is now the Head of Speech and Audio Technology Laboratory, National Electronics and Computer Technology Center (NECTEC), Thailand. His research work includes several international collaborative projects in a wide area of speech and language processing including Universal Speech Translation Advanced Research (U-STAR), PAN Localization Network (PANL10N), and ASEAN Machine Translation. He is a member of International Speech Communication Association (ISCA), Institute of Electronics, Information and Communication Engineers (IEICE), and has served as a country representative in the ISCA international affair committee during 2007-2009.

 

 

You are here: Home Keynote Talks